Hacker News new | past | comments | ask | show | jobs | submit login

The problem with using Reddit specifically is that you can't filter by date anymore. Reddit has poisoned their results to show old posts with new dates on Google.



Huh, I wonder, can I download reddit? Like, all the text posts, ignoring images. I wonder how big of a db that is and how hard would it be to crawl it myself. It can't be more than a few gb of data. I mean, at this point there is a lot of information there that is just begging to be leveraged.


Pushshift has a monthly comment[1] and submission data dump that you can download. Last June 2021's (comment) size was 20+ GB compressed in ZS.

[1]- https://files.pushshift.io/reddit/comments/




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: