Hacker News new | past | comments | ask | show | jobs | submit login

I created the Events2012 dataset which has around 120,000,000 tweets reduced from an original sample of several billion. It is now over 10 years old, but still cited and still requested fairly regularly.

Previously, my response has been to send over the Tweet IDs, but now I have a dilemma because I understand the privacy concerns that resulted in the ID only policy but also think that Twitter data offers quite a lot of value to researchers. Interestingly, one of Twitter's policies around this also requires researchers to regularly remove tweets that have been deleted on Twitter from the local copies of their datasets, which of course requires checking each Tweet individually using the API...




Didn't know you were on here. I found your thesis very informative, thanks.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: