I created the Events2012 dataset which has around 120,000,000 tweets reduced fro...

I created the Events2012 dataset which has around 120,000,000 tweets reduced from an original sample of several billion. It is now over 10 years old, but still cited and still requested fairly regularly.

Previously, my response has been to send over the Tweet IDs, but now I have a dilemma because I understand the privacy concerns that resulted in the ID only policy but also think that Twitter data offers quite a lot of value to researchers. Interestingly, one of Twitter's policies around this also requires researchers to regularly remove tweets that have been deleted on Twitter from the local copies of their datasets, which of course requires checking each Tweet individually using the API...