Hacker News new | past | comments | ask | show | jobs | submit login

A huge concern that no one seems to be mentioning is that pretty much all academic data sets that used the Twitter API were required by the TOS to only be released publicly with the tweet ids, not the content of the tweet. Some of these have hundreds of thousands of tweets, and the only way to check the work of the researchers, or to build on it, is to use the api to reseed everything yourself.

With this new policy that practically becomes impossible (the costs would be outrageous for a researcher, much less an individual).

This will have the effect of essentially destroying 15 years worth of social science work that was based on Twitter data, it's all gone.

If you're a researcher who has published a data set with only IDs but have a private version with all tweet data I highly encourage you to publish that internal dataset. I'm putting together some hosting for anyone that needs it, feel free to get in touch and we can take it from there.




I created the Events2012 dataset which has around 120,000,000 tweets reduced from an original sample of several billion. It is now over 10 years old, but still cited and still requested fairly regularly.

Previously, my response has been to send over the Tweet IDs, but now I have a dilemma because I understand the privacy concerns that resulted in the ID only policy but also think that Twitter data offers quite a lot of value to researchers. Interestingly, one of Twitter's policies around this also requires researchers to regularly remove tweets that have been deleted on Twitter from the local copies of their datasets, which of course requires checking each Tweet individually using the API...


Didn't know you were on here. I found your thesis very informative, thanks.


It seems to me that as a subject pool, Twitter is extremely bad (like other options that researchers tend pick out of frugality, like Mechanical Turk). Why use it at all?


I think the paper showing that Twitter activity slightly predicted stock movements was incredibly interesting and valuable to know. I'm sure there are plenty of other papers as well, like analyses of Arab spring?


An analysis of the Arab spring could also have been made with other sources. It's easy material, that's what Twitter is. Easy, but also misleading.


Twitter was a big driving force in that movement. Being able to track moment by moment activity and movements and how this shaped the Arab spring is not something you could achieve with other sources.


> Twitter was a big driving force in that movement. Being able to track moment by moment activity and movements and how this shaped the Arab spring is not something you could achieve with other sources.

Would anyone have been able to do that without paying? Years ago, someone told me you could get at most 1% of the "firehose" without paying, and if you paid you could get ~10% (however the free sample wasn't a subset of the paid sample, so they still grabbed both and de-duped).


You can do a lot with the 1% sample - indeed you would probably want to filter it further, both because retrieved streams count toward your monthly tweet cap (2 million/mo on the best non-academic free tier, I think 5m or 10m month if you are a postgrad w/institutional affiliation), and because the filtering capabilities are pretty rich. An obvious use case is tracking big political or industrial influencers and estimating their reach and peer cohort by looking at different kinds of engagement they get.


I meant that you could reconstruct the moment by moment events after the fact. That's where the value for social science comes in.


link to the stock movement one?



Twitter is the exclusive source of tweets, so there's that.


feel free to get in touch

How? I am not a full-fledged network scientist, but I know a whole lot of academics in this space and am good enough to swap tips/find minor math errors. I know quite a lot of people staring down the exact problem you describe.


here's fine? Message me on Twitter, @cguess


I did, feel free to DM. Thanks.


Maybe they should’ve considered whether it was wise to tie they research to a company that might not exist in 5 years


If no researchers studied Twitter, that would be a massive failure to analyze a major piece of online activity. I believe that no one giving even basic thought to what researchers do would endorse what you just said.


More like a huge new source of sociological data collected at massive scale appears, and you'd be a complete idiot not to study it. I feel your comment is like saying astronomy is a bad career choice because pollution and cloudy weather are likely to limit the scope of your research opportunities.


That clearly wasn't even an issue in the past. It's only since Musk took over that theres been anything said on the possibility of Twitter closing up shop, which as we all know frm Musks' well know history of spewing nonsense, will not happen.


Other than Twitter losing money every year?


Twitter was profitable in 2018 and 2019.


Barely profitable in 2 years but bleeding money for the rest of its existence including the last 3 years…


Twitter's 2021 wasn't really that bad. They increased revenue about 37% year-over-year to $5 billion, at a loss of $220M.

In the pumped-up investment environment of 2021, that kind of relatively small loss was an acceptable trade-off for such high growth.

The revenue collapse of 2022 is clearly almost entirely due to the new ownership who did their best to destroy advertiser trust, first by spending six months disparaging the company, then by indiscriminately firing moderation and ad sales teams.


> at a loss of $220M.

Due to an $800M lawsuit settlement - without that, they'd have been on $580M profit. Which isn't terrible on $5B revenue.


Why can't it happen?

Twitter has $13bn in debt on its balance sheet thanks to the ridiculous purchase price. Last week's bond payment of $300m is on its own, half of what their total net debt was pre-Musk ($600m)[0].

Twitter's revenue last year was $5bn. In a year's time it will probably be a fraction of even that; Twitter's user base isn't growing and isn't going to magically become more valuable in an economy where advertisers are cutting back on spending.

[0] https://www.wsj.com/articles/how-elon-musks-twitter-faces-mo...


Grab them while they are hot. One week left


Grab and maybe push to `huggingface::datasets`?


Or a torrent.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: