It's simple: Use the website. It will never stop working because people use it and you can automate and scrape it just fine. Kind of sucks that you have to go through the effort - but then there is no social media corporation that isn't toxic in some way to it's userbase.
I actually don't mind scraping it that much, and even enjoy the adversarial aspects of writing scrapers. but it makes a lot of high level functionality like filtered streams or historical search much less accessible; I'd probably never have learned a lot of network analysis stuff over the last decade or so if I'd had to to pay to access streaming data first. Also, I think it's going to be harder for academic researchers to get institutional approval to scrape adversarially, so it could put a dent in a lot of social science research by forcing people to chase grants instead of focusing on their code.
It's ironic, for a couple (non-Twitter) projects I wound up scraping because either a) they didn't have an API yet (e.g. early crypto pricing sites) or b) I wasn't confident the API would remain intact over the long term. Kind of depressing.
> I think it's going to be harder for academic researchers to get institutional approval to scrape adversarially
This is a good point about what might happen, but it seems worthwhile to address and fix directly. Personally I don't see why adversarial scraping of a publicly published website should require any more ethical consideration/review than using the suggested API would. Ethical concerns should revolve around humans, not the business desires of non-human entities.
The website is awful. It's horrific. It was the worst site I visited regularly until I started using Nitter instances. I'd rather not know what is going on than go to it. I wish they'd wind the UI back 10 years, to back when it was pleasant to browse.
I look forward to someone teaching Elon Musk what version control is, resulting in him wholesale rolling back Twitter's entire stack to 2008. At last, Twitter is written in Ruby again!
IIRC the twitter website goes to great lengths to mangle itself to prevent ad blockers presumably.
Of course this used to have the side effect of breaking significant swathes of basic browser UX, especially in areas of accessibility. I assume they’re better now than they once were, but given musk’s historical behavior I assume he won’t consider breaking something like accessibility to be bad or problematic.
> musk’s historical behavior I assume he won’t consider breaking something like accessibility to be bad or problematic.
I don't see why would he spend more to make website less accessible. Not fixing bugs - yes, just about everyone does so. But breaking it intentionally costs money.
Tesla's aren't THE most accessible (by cost) EV out there, but SpaceX def is.
Supposedly the entire accessibility team got laid off [1], so there might not be anyone left to ensure that changes to existing features do not reduce accessibility nor ensure that new features are accessible. Not that I'd expect Musk to care a whole lot about concerns raised in that regard, considering that he supposedly went ahead with Twitter Blue despite significant (and well warranted) concerns from the Trust and Safety team [2]. So Twitter will probably become less and less accessible over time.
Ah yes, because "what's the relative cost and complexity to procure a launch for my satellite?" is famously a real example of an accessibility problem faced by people with disabilities...
> I'm pretty sure you know exactly what the poster meant ...
No, I don't. It obviously is trying to say something about accessibility and SpaceX, but I don't see how the sense of accessbility being discussed in the thread even applies to SpaceX, much less what claim is being made.
Agree. It's quite the jump going from a user-centric website to a space launch company that is used by corporations and governments. Like, okay, I guess SpaceX is extremely accessible compared to other options in the space industry, but why would that have any bearing on how accessible the Twitter website is for you and me day-to-day?
This is a comment thread discussing accessibility at Twitter, in a post about Twitter discontinuing part of its platform. It’s not a place to circlejerk Musk’s achievements, regardless how much you would want that to be so.
Explain, in exact words, what about SpaceX in any way shape or form would indicate that they know how to handle user a11y and how that would transfer over to Twitter— you know, the thing we’re actually discussing here.
Accessibility is not a by cost thing, and you aren't paying to break accessibility, you're paying to break scraping and maintaining accessibility when doing so costs money. It also requires having engineers working to keep the site accessible, but musk fired them.
I think it’s a terrible move to kill the stuff like postybirb that a lot of people use to make posts on multiple websites at once. they’ll have to either change their workflow to make a tweet (annoying) or just abandon twitter altogether. just makes the website worse for practically no benefit.
Does not take much effort. Below is an example using curl. For reading Twitter feeds I just get the JSON and read the "full_text" objects. I have simple custom program I wrote that turns JSON of unlimited size into something like line-delimited JSON so I can use sed, grep and awk on it but HN readers probably prefer jq. For checking out t.co URLs I use HTTP/1.1 pipelining.
Usage for reading is something like (but not identical to)
"ahref" is just a script turns URLs on stdin into simple HTML on stdout
Alternatively if I do not trust the URLs I might use a script called "www" instead of ahref. It takes URLs on stdin and fetches archive.org URLs wrapped in simple HTML to stdout, using the IA's cdx API.
There's no way I would use the Twitter website as it requires enabling Javascript and not for the user's benefit.
This solution isn't pretty but I can easily keep tabs on Twitter feeds without any need for a Twitter account, a Twitter "API key" or a so-called "modern" browser.
You can scrape anything you see in the UI (and sometimes stuff you cannot see). Twitter makes almost no effort to stop people from using their internal APIs, which is why them saying discontinuing the free public API is to stop malicious bots is pretty laughable. Unless they seriously increase their detection abilities for non-approved clients using their internal API, it would take any malicious actor all of a few hours to transition to using the internal API for whatever they want. Honestly, I assumed most bad actors would already be doing this, since things like spamming were already against the ToS of the public API.
Twitter is not alone in using GraphQL this way, having all website visitors use the same token or key. Other websites do it, too, as shown below.
Using GraphQL like this can be an effective dark pattern because to anyone using a "modern" browser that "tech" commpanies control it makes it seem like the text of the website cannot be retrieved without Javascript enabled. That's false, but nonetheless it gets people to enable Javascript because the website explicitly asks them to enable it. Then the website, i.e., "tech" company, can perform telemetry, data collection, surveillance, and other shenanigans.
Sometimes this practice might not be a deliberate dark pattern, it might just be developers who are using Javascript gratuitously. For example, HN search provided by Algolia uses GraphQL. HN puts URLs with pre-selected query terms and a public token ("API key") on the HN website. Everyone that uses those URLs uses the same key.
Unlike Twitter, HN istelf does not ask anyone to enable Javascript. The website works fine without it, including the Algolia search, as shown below.