Hacker News new | past | comments | ask | show | jobs | submit login

Very true, they don’t have to give access to their API for free. My question is, why don’t devs use Puppeteer to spin up a Chromium instance and access Twitter’s data that way?



Twitter could add anti-bot protection like Cloudflare’s CAPTCHA. I’ve seen that make a site unusable with Puppeteer. By “unusable” I mean that correctly solving the CAPTCHA either just gives you another CAPTCHA, or gives the “click if you are human” thing, and clicking that just goes to a CAPTCHA or another “click if you are human”.

It didn’t even require doing anything to/on the page with Puppeteer. Merely using it to open a browser window and then waiting, and using that browser window by hand during the wait to go visit the site that was using Cloudflare anti-bot protection ran into the problem.


IIRC, puppeteer/webdriver things generally work by injecting JS through extension, and their presence can be detected by looking for those objects/functions from JS.


Isn’t there a stealth plugin that can hide the injected JavaScript?


I actually face the same issue on Firefox with some websites except I’m not using Puppeteer. I thought it was a glitch with Cloudflare but guess not.


That’s part of the reason that “it’s to stop the bots” is BS. Academia and other similar uses will pay or just stop. Bots and other malicious uses will find work-arounds and just change where the game of whack-a-bot is being played.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: