In the world of SPA (single page applications), headless browser API is super he...

lofatdairy · on Aug 10, 2022

Highly recommend playwright (if I'm not mistaken most of the big developers from puppeteer were hired by MS to work on playwright). I run into significantly less await/async problems with playwright than I did with puppeteer and the codegen tool is super helpful as a first pass option.

snehesht · on Aug 10, 2022

Playwright integrates with lot of different browsers compared to puppeteer which just uses chrome.

mynameismon · on Aug 10, 2022

Also is the ability to open the Networks panel, to snoop on requests and find the exact API call that you might need to perform your task, instead of having to pull in all of HTML/JS/CSS crap. As a lot of SPAs have essentially pushed everything behind JSON APIs, all information is usually one (authenticated) API call away.

XzAeRosho · on Aug 10, 2022

Most content heavy websites that tend to be scrapped, usually use server side rendering for this exact same reason, and put many obstacles in the way to make sure that data doesn't get scrapped easily. See: product price, stock, delivery information.

snehesht · on Aug 10, 2022

If you're interested in running the puppeteer in containers, take a look at chrome-aws-lambda[1] and browserless docker container[2]

Not affiliated with browserless, but they do have a free/paid cloud service. https://www.browserless.io

[1] https://github.com/alixaxel/chrome-aws-lambda

[2] https://github.com/browserless/chrome

btown · on Aug 10, 2022

https://chrome.browserless.io/ is perhaps the best technical demo I've ever seen, and shows off Browserless's capabilities amazingly. An incredibly high-quality service and codebase.