This was my approach too and it's been working great. Nowadays data isn't rendered directly into HTML anymore, it gets downloaded from some JSON API endpoint. So I use network monitoring tools to see where it's coming from and then inferface with the endpoint directly. I essentially wrote custom clients for someone else's site. One of my scrapers is actually just curl piped into jq. Sometimes they change the API and I have to adapt but that's fine.
> I understand companies can put roadblocks to hinder this
Can you elaborate? I haven't run into any roadblocks yet but I'm not scraping big sites or sending a massive number of requests.
> Can you elaborate? I haven't run into any roadblocks yet but I'm not scraping big sites or sending a massive number of requests.
Cloudflare Bot Protection[1] is a popular one. The website is guarded by a layer of code that needs to be executed before continuing. Normal browsers will follow through. It can be hard to bypass.
> I understand companies can put roadblocks to hinder this
Can you elaborate? I haven't run into any roadblocks yet but I'm not scraping big sites or sending a massive number of requests.