Hacker News new | past | comments | ask | show | jobs | submit login

This is something I find a lot of web scraping tools miss. Are there any you'd recommend that specifically deal with things like async JavaScript content loading, or loading content based on what you click on a page (e.g., in Single Page Apps)?



Javascript content loading is easier in most cases. Just look at your browser network inspector and grab the URL.

Usually the response is in JSON and you can ignore the original page. You might have to auth/grab session cookies first, but thats still easier than working with the HTML.


Playwright. It can be easily used with JS, Python, Go, Java, etc.


Thanks! Is that like using Selenium? (i.e., you have to manage and code the actions yourself)


Yes, quite similar. According to their definition it is a "library to automate Chromium, Firefox and WebKit with a single API. "


Thanks! If there are any third-party managed tools to do this, that would be awesome to know about (i.e., where they somehow run common JS functions/site interactions to test for additional content).


Unfortunately, it's a pathological edge case.

Imagine an async-loaded list, that continues loading more content as it comes in, until it displays all of the content available to the backend.

When would you know such a list is finished loading?

This sounds insane, but it's pretty easy and common for an ambitious UXer to key in on, and is something I've seen in production pages.

(In the event you are a UXer, please include some sort of status update! Even an overlaid spinner that disappears solves the problem.)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: