I'll definitely investigate using this. I implemented my own MacGyver version of this basic functionality off selenium to grab screenshots for search.marginalia.nu/explore/random -- but that script is super sketchy and held together in with bubble gum and duct tape. Yours looks a lot better.
By the way, is there a way to extract favicons as well?
No, I've not thought about favicons. That's a really interesting challenge.
I wonder if there's a way to detect favicons just using JavaScript that runs against a page? Not sure if it's easy to detect /favicon.ico v.s. the various meta tags.
Would be kind of fun to write JavaScript that runs against the page that first looks for the meta tags, then tries to fetch("/favicon.ico") and returns either the URL or a base64 encoded copy of the image (since the "shot-scraper javascript" command requires you to return JSON).
There's a lot of weird edge cases for favicons, most browsers fall back to just looking for /favicon.ico if you don't explicitly specify it in the meta tags, and if you do, there's sometimes different versions.
Yeah, maybe it's a pipe dream :-/ But even without them, it looks really useful!
HN post about this from yesterday (which failed to get any traction): https://news.ycombinator.com/item?id=30667588