Related to this: you can also use my shot-scraper tool to scrape web pages from ...

marginalia_nu · on March 14, 2022

That's fantastic!

I'll definitely investigate using this. I implemented my own MacGyver version of this basic functionality off selenium to grab screenshots for search.marginalia.nu/explore/random -- but that script is super sketchy and held together in with bubble gum and duct tape. Yours looks a lot better.

By the way, is there a way to extract favicons as well?

simonw · on March 14, 2022

No, I've not thought about favicons. That's a really interesting challenge.

I wonder if there's a way to detect favicons just using JavaScript that runs against a page? Not sure if it's easy to detect /favicon.ico v.s. the various meta tags.

simonw · on March 14, 2022

Would be kind of fun to write JavaScript that runs against the page that first looks for the meta tags, then tries to fetch("/favicon.ico") and returns either the URL or a base64 encoded copy of the image (since the "shot-scraper javascript" command requires you to return JSON).

marginalia_nu · on March 14, 2022

There's a lot of weird edge cases for favicons, most browsers fall back to just looking for /favicon.ico if you don't explicitly specify it in the meta tags, and if you do, there's sometimes different versions.

Yeah, maybe it's a pipe dream :-/ But even without them, it looks really useful!

Freeboots · on March 15, 2022

I've found some fairly reliable repos in the past, or https://t1.gstatic.com/faviconV2?client=SOCIAL&type=FAVICON&... still works (but for how long)