Hacker News new | past | comments | ask | show | jobs | submit login

You could use a proxy.

"Archiving is always lossy" No.




You're talking to the guy who built the best proxy recorder in the archiving industry ;) ikreymer created https://pywb.readthedocs.io/en/latest/

I think he has more context than any of us on the limits of proxy archiving vs browser based archiving.

But also if you really need perfect packet-level replication, just wireshark it as he said. Why bother with WARCs at all?


pywb has WARC issues too, due to use of warcio:

https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem


Every archiving tool out there makes trade-offs about what is archived and how. No one preserves the raw TLS encrypted H3 traffic because that's not useful. When you browse through an archiving MITM proxy, there are different trade-offs: there's an extra HTTP connection involved (that's not stored), a fake MITM cert, and a downgrade of H2/H3 connection to HTTP/1 (some sites serve different content via H2 vs HTTP/1.1, can detect differences, etc...)

The web is best-effort, and so is archiving the web.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: