Forgive my shallow understanding of block chain, but wouldn't that make the archive immutable? Surely there are times where the Wayback Machine needs to delete snapshots, in cases where there's copyright infringement or other illegal activity.
Yes, it would make the archive immutable, but that doesn't prevent the data from being deleted.
A very similar example is found in git repos: while normally you'd have every single bit of data that lead up to git HEAD, you can use git in "shallow" mode, which only has a subset of that data. If you delete all but the shallow checkouts, the missing data will be gone forever. The missing data is still protected from being modified by the hashing that Git does - and you're guaranteed to know that data is in fact missing - but that cryptography doesn't magically make the data actually accessible.
> Forgive my shallow understanding of block chain, but wouldn't that make the archive immutable?
Kind of. The current state of the archive is mutable, but that changes to that state are logged to an append-only edit history — it's that edit history that is the "blockchain", and starting from a known good state and replaying all those edits must produce the current state. In fact, this is how cryptocurrencies work too — the state is the balances/utxo set, and the blockchain records transactions, which are effectively just mutations on that state.
In this situation, you'd look at the current state and find the deleted snapshot missing, but the edit log would have an entry saying the snapshot was added (and what its hash was at the time), then another entry saying it was deleted.
This is also an issue for major blockchains in deployment now, specifically Bitcoin. There is the potential for illegal content, or links to it, to be stacked on BTC’s blockchain [0], and so anyone who holds that blockchain would also possess it.
I believe this would also be an issue for things like Filecoin/IPFS but I’m not sure if the liability issues are different or nuanced.
IPFS works like torrents: users only host things that they choose to, so there's no issue of some people being stuck hosting content they don't want to.
If you put the data itself in the blockchain then that would be true. I'm suggesting putting a hash of the data in a blockchain; you could delete the data and keep the hash in the chain. You couldn't regenerate the hash to check it which might be a problem but if the data has been deleted you'd have to accept the hash regardless. It'd only affect that link in the chain. (This is from my limited understanding of blockchain math. I definitely could be wrong.)
Paper archives usually contain a ton of copyrighted material, e.g. "John Doe's papers" includes magazines, newspapers, letters written by other people, etc that are not copyright by John Doe.