I can't believe I'm the one to propose this but being able to unjustifiably timestamp some data is one of the only actual use cases of blockchains. Archive your data, compute a checksum and store that in a Bitcoin block and you can prove later on that you actually owned the data at this point.
Of course there are other ways to achieve that such as publishing your checksum to a vast number of neutral third parties such as in a mailing list, bittorent or even a newspaper. You could also rely on a trusted 3rd party who has low incentives to manipulate the data (or would risk a lot if they were caught cheating) such as a bank, insurance or notary for instance.
I think archive.org could potentially do something like that by using a merkle tree containing the checksums of the various pages they've archived during the day and publish the top hash every day for anybody to archive (or publish it on a blockchain or whatever, as said above). If later on somebody accuses the archive of manipulation they can publish the merkle tree of the day the site was archived which contains the checksum of the page, and anybody having the top hash of that day can vouch that it is indeed correct.
It doesn't stop the archive from storing bogus data but it makes it impossible to "change the past" post-facto, so in this particular situation the journalist could only complain that the archive stored bogus data back in the day and not that it was recently tampered with.
I immediately had the same idea to use a 3rd part to host checksums, surprised they haven't done this. Blockchain makes a lot of sense from the immutability standpoint, but how would you incentivize people to maintain it? Maybe you can get people to do that for the common good a la wikipedia? Not sure about that. Maybe you get Apache to bake it into their webserver to ask people to opt-in to dedicate 0.1% of resources to the cause?
I was thinking about using an existing blockchain such as Bitcoin. Of course then the inconvenient is that archive.org would have to pay a fee every time they submit a new hash. A comment above pointed out that the scheme I described (unsurprisingly) already exists at https://petertodd.org/2016/opentimestamps-announcement
Realistically it might be overkill though, simply setting up some mailing list where anybody can subscribe and be sent the checksum every day or even just publishing it at some URL and letting users scrap it if they want might be sufficient. If we're talking about one checksum every day it's only a few kilobytes every year, it shouldn't be too difficult to convince a few hundred people and organizations around the world to mirror it.
Merkle trees.
Whomever wants to store a timestamp for some message stores the path from that message to the root of the Merkle tree. Only the root of the merkle tree of each day needs to be published.
Of course there are other ways to achieve that such as publishing your checksum to a vast number of neutral third parties such as in a mailing list, bittorent or even a newspaper. You could also rely on a trusted 3rd party who has low incentives to manipulate the data (or would risk a lot if they were caught cheating) such as a bank, insurance or notary for instance.
I think archive.org could potentially do something like that by using a merkle tree containing the checksums of the various pages they've archived during the day and publish the top hash every day for anybody to archive (or publish it on a blockchain or whatever, as said above). If later on somebody accuses the archive of manipulation they can publish the merkle tree of the day the site was archived which contains the checksum of the page, and anybody having the top hash of that day can vouch that it is indeed correct.
It doesn't stop the archive from storing bogus data but it makes it impossible to "change the past" post-facto, so in this particular situation the journalist could only complain that the archive stored bogus data back in the day and not that it was recently tampered with.