Hacker News new | past | comments | ask | show | jobs | submit login

The web has nothing built-in for archiving and versioning. It's a gaping hole in this technological platform, one that has been noted and criticized for a very long time. The reality of this problem, however, is vehemently denied by the current generation of "technologists". Of course, those are the same people who get six-digit salaries for managing complexity they themselves create - partly through hyper-centralization. Good and resilient archival, on the other hand, necessarily implies some level of decentralization.

I singlehandedly maintain a 14-year-old website that used to be a modestly popular web magazine. It's not very expensive, but it's a pain. DNS system is horrible and it's easy to loose domain names to some nonsense. (I lost one that used to be a free 2nd level domain when it was converted to a paid-for zone. Not a matter of money, just paperwork.) Server management is a time drain. Stuff like adding SSL certificate to a legacy VPS can lead to a cascade of updates and config changes that can take days to make and test.

BTW, everyone sings praises to archive.org (and it's well-deserved), but most people here do not seem to realize that they are also a centralized platform that can collapse and take everything down with them. Who archives the archives, etc. Fortunately, they are not the only one of the sort. Unfortunately, it's all very ad-hoc.

If W3C weren't a bunch of corporate shills, there would be a standard for creating versioned web archives, like, 10 years ago. It's obvious that we need one.




'Archival' is an adjective. The gerund 'Archiving' has done good service as a noun for decades. (I will die on this hill.)


Fair enough. Updated.


> Who archives the archives, etc.

Another archive:

https://www.bibalex.org/en/project/details?documentid=283

There are also partial distributed backups by volunteers:

http://iabak.archiveteam.org/


I'm interested in IPFS for allowing sites to be archivable. If a site I was interested in was hosted with IPFS, then I could mirror their content and help serve them their content on IPFS. If the original host goes down, I'll still be able to help host the content at its original URL, and the URL will still work for anyone else in the world who tries to follow it. And then maybe people will re-host my own content in the same way, even to long after I'm gone if my content is good enough.


Interesting post. So are you saying that if there was a good standard for versioned web archives, then you could stop maintaining your website and just point people to the archives?


Yup, that's the idea behind projects like https://github.com/HelloZeroNet/ZeroNet and https://github.com/oduwsdl/ipwb.


I would still maintain the website, but it would be much easier, because I could lean on archival features when that makes sense, instead of trying to keep everything "stable" manually.


Nobody could have predicted the global growth of internet users and the sheer quantities of data being created on a per second scale. Exabytes when? And then what?


There were plenty of people predicting it, pointing out its deficiencies and explaining what needs to be done. In terms of very high level ideas, Alan Kay comes to mind.


Let me guess, you didn't provide references because the sites predicting the growth of the internet were not archived? :-)


Ted Nelson was complaining constantly about the deficiencies. Pretty much nailed the issues too. Unfortunately his solutions were difficult to implement.


If you want a non-centralized solution check out https://archivebox.io or https://github.com/webrecorder/pywb.

(also there is a standard for web archives: WARC)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: