I've had a similar problem. In updating my portfolio site recently, I noticed a vast majority of links were dead. Not just live projects published maybe 3 years or more ago (I expect those to die). But also links to articles and mentions from barely one year ago, or links to award sites, and the like. With a site listing projects going back ~15 years, one can imagine how bad things were.
I had to end up creating a link component that would automatically link to an archive.org version of the link on every URL if I marked as "dead". It was so prevalent it had to be automated like that.
Another reason why I've been contributing $100/year to the Internet Archive for the past 3 years and will continue to do so. They're doing some often unsung but important work.
It's _not_ a video recording service. It saves and can replay all network requests during a session (including authenticated requests). It's open source, you can self host, I'm not affiliated even though I'm very happy that it exists
I've updated my portfolio before and noticed that as well. I usually include a screenshot or two when I first add a project, so at least that remains.
If the site goes down later, I just remove the link and don't worry about it. My code from 15 years ago is probably atrocious, so I'll consider it a small blessing :P
I figure you're doing the same as someone that cuts an article that they are mentioned in out of a newspaper and frames it on their wall. I've seen plenty of restaurants and businesses do it.
> It could potentially be considered fair use, since I'm not making a profit and I provide commentary.
Although people through that term around willy nilly, in our current framework that means being sued for a minimum of $100,000 per supposed violation, and making your fair use defense in front of a judge.
Youtubers have reported spending $50,000 just to begin talking with lawyers and preparing briefs.
To clear things up: robots.txt can retroactively hide content from the archive. If it's changed back to allowing the archive's crawler, content from before the ban can be accessed again.
Considering the topic of discussion, how sure can you be that archive.is will still be around in a year? Three years? Ten?
As much as I tried, all I could find about it is that it's run by one guy in Czech Republic who's paying $2000/month out of pocket for hosting, and apparently dislikes Finland.
http://archive.is/robots.txt doesn't seem too bad, it looks like you could slowly inhale everything... in theory. There are no sitemaps (they're there, but empty placeholders); you have to know the site name to be able to get a workable list.
I think http://www.webcitation.org/ might be better in that regard since it's a consortium of "editors, publishers, libraries". See "How can I be assured that archived material remains accessible and that webcitation.org doesn't disappear in the future?" in their FAQ (http://www.webcitation.org/faq). Although from my perspective it seems to be more geared towards academic use.
archive.is is very nice, but they're a URL-shortener as well, so their links are utterly opaque strings of alphanumerics, whereas the Wayback Machine preserves both the full original URL and the date and time it was captured in the archival URL.
archive.is does not crawl automatically, it must be pointed at a page by a user. While this makes it particularly useful for snapshotting frequently-changed pages, it is not a replacement for the proper Internet Archive.
I had to end up creating a link component that would automatically link to an archive.org version of the link on every URL if I marked as "dead". It was so prevalent it had to be automated like that.
Another reason why I've been contributing $100/year to the Internet Archive for the past 3 years and will continue to do so. They're doing some often unsung but important work.