Archive.is is an incredible service, but the fact he's paying out $3k-4k/mo out of pocket in expenses & time doesn't strike me as sustainable for the long term.
I'm reminded of some non-profit organization that was forced to shut down their websites because they ran out of money. In retrospect they could've setup a trust fund early on, stuck all their money in there, and then had a perpetual annual income from that for operating costs, instead of spending down. All fundraising could've gone into the trust fund in order to boost the annual budget, etc.
>How much does hosting cost you per month at the moment?
>about ~$2600/mo of pure expenses on servers/domains, not counting “work time”, “buying laptop/furniture”, etc. ($100…300/mo covered by donations + $300…500 by ads)
As someone who managed to get more or less donation campaigns in the past, here are my 2 cents.
There's a huge difference between "please donate to our project" and "it costs us $X/month to run it, we have Y users and managed to collect $Z so far, please donate".
The first one will get you about $1 per 10K-1M users. The second one will get your goal fulfilled, as long as you are reasonable, and have enough users. All it takes is a noticeable message and a way to update it automatically based on the money received.
This is a very fair answer, and all "permalinks" are lies. At the same time, I wonder if it might not be possible to have snapshots up on, I dunno, IPFS or torrent sites or something such that when the unavoidable happens, not all is lost.
Depending on the format of the archives (hopefully WARC), the owner could hand the entire archive over to the Internet Archive or Archive Team for ingestion by Wayback Machine.
If the concern is perpetual access to archived content but under your terms, that is where the cost comes in. Somebody somewhere is paying for power, cooling, connectivity, and disks. The Internet Archive estimates it costs them $2/GB to host data uploaded in perpetuity. Please consider donating if you're uploading content for permanent archival and/or deriving value from hosted content.
Pretty much. If you really want to maintain access to old stuff in perpetuity, you have to pay for it yourself by either storing it on your own equipment or paying someone to store it on theirs.
It's not perfect. If you desire to store content that the Internet Archive must dark for whatever legal or compliance reasons, you'll have to cover the cost for that storage. The cost will always be non zero to perform such an operation.
A system of mirrors prevents a single node going down from taking the whole system (in theory at least, we've all seen plenty of times where failover goes poorly), but it doesn't do anything to ensure the long term survival of the system as people lose interest, lose the ability to participate, and sometimes die.
If you were around certain internet forums in the late '00s you might have run in to an image hosting platform called WaffleImages which was created in response to yet another popular free image hosting service locking down their embedding and ruining thousands of old posts. The goal was to distribute image hosting among community-operated mirrors, and it worked great for a few years. Over time though people lost interest while the rate of new mirrors getting added dropped to basically zero and eventually it fell apart.
The problem isn't a single instance going away. The problem is what happens when for whatever reason the owner stops maintaining the project. This is a common problem and despite all the bluster and buzzwords, the IT community hasn't really found a solution. Torrents are the only kind-of-solution, but they are not ideal for something that needs to be constantly updated.
You can believe that cool URIs don't change or you could go the IPFS route. Similar to the way torrents have a 'health' score of plenty of seeders, and IPFS resources could live as long as people want that resource to exist (Not sure if that situation is baked into IPFS though).
Archive.is still doesn’t work if you use Cloudflare DNS due to a spat with Cloudflare and the operator. So to me, the continuity and reliability is already a big question. Not only is it a question of sustainability economically, but also ideologically: what happens if another similar decision is made to lock out a portion of users?
For reference: the spat is that Cloudflare DNS does not leak geographic information of the queryer through EDNS, and the archive.is fellow is requiring geographic information to provide valid DNS lookups. So he intentionally sends back bad results when it is 1.1.1.1 querying his nameservers.
I love the site, but his stance on this doesn't really make sense to me, and it's a shame that millions and millions of people use 1.1.1.1 daily and archive.is is the one website that doesn't work for those people.
I'm not sure the term 'leak' applies. It's an anti-cdn play. Refusing to use EDNS correctly makes the web slower for a lot of people. And it adds little to nothing to privacy since the answer IP is going to know your IP at the next step anyways...
As for why archive.is cares so much...that I don't know. Perhaps they rely on such data to give a fast experience, and are tired of this charade...but that's just speculation.
Cloudflare's edge network is sufficiently dense that ECS data is unnecessary in almost all cases. The requesting data center will be close enough to the client that doing geoip on the source IP will have the same results as using ECS.
There's nothing incorrect about what Cloudflare is doing, EDNS does not require ECS data to be included in requests, but for whatever reason the maintainer of Archive.is decided to block 1.1.1.1 over it.
> EDNS IP subsets can be used to better geolocate responses for services that use DNS-based load balancing. However, 1.1.1.1 is delivered across Cloudflare’s entire network that today spans 180 cities. We publish the geolocation information of the IPs that we query from. That allows any network with less density than we have to properly return DNS-targeted results. For a relatively small operator like archive.is, there would be no loss in geo load balancing fidelity relying on the location of the Cloudflare PoP in lieu of EDNS IP subnets.
> We are working with the small number of networks with a higher network/ISP density than Cloudflare (e.g., Netflix, Facebook, Google/YouTube) to come up with an EDNS IP Subnet alternative that gets them the information they need for geolocation targeting without risking user privacy and security. Those conversations have been productive and are ongoing. If archive.is has suggestions along these lines, we’d be happy to consider them.
> I love the site, but his stance on this doesn't really make sense to me
would not be surprised if he has some personal axe to grind with cf (they are no sheep either).
also i would be wary of overestimating market penetration of any 3rd-party dns provider; iirc google has total dominance of this segment and is still below 10%.
The Android app has 418,000 reviews and over 50 million installs, and the iPhone app has 230,000 reviews. The number of people who use it without an app is probably a lot higher.
I lost faith in Cloudflare when they switched from being a neutral infrastructure service to yet another politically-motivated big tech company. In their blog post around the ban of 8chan (https://blog.cloudflare.com/terminating-service-for-8chan/), they acknowledged that they didn't know if 8chan broke any laws but they nevertheless decided to pass personal judgment based on a vague notion that 8chan "inspired" a shooting. That's quite an unprincipled way to operate a fundamental network utility that backs 10% of the Fortune 1000 and 20% of the top 10000 websites.
archive.is is my browser to text-heavy websites like blogs, news, twitter, and documentation (outline.com is another one, reader mode, yet another). It completely debloats a webpage as it archives (unlike web.archive.org, say). Suits my purposes just fine. I must note though, archive.is (from what I recall), forwards IP address of whoever initiated an archive process to the origin.
archive.is was also a great mirror to instagram and linkedin for public profiles, but it doesn't archive instagram anymore.
"There is no Instagram content which don’t need to login.
If you can access the page without login, it is sort of “promo preview“, after few pages accessed this way, they add your IP into “promo is over“ list and will redirect to /login on every future request.
I just have not enough fresh IPs to abuse this mechanism."
This is kind of an antithesis to his message in the post; that nothing is actually permanent, and while many people are concerned about continuity of service, ultimately perfect continuity is impossible, whether that's due to the organization running out of money, bad backup practices and a fire, global warming wiping Ashburn Virginia off the map, or the end of the human civilization by some other means.
I've helped with a couple risk registers at tech companies. Two things I've never seen appear in a risk register: The company runs out of money. Human society is wiped out. I've been laughed at once for bringing up variations on these. They're out of scope; risks stop being a threat when there's no one left to care about them.
I think the goal of keeping an internet-scale level of data accessible and searchable, for longer than one lifetime, is an impossible task. Maybe Archive.org/Archive.is can pull it off; I doubt it. Its an insane amount of data. Most of it is totally pointless, but its really difficult to pick-apart what's useful and what's useless, so you have to keep as much as possible without bias. All of that is on hard disks which violently spin around at 8 meters per second, accessed by software which we all know breaks every day but are too afraid to admit it, over a network of other computers with all the same flaws, distributed globally, yet can be significantly disrupted by one roadside construction worker and a jackhammer.
The internet didn't increase the lifetime of data; it decreased it. Sure, we have far more of it at our fingertips than any other point in history, but that's not lifetime; that's just volume. And that volume has desensitized us; its fundamentally impacting our innate biological memory capacity, and the social structures we form around memory. We know the Library of Alexandria existed because people wrote about it; the pages laid for thousands of years; its memory passed verbally from person to person.
If all computers stopped functioning tomorrow, not even disappear, they're still there, they just don't work: Would the memory of Stranger Things still be known in two thousand years? I doubt it, but: if the only thing which offers us a satisfying "Yes" is "we keep the computers running, accessible, indexable, searchable"; that seems, at the very least, given the extreme challenges we as a species will be facing over the next century, beyond the scope of human possibility
The Sun’s ability to function like an Earth-wide Eprom eraser might cause some catastrophic disruption given our reliance on the Internet and computing devices. A large enough geomagnetic/solar storm is not unprecedented.
I think it’s one of those things modern parents are going to have to understand about their brave new world, and teach their modern children. Like look both ways before crossing the street, remember that everything you write on the Internet is permanent, so think before you write, or if you don’t want to do that then at least write it under a pseudonym.
I am not a lawyer but I think it would be legally difficult to make it illegal to record what people say and do in public spaces. That was a right that the entire Western media depended on long before the Internet was invented.
The crux may be whether websites that require you to login to use them, like Facebook, are considered public or not. But anything you say or do in Facebook that you don’t restrict to being viewable only by 1st degree friends is probably considered public.
> There is an interesting conundrum here, when we post to the internet do we also consent to having that information saved for all eternity?
I'm not sure about consent, but presume it will be 'stuck' and un-removeable from the net once it's out there. (So be careful what you disseminate). Some people even go out of their way to make sure certain content will never be forgotten from the web.
Yes. I think people need to understand that anything on the internet is by default, there forever. Post something privately or behind a password if you don't want everyone seeing it.
> all should follow the orders and those that do not need to be responsible for it
Furthermore the tide will be ordered back, the contents returned to Pandora's box, and universal entropy decreased. Failure to comply will result in a fine.
What blissful times we lived in before the daily drumbeat of “Accomplished young professional discovered to have said offensive things on the internet when they were a dumb teenager, reputation tarred and feathered for the rest of their life.”
we don’t know if it’ll be a lifelong thing though. I suspect with all this pushback and emotional exhaustion (and I really do believe it’s emotionally exhausting to constantly be hounding over peoples morality on the internet) people will just stop giving a shit about what someone said 5 years ago pretty soon.
It’s a numbers game. Billions of people won’t care, but a few dozen can make enough of a stink that a habitually risk averse institution would rather let someone be canceled than risk the controversy snowballing into something bigger.
even if we archive everything, hundreds of years from now all of “the worlds information” could very well be unusable and unreadable for a variety of factors(no one remembers how to deal with the file formats, EMP, bit rot). books however will continue to work just fine as they have for thousands of years
If we're talking that long of a timescale, how long does your typical book these days actually last? I'm no expert, but it makes me wonder how long consumer paper actually lasts. Reasonable(?) search result below.
About reliable email adresses: I'm using my university alumni "email forwarding for life", but this loses me some emails due to DMARC and friends.
What are the alternatives?
I’ve recently migrated four different accounts, three of them Gmail with their own domains, to Fastmail and I was astonished by how easy the process (that I’d put off for _years_) was. Huge weight off my mind and I’ve been very happy with the service since then.
I see the answer to this question more around backups and helping future people overcome technical limitations (knowledge transfer + data archiving).
All things are ephemeral after a certain point but archiving typically lasts much much longer than the human operators. Likewise documenting the process and barriers to overcome will help people in the future solve the problem (and a broader amount of people).
This doesn't have to be public, just needs a way to become public.
Does any one know what's up with all the different domains for archive.is?
The blog is at blog.archive.today, it calls itself the "archive.is blog", but when I visit archive.is or archive.today, I'm brought to archive.vn. When I click the "archive.today" logo in the header, I'm taken to archive.ph
According to my understanding: archive.today/archive.is are the main domains the service is known under, others are mirrors selected depending on country you are located in (because of domain bans in some countries).
Site owner said once archive.today is the domain to use when linking because it will automatically redirect to the correct one.
I'm reminded of some non-profit organization that was forced to shut down their websites because they ran out of money. In retrospect they could've setup a trust fund early on, stuck all their money in there, and then had a perpetual annual income from that for operating costs, instead of spending down. All fundraising could've gone into the trust fund in order to boost the annual budget, etc.