Hacker News new | past | comments | ask | show | jobs | submit login
Archive.today: on the trail of mysterious guerrilla archivists of the Internet (gyrovague.com)
164 points by resolutebat on Aug 5, 2023 | hide | past | favorite | 62 comments



I don't think they appreciate this article, not quite doxxing but publishing the results of a hunt for the person's name and location when it can already be assumed they don't want that known if they have never published it despite being quite high-profile.

The nixos link was edited and removed by the author 3 hours after this submission was posted to HN.

Checking who wrote this blog, the About starts with:

> Jani Patokallio was first bitten by the travel bug at the age of 8 months and hasn't managed to shake it yet. Halfway through racking up 650,000 flight miles

sounds like a nice person (next, they'll tell us how much plastic they bought in a lifetime!), but that aspect aside, I'm not seeing any motive for why archive's personalia should need to be dug into...


archive.today or archive.is - Wikipedia: https://en.wikipedia.org/wiki/Archive.today

Help:Using archive.today - Wikipedia: https://en.wikipedia.org/wiki/Help:Using_archive.today

archive.today - FAQ : https://archive.md/faq

archive.today - wiki : https://wiki.archiveteam.org/index.php/Archive.today

Archive Team wiki : https://wiki.archiveteam.org/

archive.today - Blog : https://blog.archive.today/

Tumblr : https://archive-is.tumblr.com/

Twitter : https://twitter.com/archiveis

  archive.today

  archive.ph

  archive.is

  archive.li

  archive.vn

  archive.fo

  archive.md

  archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd.onion
Launched May 16, 2012; 11 years ago


Thank you. I checked them all, all need the CAPTCHA every single time. I think these are all the same, but on different domains.

Though I am lucky compared to other users, I only need to click a box, and wait ~1.5 secs, no pictures or endless loop.


try switching your dns resolver, as mentioned in thread


I have an endless loop of captchas, it never lets me in!


Me too. I have to disconnect from my VPN hosted at Hetzner to get past the CAPTCHA.


What is the purpose of the mail.ru embedded link on the archive sites?


I think that's code for some Google Analytics-like service (from googling for "top-fwz1.mail.ru/counter"), I mean mail.ru is not only email, but a Yandex/Google-like conglomerate of various web services.


I've also been intrigued about the owner of archive.is and have looked in to it a couple of times but what I managed to find is pretty much all the same stuff as mentioned here.

One interesting thing I'd like to mention are these tweets[1] by archive.is when he was supposedly questioned for something at the finnish - russian border and as a result he blocked the entire site in Finland, although later he lifted the block. I also couldn't find any information about the "Russia vs. http://archive.org case" he mentions in the tweet.

1: https://archive.is/Pum1p


Well, formally, there is a prosecutor office or court decision behind every new block of Russian Censorship Agency, which then merely implements it (and demands the services to remove information or get blocked), but even they stopped pretending they are not in control of rubber-stamping the papers.

https://reestr.rublacklist.net/en/?page=12&q=archive.org

You can see the fine collection of everything from The Anarchist Cookbook to le ironic nasheed remixes, and from exposures of Astral Jews to Alex Jones there:

https://archive.org/details/geo_restricted?tab=collection


> Donations these days are via Liberapay, an obscure French non-profit organization, and YC-backed startup BuyMeACoffee.

I am not sure why Liberapay is qualified as 'obscure'. Their website's "legal" page [0] clearly identifies the organization and its legal representative, while providing contact details. The status of the non-profit organization can be verified in the French government's website [1].

[0]: https://en.liberapay.com/about/legal [1]: https://www.journal-officiel.gouv.fr/pages/associations-deta... - in French


Its obscure because it is not well known not because it is shady. I never heard of them before too.


I interpreted the use of the word "obscure" as meaning "little-known" rather than "secretive".

I would expect that only a relatively small proportion of people would have heard of Liberapay, so I think calling them "obscure" is not wrong.


It isn't; it's been around for ages and is specifically a trusted organization handling donations for open source projects.

Chalk the snotty comment, and irrational YC worship, up to your basic herd mentality. The author is, I would bet, a typical HN poster.


The idea that every good project should be a full scale financially stable enterprise with a proper administrative team and dedicated supportive fan club is severely limiting (and is indirectly telling you to know your place in existing chains of power made that specific way). More often than not, services like those are made not by underground kingpins, but by common people who happened to be at the right place in the right time. For example, torrents.ru was once just one of the many regional and global torrent trackers, sometimes run by literal teenagers (albeit that one had the best domain name). Look at it today.

Also, «Маша» (Masha) and «Мойша» (Moishe/Moshe) are completely different names, and I've never ever seen anyone using the former for the latter. Either the author stretches it a bit too much, or the author knows something that should not be publicly revealed in the manner they chose (and the whole post is just an intimidating leak).

Anyway, if the author(s) have successful illegal business, as implied, they shouldn't have any difficulties in acquiring enough spare identities to burn. As a side note, it's quite ironic that “security” is such an idol today that common people need to go out of their way to evade tracking, while even petty internet criminals buy virtual identities in bulk, and have special instrumented browsers to load fake system data with one click.


With archive.today, sometimes IP addresses may stop working. The following ones appear to be still working.

    x=23.137.248.133 # NL
    x=41.77.143.21 # GB
    x=51.38.69.52 # GB
    x=51.79.250.183 # SG
    x=79.133.51.130 # DE
    x=89.253.237.217 # RU
    x=90.156.209.190 # RU
    x=90.156.209.190 # RU
    x=91.193.43.144 # NL
    x=94.140.114.194 # LV
    x=130.0.232.208 # UA
    x=139.99.171.251 # AU
    x=139.99.89.157 # SG
    x=178.17.174.208 # MD
    x=178.250.243.66 # RU
    x=185.101.35.175 # NO
    x=185.125.168.154  # NO
    x=188.143.233.210 # RU
    x=192.124.216.250 # RU
    x=192.210.214.166 # US
    x=193.148.248.205 # NL
    x=193.233.203.196 # MD
    x=217.197.116.88 # RU

To test, something like

   printf 'GET /timemap/example.com HTTP/1.0\r\nhost: archive.is\r\nconnection: close\r\n\r\n'|openssl s_client -connect $x:443 -ign_eof

   echo $x archive.is >> /etc/hosts
   curl -0A "" https://archive.is/timemap/example.com


In less than 24h all of these except one does not work. Numerous HN commenters have an affinity for archive.is, dropping links on countless submissions. These do not work for everyone.


Interesting read. I've thought about this for a while.

My woes with the site is that my connection to any of the clearnet domains seem to get black holed, or completely blocked by Cloudflare while using Tor. The onion site works fine for viewing, but to archive pages I need to complete the extremely difficult Cloudflare CAPTCHA.


The captcha page looks like cloudflare, but I don't think they're using cloudflare, haha. They use recaptcha (not sure if that's possible with cloudflare), the `server` header doesn't == 'cloudflare', accessing by direct ip gives "hello world" instead of the "Direct IP access not allowed" cloudflare message, /cdn-cgi/trace isn't accessible.

Not sure why they do that. Is it just because it looks decent, or is it poking fun, maybe because of their issue with 1.1.1.1?


>The captcha page looks like cloudflare, but I don't think they're using cloudflare, haha.

That's amazing, I never bothered to take a look once I saw that page but I did just now, and you're right. Google reCAPTCHA skinned as Cloudflare, hysteric.


He is a king amongst men.

Incredible service to humanity.


Sure, ... unless the person behind the “Denis Petrov” nom de guerre is another Alexandra Elbakyan.


Why is that bad? From searching, she is behind sci-hub?


There's no suggestion that would be a bad thing.

Just highlighting the flaw in your assumption that the anonymous person behind this is male.


The way you phrased it definitely made it sound like you were implying it was a bad thing lol


That'd be the way that you read it, if they were another Alexandra Elbakyan (or Joanna Rutkowska, or Stephanie Wehner, or ... ) then they clearly wouldn't be a King among Men.

Just as I wrote.


The issue is more that your comment can be seen as a respond to the /last/ sentence of the original post.


Only to those unaware of who Alexandra Elbakyan is or to those that chose not to investigate.

For those aware the juxtaposition twixt { King of Men } | { Woman who created massive scientific paper archive } tends to dominate.

But sure, not everybody reads things the same way.


No, it is due to the way you phrased the question.


It was a statement, not a question.

That suggests the deeper issue might be language fluency, English can be difficult to parse.


It reads to me same as xereeto said. That's why I asked. Maybe think about that? Do you have some difficulty in interactions?


> Do you have some difficulty in interactions?

No.

I do have 40+ years of observing commenters on internet forums assuming unknowns are male by default.

Maybe think about that?


Well, one of those you mentioned would.


Both names are unisex


Assuming Russian looking spelling, Alexandra is a female name and Denis is a male name.


Did Alexandra Elbayan do something wrong ? Why wouldn't you wouldn't to be compared to them?


>

> Github ... account called “volth” ... contributed ... to NixOS

>

Volth maintained NixOS Perl subsystem:

https://github.com/NixOS/nixpkgs/commits/master?after=1c72dc...

>

> The obvious denispetrov.com ... programmer ... a New Yorker ... end of a 25-year career and the blog dries up entirely in 2011, so it doesn’t match the place or time

>

A Perl programmer: http://web.archive.org/web/20050208095206/http://www.denispe...

Archive.is started in 2012, just after retirement, why these do not match?


Who is archiving the archive?


What also worries me a bit is that Wikipedia started to use them in their references, to archive paywalled references.

Generally I enjoy archive.today very much, but it seems to be a labour of love which can go away any moment (despite its apparent resiliency), rather than something for the ages...


Wikipedia does not require that references be free to access. Most books, journals and physical newspapers fall into this category. So editors are perfectly entitled to reference paywalled articles. The fact that you can access some of these through Archive.today is really just a nice bonus. I am a bit concerned about the possibility that the service might just vanish someday, but I don't think that's a reason not to use it.


The problem is more that a lot of references will just disappear. It just so happened that earlier today I opened 4 or 5 references for an event that happened around 2009-2012. They all gave either a 404s or just redirected me to the homepage.

This is why I consider these types of archives important: not so much to bypass paywalls, but to ensure content is still available in a decade, or two decades.


The original URLs can go away just as easily as the archive.today mirrors of them, which is why Wikipedia (or any website of record) should contain links to both, IMHO.


Wikipedia - Link Rot : https://en.wikipedia.org/wiki/Wikipedia:Link_rot

Link Rot (link death, Link Breaking, or Reference Rot) : https://en.wikipedia.org/wiki/Link_rot


It's strange to hear the author isn't a fan of cryptocurrency. There are a lot of dubious use-cases for crypto, but facilitating donations for sketchy services is an obvious one.


The person probably have strong sense of ethics. (it would fit the project too)


What is sketchy about it?


Why Archive.today

HN Readers, Commentators and Story Submitters are aware of the FAQ; 'Are paywalls ok?

It's ok to post stories from sites with paywalls that have workarounds. : https://news.ycombinator.com/newsfaq.html

Which means for the Publishers of pay-walled sites to be featured on HN,

which is a prime site for garnering potential paying new prospects and to replace natural attrition of subscribers.

Some Publishers see this as a positive.. free samples, a minor amount of the Publications full output,

much like a paid agent in a supermarket giving out cheese and/or spiced meat on a stick etc, to encourage new users.

Other Publishers see this as mice nibbling at their cheese.

I use archive.is because almost of the Articles submitted to HN are already archived, so it is a simple copy and past.

archive.is does Not require java to read or to archive an Article,

it is fast and the archiving scripts work on most sites.


Interesting read!

For a while now, I've had infrequently occurring arcane cert/SSL issues connecting to archive.ph and its siblings, but trying a couple of links from the article I find I can't get past an endless cycle of "one more step" captcha protection - tried clearing all cookies and revisiting an old url, but to no avail.


archive.today is the "official" name, which redirects to the domain of choice (right now archive.md, at least for me).

archive.is is blackholed in many places.


Change your DNS - you are using CF


Are you suggesting the cert problem is DNS related or the new captcha issue?

DNS was ISP, not 1.1.1.1, and I get the same behaviour after switching to 8.8.8.8.


Archive.* sabotages their DNS records when Cloudflare queries for them. They don't like that Cloudflare doesn't do EDNS forwarding so they broke their service for people using 1.1.1.1.

That said, I have the same problem. Even hard coding the IP address I resolved through Google doesn't seem to work. I'm guessing their sabotage may have backfired and is causing issues beyond their intentional scope?


This just helped me realize why I couldn't get to archive.today anymore -- however, for me, both Google DNS (8.8.8.8) and CloudFlare DNS (1.1.1.1) resulted in either infinite captcha loop or timeout.

I had to switch back to my ISP DNS to have connection successful.

I did not realize that choice of DNS resolver could effect access to a website like this. I thought DNS was boring stable technology. The error conditions weren't even DNS failure (which I would also find surprising from Google or Cloudflare), but that server timeout, or weirder infinite captcha loop.


If you use an Apple device and have iCloud Private Relay turned on, one of their providers is Cloudflare and that will cause the same issue.


>I've had infrequently occurring arcane cert/SSL issues <> Same error page as https://1.1.1.7/ ?

captcha is CF

Related: Does Cloudflare’s 1.1.1.1 DNS Block Archive.is? (2019)

HN Discussion (209-comments 2023-08-02) https://news.ycombinator.com/item?id=36970702

I just snapshot this page for a test : https://archive.is/MUhAP = https://news.ycombinator.com/item?id=37009598

Edit of formatting for readability.


Who is your ISP's DNS provider?

Do they run their own resolver, or rely on an extant service?


I do Not use my ISP's DNS!

I'm reticent to disclose my current DNS provider,

given that I am able to access archive.is and many are not at this point of time.


I have just checked 8.8.8.8 and they are serving the correct response now. (incorrect earlier)

Edit

I have just checked 9.9.9.9 and they are serving the correct response now. (incorrect earlier)


Incorrect response here with 9.9.9.10 (unfiltered version of 9.9.9.9) As well as the corresponding Quad9 DOH


I'm using Quad 9 and getting the same results. Who is the right DNS provider?


if you can't trust your isp than either find someone that you can trust (by verification) or run your own resolver.

there was a recent move from the eu to have an eu-centric public resolver which brought up the question if/how the big players address country specific filtering requirements which in turn might have shed some light on the fact that gog/cf didn't care; until now.


I run Pi-hole with Unbound - set up is easy and rewards are uncensored DNS, ad-blocking etc..

Oh and - given the right adlists - may also prevent infecting your machine/network/... with malware...

Not to speak of clients which may not equipped with on-device adblockers, such as TVs etc...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: