Most relevant piece but the whole comment is worth a read:
> Archive.is’s authoritative DNS servers return bad results to 1.1.1.1 when we query them. I’ve proposed we just fix it on our end but our team, quite rightly, said that too would violate the integrity of DNS and the privacy and security promises we made to our users when we launched the service.
> The archive.is owner has explained that he returns bad results to us because we don’t pass along the EDNS subnet information. This information leaks information about a requester’s IP and, in turn, sacrifices the privacy of users.
Honestly it's that type of thing (the frankness, the presence on HN, willingness to participate, the principled stand on privacy) that got me into Cloudflare products. I now generate hundreds per month in revenue for them and that will likely be thousands in the next year or two. His time/effort on HN directly led to customer acquisition and revenue.
That said I do worry about the incentives Cloudflare has to their big customers. CF is a great tool for site owners, but like any tool has the potential to be a great evil (against the user) if the principles ever wane. It's already being used by a lot of sites to make life a living hell for people behind a VPN. As a site owner I absolutely get it: practically zero of my legitimate traffic comes from VPNs (our main demographic tend to skew older and much less technical than the average consumer), but all of the automated attacks against me do. Balancing freedom and rights is hard, but I deeply appreciate the thoughtfulness and principles that CF has displayed over the years.
In this particular case we truncate EDNS to protect the privacy of users because we believe 1) privacy is a fundamental human right; and 2) the original sin of the Internet is that IP addresses are too closely tied to the identities of individuals and services. Truncating EDNS is trying to honor #1 and overcome #2. So is our work on protocols like Oblivious DNS. This work, frankly, upsets some of our customers or potential customers (like Archive.is). But it’s the right thing to do for the long term health of the Internet.
> This work, frankly, upsets some of our customers or potential customers (like Archive.is).
That's a bit unfair, don't you think?
From what I remember of the saga, the original reason for Archive.is's block is that they run their own CDN, and by not knowing the location of the user, they can't determine the closest server to respond with.
So the alternative viewpoint is, that Cloudflare is being anti-competitive by technically preventing other CDN providers from working.
Disclosure: I'm a happy Cloudflare user, but all in all I think Archive.is service is far more fundamental for the internet (especially as it's 100% free!). So I would really appreciate if you could figure out a way of working together. Until then, 8.8.8.8 it is!
> they run their own CDN, and by not knowing the location of the user, they can't determine the closest server to respond with.
I feel like the more reasonable answer here is to just let the user take the latency hit. Surely requests being somewhat slower is preferable to requests being outright bitbucketed, right?
The current situation seems to be that for these users it's not working at all (either timeout or infinite captch loop), and it looks like an archive.is issue, no explanation is given. So avoiding service disruption looking like an archive.is issue does not seem to be the goal.
Forgive my naivety, but can you not just ping several servers and return the best? Could you not even guess first and then asynchronously perform this and then re-route or do so on the next user click? I am not an internet person so this may be a very dumb question.
For a site with longer-lived sessions (e.g. video on demand, gaming etc.) which tolerate a bit of startup delay/inefficiency that can definitely be done.
But for a site that essentially tries to serve you static content as quickly as possible and mostly all at once, that would probably introduce more overhead than it's worth.
In the latter case, that seems like it just wouldn't be such a big deal, right? Since the hit would only happen user side and be a small percentage of the user's time on the site?
I get that they don't want to "take the blame" but it seems like both parties are performing reasonable actions that butt heads but that one party resolves that by just not performing the service. To me that feels like a worse outcome than slow service, as it just looks like the site is down.
The next naive question I have is about the response of truncation. I understand Cloudflare is preserving privacy. Archive says that privacy is preserved because they truncate the PII. Is this truncation verifiable in the request from Cloudflare? If not, then this seems like an unreasonable expectation ("just trust me bro"). Again, personally I'd rather have the latency hit and I'm not sure I'm seeing a good argument against this.
> In the latter case, that seems like it just wouldn't be such a big deal, right? Since the hit would only happen user side and be a small percentage of the user's time on the site?
True, but it's still the difference between being able to load all embedded resources from a server close to the user or potentially having to haul all of that across an ocean, considering TCP congestion window scaling (which is sensitive to round trip times) etc.
All that said, based on a purported comment by the maintainer of archive.is, the aim of their CDN is actually not improving responsivity, but delaying legal/law enforcement responses: https://news.ycombinator.com/item?id=36971650
> Archive says that privacy is preserved because they truncate the PII.
Personally, I don't have a lot of sympathy for either party here:
I think, especially given the comment linked above, Archive's latency/efficiency concerns are just pretext for quite different concerns of their own (having to deal with law enforcement).
And on the other hand, while Cloudflare's EDNS subnet truncation might help user privacy in a few edge cases (as many have said here, the visited site will get the user's IP as soon as they connect to their servers!), it also makes it that much harder for CDNs other than Cloudflare to efficiently serve content using DNS-based routing and forces them to also use Anycast, which is much harder to do.
You shouldn't assume the origin server is setup for direct traffic - either in terms of load management or security (access to origin might only be available to CDN IPs on their ACL)
> You shouldn't assume the origin server is setup for direct traffic
You don't need to make any such assumption; the above point stands even in the case of simply hitting the "wrong" (i.e. geographically suboptimal) CDN endpoint.
They have own Autonomous Systems with own anycast IP addresses.
It is quite expensive for an indie project.
Not to mention legal support for compliance in every country of presence. To block 0.x% of visitors coming from CloudFlare is much cheaper for a small project than to go this road.
> They have own Autonomous Systems with own anycast IP addresses.
> It is quite expensive for an indie project. Not to mention legal support for compliance in every country of presence. To block 0.x% of visitors coming from CloudFlare is much cheaper for a small project than to go this road.
I don't buy this. I'm running my own AS and anycast services for £10pm (my ISP are sponsoring my allocations from RIPE).
Also, it feels like Cloudflare's DNS service is more than just 0.x% of the internet....?
£10 GBP a month for a AS with an IPv4+IPv6 subnet + worldwide POPs that allow you to advertise your subnets over BGP? How did you pull that off? I've researched this a while ago and just the IPv4 subnet alone was at least 10x that amount if you are OK with leasing it from less reputable sources.
You're right, if you've got a legacy internet requirement then that adds another grand a year to your costs. But I disagree that it's "quite expensive for an indie project", especially one that's so popular it needs to run it's own CDN.
> In this particular case we truncate EDNS to protect the privacy of users
In other words, you want the data, but prevent others from seeing your advantage?
This is what archive.is is doing, and you stomp your collective feet at.
> because we believe 1) privacy is a fundamental human right; and 2) the original sin of the Internet is that IP addresses are too closely tied to the identities of individuals and services.
If you cared about that, you wouldn't either block Tor or send us through captcha-hell just to pull a single webpage.
> Truncating EDNS is trying to honor #1 and overcome #2. So is our work on protocols like Oblivious DNS. This work, frankly, upsets some of our customers or potential customers (like Archive.is). But it’s the right thing to do for the long term health of the Internet.
'Upsets'? Wow. Talk about a "Rules for thee but not for me."
User will navigate to the site and show their own IP anyway. You achieved basically zero increase in privacy while making any competitor have problems with any of their users that use 1.1.1.1
And any malicious client that tries to leak data via DNS can just ask for DNS record like my-ip-is-7.8.9.0.example.com and completely go around that privacy "enhancement".
Sorry but the "privacy" here looks like smokescreen to stifle competition.
The concern isn't that the website will know the IP, it's that every single entity on the network between Cloudflare and the authoritative DNS server (most or all of which will not be operated by the website) will know it.
It still may not be the right decision, but it's important to frame the trade-off correctly.
Have you thought about spoofing the EDNS to (a less truncated version of) the nearest Cloudflare edge node's IP? This would be no more of a privacy leak than traffic from your edge to origin servers, while providing generally enough geographic information to keep latency low on the source. This is at the very least compliant with the spirit of the RFC, no?
Sure, we’re to believe that, given that you’re trying to lock out Linux users[1] (which still isn't resolved yet) and pushing for device attestation[2] to lock out rooted devices at the same time.
> given that you’re trying to lock out Linux users
I had that issue with cloudflare bot captcha when trying to access a web novel website. It would infinitely loop into the "please click the checkmark to confirm you're human" thingamagic.
Initially I thought it was because I was a linux user, but I tried to browse the same website on Google Chrome and the issue went away. They were not discriminating against linux, but against Firefox, which is just as bad, if not worse.
I tried everything on FF: deleting all cookies/storage/history, disabling addons etc. It would still do this on a pristine FF. Ultimately, I admit, not being able to access websites did manage to encourage me to uninstall FF, despite it not being FF's fault, I'm tired of dealing with this kind of cr*p.
Your link for [1] shows that CF responsed, confirmed the bug and that they fixed it. If that’s not actually the case have you tried engaging with them again? They seemed very responsive the first time.
CF (like many other companies) are responsive only when the complaint is posted on HN. Regardless, the issue wasn't solved, it came back within a few hours.
I've tried raising this issue on their forum, where I've failed to get the attention of the engineering teams, and while posting the ray ID should be sufficient, all you'd really get is clueless, unpaid volunteers asking you questions in circles like "what website do you see this on" (everywhere), "are you using adblock" (no, and Adblock has never blocked their Turnstile scripts) and "what's your user agent?" (the default Chromium one).
If I had to hazard a guess, it's their bot management script seeing "Linux" in the user agent and detecting missing video codecs (which is par for the course for standard Chromium builds), and thinking it's a headless browser. Between the the fact that differences between the JS runtime of Chromium and Chromium headless are very small these days, and the ClientHello permutation has destroyed bot management vendors' ability to distinguish different browser builds, they decided blocking all Linux users using Chromium was fair enough.
Are you sure this is a widespread, universal bug? Are you sure all Linux Chromium users are affected?
I get that it's a frustrating situation but you're viewing CF in the worst possible light ("trying to lock out Linux users" assumes an intent not on display) and I think it's counterproductive to success.
Pretty sure you are wrong - I run Linux on both my desktop and laptops; and moreover everyone in our engineering team runs Linux as main OS as well. We haven't seen any Cloudfare-specific problems.
(I have no doubt ypu are seeing the problem on your PC; but generalizing a single point to all Linux users just screams "technical incompetence" and makes me want to ignorw the post)
You don't have to "hazard a guess", one of their engineers gave you their email address in that other thread. They also invited you to their discord. Have you tried talking to someone directly at the source?
There’s a fair argument to make against device attention, but casting the exclusion of rooted devices as the goal of the policy rather than a side effect is a bit disingenuous.
I have huge problems with Cloudflare, but this comment is dishonest.
[1] "Hello, Benedikt from Cloudflare and the Turnstile Team here. Thanks you so much for the report. We looked into this report and identified that there was some false positive and cleared the signal. We have investigated this report and the issue should be fixed. Please reach out to me benedikt@cloudflare.com or at our Cloudflare Turnstile Discord, if you are still encountering problems."
[2]
> Servers commonly use passive and persistent identifiers associated with clients, such as IP addresses or device identifiers, for enforcing access and usage policies. For example, a server might limit the amount of content an IP address can access over a given time period (referred to as a "metered paywall"), or a server might rate-limit access from an IP address to prevent fraud and abuse. Servers also commonly use the client's IP address as a strong indicator of the client's geographic location to limit access to services or content to a specific geographic area (referred to as "geofencing").
> However, passive and persistent client identifiers can be used by any entity that has access to it without the client's express consent. A server can use a client's IP address or its device identifier to track client activity. A client's IP address, and therefore its location, is visible to all entities on the path between the client and the server. These entities can trivially track a client, its location, and servers that the client visits.
> A client that wishes to keep its IP address private can hide its IP address using a proxy service or a VPN. However, doing so severely limits the client's ability to access services and content, since servers might not be able to enforce their policies without a stable and unique client identifier.
> This document describes an architecture for Private Access Tokens (PATs), using RSA Blind Signatures as defined in [BLINDSIG], as an explicit replacement for these passive client identifiers. These tokens are privately issued to clients upon request and then redeemed by servers in such a way that the issuance and redemption events for a given token are unlinkable.
Please see [1] regarding the concerns around attestations and PAT and [2] for what has happened outside HN, which a simple reading of that thread wouldn't otherwise suggest.
So Stavros [1] indicates that archive.is needs that EDNS data to protect themselves against CSAM/ISIS material based attacks, and they suggested solutions but CF refused to cooperate. Is this true?
Exchange Online determines the closest EXO front door by the location of resolver. Other M365 services do not use this method, it is an artifact from Exchange Server.
EDIT: This is outlined in [0], although it doesn't go into the depth I wish it did.
> By providing local Internet egress and by configuring internal DNS servers to provide local name resolution for Microsoft 365 endpoints, network traffic destined for Microsoft 365 can connect to Microsoft 365 front end servers as close as possible to the user.
More of a meta comment, but thank you for your willingess to upset some customers and potential customers. Having and standing by principles can be damn inconvenient at times, but the world is a much better place because of it.
Also, looking up DNS in one direction and browsing (say over a VPN) in another. The destination site doesn’t always get the same IP that the DNS request gets.
> thank you for your willingess to upset some customers and potential customers
Or, thank you for wasting your customers time attempting to figure out why one or more sites aren't responding appropriately on your network while they work on other networks.
Not everyone is clued into EDNS or why archive.is doesn't function with CF.
I mean, it's archive.is that is intentionally serving an incorrect DNS record (pointing back at Cloudflare's IPs) when it gets a DNS query that every other resolver handles just fine. They may have legitimate grievances with the info being dropped, but in the end they're the ones breaking their own traffic.
That seems like your much stronger older brother hitting you with your own arm and asking "why are you hitting yourself" over and over again though. Cloudflare is standing their ground with their morals, and Archive is standing their ground with their morals. Which one is right is for you to decide.
Without protecting themselves, archive.is wouldn't exist.
And given it is the subnet number being sent, NOT the IP address that people here claim, the privacy concern is fairly low (CF knows your IP address in order to deliver the DNS answer back to you and archive.is knows your IP address when you request resources).
I'll take the performance improvement that EDNS client subnet can provide.
> In this particular case we truncate EDNS to protect the privacy of users because we believe 1) privacy is a fundamental human right; and 2) the original sin of the Internet is that IP addresses are too closely tied to the identities of individuals and services.
Your beliefs certainly aren't reflected in how you treat users. It's been a good while since I've been able to visit any cloudflare protected site using Tor. Your broken systems keep presenting me with infinite checkboxes that do absolutely nothing.
If you want to block people who truly believe that privacy is a fundamental human right, at least have the decency to be honest. Tell Tor users that they are permanently blocked so that they don't waste their time clicking on pointless checkboxes.
This, and Warp's insistence on passing my origin IP to my destinations, are two things I wish I could customize when using Cloudflare. Before going corporate OpenDNS had something similar, and you could even set up custom behaviors per origin IP (for home and work), which I miss. These are good defaults, but I wish to change the defaults while also allowing particular domains to get my full EDNS info, like particular CDNs.
While I'm here, I'd also like to layer Zero Trust and Warp+ so I can toggle my internal network while staying on Warp+.
Also, the separation in Zero Trust and tunnels between routed DNS names and private IPs is very confusing. Why do I need both?
Custom DNS entries for Zero Trust DNS would be nice, so I could point internal domains to the external routing without having to have public DNS, or even have the domains match.
> Warp's insistence on passing my origin IP to my destinations
IIRC WARP was only able to forward your origin IP to websites using Cloudflare. Then, as of Aug 2022, their FAQ[1] says your origin IP is hidden regardless of which website. Their IPs do reveal your geolocation though.
There was a bug[2] that revealed your IP to select websites; that seems to have been fixed by Nov 2022.
Disclaimer: I’m not knowledgeable enough to test every possible IP leak mechanism (like WebRTC), so I didn’t do that. I’m basically taking their word for it.
I wish we (and the rest of the world) were as organized as it’d take to support conspiracy theories. Easier explanation probably is that I just woke up about 90 minutes ago. HN is about #9 on the list of things I check every morning. Saw this post and thought: “Oh not this again!” Pinged our team to get latest status since I hadn’t thought about archive.is in several years. And then jumped into the comments.
But your story is way more fun, makes us look way more organized, and, frankly, would give me hope that there are really smart people out there organizing and planning the chaos. So let’s go with yours.
I’m smiling thinking of the reaction I’d get on our marketing team if I suggested: “Hey, I have an idea, let’s post a negative story on Hacker News so then people in the comments will say things and I can reply.”
John (our CTO) and I care about HN because we’re engineers at heart. Unfortunately, most the people we sell to today have never even heard of let alone participated in this community. But, over time, I hope as many of the people in this community get C’s in front of their titles and manage hundred million dollar IT budgets, I hope you’ll stay engaged here. More engineers in the C-suite would be great.
Though what you’ll find, sadly, is there’s still lots and lots of chaos.
There is a legend about an online store on the eve of the 2000s, which behaved very badly towards customers, and on forums everyone discussed how bad they are.
But their mentions performed very good for SEO and sales.
It's a [m-word] textbook story, I don't believe you don't know it.
There are no "negative stories", there are newsworthy occasions.
Reminders that we're still alive, that we're still working.
And comments about how great we are, which for some reason are far more numerous (and upvoted) in this thread than Hacker's technical discussions.
I would bet the scale and scope of Cloudflare's business would make HN a complete waste of time over and over, even if it were some marketing conspiracy. The CEO of an org like CF has way better things to do with his company-time than shill on HN. Plus, if you took 2 minutes to look at eastdakota's comment history and/or mine, you'll see we are longtime active HN users. So if we're somehow in cahoots to drum up some good marketing PR on HN, then we've put more than 10 years of effort into building fake profiles with comment history and have carefully made sure to hide our affiliation, just so I could compliment Cloudflare on HN and hope that the Gods of random actually got it noticed and upvoted, just so the CEO could come in and leave a comment.
Or, maybe we're just people that like and use HN that happen to believe what we say.
> The CEO of an org like CF has way better things to do with his company-time than shill on HN. Plus, if you took 2 minutes to look at eastdakota's comment history
If we took 2 minute to look at eastdakota's comment history (put aside its authenticity), we can see that he has no better things to do than to flame with us in this branch hidden by mods from the rest of the world besides us three (how this topic seen by others: http://archive.is/1VgPM)
Honestly, my go-to explanation would have been "Cloudflare has an automated alert for when they're on the HN front page". Not an, urgent, wake up the team alert, but a "post it to a Slack channel" bot.
Maybe I should throw together a Cloudflare Worker that does this.
>Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data.
This is honestly such a tone-deaf answer. At the end of the day, users don't care about the purity of adherence to some far-down technical principles or details. As a product leader, you should task your team to solve real user problems and not spend literal years arguing about EDNS.
Truly, just imagine the user story:
"I can't access this website."
"No worries! That is by design, because the protocol response returned to CF is illegal, and the server simply propagates the error back to you. It would be impure for CF to modify the response in-flight to fix this for you."
"?? I do not care. I am talking to CF, why can't CF just fix the issue?"
Your take is so on the money. I'm a staunch CF advocate, but so much rests on the trust they engender. Their power is a consequence in their presence as a trustable company working across the infrastructure of the Internet. There's so many pathways where the enforcers of good behavior eventually end up as the Pharaohs. I genuinely love Cloudflare while being simultaneously extremely wary of Cloudflare.
It's the opposite for me. Cloudflare presents me with so many infinite "Are you human" check boxes when using Tor that I decided to never pay a single cent to that company unless we found ourselves in a situation were there was no other choice. Thankfully, there almost always choice, with most vendors now offering very good alternatives. I work in government IT procurement.
Cloudflare is in the wrong here. They want to "protect" people from their own ISPs, from nefarious web and DNS servers that'll "sacrifice the privacy of users" by - you guessed it - doing exactly the same thing themselves. They've given very little reason to trust them, while giving plenty of reasons to think they might be evil (like protecting known spammers / scammers / phishers).
If another company did what Cloudflare does and homogenized tons of requests behind them, you can bet Cloudflare's CAPTCHA systems would block them in a second.
I have zero respect for Cloudflare's inability to answer criticisms about what they do, about their constant deflections from simple, straightforward questions, and the fact that they do to others what they would never accept anyone else doing to them. It's hypocrisy in the service of trying to become a monopoly by re-centralizing the Internet.
Don't believe me? Go ahead and look for examples of Matthew Prince addressing concerns that much of the non-western world can't access Cloudflare fronted sites because of Cloudflare's "reasons". When you don't find any that have more than just vague platitudes and handwaving, imagine how you'd feel if you were one of those multiple billion people.
EDNS is in part how using one IP address across the world can work without tons of latency for everyone who isn't geographically local. In other words, 1.1.1.1 would be a lot shittier, and the DNS answers they provide would be much less geographically appropriate, if they didn't make use of information about the source of a query.
In other words, Cloudflare expects us to think they're so special that they should get to do what they explicitly don't want others doing.
It's bullshit, particularly for all the people who are victims of Cloudflare's manipulations such as the default use of Cloudflare DNS servers for DNS-over-https on Firefox, which users were never asked about before it was enabled for them (at least in the US).
Cloudflare is not special or unique tons of resolvers don't support EDNS. archive.is serves them all the same they only lie in their response if the source is Cloudflare.
It's actually really funny archive.is works from time to time on 1.1.1.1 which I'm assuming is when archive.is hasn't update their IP list / detection logic. I wonder how much time they spend maintaining that if they blocked everyone without EDNS it would be easy but since it's just Cloudflare....
Given that Cloudflare has ~80% of the CDN marketshare and ~20% of the total web marketshare (a number that keeps on growing, simply because no one dares to prosecute bad actors and so site operators are all but forced to hide under the protection racket), their push for PAT will lead to users effectively being locked out of a lot of online services if they do not pass the Blessed Test aka have modified their device in any way that The Gods have not determined to be acceptable.
It's bad enough on mobile - try banking apps, Netflix, Google Pay, Samsung's Watch companion app or pretty much any game on a rooted Android, you'll have to bytecode patch your way around. I do not want that crap on desktop as well.
> their push for PAT will lead to users effectively being locked out of a lot of online services if they do not pass the Blessed Test
The blog post you linked to specifically explains that Cloudflare are using PATs to replace CAPTCHAs. Do you find yourself filling out Cloudflare CAPTCHAs on a regular basis when accessing 80% of web services? I personally do not, nor have I heard anyone complain about such a thing.
If you're not filling out a CAPTCHA today you're not going to be required to provide a PAT tomorrow. Even if you _are_ required to provide a PAT to access a service your inability to do so means it'll fall back to what happens today: you fill out a CAPTCHA.
I'm not saying this is amazing or even anything positive at all but I think it does a disservice to exaggerate the situation. It leads to people dismissing the issue out of hand.
> The blog post you linked to specifically explains that Cloudflare are using PATs to replace CAPTCHAs. Do you find yourself filling out Cloudflare CAPTCHAs on a regular basis when accessing 80% of web services? I personally do not, nor have I heard anyone complain about such a thing.
I regularly see applications deny me service simply because my Android device is rooted. The RiF app, back when it was usable, led fanfiction.net to loop around Cloudflare craptchas in its in-app browser because FFN had the anti-bot settings turned to the extreme and something deemed my device to be dangerous.
I have zero reason to assume that once Cloudflare has enough market penetration they can and will switch on a rule to kill off "modified" desktop devices just as well - especially not if rightsholder alliances such as the MPAA, Netflix or whatever demand yet another piece of crap signal to judge my device worthy. Just look how rightsholders almost got Microsoft to implement Palladium for their crypto crap decades ago, and now we got the same shit on virtually all mobile devices with Widevine refusing service if you dared to install an aftermarket firmware or root it. Or how the rightsholders got HDCP embedded everywhere. Or how Secure Boot got all but mandatory.
Thanks but no thanks, everything that even begins to veer in this direction has to be stopped from the start. Cloudflare is in a too powerful (and for media rightsholders, way too tasty) position, and we have seen it in the past (as detailed above) how eventually the rightsholders get their way.
FWIW I don’t disagree with anything you’re saying but you’re going leaps and bounds beyond your original statement, which is what prompted my original comment.
“[link to a post discussing replacing CAPTCHAs with PATs] is the greatest threat to user freedom we’ve ever seen” loses potential allies because the evidence doesn’t back up the statement.
IMO the actual threat is the centralisation of power on the internet. PATs are simply a manifestation of that power. Lots of people in this thread use Cloudflare and enjoy their products so any argument relying on “Cloudflare are evil” will likely also fall flat. “Centralised power is inherently/inevitably evil”… now we’re cooking with gas. But it’s still an uphill struggle losing people all the way.
> Lots of people in this thread use Cloudflare and enjoy their products so any argument relying on “Cloudflare are evil” will likely also fall flat.
Lots of people in this thread likely use and enjoy Google or Apple or Microsoft products, and yet the notion that said companies are evil probably wouldn't get much disagreement except from their most rabid of fanboys.
My problem is that you offer no alternative. That terrible future you foresse; I earnestly believe it has already happenned. A sealed, signed, and immutable OS is already a reality on Apple's platforms. Windows and Linux are not against this idea; they're late.
When it does happen that Windows and Linux join the sealed and signed party, what can be done to make you whole?
> A sealed, signed, and immutable OS is already a reality on Apple's platforms.
Until I turn SIP/AMFI off and mess with the OS as I wish, which Apple lets me do. I'm not looking forward to random websites deciding that is grounds to denying me access or giving me 10x more captchas.
Until Cloudflare decides to not even serve you the CAPTCHA because your browser is too old. Where "too old" can mean the current Firefox ESR release, or a small niche browser. Or it legitimately is an older browser because you're into retro computing preservation.
Cloudflare position themselves as the internet safety police will turn the web into a monoculture and that was the argument used against MSIE and now used against Google. But this is a larger threat because the end-users don't choose to use CF so it isn't measurable how many people are being adversely affected. Site owners' metrics will only show them the permitted users and they'll be unaware that their applications are unusable to many people. How do you get a bug report from something that doesn't appear in your logs?
> Do you find yourself filling out Cloudflare CAPTCHAs on a regular basis when accessing 80% of web services? I personally do not, nor have I heard anyone complain about such a thing.
Sort of. On a regular basis = when I decide to visit less hacker-culture websites, which isn't often, but then I do the CAPTCHA and it stops bugging me for a long time.
So, are you telling me in the future PAT will prevent me from visiting such websites altogether? Maybe the poster isn't hyperbole then.
I always feel that Cloudflare makes the right decisions and wonder what the critics are worried about. Is the criticism is that they are just too darn big?
Our there any other nuanced opinions?
Secondly doesn't an entity have to have a certain amount of internet traffic to even consider a CDN.
My point of view is that large commercial entities need CDNs small time people can get by with the ip provided by the ISP.
The problem is, it's incredibly hard to run a non-static website these days without using something like Cloudflare, Cloudfront+WAF or Akamai in front. You will get constantly hit by morons running hack attempts, and if you have something that someone might want to disrupt (e.g. a comment section, or you run a Mastodon instance) for whatever reason, and be it just to spam fake Vixgra pills, you will get hit by these as well. Or some extorters will DDoS you, email you "give us a thousand dollars and we'll stop".
The core problem is that our governments are doing nothing to keep bad actors accountable - China, Russia, North Korea, Iran, they all can run attacks as they please without any consequence. And ISPs don't do anything if you alert them about hacked routers.
In my day job, I'm responsible for running a few very high profile and high traffic targets, and the amount of bullshit we have to deal with even after CloudFront+WAF in front is absurd. Gotta be three digits worth of abuse emails I sent out to ISPs all over the world, and nothing happened.
Trying to create a monopolistic cartel where only a select few corporations can make devices or browsers, as described in my other comment[1] is seen as evil by most countries out there[2].
Insisting that a browser or device be attested by the manufacturer locks out smaller firms or indie developers from developing alternatives to said browsers or devices, as they would be in a catch-22 situation, where they'd be seen as illegitimate till they get a broader audience, and unless they attest their devices they can't get an audience.
The same situation exists in other areas as well. Trying to bootstrap a search engine? Well tough luck, because what are you trying to do with a headless version of Chromium?
This in turn creates a cartel where only few players can compete, and the members of said cartel can enter new markets without ever having to compete.
If you're using a Pi-hole for your DNS (or anything else using dnsmasq I suppose), I worked around the issue by creating /etc/dnsmasq.d/02-archive.is.conf (with a Docker bind mount in my case) with the following content:
Or you could just not use 1.1.1.1. There are other players in this space, and unless you are querying DNS but not connecting to the server to get content, the server is going to know your ip.
It's not just about the server you're connecting to knowing your IP, that's a given. With ECS every recursive DNS server in the chain also knows which site you want to visit
I was willing to give CF the benefit of the doubt, until other posters (and you) pointed out that this is a red herring. Also given Stavros' note [1] on how archive.is needs the EDNS data to protect themselves from CSAM/ISIS material based attacks and that they suggested solutions but CF refused to cooperate, I'm unsure of the motives behind these posters claiming CF is protecting privacy. Matthew Prince's motives in his truth-but-not-full-truth response are obvious.
I'm still giving cf the benefit of the doubt, but I need more research.
I always found the funding of archive.is unknown. Who is behind it and why do they want this info. Why and how they can provide this for free is a big unknown to me.
I'm giving cf the benefit of the doubt against archive. At least I know cloudflare and this would be the first "doubt-moment"...
It's weird that others don't have this issue that much, I would have thought that CDN's would scream from everywhere for years already, if archive.is his statement is "complete".
Edit: cloudflare does not seem to block what's needed though.
> EDNS IP subsets can be used to better geolocate responses for services that use DNS-based load balancing. However, 1.1.1.1 is delivered across Cloudflare’s entire network that today spans 180 cities. We publish the geolocation information of the IPs that we query from. That allows any network with less density than we have to properly return DNS-targeted results.
The problem is that there are more parties than you, your DNS server (the cab driver) and your destination (the website). The cab driver might have to ask a number of other people how to get to the destination.
Your DNS server probably doesn't have the exact record for you at the ready, but it does know another DNS server that gets you closer to an answer. That's how recursive DNS works and it might happen a few times before you actually get to a result. With ECS now every DNS server in this chain knows 12.45.56.x wanted to visit hacker news.
I talked to the maintainer of archive.is years ago, they said this (hopefully they won't mind me posting):
> There have been numerous attacks where people upload illegal content (childporn or isis propaganda) and immediately reported to the authorities near the IP of the archive. It resulted in ceased servers and downtimes. I just have no time to react. So I developed sort of CDN, with the only difference: DNS server returns not the closest IP to the request origin but the closest IP abroad, so any takedown procedure would require bureaucratic procedures so I am getting notified notified and have time to react.
> But CloudFlare DNS disrupts the scheme together with all other DNS-based CDNs Cloudflare is competing with and puts the archive existence on risk. I offered them to proxy those CloudFlare DNS's users via their CDN but they rejected. Registering my own autonomous system just to fix the issue with CloudFlare DNS is too expensive for me.
When I proposed using the DNS server's IP instead, they said:
> It did not work initially because they have global planetwide cache.
> 1. Someone resolves domain from Brazil.
> 2. Website's DNS get request from Cloudflare Brazil DC.
> 3. The result is replicated to other Cloudflare DCs
> 4. Some from Turkey resolves same domain and get the cached value
> It could be worked around by setting tiny TTL, which would slowly end up in consistent results, but... After "I’ve proposed we just fix it on our end .." all requests for 7 archive.* domains are sent from Symantec USA IP
This makes a lot of sense, and this comment should be higher.
The other comments that only present the Cloudflare side of the situation make it sound like the archive.is owner was being unreasonable, but as we see there is more to it!
I personally tried to use 1.1.1.1 as my resolver a couple of years ago but I use archive.is a lot.
Regardless of who is “at fault”, not being able to access archive.is is a dealbreaker for me so I quickly stopped using 1.1.1.1
But Cloudflare has a lot of other things that work well for me.
Maybe they need it to route the traffic to the right CDN? That kinda would make sense.
While I'm very privacy conscious, I don't really see the benefit to hiding my region in the DNS request. Because the very next step after the DNS is my browser making a request to their webserver, at which time they will have my actual complete IP anyway.
I've talked to some people working for the .nl TLD. They collect logs on DNS requests for every .nl domain to mine data about phishing websites and online scams. They're not using the EDNS information as far as I know (that would be very very illegal) and I don't know what the introduction of the GDPR has done to their research, but TLDs not limited by privacy laws such as American companies can do whatever they want.
It's not just the website's DNS server that received your subnet information; it's every single location in the chain of DNS resolvers. That includes TLD servers run by data mining companies. Does Verisign need to know that 2001:2345:6789::abcd is looking for news.ycombinator.com?
With caching in place these methods of data gathering aren't all-encompassing, but if you visit some new or uncommon domains you'll be more likely to become part of the dataset.
>Does Verisign need to know that 2001:2345:6789::abcd is looking for news.ycombinator.com?
Verizon wouldn't know that even with ECS, because ECS only needs to include the subnet prefix of the length that the client (Cloudflare's recursive resolver in this case) is willing to give out. There is no benefit and only harm to the client if it gives out the whole IP, and indeed it is called out as a bad idea in the ECS RFC.
eastdakota's original comment covers this. The DNS request isn't encrypted, so anyone with control over the network (upstream ISPs via warrants) can use this information to figure out who is attempting to visit. ("Someone is resolving example.com" is less information than "someone in LA is resolving example.com") Meanwhile, the actual HTTPS connection leaks less information. If the website is hosted on a CDN or cloud provider, then someone monitoring the IP traffic only knows that you're visiting something hosted by that CDN. ("The target is visiting a Cloudflare-hosted site" is less information than "The target is visiting example.com") So, there is a slight information leak by sending the geolocation information.
On the other hand, it's possible this doesn't matter. The client might not encrypt the host it's trying to visit. Nation states can correlate packet timing. So if someone really wants to know, they'll probably figure it out. (This is always a risk with things like Tor. If the government is monitoring your connection and some target website's connection, and you are sending a lot of packets at the same time they're receiving a lot of packets, you can guess who is talking to who.)
> If the website is hosted on a CDN or cloud provider, then someone monitoring the IP traffic only knows that you're visiting something hosted by that CDN.
This isn't true, because the request leaks the hostname in the handshake via SNI:
> EDNS IP subsets can be used to better geolocate responses for services that use DNS-based load balancing. However, 1.1.1.1 is delivered across Cloudflare’s entire network that today spans 180 cities. We publish the geolocation information of the IPs that we query from. That allows any network with less density than we have to properly return DNS-targeted results.
edns-client-subnet only provides an IP address; the receiving CDN still needs to geolocate it.
So the main difference is that Cloudflare's servers need to be present in the IP geolocation database. Given their prevalence, they're probably in most of them already.
This is obviously not Cloudflare's fault, but I wonder why they don't just mask their identity (e.g. by using a random AWS IP address) when querying archive.is?
AFAICT this wouldn't "violate the integrity of DNS and the privacy and security promises we made to our users" and would solve a big pain point of using 1.1.1.1.
We’ve tried. The owner of Archive.is actively monitors and then returns bad results. This is true even if we recurse through another recursor. It’s a very odd hill to die on.
Have you guys considered just having the resolver not return anything? Such that my system would fallback to another resolver (such as Google or Quad9) and I wouldn't have issues accessing the site?
I guess that still has the privacy implications.. but at least it would work!
I think I'm missing something, but is there a way you can pass along some some sort of vague location info for caching purposes without revealing too much? From their tweet they mentioned even continent level information isn't available, which I can understand. Is there really no middle ground that works here?
From another post the CEO made it sounds like they could do a bunch of things but don’t think they should. Which I understand. Once you start adding workarounds for specific domains I can imagine the whole thing spiralling quickly. The owner of archive.is doesn’t want the traffic, CF probably shouldn’t move heaven and earth in response.
Continent-level information doesn't exist. EDNS Client Subnet doesn't send a location, it sends a subnet. Its "location" then has to be looked up in geolocation databases which may or may not be accurate. There's no subnet that will map to a continent.
I'm not affiliated with CF at all, but if I were I would oppose that idea on a couple levels.
Philosophically I think that lacks respect for the site owner and it would be wrong to deceive them and go against their wishes.
Pragmatically that sounds like a giant maintenance pain in the ass to manage, and not worth the time/money to make somebody's site work who actively doesn't want it to work.
Nextdns: How we made DNS both fast and private with ECS[0]
Cloudflare sends the closest city from where the request was made which ought to be sufficient to optimize for a cdn I guess? Appears that archive.is doesn't have a locus standii in this debate...
> EDNS IP subsets can be used to better geolocate responses for services that use DNS-based load balancing. However, 1.1.1.1 is delivered across Cloudflare’s entire network that today spans 180 cities. We publish the geolocation information of the IPs that we query from. That allows any network with less density than we have to properly return DNS-targeted results.
>Cloudflare sends the closest city from where the request was made which ought to be sufficient to optimize for a cdn I guess? Appears that archive.is doesn't have a locus standii in this debate...
>>all requests for 7 archive.* domains are sent from Symantec USA IP
It might be that the archive.is only lies to that IP, which would explain why many users in this thread say that archive.is does resolve correctly for them with 1.1.1.1
Lately I've run into the issue that they're also blocking iCloud Private Relay. Although for some reason the block is only affecting me over IPv6, if I disable v6 it works, which led me to waste a bunch of time debugging.
The seemingly anti-privacy blocks by archive and the fact that nobody knows who funds this extremely popular service has led me down conspiratorial thought patterns.
Cloudflare is one of the operators of egress nodes for iCloud Private Relay, along with Akamai and Fastly and maybe some others (IIRC). So you probably have issues with Archive.is when you're connected through Cloudflare. Although I'm not sure how DNS resolution works on private relay (since it's the DNS server rather than the egress proxy that causes the issue - but if the egress proxy is making the DNS requests then that would explain it).
I have the same issue and just now found it was because of Private Relay. I did not connect the fact that I had no issues accessing archive.is when I had a VPN on - which disables Private Relay. Since they got a location, not my location but a location they were hunky dory....
I saw this post and tried it with and without Private Relay and sure enough, turning it on is the issue. Good to know....
Edit: I updated Private Relay to "Maintain general location" for IP Address Location Settings and archive.is loads fine.
Second Edit: Maybe not, it all works now and I think it is either session or cache. I got to play around with it
Does enabling a VPN disable private relay or does it tunnel to the VPN through private relay? For some reason I have it in my head that it "double wraps" the traffic but I've never actually confirmed that for a fact.
I don't know for sure what's going on with the DNS. I know the archive.* website messes with Cloudflare's DNS results because they don't know how to set up anycast, but when I use some external tool to look up the right address and hardcode that I still get either a TLS error or "hello world".
archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd.onion still works fine if you don't want to share your geolocation information with every DNS server in the recursive resolver chain.
While archive.is' reaction is questionable (and even then, its their own server and can do whatever they want with it) this blog post is throughout Cloudflare advertising. The CEO used the scenario to their advantage very cleverly, but this blog post could describe all of that without constantly reminding how wonderful CF is.
Or, they could do it while constantly reminding people how wonderful Cloudflare is, just because they want to. I'm not sure why supposed to be wrong to honestly advertise.
Many people think any form of advertising is evil. Though incorrect, it's perhaps a predictable reaction to the pervasive and genuinely morally bankrupt advertising we're inundated with on a regular basis.
Cloudflare is also hurting peering and ggc to Google on my ISP.
When I use google dns, opendns or even my isp's own dns, the ip I get for example, googlevideo.com resolves to my isp's cache. But with cloudflare it gives my a google ip and I get no cache benefits.
Also, anecdotal evidence, uploading and downloading large files to Google Drive and some other websites is significantly faster when I use google dns or opendns. I'm not sure how that works, but maybe the server's returning an ip my isp can't route to in a fast way? My isp is notorious for having bad routes.
One way you can get around this and still use 1.1.1.1 for most sites would be to use a local recursive resolver that supports domain name-based upstream overrides such as dnsmasq, and setting that as your local resolver.
You can run dnsmasq on anything that can run arbitrary services e.g. on your local workstation, or on your router if it is open enough, or even on something like Termux probably.
Great post! I respect people's decisions regarding the focus on privacy but above all I just need the internet to work and this was always a mysterious issue. Finally fixed by switching to Google DNS, which for me personally is not a problem, escaping their net is better than not but not a requirement.
I'm seeing that now too. It might have changed in the last 4 hours, or maybe the behavior is cyclical for some reason. I posted this article after running into it repeatedly over the last few weeks since switching to Cloudflare DNS.
I wonder if one party or the other actually made a change in response to this hitting the front page again?
The behaviour is cyclical. It has been for years, really. It starts resolving every now and then, but rarely for long and I doubt this time is any different.
I don't think archive.is blocks CF based on their IPs, so they must have some heuristic in place to defect bogus EDNS. Perhaps sometimes that heuristics fails?
Tangentially, various network providers (eg. some mobile networks) block access to some archive sites (particularly the Internet Archive as it's the most well known one) on the basis of them containing adult/pirated content.
Surfacing here for people who like to read comments before clicking through to the article.
TLDR: the site owner was returning wrong DNS responses for people using Cloudflare's 1.1.1.1 DNS service because the site owner doesn't like Cloudflare.
> the site owner was returning wrong DNS responses for people using Cloudflare's 1.1.1.1 DNS service because the site owner doesn't like Cloudflare.
Or you can try to not imagine everyone as hating everything and read the other comment in here posting the archive.* side of the story.
> There have been numerous attacks where people upload illegal content (childporn or isis propaganda) and immediately reported to the authorities near the IP of the archive. It resulted in ceased servers and downtimes. I just have no time to react. So I developed sort of CDN, with the only difference: DNS server returns not the closest IP to the request origin but the closest IP abroad, so any takedown procedure would require bureaucratic procedures so I am getting notified notified and have time to react.
> But CloudFlare DNS disrupts the scheme together with all other DNS-based CDNs Cloudflare is competing with and puts the archive existence on risk. I offered them to proxy those CloudFlare DNS's users via their CDN but they rejected. Registering my own autonomous system just to fix the issue with CloudFlare DNS is too expensive for me.
Hmm, no clue but if i ask 1.1.1.1 for the a record for archive.is i get 89.253.237.217
asking the office DNS the same question i get 51.38.69.52, asking 9.9.9.9 it gives me the same IP as the office DNS. Finaly asking google or 8.8.8.8 i also get 51.38.69.52
It's not just Cloudflare, it's any DNS resolver that doesn't implement EDNS extensions. If you (or the company you work for) run a recursive resolver that doesn't submit such data, you'll run into the same issues.
I've switched to archive.org because archive.* is broken. For stuff that .org doesn't have, there's always the Tor version. The Tor address seems to be more responsive as well, so that's nice.
No, they go out of their way to make a system that can handle people trying to abuse it. Cloudflare doesn't like that system and refuses to help them.
> There have been numerous attacks where people upload illegal content (childporn or isis propaganda) and immediately reported to the authorities near the IP of the archive. It resulted in ceased servers and downtimes. I just have no time to react. So I developed sort of CDN, with the only difference: DNS server returns not the closest IP to the request origin but the closest IP abroad, so any takedown procedure would require bureaucratic procedures so I am getting notified notified and have time to react.
> But CloudFlare DNS disrupts the scheme together with all other DNS-based CDNs Cloudflare is competing with and puts the archive existence on risk. I offered them to proxy those CloudFlare DNS's users via their CDN but they rejected. Registering my own autonomous system just to fix the issue with CloudFlare DNS is too expensive for me.
Isn't the Internet full of major websites that need to be able to handle that kind of abuse? If what archive.* did were really necessary to do so, then why haven't any other websites needed to do the same thing?
>Isn't the Internet full of major websites that need to be able to handle that kind of abuse?
I don't know; since their whole reason for being is to act as (a temporary?) archive of websites that would make them more vulnerable to these attacks than someone like ebay I'd think?
As I understand it, the main reasons people use archive.is over archive.org are because archive.is is more of an immediate proxy/cache/cdn, rather than a long-term archival system that requires a bot to crawl based on schedule parameters. That, and also it includes features to help bypass paywalls by sanitizing some (all?) JavaScript.
On the other hand, Archive.org doesn’t remove or alter scripts or anything like that. And as far as I know you can’t just request them to crawl a site and then browse it there immediately, but you can on Archive.is
Doesn't really change anything for the end-user that wants to access the website and is bummed that it doesn't work. There might be politics in the way but all they care is that it doesn't work.
> Supposedly Cloudflare uses a feature of DNS archive.* doesn't like, or vice versa.
From what I understand, it's the opposite: Cloudflare doesn't use a relatively new feature of DNS (EDNS Client Subnet), and that site doesn't like the lack of that feature.
iCloud Private Relay uses a variety of third parties for their network connectivity. If it's working fine, you might be going out via Fastly or someone else. The person who it's not working for is likely getting Cloudflare.
With iCloud Private Relay, it sometimes works for me and sometimes doesn't depending on which CDN's system I'm using at the time.
Intransigent maintainers are such an irritating problem. Any time a maintainer has some strongly-held beliefs or goals that supersede their desire to serve users, you've got a ticking time bomb. When the two clash, they'll decline to serve users in order to serve their strongly-held belief instead. We see it time and time again.
Indeed, they mention that the proper solution is to get their own AS but that it's too expensive (this is what other major websites do). Cloudflare argues that they are protecting user privacy by truncating the EDNS responses, and I believe it's commonly agreed that they're right that this move does so. Archive.is says explicitly that they want this because it's cheaper than an alternative that is available to them. The cost of running an AS is disputed elsewhere in this HN thread. To me it seems clear that Cloudflare has the moral upper hand.
Archive.is believes that Cloudflare can simply provide the full EDNS data, and they're technically right. But Cloudflare won't budge because they believe this is hostile to user privacy. I haven't heard a counterargument that Cloudflare is wrong about this.
Cloudflare believes that Archive.is can simply live without the EDNS data, and they're technically right. But Archive.is won't budge because they believe it prevents their abuse prevention techniques. They mention that owning their own AS would solve the problem but that's too expensive.[1]
Blame is in the eye of the beholder, but it seems to me that Archive.is should find alternative abuse prevention techniques like other websites do. Cloudflare has an argument based on privacy. Archive.is has an argument based on the proper solution being too expensive. The expense of running an AS is disputed in this HN thread.[2]
Most relevant piece but the whole comment is worth a read:
> Archive.is’s authoritative DNS servers return bad results to 1.1.1.1 when we query them. I’ve proposed we just fix it on our end but our team, quite rightly, said that too would violate the integrity of DNS and the privacy and security promises we made to our users when we launched the service.
> The archive.is owner has explained that he returns bad results to us because we don’t pass along the EDNS subnet information. This information leaks information about a requester’s IP and, in turn, sacrifices the privacy of users.
Honestly it's that type of thing (the frankness, the presence on HN, willingness to participate, the principled stand on privacy) that got me into Cloudflare products. I now generate hundreds per month in revenue for them and that will likely be thousands in the next year or two. His time/effort on HN directly led to customer acquisition and revenue.
That said I do worry about the incentives Cloudflare has to their big customers. CF is a great tool for site owners, but like any tool has the potential to be a great evil (against the user) if the principles ever wane. It's already being used by a lot of sites to make life a living hell for people behind a VPN. As a site owner I absolutely get it: practically zero of my legitimate traffic comes from VPNs (our main demographic tend to skew older and much less technical than the average consumer), but all of the automated attacks against me do. Balancing freedom and rights is hard, but I deeply appreciate the thoughtfulness and principles that CF has displayed over the years.