I'm using chrome on linux and noticed that this year cloudflare is very agressiv...

rurp · 2025-02-05T21:14:31 1738790071

Cloudflare has been even worse for me on Linux + Firefox. On a number of sites I get the "Verify" challenge and after solving it immediately get a message saying "You have been blocked" every time. Clearing cookies, disabling UBO, and other changes make no difference. Reporting the issue to them does nothing.

This hostility to normal browsing behavior makes me extremely reluctant to ever use Cloudflare on any projects.

a_imho · 2025-02-05T23:15:36 1738797336

I'm a Cloudflare customer, even their own dashboard does not work with linux+slightly older firefox. I mean one click and it is ooops, please report the error to dev null

mmh0000 · 2025-02-05T22:24:18 1738794258

At least you can get past the challenge. For me, every-single-time it is an endless loop of "select all bikes/cars/trains". I've given up even trying to solve the challenge anymore and just close the page when it shows up.

theamk · 2025-02-06T01:32:54 1738805574

that's not Cloudflare, they stopped doing pictures years ago. You can tell because Cloudflare always puths their brand name on their page.

Cloudflare just blocks you without recourse nowdays.

jcelerier · 2025-02-06T01:44:32 1738806272

> Cloudflare just blocks you without recourse nowdays.

looks like someone is due for a class action

trinix912 · 2025-02-06T08:56:05 1738832165

It is Cloudflare, I see it too. It's a Cloudflare page, with all branding, the spinning circle, then a captcha pops up on the same Cloudflare-branded page.

theamk · 2025-02-06T20:48:07 1738874887

Interesting! Do you have have a URL I could look at?

Springtime · 2025-02-06T00:16:05 1738800965

I run a few Linux desktop VMs and Cloudflare's Turnstile verification (their auto/non-input based verification) fails for the couple sites I've tried that use it for logins, on latest Chromium and Firefox browsers. Doesn't matter that I'm even connecting from the same IP.

I'd presumed it was just the VM they're heuristically detecting but sounds like some are experiencing issues on Linux in general.

abirch · 2025-02-06T02:35:07 1738809307

I guess it’s time to update our user agent strings like I did with konquerer 20 years ago.

Looks like there’s a plugin for that https://chromewebstore.google.com/detail/user-agent-switcher...

nbernard · 2025-02-05T21:29:10 1738790950

Check that you are allowing webworker scripts, that did the trick for me. I still have issues on slower computers (Raspberry pies and the like) as they seem to be to slow to do whatever Cloudflare wants as a verification in the allotted time, however.

ponector · 2025-02-05T23:27:26 1738798046

Sounds like my experience browsing internet while connected to the VPN provided by my employer: tons of captcha and everything is defaulted to German (IP is from Frankfurt).

ranger_danger · 2025-02-06T03:11:56 1738811516

The problem is that you are not performing "normal browsing behavior". The vast majority of the population (at least ~70% don't use ad-blockers) have no extensions and change no settings, so they are 100% fingerprintable every time, which lets them through immediately.

globalnode · 2025-02-06T03:50:38 1738813838

linux + firefox. not sure what happened to me yesterday but the challange/response thing was borked and when i finally got through it all, it said i was a robot anyway. this was while trying to sign up for a skype acct, could have been a ms issue though and not necessarily cloudflare. i think the solution is to just not use obstructive software. thanks to this issue i discovered jitsi and that seems more than enough for my purposes.

sleepybrett · 2025-02-05T21:40:32 1738791632

Yeah, Lego and Etsy are two sites I can now only visit with safari. It sucks. Firefox on the same machine it claims I'm a bot or a crawler. (not even on linux, on a mac)

neodymiumphish · 2025-02-06T00:51:24 1738803084

Does it still apply if you change the UA to something more common (Chrome on Windows or something)?

beepbooptheory · 2025-02-06T01:50:30 1738806630

Fwiw, I was getting cloudflare blocked for a long time on Firefox+Linux and the only thing that fixed it was completely disabling the UA adjuster browser extension I had installed.

lta · 2025-02-05T21:32:00 1738791120

Yeah, same here. I've avoided it for a most of my customers for that very reason already

fcq · 2025-02-05T20:12:35 1738786355

I have Firefox and Brave set to always clear cookies and everything when I close the browser... it is a nightmare when I come back the amount of captchas everywhere....

It is either that or keep sending data back to the Meta and Co. overlords despite me not being a Facebook, Instagram, Whatsapp user...

ezfe · 2025-02-05T23:06:53 1738796813

You don't need to clear cookies to avoid sending that data back. Just use a browser that properly isolates third party/Facebook cookies.

nacs · 2025-02-05T23:22:01 1738797721

You don't even need to use a different browser - Firefox has an official "Multi-account containers" extension that lets you assign certain sites to open in their own sandbox so you can have a sandbox for Google, another for Facebook, etc.

opello · 2025-02-06T03:50:55 1738813855

So, what's a good strategy for managing containers? I've used this extension for years, and in the past I was a bit more conservative with my containers (personal, work, google, facebook, twitter, banking, etc.) and now I've gone a bit more ... "ham" as they say ... and I have 29. One example is travel, to keep fare searches from pervading news story ads. But I'm sure there's a way to strike a balance that I've just not yet found.

onemoresoop · 2025-02-06T01:37:42 1738805862

Great idea, I wasn’t even aware and got resigned to the idea tracing is inescapable, but I really need to take that back, even stop using a lot of hostile services. On smartphones it’s even worse.

ATechGuy · 2025-02-05T23:52:31 1738799551

I wonder if browsers have a future.

nerdralph · 2025-02-05T22:05:36 1738793136

I don't bother with sites that have cloudflare turnstyle. Web developers supposedly know the importance of page load time, but even worse than a slow loading page is waiting for cloudflare's gatekeeper before I can even see the page.

fbrchps · 2025-02-05T22:53:35 1738796015

That's not turnstile, that's a Managed Challenge.

Turnstile is the in-page captcha option, which you're right, does affect page load. But they force a defer on the loading of that JS as best they can.

Also, turnstile is a Proof of Work check, and is meant to slow down & verify would-be attack vectors. Turnstile should only be used on things like Login, email change, "place order", etc.

supriyo-biswas · 2025-02-06T02:28:13 1738808893

Managed challenges actually come from the same "challenges" platform, which includes Turnstile; the only difference being that Turnstile is something that you can embed yourself on a webpage, and managed challenge is Cloudflare serving the same "challenge" on an interstitial web page.

Also, Turnstile is definitely not a simple proof of work check, and performs browser fingerprinting and checks for web APIs. You can easily check this by changing your browser's user-agent at the header level and leave it as-is at the header level; this puts Turnstile into an infinite loop.

viraptor · 2025-02-05T19:46:23 1738784783

The captcha on robots is a misconfiguration in the website. CF has lots of issues, but this one is on their costumer. Also they detect Google and other bots, so those may be going through anyway.

jasonjayr · 2025-02-05T21:35:50 1738791350

Sure; but sensible defaults ought to be in place. There are certain "well known" urls that are intended for machine consuption. CF should permit (and perhaps rate limit?) those by default, unless the user overrides them.

JimDabell · 2025-02-06T15:12:47 1738854767

Putting a CAPTCHA in front of robots.txt in particular is harmful. If a web crawler fetches robots.txt and receives an HTML response that isn’t a valid robots.txt file, then it will continue to crawl the website when the real robots.txt might’ve forbidden it from doing so.

potus_kushner · 2025-02-05T20:49:41 1738788581

using palemoon, i don't even get a captcha that i could solve. just a spinning wheel, and the site reloads over and over. this makes it impossible to use e.g. anything hosted on sourceforge.net, as they're behind the clownflare "Great Firewall of the West" too.

inemesitaffia · 2025-02-06T07:40:46 1738827646

See if changing user agent to Chrome/Firefox helps

progmetaldev · 2025-02-05T20:40:25 1738788025

Whoever configures the Cloudflare rules should be turning off the firewall for things like robots.txt and sitemap.xml. You can still use caching for those resources to prevent them becoming a front door to DDoS.

kevincox · 2025-02-05T23:20:44 1738797644

It seems like common cases like this should be handled correctly by default. These are cachable requests intended for robots. Sure, it would be nice if webmasters configure it but I suspect a tiny minority does.

For example even Cloudflare hasn't configure their official blog's RSS feed properly. My feed reader (running in a DigitalOcean datacenter) hasn't been able to access it since 2021 (403 every time even though backed off to checking weekly). This is a cachable endpoint with public data intended for robots. If they can't configure their own product correctly for their official blog how can they expect other sites to?

progmetaldev · 2025-02-06T00:25:09 1738801509

I agree, but I also somewhat understand. Some people will actually pay more per month for Cloudflare than their own hosting. The Cloudflare Pro plan is $20/month USD. Some sites wouldn't be able to handle the constant requests for robots.txt, just because bots don't necessarily respect cache headers (if they are even configured for robots.txt), and the sheer number of bots that look at robots.txt and will ignore a caching header are too numerous.

If you are writing some kind of malicious crawler that doesn't care about rate-limiting, and wants to scan as many sites as possible for the most vulnerable to get a list together to hack, you will scan robots.txt because that is the file that tells robots NOT to index these pages. I never use a robots.txt for some kind of security through obscurity. I've only ever bothered with robots.txt to make SEO easier when you can control a virtual subdirectory of a site, to block things like repeated content with alternative layouts (to avoid duplicate content issues), or to get a section of a website to drop out of SERPs for discontinued sections of a site.

kevincox · 2025-02-06T00:30:23 1738801823

> sheer number of bots that look at robots.txt and will ignore a caching header

This is not relevant because Cloudflare will cache it so it never hits your origin. Unless they are adding random URL parameters (which you can teach Cloudflare to ignore but I don't think that should be a default configuration).

progmetaldev · 2025-02-06T00:40:19 1738802419

The thing is, it won't do that by default. You have to enable caching currently, when creating a new account. I use a service that detects if a website is still running, and it does this by using a certain URL parameter to bypass the cache.

Again, I think you are correct with more sane defaults, but I don't know if you've ever dealt with a network admin or web administrator that hasn't dealt with server-side caching vs. browser caching, but it most definitely would end up with Cloudflare losing sales because people misunderstood how things work. Maybe I'm jaded, at 45, but I feel like most people don't even know to look at headers by default when they feel they hit a caching issue. I don't think it's based on age, I think it's based on being interested in the technology and wanting to learn all about it. Mostly developers that got into it for the love of technology, versus those that got into it because it was high paying and they understood Excel, or learned to build a simple website early in life, so everyone told them to get into software.

glandium · 2025-02-06T00:28:48 1738801728

The best part is when you get the "box" on a XHR request. Of course no site handles that properly, and just breaks. Happens regularly on ChatGPT.

scarab92 · 2025-02-06T01:09:56 1738804196

Cloudflare is security theatre.

I scrape hundreds of cloudflare protected sites every 15 minutes, without ever having any issues, using a simple headless browser and mobile connection, meanwhile real users get interstitial pages.

It's almost like Cloudflare is deliberately showing the challenge to real users just to show that they exist and are doing "something".

chiefalchemist · 2025-02-06T01:21:29 1738804889

Just wanted to mention that the time between challenges is set by the site, not CF. Perhaps if you mention it, the site(s) will update the setting?

kylecazar · 2025-02-06T02:13:35 1738808015

Same. I'm consistently getting a captcha and some nonsense about a Ray ID multiple times a day.

trinix912 · 2025-02-06T08:54:08 1738832048

It's not just Linux, I'm using Chrome on my macOS Catalina MBP and I can't even get past the "Verify you are a human" box. It just shows another captcha, and another, and yet another... No amount of clearing cookies/disabling adblockers/connecting from a different WiFi does it. And that's on most random sites (like ones from HN links), I also don't recall ever doing anything "suspicious" (web scraping etc.) on that device/IP.

Somehow, Safari passes it the first time. WTF?

idlephysicist · 2025-02-06T00:09:04 1738800544

> What are you protecting cloudflare?

A cheeky response is "their profit margins", but I don't think that quite right considering that their earnings per share is $-0.28.

I've not looked into Cloudflare much, I've never needed their services, so I'm not totally sure on what all their revenue streams are. I have heard that small websites are not paying much if anything at all [1]. With that preface out of the way–I think that we see challenges on sites that perhaps don't need them as a form of advertising, to ensure that their name is ever-present. Maybe they don't need this form of advertising, or maybe they do.

[1] https://www.cloudflare.com/en-gb/plans/

tempest_ · 2025-02-06T02:22:21 1738808541

If you log in to the CF dashboard every 3 months or so you will see pretty clearly they are slowly trying to be a cloud provider like Azure or AWS. Every time I log in there is a who new slew of services that have equivalent on the other cloud providers. They are using the CDN portion of the business as a loss leader.

benbristow · 2025-02-05T23:31:03 1738798263

They usually protect the whole DNS record so it makes sense it would cover robots.txt as well, even if it's a bit silly.

alexjplant · 2025-02-05T23:47:58 1738799278

They run their own DNS infra so that when you set the SOA for your zone to their servers they can decide what to resolve to. If you have protection set on a specific record then it resolves to a fleet of nginx servers with a bunch of special sauce that does the reverse proxying that allows for WAF, caching, anti-DDoS, etc. It's entirely feasible for them to exempt specific requests like this one since they aren't "protect[ing] the whole DNS" so much as using it to facilitate control of the entire HTTP request/response.

likeabatterycar · 2025-02-05T21:09:50 1738789790

I run a honeypot and I can say with reasonable confidence many (most?) bots and scrapers use a Chrome on Linux user-agent. It's a fairly good indication of malicious traffic. In fact I would say it probably outweighs legitimate traffic with that user agent.

It's also a pretty safe assumption that Cloudflare is not run by morons, and they have access to more data than we do, by virtue of being the strip club bouncer for half the Internet.

rurp · 2025-02-05T21:21:50 1738790510

User-agent might be a useful signal but treating it as an absolute flag is sloppy. For one thing it's trivial for malicious actors to change their user-agent. Cloudflare could use many other signals to drastically cut down on false positives that block normal users, but it seems like they don't care enough to be bothered. If they cared more about technical and privacy-conscious users they would do better.

likeabatterycar · 2025-02-05T21:25:56 1738790756

> For one thing it's trivial for malicious actors to change their user-agent.

Absolutely true. But the programmers of these bots are lazy and often don't. So if Cloudflare has access to other data that can positively identify bots, and there is a high correlation with a particular user agent, well then it's a good first-pass indication despite collateral damage from false positives.

plaguuuuuu · 2025-02-05T23:22:38 1738797758

The programmers of these bots are not lazy - this space is a thriving industry with a bunch of commercial bots, the abiluty of whcih to evade cloudflare/etc is the literal metric that determines their commercial viability

likeabatterycar · 2025-02-06T00:41:29 1738802489

My data says otherwise and you have provided nothing to back up your claim other than saying we have an industry full of dirty money paying programmers to write unethical code. I'm sure it inspires them to do their best work.

Half these imbeciles don't even change the user-agent from the scraper they downloaded off GitHub.

I employ lots of filtering so it's possible the data is skewed towards those that sneak through the sieve - but they've already been caught, so it's meaningless.

ok_dad · 2025-02-05T21:49:11 1738792151

I would hope Cloudflare would be way, way beyond a “first pass” at this stuff. That’s logic you use for a ten person startup, not the company who’s managed to capture the fucking internet under their network.

sangnoir · 2025-02-05T22:19:22 1738793962

> So if Cloudflare has access to other data that can positively identify bots

They do not - not definitively [1]. This cat-and-mouse game is stochastic at higher levels, with bots doing their best to blend in with regular traffic, and the defense trying to pick up signals barely above the noise floor. There are diminishing returns to battling bots that are indistinguishable from regular users.

1. A few weeks ago, the HN frontpage had a browser-based project that claimed to be undetectable

fbrchps · 2025-02-05T22:54:46 1738796086

> a browser-based project that claimed to be undetectable

For now

sangnoir · 2025-02-05T23:39:59 1738798799

That's just part of the game. Sometimes you're ahead, sometimes you're behind, but there's never a decisive winner.

sleepybrett · 2025-02-05T21:43:09 1738791789

I mean, do we need to replace user agent with some kind of 'browser signing'?

doctor_radium · 2025-02-07T15:53:34 1738943614

If you're thinking of Google's WEI, I'm thankful that went down in flames:

"Google is adding code to Chrome that will send tamper-proof information about your operating system and other software, and share it with websites. Google says this will reduce ad fraud. In practice, it reduces your control over your own computer, and is likely to mean that some websites will block access for everyone who's not using an "approved" operating system and browser."

https://www.eff.org/deeplinks/2023/08/your-computer-should-s...

lta · 2025-02-05T21:31:01 1738791061

Sure, but does that means that we, Linux users, can't go on the web anymore ? It's way easier for spammers and bots to move to another user agent/system than for legitimate users. So whatever causes this is not a great solution to this problem. You can do better CF

zamadatix · 2025-02-05T21:44:51 1738791891

I'm a Linux user as well but I'm not sure what Cloudflare is supposed to be doing here that makes everybody happy. Removing the most obvious signals of botting because there are some real users that look like that too may be better for that individual user but that doesn't make it a good answer for legitimate users as a whole. SPAM, DoS, phishing, credential stuffing, scraping, click fraud, API abuse, and more are problems which impact real users just as extra checks and false positive blocks do.

If you really do have a better way to make all legitimate users of sites happy with bot protections then by all means there is a massive market for this. Unfortunately you're probably more like me, stuck between a rock and a hard place of being in a situation where we have no good solution and just annoyance with the way things are.

oneshtein · 2025-02-06T00:41:23 1738802483

What CF does when bots use "Chrome on Windows" browser agent string?

zamadatix · 2025-02-06T03:15:21 1738811721

The method is the same, it just looks different when n=1. I.e. the method is "wait until you see something particularly anomalous occuring, probe, see if the reaction is human like". The more times you say "well you can't count that as anomalous, an actual person can look like that too and a bot could try to fake that!" the less effective it becomes at blocking bots.

This approach clearly blocks bots so it's not enough to say "just don't ever do things which have false positives" and it's a bit silly to say "just don't ever do the things which have false positives, but for my specific false positives only - leave the other methods please!"

johnklos · 2025-02-06T01:10:05 1738804205

Many / most bots use Chrome on Linux user agent, so you think it's OK to block Chrome on Linux user agents. That's very broken thinking.

So it's OK for them to do shitty things without explaining themselves because they "have access to more data than we do"? Big companies can be mysterious and non-transparent because they're big?

What a take!

wakawaka28 · 2025-02-06T01:43:20 1738806200

Can't the user agent be spoofed anyway?

sumanthvepa · 2025-02-06T01:52:10 1738806730

I think they also fingerprint the browser. So changing user agent alone won't help.

shwouchk · 2025-02-06T00:29:15 1738801755

I usually notice an increase in those when connecting to sites over vpn and especially tor. could that be it?

selfhoster · 2025-02-06T01:14:44 1738804484

We're on Chrome on Linux, mostly we don't see those.

GGByron · 2025-02-06T00:24:58 1738801498

Excuse my ignorance, but what exactly are these stupid checkboxes supposed to accomplish? Surely they do not represent a serious obstacle.