Hacker News new | past | comments | ask | show | jobs | submit login

This didn't sound right to me so I did some investigation and I think I found a bug.

Keep in mind that Cloudflare is a complex stack of proxies. When a worker performs a fetch(), that request has to pass through a few machines on Cloudflare's network before it can actually go to origin. E.g. to implement caching we need to go to the appropriate cache machine, and then to try to reuse connections we need to go to the appropriate egress machine. Point is, the connection to origin isn't literally coming from the machine that called fetch().

So if you call fetch() twice in a row, to the same hostname, does it reuse a connection? If everything were on a single machine, you'd expect so, yes! But in this complex proxy stack, stuff has to happen correctly for those two requests to end up back on the same machine at the other end in order to use the same connection.

Well, it looks like heuristics involved here aren't currently handling Workers requests the way they should. They are designed more around regular CDN requests (Workers shares the same egress path that regular non-Workers CDN requests use). In the standard CDN use case where you get a request from a user, possibly rewrite it in a Worker, then forward it to origin, you should be seeing connection reuse.

But, it looks like if you have a Worker that performs multiple fetch() requests to origin (e.g. not forwarding the user's requests, but making some API requests or something)... we're not hashing things correctly so that those fetches land on the same egress machine. So... you won't get connection reuse, unless of course you have enough traffic to light up all the egress machines.

I'm face-palming a bit here, and wondering why there hasn't been more noise about this. We'll fix it. Talk about low-hanging fruit...

(I'm the tech lead for Cloudflare Workers.)

(On a side note, enabling Argo Smart Routing will greatly increase the rate of connection reuse in general, even for traffic distributed around the world, as it causes requests to be routed within Cloudflare's network to the location closest to your origin. Also, even if the origin connections aren't reused, the RTT from Cloudflare to origin becomes much shorter, so connection setup becomes much less expensive. However, this is a paid feature.)




> So if you call fetch() twice in a row, to the same hostname, does it reuse a connection?

In my testing, the second fetch() call from a worker to the same origin ran over the same TCP connection 50% of the time and was much faster.

We want to use Workers as a reverse proxy - to pick up all HTTP requests globally and then route them to our backend. So our use-case is mostly one fetch() call (to the origin) per one incoming call. The issue is that incoming requests arrive to a ~random worker in the user's POP, and it looks like each Worker isolate has to re-establish its own TCP/TLS connection to our backend, which takes a long time (~90% of the time).

What I want is Hyperdrive for HTTPS connections. I tried connecting to backend via CF Tunnel, but that didn't make any difference. Our backend is accessible via AWS Global Accelerator, so Argo won't help much. The only thing that made a difference was pinning the Worker close to our backend - connections to the backend becamse fast(er) because the TLS roundtrip was faster, but that's not a great solution.


> The issue is that incoming requests arrive to a ~random worker in the user's POP, and it looks like each Worker isolate has to re-establish its own TCP/TLS connection to our backend, which takes a long time (~90% of the time).

Again, origin connections are not owned by isolates -- there are proxies involved before we get to the origin connection. Requests from unrelated isolates can share a connection, if the are routed to the same egress point. Problem is that they apparently aren't being routed to the same point in your case. That could be for a number of reasons.

It sounds like the bug I found may not be the issue in your case (in fact it sounds like you explicitly aren't experiencing the bug, which is surprising, maybe I am misreading the code and there actually is no bug!).

But there are other challenges the heuristics are trying to solve for, so it's not quite as simple as "all requests to the same origin hostname should go through the same egress node"... like, many of our customers get way too much traffic for just one egress node (even per-colo), so we have to be smarter than that.

I pinged someone on the relevant team and it sounds like this is something they are actively improving.

> The only thing that made a difference was pinning the Worker close to our backend - connections to the backend becamse fast(er) because the TLS roundtrip was faster, but that's not a great solution.

Argo Smart Routing should have the same effect... it causes Cloudflare to make connections from a colo close to your backend, which means the TLS roundtrip is faster.


Thank you for looking into it in such detail based on an unrelated thread!

Cloudflare seems to consistently make all types of network improvements behind the scenes, so I’ll continue to monitor for this “connection reuse” feature. It might just show up announced.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: