The most important part if the blog entry is that they are building on top of async Tokio but not Hyper, so other companies can probably consider Tokio as production ready for large scale workloads.
I believe the lack of using Hyper was so they could be more flexible in what nonstandard parts of HTTP they could support, e.g. status codes into the 900s.
And conforming to HTTP is a good thing when you are writing a server, where you want all your responses to be perfectly formed. But for a proxy that is mostly passing through other people's traffic unaltered, it is not so important.
You could almost argue it's counter productive to an extend. One of these invalid response codes might become valid in the future and then suddenly your proxy breaks valid communication attempts. One of the reasons TLS1.3 has to look like TLS1.2 in the handshake.
I don't know about HTTP, but TLS had explicit invariants. You could write a working TLS 1.2 proxy in say, 2010, and do it correctly, having of course no idea how TLS 1.3 will work, and it would have worked with TLS 1.3 as conceived over a year before publication, back when it admitted it was TLS 1.3 (0x03 0x04 ~= SSL 3.4)
But real world systems, most prominently "middleboxes" often sold as security devices, did not obey the invariants, they were actually bad implementations of TLS 1.2, but they worked and so they'd been able to proliferate. TLS 1.3 as eventually standardised needed to cope with that nonsense.
So, if there are HTTP invariants, then a proxy merely needs to get those correct and would in principle interoperate despite newer HTTP versions. With TLS 1.2 invariants talking to a TLS 1.3 client that meant a proper proxy would shrug and say "I can't speak TLS 1.3, you need to talk TLS 1.2" and that works fine. Whereas the middleboxes tended to go "OMG. A byte I didn't understand! We're under attack! Light the beacons, all men to the battlements!"
At cloudflare-scale, I think I would want a proxy that sits at least partially in the network hardware.
Ie. so that the bytes of a large image or video that is coming from an origin server and being sent to a client never has to pass through the system RAM or CPU.
I'd probably implement this as mods to the sendfile() API so that a certain number of bytes can be copied from one socket to another, with that request going all the way to the firmware of the network card which will do the actual work.
It probably needs to work with HTTP/3 UDP and encryption too - so decrypt the data from this socket and send it to that socket reencrypted with this other key and packetized to this HTTP/3 stream. The firmware would need some way to do aborts/timeouts and kick a partially complete request back to software too.
Is it complex...? Yes. But will the compute savings be worth it...? At cloudflare scale, I think the answer is yes.
We constantly think about the possibilities of using exotic hardware for acceleration. So far we've got a very, very long way with "commodity" hardware and the Linux kernel. One day, we'll probably do something exotic.
I don't think the hardware need be very exotic - most network hardware already has firmware capable of various types of acceleration, and all you would have to do is partner with the network card vendor and have a joint team write firmware that has all the features you need.
With shared source, the network card vendor can even repackage some of the types of acceleration you write for their other clients - and I bet they're already looking into acceleration of HTTP/3.
It's a win-win - because cloudflare gets massive compute savings, and the network card vendor knows they've built quite a high moat to buying someone else's hardware.
Tokio supports io_uring (https://github.com/tokio-rs/tokio-uring), so perhaps when it's mature and battle-tested, it'd be easier to transition to it if Cloudflare aren't using it already.
Last time I benchmarked io_uring, it was good enough, but I was able to get measurably faster from syscalls and a well-crafted userspace thread pool for storage to NVMe.
(Note: the I/O thread pool I wrote came out noticably faster than fio, the benchmark tool, so don't assume fio results are the best possible.)
For that reason, I have a small application which can be configured for either io_uring or the thread pool, and the thread pool is preferred for performance.
If someone from Cloudflare is reading... please don't do this. I'm asking for two reasons:
1. Cloudflare has been great at rapidly iterating (compare the timeline between HTTP/3 support in Cloudflare vs Cloudfront... let alone AWS's ALB that still doesn't support it). Introducing hardware accelerators would surely hinder those efforts (ouch, we need Y to do this that the hardware can't do, and the vendor says it'll take 1 year to have new production-ready cards).
2. Cloudflare has been a good upsteram contributor for the projects they depend on (the kernel, the rust language, etc.). Partnering with a hardware vendor inevitably means closed source, deviations from upstream and ultimately a much larger hurdle to be a good citizen to the open-source community.
As you can see, the reasons are entirely selfish. I'd understand it if you do it anyways because the numbers make sense for your company. In the meantime... thanks for holding out until you really can't justify it anymore!
Those are actually reasons I'm hesitant. We want to be able to support protocols etc. very quickly and can't wait for hardware to catch up. As I said elsewhere the combination of CPUs and the Linux kernel has worked pretty well for us.
> Partnering with a hardware vendor inevitably means closed source, deviations from upstream and ultimately a much larger hurdle to be a good citizen to the open-source community.
No, it absolutely doesn't "automatically" mean that. You can totally request some major hardware vendors to fully upstream hardware offload capabilities in their Linux drivers.
Cloudflare has generally been very conservative about its tech stack (eg kernel TCP, well-known OSS solutions and abstractions), which seems to work fine for them. They are certainly leaving performance on the table, but they also don't have to pay to support exotic solutions.
Other firms in the same space do things that are a lot more exotic, and it is worth it for them.
And considering there was no real need to move away apart from the difficulty getting new features I would say even if you are operating at Cloudflare scale Nginx is still fine! (Although they did save hardware resources by moving to their own proxy and increasing cache hit rate - but seemingly none of these were business killing issues.)
We had a great deal of experience with NGINX and the original architecture for Cloudflare was based on NGINX. When we came to move away from it we wanted to fully own our destiny (as we had done years before when writing our own DNS server) and so moving to HAProxy would not have made sense.
I think because Lee Holloway (the technical founder of Cloudflare) was used to use NGINX and so he used it for the original architecture. It's also the case that Pingora replaced part of something that handled both connections to origin servers and reading from cache. So the comment about "not serving files" isn't 100% correct as the NGINX instance was serving files as part of its work.
Well, it does offer more varied stuff from the get go so I can see why someone wouldn't want to limit their options, HAProxy was and is purely a proxy while NGINX is a bit of everything.
SPOE/SPOA added a bit of programmability to HAProxy but it is still basically only messing around with headers and acting upon that, nothing to do with content.
Back in the day, Cloudflare's WAF was based on OpenResty, so the high-performance Lua-programmability at the edge (which is noted in this blog entry) was probably a factor. Quick research shows HAProxy added Lua support in 2015, which is a bit later than their use of OpenResty.
Lee actually looked at HAProxy when we were building the first prototype of Cloudflare but found he could get better performance and extensibility from NGINX. Things obviously evolved, but we considered HAProxy early on and decided to go another way.
We use Go and Rust and C++ and all sorts of things. I wasn't involved in selecting Rust for this project but in general we've seen Rust be a very good fit for things that might long ago have been written in C.
I wonder if as part of the rollout, they selectively routed non-rfc compliant connections to nginx and get 90% of the way there with their changes while they added more support
Wouldn't "switches to" have been a better word instead of "ditches" which implies they are getting rid of it? Considering Nginx is an old and reliable FOSS software...
What precedent would it set for developers when 10 years down the line, a new shiny tool would just "ditch" the result of their hard work?
>> With the danger of hurting HN Rust hipsters... Language does not matter (sorry guys).
There are also a lot of Rust haters, and people unwilling to change.
From the Cloudflare blog:
>> We chose Rust as the language of the project because it can do what C can do in a memory safe way without compromising performance.
So in their opinion language choice matters. I'm not a Rust hipster, but I do believe in the promises the language makes. Assuming they didn't write any unsafe Rust it's pretty much assured that the new proxy contains ZERO bugs of certain types. See what I did there? I made a (contingent) assertion about the non-existence for entire classes of bugs in a code-base I have not seen, based on the language it is written in. So from that point of view Language does matter.
"How we built Pingora, the proxy that connects Cloudflare to the Internet"
And look what's being reported "rust-written", it's not the case with any other languages, and that's why some people are against it, because fanboyism takes over rational decisions
Nobody mention the language for the countless new AI/ML based projects
Your account was created 15 years ago. The account you were replying to was created 60 days ago.
People who’ve been here a while remember when “written in Go” was appended to everything. When “we decided to rewrite X in Go” would show up every other week. The new thing is going to get attention from people who haven’t used it but are interested.
In a few years we’ll have new accounts complaining that there are too many articles about Carbon or whatever is new then.
The "written in rust" has gotten to the point where I notice myself treating it the same as any other spam/ad. Turned me off from learning the language, which is a shame.
It's really a shame that you won't learn the language because there are a lot of fans. Maybe it would be worth considering that this may be more than just hype. Especially now that the Linux Kernel may be adopting Rust, and most of the big tech companies are building new projects in Rust.
Go's approach to massive parallelism, Erlang and the supervisor/actor model, Rust and it's borrow-checker. These are all milestones in programming language development. There's a reason for the hype.
The issue for me is the type of fans and not the quantity. They give a sense of the community surrounding the language. My free time is limited and I've been spoiled by active involvement with a few open source communities that have been spectacular. To me, the community is as important as the project.
I find it bit ironic that this is written under topic of "rust hipsters" because your comment reminds me defining feature of hipsters: turned off by hype.
That’s a strange decision making strategy. What did you do when everyone around you was taking the Covid vaccine? Did you decide to skip it because too many people were recommending it?
Comparing my decision to not learn a specific technology in my free time because a very vocal subset of its community is off putting to COVID tells me two things. 1. Using such a false equivalency means you have no actual argument against my decision making strategy, 2. You're likely part of the subset.
I’ve met people who decide not to do things because they feel like they’re being forced into it when they receive too many recommendations.
So someone saying “oh $language is great, it worked really well for me” is clearly upsetting you. I thought “$vaccine has boosted my chances of survival with minimal side effects” might have upset you as well.
The original blog post from Cloudflare [1] is titled "How we built Pingora, the proxy that connects Cloudflare to the Internet". The linked Phoronix article decided to use an overly hyped and editorialized title.
Having "written in Rust" needlessly slapped on to HN, reddit, and other content farm post titles doesn't make me feel like I'm being forced to learn it. It gives the Rust community more of a MLM vibe with the need of the hype crew to talk it up to validate their decision to use it and they're trying to sell me something.
I’m sorry you feel that way. You shouldn’t feel like you’re being forced to learn anything.
That said the MLM interpretation is the least charitable one possible. A bunch of people excited about a new technology need not have malicious intent when they’re taking about it. Maybe they’re just excited.
I misread what you said. That was my mistake. I’m sorry.
Again I request, please don’t assume that mistakes happen out of malice. There was no deceit in what I did.
Suppose I made this mistake on purpose. What advantage could I possibly have gotten from it? Nothing. My only purpose here was to have a discussion in good faith.
If you’re going into conversations assuming bad faith and deceit and ulterior motives of strangers, don’t be surprised if your biases are confirmed.
No one is forcing you to learn anything even if Rust is successful in the future.
You see vocal 'fan-boys' OK. On the other hand it is really strange that under every 'Rust' post there are few very vocal comments hating on Rust. If you are not interested just ignore Rust posts. If you need to criticize, do it. But it would be more fruitful if it is something more than this vague formalism made by people bothered by title.
> No one is forcing you to learn anything even if Rust is successful in the future
Agreed, which is why I explicitly stated that I don't feel like I'm being forced to learn it. You might be making the wrong assumption given the other person's blatantly deceitful replies.
Posts that are editorialized, shilling, ad/spam, or any of the other class of HN doesn't want you to do that type of posts, regardless of if it's about Rust should have the community call it out in comments.
To be clear, I don't hate the Rust language. It has a few features that I find appealing and there are posts about Rust that I feel worth worth the read. The cult like evangelizing and editorializing is what I dislike and makes some, including me, not want to engage with Rust's community.
Funny you should bring up that question. I was recently told, "You know when every government in the world is pushing a vaccination that it must be a trap."
For you it is about fanboys. For me it is direct utility. If there was no 'Rust' in the title I would miss it in my RSS feed. While I do have also filters for specific tech fields not language related - here I can find, for example, what is used in the production for this specific language I'm interested in.
I just have to learn now not to look in the comment section, which is not only off-topic but actually arguing why even this topic exists.
>> Language choice doesn’t matter when you have a correct and fast program. The trick is getting there.
Language choice can influence how fast or slow you get there, and also the level of certainty you have. Again, Rust guarantees certain bugs and race conditions do not exist, so in a sense we START from there and never have to worry about "getting there".
> Assuming they didn't write any unsafe Rust it's pretty much assured that the new proxy contains ZERO bugs of certain types.
And assuming that the libraries you depend on also don't have any `unsafe` bugs. This is something people seem to forget when discussing rust. And "the unsafe is well tested" sounds similar to "this legacy C code is well tested". I like rust the language but have consistently negative experiences with evangelists.
Here are some other things that make rust a good choice for developing low level systems:
- Zero cost abstractions like in C++
- A strong type system (and lifetime annotations built into the language, less need for the grind tools)
- In general a stricter language than C++ with far less legacy options
- Newer language with a de facto package manager and build system
There's a lot to like about rust without just saying it's "safer" which is highly nuanced and people seemingly never actually go into the details. I've had many negative votes for even suggesting than C is currently a better choice for Linux Kernel development compared to Rust. People will just say "Rust is safer" and that's the end of the conversation...
"This legacy C code" usually has millions of lines. "The unsafe parts" are usually on the few dozens.
The safety is also conditional on rustc not having bugs, and your computer working correctly. But you don't get to pretend those are as risky as some large random, less used C code.
Not only that but also the clear boundary of responsibility. If there’s a way to use a library in safe Rust that cause UAF, that’s the library problem. In C, it’s because you are using it wrong.
I mean the guy used clojure, a functional lispy language with emphasis on immutability. It's as "change embracing" as it gets in webdev when phrases like LAMP stack, MEAN stack are the norm.
You can argue writing a replacement isn't hard, but writing one that had (at time of writing) served north of 100 trillion requests without ANY errors in the proxy server itself is impressive to me even with Rust's promises.
Doesn't a server author get to decide what an error is? You could always just decide the fault was in the client, not the server.
For example, I worked with one CDN company where the engineers decided that injecting random NULL bytes into the TCP stream of an HTTP request was perfectly valid. They obviously are in fact idiots.
This is similar to how AWS can always claim 5-9s of uptime. They decide what counts as downtime. Even when large portions of us-east-1 are broken, they have 5-9s of uptime.
In the blog post, the claim is "without crashes due to server code". That is different than protocol errors or logic errors (both of which are recoverable in a well written server). There's a lot less weasel room in such a claim.
> For example, I worked with one CDN company where the engineers decided that injecting random NULL bytes into the TCP stream of an HTTP request was perfectly valid. They obviously are in fact idiots.
One time I got back a response to an http request that I could just not parse. It rendered fine in my text editor, but the code I was trying to parse it with was returning a nonsensical error. Fortunately (?) I had the response saved to a file, so I didn't have to keep hitting the server as I tried to figure out how to parse it.
Eventually I broke out a hex editor, and realized something had inserted a null byte in between every other byte of text. Unfortunately I couldn't get the server (third party) to reproduce this behavior.
At least from what I remember of the issue I can't rule that out. I'm reasonably certain I had some sort of documentation saying the response was supposed to be utf8, but that doesn't mean a bug couldn't have (non-deterministically!?) returned it in a different encoding.
I believe it was a nearly entirely ascii CSV file, and I might not have checked any non-ascii characters to see how the pattern held.
It's not uncommon that the word in black and white is "Unicode" and to a programmer writing software on Windows or (especially older) Java that seems to obviously mean UTF-16, while to a programmer writing say a Unix C program that means UTF-8. Both may feel assured they know exactly what's going on... but there's a miscommunication.
You definitely won't be the first person who got given UTF-16 and went "Ah, some fool added NUL bytes in between all my ASCII", I worked with a really excellent colleague, Cezary, a decade or more ago who made exactly this mistake on some data we were receiving.
Of course having never seen it I also can't rule out the documentation just being wrong.
I imagine a dystopian future where the error reads "some customers were experiencing elevated error rates. Their accounts have been terminated in line with Amazon's zero tolerance policy for errors"
Especially with increased concurrency/memory-sharing
Purely speculating here, but I imagine Nginx's choice to split across worker processes (with separate pools) instead of threads (with shared pools) had to do with a) avoiding memory errors due to sharing resources, b) reducing extra (defensive) locks, and/or c) reducing the blast-radius of crashes
All of which Rust helped the team not to have to worry about, enabling more sharing of resources, which solved their main bottleneck
Of course the OP's JVM solution would be memory-safe too (helping with (a)), but then that comes with other costs
Yeah I can see the choices NGINX making being to minimize classes of errors Rust tries to help you with at compile time, which was where a lot of their speed/memory benefits came from. Will be interesting to see the code once they open source it.
I'm not someone on the rust bandwagon, but I doubt it's a matter of performance as much as safety.
Remember Cloudbleed? That was caused by a memory safety issue in a parser written in C++.
re: jvm -- you may be able to get that sort of throughput, but I also have to wonder what the latency distribution looks like, and how much control you wind up having over it. Yielding control to a GC in a latency sensitive application isn't an easy give.
This is a good reply with great insight, don't get me wrong. But it feels like on every Rust-related post we now see comments like this where somebody points out the Rust hipsters. It seems like Rust-hipster-mentioner may have reached the same status as the original Rust hipsters.
It's just like when the iPhone came out there were a ton of people dismissing it, saying it was just blind Apple fanboys that loved it.
Sometimes people being vocal about something being good is because it really is good. The iPhone was definitely like that. Rust is too. I would imagine the "it's just Rust hipsters" comments will die out in a few years when Rust is more popular than C++.
Especially since you have one headline mentioning Rust, and the Rust-hipster mention gets commented multiple times. I think we're above break-even now.
> Nginx is an excellent general-purpose proxy that does a lot of things very well and tries to be as resource mindful as possible.
Yeah, this is always the big takeaway in this sort of story — Really good general purpose solutions can often be outperformed by much simpler special-purpose implementations.
Yeah, I think it is very obvious that rewriting something carefully in Rust for your specific use cases is going to be more robust, faster, and more featureful than something general purpose. You could probably cut 95% of the lines in Nginx to meet your use cases after all...
The thing I'm interested in from the article is what they did to replace their lua runtime. It was used for request processing and WAF rules. Is it now baked directly into Rust? If a new WAF rule comes out do they have to recompile their entire proxy? The dynamic nature of Lua and Nginx was very powerful - I wonder how they changed that or what they replaced it with.
They engaged in a rewrite, and they chose a modern language. These are orthogonal.
Even in academia, one sees a prevailing view that language choice is picking a brand of toilet seat. Others see language as one of the central questions of our evolving identity as a species. There are those in the middle, just trying to get work done, considering their language choices but not seeing that as fashion.
I'm not a Rust user, but this is clearly false. Proof: an implementation of any large program will be faster, less buggy, more maintainable, and take less total development time in C than in assembly. Therefore, language choice does matter.
Now, that isn't to say that Rust is uniquely better than any other language out there (which I strongly believe that it's not), just that language design differences do translate into meaningful changes in the trade-space between correctness, performance, and development cost, and that's something you should be aware of even if BrainF fulfills all of your engineering constraints easily.
I think the biggest change here is moving Lua code executed by Nginx to pure Rust, executed some third party code is always slower and more fragile that doing it nativaly into your own service.
That and have the architecture that fits exactly your need.
Lua was used not for speed or robustness, but because it's an extension language - it allows you to reconfigure parts of the application without recompiling the entire thing. Rust is exactly the opposite, and with ridiculous compilation times, to boot.
This sort of atavistic, ritual distrust of C (and C++) code is tiresome. Keep in mind that most likely all of the highest-traffic sites in the world are using C (or C++) systems to process their own trillions of transactions. I recently worked on such code (written in C), and it has also had zero errors over more than two years of continuous operation; and I do mean zero.
How many people worked on that code for how long to get it down to that level of error rate? Nobody is saying that C/C++ programs cannot be safe, only that it's more expensive and challenging to do so. Given the regular cadence of the largest tech companies in the world[1] shipping updates for memory safety issues I think there's a enough evidence supporting that position that it can't be dismissed as “atavistic”.
I have a table saw in my garage. It was my dad's that he gave to me when he moved. He bought it in the 70s. It's a fantastic saw - it cuts through pretty much anything. I grew up learning how to use a table saw with this exact piece of equipment. I never had a mishap, nor did my dad - that saw has never tasted human. I was taught to be careful with it, and respect the fact that humans are easier for it to dismember than a board.
It sits in a corner collecting dust and reminding me of the fun times my pops and I built stuff together because I bought a saw-stop. Sure i could probably spend the next 2-3 decades safely using the old saw and never hurting myself - but why? I like my hands and fingers attached to my body - being careful will probably keep them in that state, but when I have stronger guarantees from the new gear, I'll use that instead - and recommend the same to others.
Also, if you look at the stats for saw-related accidents, 70% of them are impossible when the safety mode of the new gear is enabled (which is the default way of handling it). This is even in professional companies that keep using the old gear due to inertia and invested a lot in "safe handling practices" for it.
Google most likely serve an order of magnitude of that and use C++. Rust is definitly a good fit but saying that other language can't do it where there are much larger service than cloudflare out there not using Rust...
This is an exciting prospect. Some financial exchanges are currently proxying their order entry and market data through Cloudflare, so it's useful for their customers to understand how the proxy works.
Btw, I’d like cf to have a feature that will prevent anyone (including cf) from viewing my traffic. Is this in plans?
I want the protection of cf but don’t want to sacrifice privacy of my customers data..
Thanks! The missing part is the WAF… unfortunately I couldn’t find anything for on premises installation at a pro cf account price level or open source.
Yeah. We don't provide an on-premise solution. It's all cloud-based. If you want us to do inspection we have to terminate TLS. We can do DDoS protection without termination but WAF requires inspection of the payload.
The article is interesting and the solution great but saying it is safer because Rust is misleading. I wrote a Rust milter for Postfix incorrectly and although it didn't crash as such, it completely didn't work because data was coming into it from outside. i.e. it is still possible to write broken code in Rust.
I've personally never considered nginx unsafe either but tbh, I am not exactly an nginx guru on the cutting edge of performance or load.
> The article is interesting and the solution great but saying it is safer because Rust is misleading.
Using Rust (without unsafe) precludes memory management bugs being turned into exploits. That closes an entire category of attack.
Sure, “safety” can be defined along different axes, but Rust handles one entirely. How does that not make it safer than using a memory unsafe language?
Cloudflare uses a custom fork of nginx, with custom extensions, Lua FFI, and improved HTTP/2 and caching modules. So it is a comparison to development of in-house C.
Has anyone ever claimed that if it compiled and ran in rust that it was logically correct? I've never seen anyone claim that and would call anyone who did an idiot. People do say it will tend to have fewer memory and programming errors out of the chute than c or c++ though given similarly experienced programmer because the compiler is much stricter.