You can argue writing a replacement isn't hard, but writing one that had (at time of writing) served north of 100 trillion requests without ANY errors in the proxy server itself is impressive to me even with Rust's promises.
Doesn't a server author get to decide what an error is? You could always just decide the fault was in the client, not the server.
For example, I worked with one CDN company where the engineers decided that injecting random NULL bytes into the TCP stream of an HTTP request was perfectly valid. They obviously are in fact idiots.
This is similar to how AWS can always claim 5-9s of uptime. They decide what counts as downtime. Even when large portions of us-east-1 are broken, they have 5-9s of uptime.
In the blog post, the claim is "without crashes due to server code". That is different than protocol errors or logic errors (both of which are recoverable in a well written server). There's a lot less weasel room in such a claim.
> For example, I worked with one CDN company where the engineers decided that injecting random NULL bytes into the TCP stream of an HTTP request was perfectly valid. They obviously are in fact idiots.
One time I got back a response to an http request that I could just not parse. It rendered fine in my text editor, but the code I was trying to parse it with was returning a nonsensical error. Fortunately (?) I had the response saved to a file, so I didn't have to keep hitting the server as I tried to figure out how to parse it.
Eventually I broke out a hex editor, and realized something had inserted a null byte in between every other byte of text. Unfortunately I couldn't get the server (third party) to reproduce this behavior.
At least from what I remember of the issue I can't rule that out. I'm reasonably certain I had some sort of documentation saying the response was supposed to be utf8, but that doesn't mean a bug couldn't have (non-deterministically!?) returned it in a different encoding.
I believe it was a nearly entirely ascii CSV file, and I might not have checked any non-ascii characters to see how the pattern held.
It's not uncommon that the word in black and white is "Unicode" and to a programmer writing software on Windows or (especially older) Java that seems to obviously mean UTF-16, while to a programmer writing say a Unix C program that means UTF-8. Both may feel assured they know exactly what's going on... but there's a miscommunication.
You definitely won't be the first person who got given UTF-16 and went "Ah, some fool added NUL bytes in between all my ASCII", I worked with a really excellent colleague, Cezary, a decade or more ago who made exactly this mistake on some data we were receiving.
Of course having never seen it I also can't rule out the documentation just being wrong.
I imagine a dystopian future where the error reads "some customers were experiencing elevated error rates. Their accounts have been terminated in line with Amazon's zero tolerance policy for errors"
Especially with increased concurrency/memory-sharing
Purely speculating here, but I imagine Nginx's choice to split across worker processes (with separate pools) instead of threads (with shared pools) had to do with a) avoiding memory errors due to sharing resources, b) reducing extra (defensive) locks, and/or c) reducing the blast-radius of crashes
All of which Rust helped the team not to have to worry about, enabling more sharing of resources, which solved their main bottleneck
Of course the OP's JVM solution would be memory-safe too (helping with (a)), but then that comes with other costs
Yeah I can see the choices NGINX making being to minimize classes of errors Rust tries to help you with at compile time, which was where a lot of their speed/memory benefits came from. Will be interesting to see the code once they open source it.