Their explanation for why Go performs badly didn't make any sense to me. I'm not sure if they don't understand how goroutines work, if I don't understand how goroutines work or if I just don't understand their explanation.
Also, in the end, they didn't use the JSON payload. It would have been interesting if they had just written a static string. I'm curious how much of this is really measuring JSON [de]serialization performance.
Finally, it's worth pointing out that WebSocket is a standard. It's possible that some of these implementations follow the standard better than others. For example, WebSocket requires that a text message be valid UTF8. Personally, I think that's a dumb requirement (and in my own websocket server implementation for Zig, I don't enforce this - if the application wants to, it can). But it's completely possible that some implementations enforce this and others don't, and that (along with every other check) could make a difference.
> Their explanation for why Go performs badly didn't make any sense to me.
To me, the whole paper is full of misunderstanding, at least the analysis. There's just speculation based on caricatures of the language, like "node is async", "c++ is low level" etc. The fact that their C++ impl using uWebSocket was significantly slower than then Node, which used uWebSocket bindings, should have led them to question the test setup (they probably used threads which defeats the purpose of uWebSocket.
Anyway.. The "connection time" is just HTTP handshake. It could be included as a side note. What's important in WS deployments are:
- Unique message throughput (the only thing measured afaik).
- Broadcast/"multicast" throughput, i.e. say you have 1k subscribers you wanna send the same message.
- Idle memory usage (for say chat apps that have low traffic - how many peers can a node maintain)
To me, the champion is uWebSocket. That's the entire reason why "Node" wins - those language bindings were written by the same genius who wrote that lib. Note that uWebSocket doesn't have TLS support, so whatever reverse proxy you put in front is gonna dominate usage because all of them have higher overheads, even nginx.
Interesting to note is that uWebSocket perf (especially memory footprint) can't be achieved even in Go, because of the goroutine overhead (there's no way in Go to read/write from multiple sockets from a single goroutine, so you have to spend 2 gorountines for realtime r/w). It could probably be achieved with Tokio though.
The whole paper is not only full of misunderstandings, it is full of errors and contradictions with the implementations.
- Rust is run in debug mode, by omitting the --release flag. This is a very basic mistake.
- Some implementations is logging to stdout on each message, which will lead to a lot of noise not only due to the overhead of doing so, but also due to lock contention for multi-threaded benchmarks.
- It states that the Go implementation is blocking and single-threaded, while it in fact is non-blocking and multi-threaded (concurrent).
- It implies the Rust implementation is not multi-threaded, while it in fact is because the implementation spawns a thread per connection. On that note, why not use an async websocket library for Rust instead? They're used much more.
- Gives VM-based languages zero time to warm up, giving them very little chance to do one of their jobs; runtime optimizations.
- It is not benchmarking websocket implementations specifically, it is benchmarking websocket implementations, JSON serialization and stdout logging all at once. This adds so much noise to the result that the result should be considered entirely invalid.
> To me, the champion is uWebSocket. That's the entire reason why "Node" wins [...]
A big part of why Node wins is because its implementation is not logging to stdout on each message like the other implementations do. Add a console.log in there and its performance tanks.
There is no HTTP handshake in RFC6455. A client sends a text with a pseudo unique key. The server sends a text with a key transform back to the client. The client then opens a socket to the server.
The distinction is important because assuming HTTP implies WebSockets is a channel riding over an HTTP server. Neither the client or server cares if you provide any support for HTTP so long as the connection is achieved. This is easily provable.
It also seems you misunderstand the relationship between WebSockets and TLS. TLS is TCP layer 4 while WebSockets is TCP layers 5 and 6. As such WebSockets work the same way regardless of TLS but TLS does provide an extra step of message fragmentation.
There is a difference in interpreting how a thing works and building a thing that does work.
Call it what you will. The point about the handshake is that TCP + http headers need comes before the upgrade to use the raw tcp streams. This is part of the benchmark and while it exists also in the real world it can be misleading because that’s testing connections, not message throughput.
Also I was wrong about uWebSocket, they do have tls support so you can skip reverse proxy. It deals with raw tcp conns and thus to encrypt you need tls support there. It is also a barebones http/1.1 server because why not. The thing I misremembered is I confused tls with http/2 which it does not support. This is unrelated to WS.
I was under the impression that the underlying net/http library uses a new goroutine for every connection, so each websocket gets its own goroutine. Or is there somewhere else you were expecting goroutines in addition to the one per connection?
http.ListenAndServe is implemented under the hood with a new goroutine per incoming connection. You don't have to explicitly use goroutines here, it's the default behaviour.
Yes _however_ the nodejs benchmark at least is handling each message asynchronously, whereas the go implementation is only handling connections asynchronously.
Edit to add: looks like the same goes for the c++ and rust implementations. So I think what we might be seeing in this benchmark (particularly the node vs c++ since it is the same library) is that asynchronously handling each message is beneficial, and the go standard libraries json parser is slow.
Edit 2: Actually I think the c++ version is async for each message! Dont know how to explain that then.
Well, tcp streams are purely sequential. It’s the ideal use case for a single process, since messages can’t be received out of order. There’s no computational advantage to “handling each message asynchronously” unless the message handling code itself does IO or something. And that’s not the responsibility of the websocket library.
> I'm curious how much of this is really measuring JSON [de]serialization performance.
Well, they did use the standard library for that, so quite a bit, I suppose. That thing is slow. I've got no idea how fast those functions are in other languages, but you're right that it would ruin the idea behind the benchmark.
Yeah I thought this looked familiar.. I went through this article about a year and a half ago when exploring WebSockets in Python for work. With some tuning and using a different libraries + libuv we were easily able to get similar performance to NodeJS.
I had a blog post somewhere to show the testing and results, but can't seem to find it at the moment though.
Also, in the end, they didn't use the JSON payload. It would have been interesting if they had just written a static string. I'm curious how much of this is really measuring JSON [de]serialization performance.
Finally, it's worth pointing out that WebSocket is a standard. It's possible that some of these implementations follow the standard better than others. For example, WebSocket requires that a text message be valid UTF8. Personally, I think that's a dumb requirement (and in my own websocket server implementation for Zig, I don't enforce this - if the application wants to, it can). But it's completely possible that some implementations enforce this and others don't, and that (along with every other check) could make a difference.