If the author is reading this, I think a single repository would be more appropriate than multiple repos [1]. It would be nice to set things up so we can simply git pull, docker run, and execute the benchmarks for each language sequentially.
Something that stood out to me is the author’s conclusion that "Node.js wins." However, both the Node.js and C++ versions use the same library, uWebSockets! I suspect the actual takeaway is this:
"uWebSockets wins, and the uWebSockets authors know their library well enough that even their JavaScript wrapper outperforms my own implementation in plain C++ using the same library!" :-p
Makes me wonder if there’s something different that could be done in Go to achieve better performance. Alternatively, this may highlight which language/library makes it easier to do the right thing out of the box (for example, it seems easier to use uWebsockets in nodejs than in C++). TechEmpower controversies also come to mind, where "winning" implementations often don’t reflect how developers typically write code in a given language, framework, or library.
Their explanation for why Go performs badly didn't make any sense to me. I'm not sure if they don't understand how goroutines work, if I don't understand how goroutines work or if I just don't understand their explanation.
Also, in the end, they didn't use the JSON payload. It would have been interesting if they had just written a static string. I'm curious how much of this is really measuring JSON [de]serialization performance.
Finally, it's worth pointing out that WebSocket is a standard. It's possible that some of these implementations follow the standard better than others. For example, WebSocket requires that a text message be valid UTF8. Personally, I think that's a dumb requirement (and in my own websocket server implementation for Zig, I don't enforce this - if the application wants to, it can). But it's completely possible that some implementations enforce this and others don't, and that (along with every other check) could make a difference.
> Their explanation for why Go performs badly didn't make any sense to me.
To me, the whole paper is full of misunderstanding, at least the analysis. There's just speculation based on caricatures of the language, like "node is async", "c++ is low level" etc. The fact that their C++ impl using uWebSocket was significantly slower than then Node, which used uWebSocket bindings, should have led them to question the test setup (they probably used threads which defeats the purpose of uWebSocket.
Anyway.. The "connection time" is just HTTP handshake. It could be included as a side note. What's important in WS deployments are:
- Unique message throughput (the only thing measured afaik).
- Broadcast/"multicast" throughput, i.e. say you have 1k subscribers you wanna send the same message.
- Idle memory usage (for say chat apps that have low traffic - how many peers can a node maintain)
To me, the champion is uWebSocket. That's the entire reason why "Node" wins - those language bindings were written by the same genius who wrote that lib. Note that uWebSocket doesn't have TLS support, so whatever reverse proxy you put in front is gonna dominate usage because all of them have higher overheads, even nginx.
Interesting to note is that uWebSocket perf (especially memory footprint) can't be achieved even in Go, because of the goroutine overhead (there's no way in Go to read/write from multiple sockets from a single goroutine, so you have to spend 2 gorountines for realtime r/w). It could probably be achieved with Tokio though.
The whole paper is not only full of misunderstandings, it is full of errors and contradictions with the implementations.
- Rust is run in debug mode, by omitting the --release flag. This is a very basic mistake.
- Some implementations is logging to stdout on each message, which will lead to a lot of noise not only due to the overhead of doing so, but also due to lock contention for multi-threaded benchmarks.
- It states that the Go implementation is blocking and single-threaded, while it in fact is non-blocking and multi-threaded (concurrent).
- It implies the Rust implementation is not multi-threaded, while it in fact is because the implementation spawns a thread per connection. On that note, why not use an async websocket library for Rust instead? They're used much more.
- Gives VM-based languages zero time to warm up, giving them very little chance to do one of their jobs; runtime optimizations.
- It is not benchmarking websocket implementations specifically, it is benchmarking websocket implementations, JSON serialization and stdout logging all at once. This adds so much noise to the result that the result should be considered entirely invalid.
> To me, the champion is uWebSocket. That's the entire reason why "Node" wins [...]
A big part of why Node wins is because its implementation is not logging to stdout on each message like the other implementations do. Add a console.log in there and its performance tanks.
There is no HTTP handshake in RFC6455. A client sends a text with a pseudo unique key. The server sends a text with a key transform back to the client. The client then opens a socket to the server.
The distinction is important because assuming HTTP implies WebSockets is a channel riding over an HTTP server. Neither the client or server cares if you provide any support for HTTP so long as the connection is achieved. This is easily provable.
It also seems you misunderstand the relationship between WebSockets and TLS. TLS is TCP layer 4 while WebSockets is TCP layers 5 and 6. As such WebSockets work the same way regardless of TLS but TLS does provide an extra step of message fragmentation.
There is a difference in interpreting how a thing works and building a thing that does work.
Call it what you will. The point about the handshake is that TCP + http headers need comes before the upgrade to use the raw tcp streams. This is part of the benchmark and while it exists also in the real world it can be misleading because that’s testing connections, not message throughput.
Also I was wrong about uWebSocket, they do have tls support so you can skip reverse proxy. It deals with raw tcp conns and thus to encrypt you need tls support there. It is also a barebones http/1.1 server because why not. The thing I misremembered is I confused tls with http/2 which it does not support. This is unrelated to WS.
I was under the impression that the underlying net/http library uses a new goroutine for every connection, so each websocket gets its own goroutine. Or is there somewhere else you were expecting goroutines in addition to the one per connection?
http.ListenAndServe is implemented under the hood with a new goroutine per incoming connection. You don't have to explicitly use goroutines here, it's the default behaviour.
Yes _however_ the nodejs benchmark at least is handling each message asynchronously, whereas the go implementation is only handling connections asynchronously.
Edit to add: looks like the same goes for the c++ and rust implementations. So I think what we might be seeing in this benchmark (particularly the node vs c++ since it is the same library) is that asynchronously handling each message is beneficial, and the go standard libraries json parser is slow.
Edit 2: Actually I think the c++ version is async for each message! Dont know how to explain that then.
Well, tcp streams are purely sequential. It’s the ideal use case for a single process, since messages can’t be received out of order. There’s no computational advantage to “handling each message asynchronously” unless the message handling code itself does IO or something. And that’s not the responsibility of the websocket library.
> I'm curious how much of this is really measuring JSON [de]serialization performance.
Well, they did use the standard library for that, so quite a bit, I suppose. That thing is slow. I've got no idea how fast those functions are in other languages, but you're right that it would ruin the idea behind the benchmark.
Yeah I thought this looked familiar.. I went through this article about a year and a half ago when exploring WebSockets in Python for work. With some tuning and using a different libraries + libuv we were easily able to get similar performance to NodeJS.
I had a blog post somewhere to show the testing and results, but can't seem to find it at the moment though.
It's also interesting that https://github.com/websockets/ws does not appear in this study, given that in the node ecosystem it is ~3x more likely to be used (not a perfect measurement but ws has 28k github stars vs uWebSockets 8k stars)
Thanks for the free access links. I did read through a bit.
The title is misleading because exactly one implementation was chosen for each of the tested languages. They conclude “do not us e Python” because the Python websockets library performs pretty poorly.
Each language is scored based on the library chosen. I have to believe there are more options for some of these languages.
As someone who is implementing an Elixir LiveView app right now, I was particularly curious to see how Elixir performed given LiveViews reliance on websockets, but as Elixir didn’t make the cut.
> The title is misleading because exactly one implementation was chosen for each of the tested languages. They conclude “do not use Python” because the Python websockets library performs pretty poorly.
On the contrary, they tried autobahn and aiohttp as well:
For the Python websocket, a generic module is used which is simply named "websockets". ... This is most likely a module that offers the simplest of websocket functionality. Now, it was mentioned that this only partly explains the poor performance. While writing this report, it seemed unjust not to give Python a fighting chance. So, the websocket server has been rebuilt with the more trusted Autobahn library and the benchmark test has been rerun. This new server does lead to better results ... still unable to finish the benchmark test.... [T]he Python server is rebuilt one more time, this time with a library by the name of "aiohttp." At last, all 100 rounds of the benchmark are able to be completed, though not very well. Aiohttp still takes longer than Go, and becomes substantially unreliable after round 50, dropping anywhere from 30-50% of the messages. It can only be concluded that the reason for this dreadful performance is Python itself.
Was this published as-is to some sort of prominent CS journal? I honestly can't tell from the link. If that's the case, I'm very disappointed and would have a few choice words about the state of "academia".
The author couldn't tell why he didn't manage to make run the C or python program but figured it is probably the blame of the language for some obscure reasons.
He also mentioned that he should have implemented multithreading in C++ to be comparable with Node, but meh that's probably also not of his concern, let compare them as is ^^`
Also he doesn't mention the actual language of the library used, but that would have voided the interest of the article, so I quite may understand that omission :P
But at the end, nothing can be learned from this and it is hard to believe it is what "research" can produce
Yeah it’s a rubbish paper. It’s just a comparison of some websocket implementations at some particular point in time. It tells you how fast some of the fastest WS implementations are in absolute terms, but there are no broad conclusions you can make other than the fact that there’s more room for optimisation in a few libraries. Whoopty doo. News at 11.
I was able to make a uWebsockets adapter for NestJS pretty easily. It's a bit sensitive of a library to integrate though, a single write when the connection is gone and you get a segfault, which means a lot of checking before writing if you've yielded since you last checked. This was a few years ago, perhaps they fixed that.
I have a home grown websocket library I wrote in TypeScript for node.js. When I measured it a couple of years ago here were my findings:
* I was able to send a little under 11x faster than I could process the messages on the receiving end. I suspected this was due to accounting for processing of frame headers with consideration of the various forms of message fragmentation. I also ran both send and receive operations on the same machine which could have biased the numbers
* I was able to send messages on my hardware at 280,000 messages per second. Bun claimed, at that time, a send rate of about 780,000 messages per second. My hardware is old with DDR3 memory. I suspect faster memory would increase those numbers more than anything else, but I never validated that
* In real world practical use switching from HTTP for data messaging to WebSockets made my big application about 8x faster overall in test automation.
Things I suspect, my other assumptions:
* A WebSocket library can achieve superior performance if written in a strongly typed language that is statically compiled and without garbage collection. Bun achieved far superior numbers and is written in Zig.
* I suspect that faster memory would lower the performance gap between sending and receiving when perf testing on a single machine
A meta comment: This paper gives an example of a "teaser abstract". It says what was done, but does not say anything about the actual results. This style is relatively common, but I find it very annoying. There was certainly enough room in the abstract to provide a concise summary of the actual results, which would both inform the reader and perhaps encourage more people to read the entire paper.
That library can use either select, libuv, libev or libevent if I'm not mistaken. Fibers are not used at this point, although other libraries have explored the idea (revoltphp).
If we're assuming the paper author installed a typical PHP, then it's using select for async I/O. It's the slowest implementation of the event loop. Using something like swoole would extract even more performance out of PHP for async io scenarios.
Is this a peer reviewed paper? It does not seem to be. At a first glance, the researchgate URI and the way the title was formulated made me think it would be the case.
I had a quick run with Starlette/uvicorn: similar results than node/uws, a bit faster actually but not significant enough to be meaningful. So I would expect similar results with other modern/fast libraries.
I also found that the "websockets" library is a bit slower (25% or so), all with the default settings of that benchmark.
The issue with Python that the author faced takes 2 minutes to identify and fix: raise ulimits.
Finally, one can question the value of such benchmark in real world applications, especially when the supporting article is so poorly researched as other already pointed.
Something that stood out to me is the author’s conclusion that "Node.js wins." However, both the Node.js and C++ versions use the same library, uWebSockets! I suspect the actual takeaway is this:
"uWebSockets wins, and the uWebSockets authors know their library well enough that even their JavaScript wrapper outperforms my own implementation in plain C++ using the same library!" :-p
Makes me wonder if there’s something different that could be done in Go to achieve better performance. Alternatively, this may highlight which language/library makes it easier to do the right thing out of the box (for example, it seems easier to use uWebsockets in nodejs than in C++). TechEmpower controversies also come to mind, where "winning" implementations often don’t reflect how developers typically write code in a given language, framework, or library.
--
1: https://github.com/matttomasetti?tab=repositories&q=websocke...
reply