I Agree, It could be less than 15Gb for 2M WebSockets. But it's the first benchmark, and I decided to make it without much of optimizations - just to have a starting point. There is still much space for optimizations like reducing internal buffers and so on, but I will leave it for future tests.
It's actually not that different from a traditional (pre-AJAX) HTML forms web app.
That is the mainframe would send out a burst of data to draw a screen on the terminal, the user would fill out the fields, then the user would hit a button that sent the field content back to the mainframe.
Thus the mainframe did not so have to handle an interrupt for each character but instead just one for the whole form.
They didn't have the bloat back then that we have these days, like the crazy deep call stacks you see in both object-oriented and functional programming.
CICS programs were frequently written in assembler, sometimes COBOL and PL/I. Mainframe compilers from the early 1970s were much more advanced in many ways than the Unix/C technology described in the Dragon Book.
The mainframe did I/O through channel processors which were quite expensive in themselves, but offloaded a lot of work.
CICS did many things that operating systems do in user space. IBM never did produce a "ring to rule them all" operating system for the 360 series, but with the 370 some academics figured out how to run multiple operating systems in virtual machines. So CICS could run in a VM with the minimal OS that it needed.
Mainframe systems had the source code for CICS and the OS and usually used a custom build, so the kernel didn't have anything it didn't need.
The machine was expensive but cost effective if you used it efficiently, so people did.
Your bottleneck might be packets/sec of 150k. Soft interrupt handling is likely getting saturated. If you tune your network (receive side scaling etc) for high packet throughput, you may be able to get the benchmark to find the application bottleneck.
If @lganzzzo's oatpp can indeed handle 2M concurrent on 8core/52GB with _presence_ and real data being communicated/broadcast it would be worth looking into.
We use Phoenix for a number of projects and in addition to handling lots of Websocket connections it's a fully featured framework with an excellent workflow, expressive ORM and seamless DevOps.
Seems like oatpp has been built with a single purpose in mind (similar to Redis). Always good to see people diving deep into a topic to push the boundaries of the state of the art.
I'm very interested in the "seamless DevOps" part. Could you point me to some link describing how you manage to achieve that with phoenix ? (do you use docker + kubernetes, or OTP, or something else ?)
As others have suggested, use Distillery + Edeliver for zero-downtime deployment.
Step-by-step instructions here: https://git.io/fjcVz (happy to answer any questions you have, please open issues on GitHub as we have no way of checking notifications on HN)
Thanks,
In this benchmark I decided to go without much of the framework tuning. Mostly took it as is in order to see what I can get.
In any case oatpp is general purpose web framework, it is understood that dedicated libraries like uWebSockets may be more optimized.
Nevertheless I beleve it's still much space for tunning and optimizing of oatpp.
@lganzzzo great work! (bookmarked for further reading...) out of curiosity, did you consider using Rust for this before using C++? or did you dive strait into C++?
I'm confused. If there are 20M clients and the server is sending 9M messages per minute, doesn't that mean that each client is sending less than 1 message per minute?
I thought shoehorning was a microcomputer thing until I found out about that!