Show HN: 2M fully loaded concurrent WebSockets

PaulHoule · on May 7, 2019

Reminds me of the good old days when an IBM 360 running CICS could support 13,000+ terminals with just 16 MB of RAM.

I thought shoehorning was a microcomputer thing until I found out about that!

lganzzzo · on May 7, 2019

I Agree, It could be less than 15Gb for 2M WebSockets. But it's the first benchmark, and I decided to make it without much of optimizations - just to have a starting point. There is still much space for optimizations like reducing internal buffers and so on, but I will leave it for future tests.

drudru · on May 7, 2019

Is there a URL that talks more about how CICS pulled that off?

tyingq · on May 7, 2019

3270 terminals are block mode, so there's less state for the server side to deal with as compared to a typical Unix terminal.

I'm sure there's more clever stuff, but that does help a lot.

PaulHoule · on May 7, 2019

It's actually not that different from a traditional (pre-AJAX) HTML forms web app.

That is the mainframe would send out a burst of data to draw a screen on the terminal, the user would fill out the fields, then the user would hit a button that sent the field content back to the mainframe.

Thus the mainframe did not so have to handle an interrupt for each character but instead just one for the whole form.

drudru · on May 7, 2019

Ah. Thanks guys.

PaulHoule · on May 7, 2019

I'll fill in some more things.

They didn't have the bloat back then that we have these days, like the crazy deep call stacks you see in both object-oriented and functional programming.

CICS programs were frequently written in assembler, sometimes COBOL and PL/I. Mainframe compilers from the early 1970s were much more advanced in many ways than the Unix/C technology described in the Dragon Book.

The mainframe did I/O through channel processors which were quite expensive in themselves, but offloaded a lot of work.

CICS did many things that operating systems do in user space. IBM never did produce a "ring to rule them all" operating system for the 360 series, but with the 370 some academics figured out how to run multiple operating systems in virtual machines. So CICS could run in a VM with the minimal OS that it needed.

Mainframe systems had the source code for CICS and the OS and usually used a custom build, so the kernel didn't have anything it didn't need.

The machine was expensive but cost effective if you used it efficiently, so people did.

drudru · on May 8, 2019

Thanks for elaborating more. I think people who do performance work constantly rediscover or re-invent this (ie like the websocket post).

lganzzzo · on May 7, 2019

You may also want to check out:

- benchmark project repo - https://github.com/oatpp/benchmark-websocket

- oatpp framework repo (the benchmark built with) - https://github.com/oatpp/oatpp

vinay_ys · on May 7, 2019

Your bottleneck might be packets/sec of 150k. Soft interrupt handling is likely getting saturated. If you tune your network (receive side scaling etc) for high packet throughput, you may be able to get the benchmark to find the application bottleneck.

foobar502 · on May 7, 2019

How would you do that?

unmole · on May 7, 2019

For UDP, this is an excellent resource: https://events.static.linuxfound.org/sites/events/files/slid...

timwis · on May 7, 2019

Thought this was gonna be about elixir

nelsonic · on May 7, 2019

Indeed the Phoenix benchmark with 2M concurrent connections https://phoenixframework.org/blog/the-road-to-2-million-webs... was on a 40core/128gb machine but the load was a pretty realistic "real world" use case.

If @lganzzzo's oatpp can indeed handle 2M concurrent on 8core/52GB with _presence_ and real data being communicated/broadcast it would be worth looking into.

We use Phoenix for a number of projects and in addition to handling lots of Websocket connections it's a fully featured framework with an excellent workflow, expressive ORM and seamless DevOps.

Seems like oatpp has been built with a single purpose in mind (similar to Redis). Always good to see people diving deep into a topic to push the boundaries of the state of the art.

bsaul · on May 7, 2019

I'm very interested in the "seamless DevOps" part. Could you point me to some link describing how you manage to achieve that with phoenix ? (do you use docker + kubernetes, or OTP, or something else ?)

arc_of_descent · on May 7, 2019

Have a look at distillery for building Elixir releases, and edeliver for deployment.

nelsonic · on May 8, 2019

As others have suggested, use Distillery + Edeliver for zero-downtime deployment. Step-by-step instructions here: https://git.io/fjcVz (happy to answer any questions you have, please open issues on GitHub as we have no way of checking notifications on HN)

Thaxll · on May 7, 2019

The memory seems high actually, I've seen other C / C++ / Go implementation using less than a 1GB for 1m connections. Pretty cool nonetheless!

https://github.com/uNetworking/uWebSockets

https://speakerdeck.com/eranyanay/going-infinite-handling-1m...

lganzzzo · on May 7, 2019

Thanks, In this benchmark I decided to go without much of the framework tuning. Mostly took it as is in order to see what I can get. In any case oatpp is general purpose web framework, it is understood that dedicated libraries like uWebSockets may be more optimized. Nevertheless I beleve it's still much space for tunning and optimizing of oatpp.

nelsonic · on May 7, 2019

@lganzzzo great work! (bookmarked for further reading...) out of curiosity, did you consider using Rust for this before using C++? or did you dive strait into C++?

lganzzzo · on May 7, 2019

Hey @nelsonic,

Thanks for the question!

I dived straight into C ++ because I had some groundwork written in C ++ from my previous projects.

chirau · on May 7, 2019

Get your credit card or your infra ready, whichever applies... someone seating here with me is about to max out botnet rush to you

lganzzzo · on May 7, 2019

:)) My infra is $15/Month instance running oatpp server

CptMauli · on May 7, 2019

Does anybody have real world experience (and maybe data) for websockets when used in a mobile environment (with a lot of dropped connections)?

iandanforth · on May 7, 2019

I'm confused. If there are 20M clients and the server is sending 9M messages per minute, doesn't that mean that each client is sending less than 1 message per minute?

lganzzzo · on May 7, 2019

Not 20M, but 2M fully loaded sockets... about 1 message per 13sec per Connection

bennettlp · on May 7, 2019

I might of missed it, but didn’t see in the article any mention of multiple IPs or NICs isn’t the theoretical maximum 65k ports per IP?

ec109685 · on May 8, 2019

No, the tuple includes source ip and port as well.

popzq · on May 7, 2019

[flagged]

estomagordo · on May 7, 2019

lol haha some langauges are slower than others