*We used Ubuntu 18.04 with the 4.15.0-1031-aws kernel, with sysctld overrides se...

lstodd · on March 5, 2019

omg.

100K/sec was achieved by yours truly 10 years ago on a contemporary xeon with nothing but nginx and python2.6 - gevent patched to not copy the stack, just switch it. (EDIT: and also a FIFO I/O scheduler)

Why does this require 36 cores today??

benfolred · on March 5, 2019

You are comparing apples and oranges.

They are purposely holding the connections around for 1+10%seconds. So first of all, it means that, for a rate of 100k conn/s, they are going to have around 200k open connections after a second. This already imposes a different profile than 100k single request connections per second.

You are also assuming that they need 36 cores to achieve 100k connections per second, which is likely not the case since they quickly moved the bottleneck to the OS. I am assuming they have other requirements that force them to run on such a large machine and they want to make sure they are not running into any single-core bottlenecks (and having a large amount of cores makes it much easier to spot those).

Thaxll · on March 5, 2019

I highly doubt you were able to do 100k connections/sec 10 years ago with the same hardware, you must be confused between requests/sec and connections/sec very different things.

rozap · on March 5, 2019

If you read the article, in the third or so paragraph.

> What this means, performance-wise, is that measuring requests per second gets a lot more attention than connections per second. Usually, the latter can be one or two orders of magnitude lower than the former. Correspondingly, benchmarks use long-living connections to simulate multiple requests from the same device.

lstodd · on March 5, 2019

Your point being? I was talking of single-request connections.

jasonlotito · on March 5, 2019

> I was talking of single-request connection.

Yes. Which is not what's being discussed here.

lstodd · on March 5, 2019

Yeah, what's being discussed here are connections without any i/o over them. Just an fd lingering somewhere in an epoll pool. Which obviuosly is even less taxing. So your point is?

kierenj · on March 5, 2019

..that you are comparing apples and oranges, like he said

StreamBright · on March 5, 2019

Nothing tells more about an engineer than the last undocumented unreproducible hello world micro benchmark conducted by her once and only once some years ago that beats a real world application in terms of req/s leaving out latency profile.

lpgauth · on March 5, 2019

Duh. Of course a C event loop while be faster at accepting connections, that's not the point of the article.

lstodd · on March 5, 2019

They boast only accepting 100K connections per second, not pushing back a meaningful response?

Why this is even here then?

dzik · on March 5, 2019

Would you mind sharing the details? (URL maybe)

I think limiting factor might be not number of cores and outside of erl scope, that is eth card they used, network infrastructure, etc. Even Elixir could be something that impacts the tests.

lstodd · on March 5, 2019

There is no url summing the details unfortunately.

The work in some unknown state is at https://code.google.com/archive/p/coev/

Without the business logic (which was in django IIRC) and deployment details, obviously. Very outdated and some later patches might be missing. No one was interested, you see.

I'd be surprised if there were problems with network, and if there were, that should have been obvious in the metrics.

Maybe the metrics were inadequate

dzik · on March 5, 2019

Sorry, where do the authors claim they achieved >100k connections per second?

lstodd · on March 5, 2019

I'm the author, and that's the truth.

Can't see how this can be replicated as a controlled experiment nowadays, unfortunately.

But if you define exactly what's a request, what's a response, and what the connection/response ratio is let's have a race.

Like, you set the parameters, and whoever serves that on lower-capability hardware wins. Py3 plus low-level C/Rust hacks vs Elixir, say.

dzik · on March 5, 2019

That's the thing. You can always hack something in C to prove there is a better way for a specific task. In the past I did things like that just for fun. But in the real world it does not work like that. You buy into things as a whole, accepting their pros and cons as a whole. If you need to hack - change your tools.

lstodd · on March 5, 2019

Please do not beat the strawman. And don't set him on fire. He's innocent.

I offered to beat whatever you've done by tweaking the Py3 stdlib. Not by writing a plain C implementation.

If you for some reason doubt that this old python thing is of the real world - let me disappoint you. It was done because nothing else could do those 100K rps back then. And it did the thing for five years, until the whole stack was ditched.

ramchip · on March 5, 2019

I think you’re misinterpreting the point of the article. It’s not gloating about how much they scale, or saying their particular tech beats other techs. It’s just explaining how to solve a specific scaling issue on a specific platform.

As an Elixir user who had to deal with high connections/s in the past, I found it interesting and useful. I use Elixir for reasons that have nothing to do with performance so a language comparison isn’t particularly interesting.

jacobn · on March 5, 2019

Was your benchmark for requests/sec or connections/sec?

lstodd · on March 5, 2019

Single-request connections. Response required consulting memcached and updating it from postgres if out of luck, which was very rare but still needed (and patching then-existing postgres C client to be async aware was an undertaking)

jasonlotito · on March 5, 2019

> Single-request connections.

What does that mean? You keep qualifying "connections." It's a connection. It holds onto it's connection for X period of time. An HTTP request is just a single-request connection, which is NOT what this article is discussing.

lstodd · on March 5, 2019

One HTTP connection, one request, one response, connection closed.

I admit I didn't first see that they actually don't do any i/o over those connections.

Well, you know, handling x accepts() per second and holding onto y fds is even less than nothing to be proud of.

jasonlotito · on March 5, 2019

So yeah, those are generally considered to be requests per second. Apples and oranges.

dzik · on March 5, 2019

Did starting more acceptors than the number of cores make any difference?

kt315_ · on March 6, 2019

Author here. We tried number of acceptors that was 4x and 16x number of cores without any difference.

zambal · on March 5, 2019

I'm not related to the author(s) of the article in any way. Just a slightly more careful reader than OP ;)

SnorkelTan · on March 5, 2019

skimming fail :(