Achieving 100k connections per second with Elixir

dzik · on March 5, 2019

This article is quite good, especially part about bottleneck caused by single supervisor in ranch. However I have to say that title is a bit misleading because all of this has nothing to do with Elixir, it's all about Linux kernel and Erlang, cowboy and ranch are written in Erlang.

Having said that, I will add that I think it is good to have Elixir.

keithnoizu · on March 5, 2019

And here i'm happy with my 8k requests per second Elixir IOT solution.

I use long lived processes and had to come up with some magic to work around the supervisor behavior with high child counts, etc.

Roughly I randomly assign worked to a node in the cluster if they have not yet been assigned. (there is some logic tracking total nodes on cluster and max / target that influences the decision). I verify if the remote (or local) worker is alive or migrating by checking a fragmented process Registry and unique identifier via :rpc (because I do some recovery logic if it's offline and let the caller specify if messages should force a spawn or can be ignored if the process is offline) and then pad the call with some meta information for rerouting so the receiver can confirm it is the same process the message was initially sent to (since processes cycle so frequently that the initial process may have died and a new process may have spawned in the time it took to forward the message).

If the process has changed mid transit or the worker has been flagged for migration the message gets rerouted to the new destination+process. If a process does not yet exist a hash based on the worker type + id and available worker pools is used to select which of the auto generated but named and registered (WorkType.PoolSupervisor_123) worker supervisors spawns the child node.

It's a trip, and needs to be heavily documented. Starting from scratch i'd probably change some things and it probably needs some refinement later this year before the next batch of 250-500k devices get added to the network, but the costs per reporting device are fantastic with plenty of low hanging fruit for improving the cogs further so I'm happy.

https://github.com/noizu/SimplePool/

csisnett · on March 5, 2019

"is a bit misleading because all of this has nothing to do with Elixir" Stressgrid is written in elixir though, https://gitlab.com/stressgrid/stressgrid

dzik · on March 5, 2019

Point taken and I am already looking at stressgrid, "millions of users" is definitely a selling point to me. It is actually quite hard to generate enough and correct traffic to stress test large distributed systems.

dnautics · on March 5, 2019

Presumably they were using cowboy through Elixir. It's not hard, the module is just called :cowboy instead of cowboy.

dqv · on March 5, 2019

Or `Cowboy` with `alias :cowboy, as: Cowboy` ;)

latch · on March 6, 2019

Stick with the :module notation. This makes it clear that you need to be in "erlang mode"...indexes starting at 1, [probably] charlists instead of binaries, and possibly weird (from an elixir programmer's point of view) argument order.

dnautics · on March 6, 2019

To be fair, it's a rare thing to be using an index in elixir at all.

rdtsc · on March 5, 2019

That's fine I think. Elixir is their primary language it seems and using Erlang bits from Elixir is pretty easy, which is nice. It's a great benefit to the BEAM VM ecosystem. I think Elixir is the newer language and people enjoy up-voting it more. They could have said something like using BEAM VM and Cowboy maybe...? But I don't think it's a great misrepresentation either way.

rargulati · on March 5, 2019

I'd love to see data on the average on-call incidents for an application written in language X (say Go) vs those written in Elixir.

Concretely, its it the case, for an application where Elixir/Erlang/Beam are a great choice, but also, another language would be fine, that the equivalent Elixir application results in less downtime/pages than the alternative. Anything from the perfect app to something with a ton of races/leaks.

Is this a fair question (maybe I'm presuming too much of BEAM/supervisor pattern, I zero experience with it)?

rdtsc · on March 6, 2019

> I'd love to see data on the average on-call incidents

Don't have any hard data to compare but having been involved in debugging running Erlang systems. It's very nice having the ability to restart separate supervisors while the rest of the processes handle requests. Being able to do hot code loading to say fix bugs or add extra logging. And my all time favorite -- live tracing after connecting to a VM's remote shell. You can just pick any function, args, and process and say "trace these for a few seconds if a specific condition happens". None of those individually are earth shattering but taken together they are just so pleasant to use. I wouldn't enjoy going back to anything didn't those capabilities.

And yes, that restarting of sub-systems (supervision trees) happens automatically as well. There were a number of cases were it turned a potential "wake up 4am and fix this now, cause everything crashed" into a "meh, it's fine until I get to it next week" kind of a problem.

brightball · on March 6, 2019

Is there a good write up of how to do that somewhere?

rdtsc · on March 6, 2019

Which part or just in general ops with Erlang?

Overall I would say this book is a good start https://www.erlang-in-anger.com/

Supervisors are just a general pattern in Erlang. Any book will have something about it. I like this one: https://learnyousomeerlang.com/supervisors

Restarting frequency and limits are just one of the parameters you specify. So don't need to do anything fancy or special there.

Hot code loading might not be as obvious: http://erlang.org/doc/reference_manual/code_loading.html but is essentially just compiling the module on the same VM version (or close by, no more than 2 version away), copying it to the server in the same path as the original. The original could be save to a backup file. The do `l(modulename)` to load it.

For tracing I recommend http://ferd.github.io/recon/. Erlang in Anger book will also have example of tracing. http://erlang.org/doc/man/dbg.html has some nice shortcuts too, but be careful using it in production is it doesn't have any overload protection. So if you accidentally trace all the messages on all the processes, you might crash your service :-)

brightball · on March 6, 2019

Tracing is mainly what I was going for. I'm very familiar with the various patterns and the run time, but I haven't seen the tracing aspects referenced in as much detail.

Thanks!

kureikain · on March 5, 2019

I don't have but I can tell you from my experience with Ruby, Go, Node and Elixir.

I have zero on-call for Go. I had very few for Elixir. But the bug were in logic code. Same with Ruby.

But it's a disaster with Node. We used TypeScript so it catch lot of type issue. However, the Node runtime is weird. We run into DNS issue(like yo have to bump the libuv thread pool, cache DNS). JSON parsing issue and block the event loop etc...max memory...

optimusclimb · on March 6, 2019

This would be too heavily influenced by confounding factors.

For instance:

* Are the teams that use certain languages comprised of more experienced people?

* How mature is the company and project? I.e., a faster moving startup cutting more corners, where time was decided to be of the essence (rightly or wrongly) will likely produce more on call incidents than a slower, more established company that can takes its time

vasilia · on March 5, 2019

I can handle 120k connections per second with my custom made, highly optimized multiprocess C++ server. But the main problem is business logic. Just make 2 SQL queries to MySQL on each HTTP request and look at how it will degrade.

repsilat · on March 6, 2019

There are simple tricks to make those queries not kill performance. Here is a dumb proof-of-concept I made a few months ago: https://github.com/MatthewSteel/carpool

The general idea is combining queries from different HTTP requests into a single database query/transaction, amortising the (significant) per-query cost over those requests. For simple use-cases it doesn't add a whole lot of complexity, can reduce both load and latency significantly, and doesn't lose transactional guarantees.

Not 100k/sec writes on my laptop, mind you :-).

Perseids · on March 6, 2019

Since looking into modern concurrency concepts I've always thought such (in my opinion obvious) batching should be part of sophisticated ORM frameworks such as Rails' Active Records. Alas, their design decisions always seem to cater for making the dumb usages more performant (sometimes automagically, sometimes adding huge layers of cruft) than rewarding programmers who are willing to learn a few concepts by creating interfaces with strong contracts with better safety and performance.

E.g. please give me guidance on how to better structure my database model so that it doesn't effectively end up as a huge spaghetti heap of global variables. My personal horror: updating a single database field spurs 20 additional SQL queries creating several new rows in seemingly unrelated tables. Digging in I find this was due to an after_save hook in the database model which created an avalanche of other after_save/after_validation hooks to fire. The worst of it: Asking for how this has come to be I find out that each step of the way was an elegant solution to some code duplication in the controller, some forgotten edge case in the UI, some bug in the business logic. Basically ending up with extremely complex control flows is the default.

So of course, if your code has next to no isolation, batching up queries produces incalculable risks.

/rant, sorry.

repsilat · on March 6, 2019

I agree that with that kind of complexity (or with the belief that that kind of complexity is inevitable) it isn't a great idea. You lose isolation, and if you can't predict which rows will be touched you're hosed.

One mitigating factor, this sort of optimisation should be applied to frequent queries more than expensive queries. In some use-cases the former kind may be simple ("Is this user logged in?") even if the latter is not.

And on keeping that complexity down: the traditional story has been "normalise until you only need to update data in one place," but often requirements don't line up well to foreign-key constraints etc. The newer story can work, though: "Denormalise until you only have to update in one place, shunt the complexity to user code, and serialise writes." It's anathema to many, but it is becoming more common (usually in places that don't use RBDMSs though.)

chug · on March 6, 2019

Looks interesting! You mentioned in the docs that it would be simpler once abstractions develop and that made me realize it's similar to facebook/dataloader, just used across requests instead of batching up all of the queries per request. It's also of course a generalized form of it that represents batching a parametrized method more so than just batching retrievals by some kind of unique key. It may be able to serve as something to lift API ideas from though. Like some kind of BatchedTask that has an execute() method that takes an array of args then batches those into an array of array of args for the underlying batched implementation.

https://github.com/facebook/dataloader/blob/master/README.md

repsilat · on March 6, 2019

Ah, thanks for the link, I'll definitely check it out.

vasilia · on March 6, 2019

It may help if you use a single data source. But we are using more than 2000 data sources and all they distributed/replicated in different data centres across different countries.

jamra · on March 5, 2019

That seems like a job for sharded databases and caches.

vasilia · on March 5, 2019

Of course, we have both, but 100k is nothing if it's not a CDN server which stores static file in-memory. Moreover, the main metric is latency, not a number of connections. You can scale a number of connections with an L3/L4 load balancer, but not latency.

dnekencjfkerf · on March 5, 2019

> What this means, performance-wise, is that measuring requests per second gets a lot more attention than connections per second. Usually, the latter can be one or two orders of magnitude lower than the former.

does anyone know how does 100k connections compare with other servers?

Thaxll · on March 5, 2019

It's probably easy to do with Java / C# and Go, they're using a 36 cores machine to achieve that with fast CPU, meaning that you need 3000conn/sec per core, very doable with recent frameworks.

ralusek · on March 5, 2019

Should be possible just fine with NodeJS, so long as it's clustered to run an instance per core.

The order of magnitude(s) differentiator for server performance really comes down to whether or not the architecture is blocking or non-blocking.

holoduke · on March 6, 2019

We run about 20k connections per second with nodejs on a 12 core machine. All node is doing is parsing cached JSON, modifies it and serve it back to the client. One server has an uptime of 560days without any memory/performance issues.

ww520 · on March 5, 2019

Java was able to that for the longest time. I remember seeing async io based servers doing 500K or 1M connections per machine in the last ten years. In all cases they needed to reconfigure the OS kernel since that's where the bottleneck was.

ww520 · on March 6, 2019

A quick search shows that the problem has been shifted to tackle the so called C10M problem, with C1M/second. That was couple years ago. Not sure what the current state is.

http://highscalability.com/blog/2013/5/13/the-secret-to-10-m...

https://mrotaru.wordpress.com/2015/05/20/how-migratorydata-s...

adontz · on March 5, 2019

With Python/uvloop I can easily get 10K-12K connections per second per core, so 36 cores will be fine with Python too.

ioquatix · on March 5, 2019

Can you show me your example code?

Also, assuming it scales up linearly is a bit risky, although I agree with that kind of conn/s I am sure it will be sufficient.

adontz · on March 5, 2019

Nothing special, just usual asyncio Protocol with uvloop policy.

ioquatix · on March 6, 2019

I would love to see the code.

ioquatix · on March 5, 2019

On my desktop computer with a single thread, Ruby can handle about 2000/conn/s. I'm just going to check a single thread with a similar C++ implementation.

holtalanm · on March 5, 2019

im a simple man. i see Elixir, i upvote.

that being said, this article was pretty informative. The bit about the proposed SO_REUSEPORT socket option was really interesting. Really fun to read about performance bottleneck detection and improvement.

edit: wow, downvoting for making a simple joke about liking elixir. Cool.

mrinterweb · on March 6, 2019

I've found that humor in comments on HN is usually not well received. Not sure why, just an observation.

dang · on March 6, 2019

https://news.ycombinator.com/item?id=18817249

Maybe we should add something about this to https://news.ycombinator.com/newsfaq.html.

pmarreck · on March 6, 2019

It's ASD. << that was a joke

I think that the inclination towards "meaningless" humor makes it too much like Reddit. These folks want SUBSTANCE! (Well, that's why _I_ come here, at least!)

thatcat · on March 6, 2019

Simple jokes from a simple man... I laughed anyway.

supermatt · on March 6, 2019

Id like to see memory consumption charts for this. It seems you miss this on all your posts. Not a criticism (and thank you for what you have done), its just something I (and others) would like to see, and if you are running the tests its just another metric to log :D

Also, any update on your previous article? https://news.ycombinator.com/item?id=19094233

kt315_ · on March 6, 2019

We are preparing new benchmark test for major platforms. Among other suggestions it will include memory consumption.

Leace · on March 5, 2019

ejabberd [0], XMPP server is written in erlang and powers chat in some of the biggest MMORPGs [1].

[0]: https://github.com/processone/ejabberd

[1]: https://xmpp.org/uses/gaming.html

77pt77 · on March 5, 2019

By used in MMORPGs you mean the chatting component, not the actual game-play network protocol.

yawaramin · on March 6, 2019

That's exactly what they said.

confounded · on March 5, 2019

Is Elixir/Erlang considered superior to Go for writing high concurrency web servers?

nathan_long · on March 5, 2019

I don't know Go, but that probably depends on your goals. To quote myself from elsewhere:

> Efficiency in the BEAM is mainly in service of its primary goal of fault-tolerance. If one process crashes unexpectedly, the others should continue. By the same logic, if one process is CPU-intensive or IO-blocked, the others should keep making progress smoothly. And if processes are good for isolating errors and performance issues, they should be cheap enough that we can run a lot of them at once. Those assumptions are baked into how the BEAM manages processes.

If raw speed is your only goal, the BEAM probably isn't the best choice. If consistent speed and stability matter, it may be.

More on this at https://dockyard.com/blog/2018/07/18/all-for-reliability-ref...

mastry · on March 5, 2019

That was a helpful and interesting article. Thanks.

rakoo · on March 5, 2019

Not na expert in any of the languages by any means, but Go and Erlanger/Elixir focus on different things:

- Go wants to be performant at high concurrency scale

- Erlang/Elixir wants to keep running at high concurrency scales, whatever the issues are in your application code. Performance comes second.

There's no clear cut answer to your question; I guess if you trust yourself to write servers that will hold a large number of connections while doing a lot of processing then Go has an advantage, otherwise you should probably trust the man-centuries behind the BEAM VM and follow the various blog posts/presentations explaining how you can fine-tune your machine to get to super large scales.

anthony_doan · on March 6, 2019

> Performance comes second.

I want to state that performance is too generalize here.

BEAM VM also have a goal of low latency which can be consider as performance. I'm not entirely sure if GO is aiming for that or not. I would never do any numerical stuff on BEAM though, it's very slow.

This article is a bit dated but is interesting between Go and Erlang:

https://www.theerlangelist.com/article/reducing_maximum_late...

rakoo · on March 6, 2019

Very true, thanks for the article. Go also wants to minimize GC duration by making it per-goroutine and some fancy algorithms to make it as short as possible, so I'd say it's part of it's goals too.

keithnoizu · on March 5, 2019

I suspect it's probably much more straight forward to horizontally scale across nodes with Erlang/Elixir and OTP than with Go.

brightball · on March 5, 2019

Depends on the definition of superior.

If it's pure benchmarks, then Go is usually going to come in a little bit ahead.

When you get into comparing language design, underlying architectural decisions, problems solved/created/avoided by those decisions it gets more complex.

I did a big write up for code ship a couple of years ago. Had a solid discussion on HN and the comparison remains fairly accurate.

https://news.ycombinator.com/item?id=13497505

ilovecaching · on March 5, 2019

What does high concurrency mean?

Both of them give you less flexibility than is necessary to achieve highly efficient use of all threads on a multiprocessor system. For that, you'll need something like a pool of event loops using async/await. This is the system most common in high performance networking in C++, C, and Rust.

Erlang and Go both sacrifice efficiency to improve maintainability and safety by offering a model that allows you to approach concurrency from a more synchronous mindset. Erlang in particular goes beyond Go in that the Actor model is considerably easier to avoid deadlocks and other concurrency bugs in at the expense of a much more opinionated system. Erlang is also less focused on reducing average latency as much as keeping latency predictable at scale.

Long story short, Erlang, Go, and the rest are not apples to apples comparisons, and it takes investment in each language to understand the tradeoffs and use cases for each. You should also view them holistically, as in, what language can my team support, and will the wins from Erlang's message queues outweigh the smaller community, or will Go's mid tier performance be enough to avoid writing on top of the low level libevent and building a custom thread pool or fine tuning Go's scheduler.

dscpls · on March 6, 2019

Erlang tends to have excellent cpu utilisation if you follow the most basic principles in Erlang and OTP.

The question is if you cannor want to write better concurrent code by rolling it yourself.

alexgaribay · on March 5, 2019

In terms of raw performance, Go will be faster. However, the differentiator here is that the BEAM gives you the guarantees and tools to write highly concurrent applications with a sane mental model, fault-tolerance, and isolated processes. As a sibling said, it's fairly easy cluster applications. Additionally, if something truly needs to be ran in another language for performance, you can write a NIF in Rust or something and execute it from Elixir.

StreamBright · on March 5, 2019

Highly depends on which libraries are you talking about. Essentially you cannot claim that Go will be faster. You can say that fasthttp is faster than Cowboy for a hello world application. Every real world application is much more complex and the stack's performance will be decided by its slowest element.

Thaxll · on March 5, 2019

Go is faster than Erlang and not by a small margin, Erlang is a dynamic and immutable language, it hurts performance compare to languages like Go.

Some benchmark about pure CPU computation:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

Erlang is really slower than Java

Go and Java now:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

You see the big difference, Go and Java are on part but Go usually takes way less memory than Java.

You can't have everything, message passing / immutability with no performance hit.

keithnoizu · on March 5, 2019

Elixir/Erlang is slower than java for computationally expensive non tail optimized large struct operations maybe. For simple requests with ginormous thread counts the point of failure is going to get hit way sooner with Java than Erlang. If connecting directly to ETS/Mnesia/Dets instead of converting back and forth between an external persistence layer Erlang/Elixir can be amazingly fast. Doing Map Reduce type operations or performing massive data processing tasks in parallel Erlang/Elixirs is likely going to outperform Java. Anywhere where thread overhead is more critical than raw per thread performance Erlang/Elixir can perform better than alternative solutions.

shaklee3 · on March 5, 2019

Instead of downvoting this comment, can someone write why this is incorrect? I realize it might not be a popular writing style, but references were provided.

RobertKerans · on March 5, 2019

Because it's not hugely relevant. Yes, Erlang is not a "fast" language by many metrics, but that [very often] is not the reason a team would choose to use it. The sibling and parent comments make up a fairly considered discussion regarding this. A comment in the middle of this conversation saying Go etc win in some benchmark game is a non sequitur

Thaxll · on March 5, 2019

I was replying to someone saying that it depends of the library, the answer is no it doesn't depends of the library, Erlang is not a fast language and it's ok.

mrdoops · on March 5, 2019

There's different kinds of "fast" is the problem.

Are you juggling lots of messages concurrently and orchestrating across complex topologies of nodes? A BEAM language is going to excel. That's why Whatsapp, Discord, and RabbitMQ use Erlang/Elixir.

Are you trying to go really fast in a straight and simple concurrency scenario? Go/Java/C++/Rust is going to be faster than a BEAM language in those scenarios.

You won't want to implement a complex concurrency run-time in Java whereas Elixir is not a good choice for a 3D game engine.

Still, there's nothing wrong with using both.

alexgaribay · on March 5, 2019

True. It's highly dependent on what is compared. I should have made my point more clear, however. I care more about what the BEAM provides over performance I can get in other language.

jadbox · on March 5, 2019

As far as from my tests and what I've seen reported online, Go and Rust have a substantial lead (20% ish) over Erlang for high throughput servers.

EDIT: I believe this is partially due to Go being a lot more CPU efficient overall than Erlang (see below). So for simple servers, Go and Erlang will match performance, but for slightly more complex web servers that need to crunch some data, Go [and Rust] will outperform the Erlang VM. https://stressgrid.com/blog/benchmarking_go_vs_node_vs_elixi...

fermuch · on March 5, 2019

I would add the detail that for both erlang and elixir, running in one core, multiple cores, or multiple machines is seamless. Clustering is easy.

StreamBright · on March 5, 2019

Well one of the fastest HTTP library out there is rapidoid and fasthttp comes very close to it as well as actix-raw, hyper and tokio-minihttp. Erlang and Elixir is lagged behind with a non trivial margin.

muststopmyths · on March 5, 2019

>Finally, the connections per second rate reaches 99k, with network latency and available CPU resources contributing to the next bottleneck.

Can someone educate me on what they might talking about here ? CPU is ~45% in their final graph. I don't know what network latency means in this context though. Roundtrip for a TCP handshake ? That seems unlikely.

Qwertystop · on March 5, 2019

The CPU graph peaks near 97% (teal line) at the time when connections-per-second are highest. Are you looking at the red? That's the version without the two patches.

muststopmyths · on March 5, 2019

oh yeah, you're right. I reversed the two in my head somehow.

dzik · on March 5, 2019

It means CPU is not saturated, so it is not the bottleneck, which means it is likely not enough Erlang processes have been started.

makkesk8 · on March 5, 2019

Even if connections per second can be a magnitude or two lower than requests per second this result is still quite off by today's alternative.

14 core machine comparing .net core with other top webservers: https://www.ageofascent.com/2019/02/04/asp-net-core-saturati...

benwilson-512 · on March 5, 2019

A lot of folks are failing to read the article. They're intentionally holding each connection open for 1 whole second. This is a whole different ballgame than benchmarks where each connection is allowed to terminate as rapidly as it can send back a plain text response.

muststopmyths · on March 5, 2019

Good point. At first glance, holding the connection open for one second seemed a bit meaningless if they're touting connections/sec.

But since they are benchmarking Elixir, there is some amount of overhead involved in that framework's management of connections and requests. If I knew Erlang/Elixir, that would be a fascinating thing to explore.

Edit: I'm assuming the saturated CPU comes from Elixir and not the OS. It would be strange for 100k/sec to saturate the TCP stack with 36 cores.

makkesk8 · on March 5, 2019

Totally missed that. In that case it does make sense.

ergl · on March 5, 2019

That's measuring reqs/s, and as you said, if conns/s is an order of magnitude or two less, that's 700k, or 70k conns/s, which is right around what this post finds.

sergiotapia · on March 5, 2019

That's really exciting! As someone who dropped out of .NET entirely around the time ASP.Net MVC2 came out, where do you recommend I start looking into aspnet core / .net core? Do you still write core .net in visual studio? or can you use vscode?

SnorkelTan · on March 5, 2019

Would be helpful to know the hardware/instance size they used for these tests. TFA doesn't explicitly state it.

zambal · on March 5, 2019

We used Ubuntu 18.04 with the 4.15.0-1031-aws kernel, with sysctld overrides seen in our /etc/sysctl.d/10-dummy.conf. We used Erlang 21.2.6-1 on a 36-core c5.9xlarge instance.

To run this test, we used Stressgrid with twenty c5.xlarge generators.

lstodd · on March 5, 2019

omg.

100K/sec was achieved by yours truly 10 years ago on a contemporary xeon with nothing but nginx and python2.6 - gevent patched to not copy the stack, just switch it. (EDIT: and also a FIFO I/O scheduler)

Why does this require 36 cores today??

benfolred · on March 5, 2019

You are comparing apples and oranges.

They are purposely holding the connections around for 1+10%seconds. So first of all, it means that, for a rate of 100k conn/s, they are going to have around 200k open connections after a second. This already imposes a different profile than 100k single request connections per second.

You are also assuming that they need 36 cores to achieve 100k connections per second, which is likely not the case since they quickly moved the bottleneck to the OS. I am assuming they have other requirements that force them to run on such a large machine and they want to make sure they are not running into any single-core bottlenecks (and having a large amount of cores makes it much easier to spot those).

Thaxll · on March 5, 2019

I highly doubt you were able to do 100k connections/sec 10 years ago with the same hardware, you must be confused between requests/sec and connections/sec very different things.

rozap · on March 5, 2019

If you read the article, in the third or so paragraph.

> What this means, performance-wise, is that measuring requests per second gets a lot more attention than connections per second. Usually, the latter can be one or two orders of magnitude lower than the former. Correspondingly, benchmarks use long-living connections to simulate multiple requests from the same device.

lstodd · on March 5, 2019

Your point being? I was talking of single-request connections.

jasonlotito · on March 5, 2019

> I was talking of single-request connection.

Yes. Which is not what's being discussed here.

lstodd · on March 5, 2019

Yeah, what's being discussed here are connections without any i/o over them. Just an fd lingering somewhere in an epoll pool. Which obviuosly is even less taxing. So your point is?

kierenj · on March 5, 2019

..that you are comparing apples and oranges, like he said

StreamBright · on March 5, 2019

Nothing tells more about an engineer than the last undocumented unreproducible hello world micro benchmark conducted by her once and only once some years ago that beats a real world application in terms of req/s leaving out latency profile.

lpgauth · on March 5, 2019

Duh. Of course a C event loop while be faster at accepting connections, that's not the point of the article.

lstodd · on March 5, 2019

They boast only accepting 100K connections per second, not pushing back a meaningful response?

Why this is even here then?

dzik · on March 5, 2019

Would you mind sharing the details? (URL maybe)

I think limiting factor might be not number of cores and outside of erl scope, that is eth card they used, network infrastructure, etc. Even Elixir could be something that impacts the tests.

lstodd · on March 5, 2019

There is no url summing the details unfortunately.

The work in some unknown state is at https://code.google.com/archive/p/coev/

Without the business logic (which was in django IIRC) and deployment details, obviously. Very outdated and some later patches might be missing. No one was interested, you see.

I'd be surprised if there were problems with network, and if there were, that should have been obvious in the metrics.

Maybe the metrics were inadequate

dzik · on March 5, 2019

Sorry, where do the authors claim they achieved >100k connections per second?

lstodd · on March 5, 2019

I'm the author, and that's the truth.

Can't see how this can be replicated as a controlled experiment nowadays, unfortunately.

But if you define exactly what's a request, what's a response, and what the connection/response ratio is let's have a race.

Like, you set the parameters, and whoever serves that on lower-capability hardware wins. Py3 plus low-level C/Rust hacks vs Elixir, say.

dzik · on March 5, 2019

That's the thing. You can always hack something in C to prove there is a better way for a specific task. In the past I did things like that just for fun. But in the real world it does not work like that. You buy into things as a whole, accepting their pros and cons as a whole. If you need to hack - change your tools.

lstodd · on March 5, 2019

Please do not beat the strawman. And don't set him on fire. He's innocent.

I offered to beat whatever you've done by tweaking the Py3 stdlib. Not by writing a plain C implementation.

If you for some reason doubt that this old python thing is of the real world - let me disappoint you. It was done because nothing else could do those 100K rps back then. And it did the thing for five years, until the whole stack was ditched.

ramchip · on March 5, 2019

I think you’re misinterpreting the point of the article. It’s not gloating about how much they scale, or saying their particular tech beats other techs. It’s just explaining how to solve a specific scaling issue on a specific platform.

As an Elixir user who had to deal with high connections/s in the past, I found it interesting and useful. I use Elixir for reasons that have nothing to do with performance so a language comparison isn’t particularly interesting.

jacobn · on March 5, 2019

Was your benchmark for requests/sec or connections/sec?

lstodd · on March 5, 2019

Single-request connections. Response required consulting memcached and updating it from postgres if out of luck, which was very rare but still needed (and patching then-existing postgres C client to be async aware was an undertaking)

jasonlotito · on March 5, 2019

> Single-request connections.

What does that mean? You keep qualifying "connections." It's a connection. It holds onto it's connection for X period of time. An HTTP request is just a single-request connection, which is NOT what this article is discussing.

lstodd · on March 5, 2019

One HTTP connection, one request, one response, connection closed.

I admit I didn't first see that they actually don't do any i/o over those connections.

Well, you know, handling x accepts() per second and holding onto y fds is even less than nothing to be proud of.

jasonlotito · on March 5, 2019

So yeah, those are generally considered to be requests per second. Apples and oranges.

dzik · on March 5, 2019

Did starting more acceptors than the number of cores make any difference?

kt315_ · on March 6, 2019

Author here. We tried number of acceptors that was 4x and 16x number of cores without any difference.

zambal · on March 5, 2019

I'm not related to the author(s) of the article in any way. Just a slightly more careful reader than OP ;)

SnorkelTan · on March 5, 2019

skimming fail :(

jschniper · on March 5, 2019

They mention in the Ranch section that "In this test, we set it to 36—the number of CPU cores on our c5.9xlarge."

fabioyy · on March 7, 2019

Opening a conection and closing after a while is not very good example os "scalability" ... the kernel does the opening part... reply

cutler · on March 6, 2019

Great, I can use this for that blogging app I've been meaning to write and sleep at night knowing I won't run out of connections.