Making 1M requests with Python-aiohttp

terom · on April 23, 2016

Re the EADDRNOTAVAIL from socket.connect(),

If you're connecting to 127.0.0.1:8080, then each connection from 127.0.0.1 is going to be assigned an ephemeral TCP source port. There are only a finite number of such ports available, on the order of ~30-50k, which limits the number of connections from a single address to a specific endpoint.

If you're doing 100k TCP connections with 1k concurrent conections, it's feasible that you'll run into those limits, with TCP connections hanging around in some TIME_WAIT state after close().

Not that this would be a documented errno for connect(), but it's the interpretation that makes sense..

http://www.toptip.ca/2010/02/linux-eaddrnotavail-address-not... http://lxr.free-electrons.com/source/net/ipv4/inet_hashtable...

ahuang · on April 24, 2016

Generally its the upper 32k ports that are ephemeral, and if your churn more than that per minute in connections, you'll run into that TIME_WAIT issue.

Hacky way to get around that is to enable tcp_tw_reuse which will let you reuse ports, but it can be risky if you get a SYN from the previous connection that happens to lineup with segment number of the current connection (which will close your connection). Shouldn't happen often, and if you can tolerate a small amount of failure is an easy way to get around this limit.

[0] http://blog.davidvassallo.me/2010/07/13/time_wait-and-port-r...

e12e · on April 24, 2016

For benchmarking loopback connections, addressing really shouldn't be an issue, as you have an entire /8-subnet to split between your client(s) and server(s) (127.0.0.0/8). You would need some logic to set up eg 10.000 listening servers, and 1000.000 clients to get it working, and at some point you'd probably run into memory or other limits.

I'm a little surprised some simple googling didn't turn up any examples of this - I'm sure someone have tried it out in order to do some benchmarking of high-performance network servers/services?

Apparently ipv6 changes this to a single (loopback) address, but then again, with ipv6 you can use entire subnets per network card.

takeda · on April 24, 2016

> Hacky way to get around that is to enable tcp_tw_reuse which will let you reuse ports, but it can be risky if you get a SYN from the previous connection that happens to lineup with segment number of the current connection (which will close your connection)

Actually Linux will fall back to using TCP timestamps to distinguish between different connections. Ironically people will disable timestamps too to "fix" other issues[1] which also break PAWS[2] and may cause the issue you describing.

[1] It can break with some NAT and some load balancers. Actually the way I learned about tcp_tw_reuse was when we plugged in a new load balancer. We tested everything worked fine, but as soon as we sent production traffic many connections took few seconds to complete. Took 2 weeks to find the cause and looking at packet dumps. Turns out that the issue was that the load balancer was set up in active-active configuration, so different connections had different timestamps. This caused Linux to get confused and ignore some packets. Turned out one of managers wanted to make everything performant and copied some sysctls (that included tcp_tw_reuse and tcp_tw_recycle) from Internet without much though. After restoring the setting everything worked flawlessly.

[2] https://en.wikipedia.org/wiki/Transmission_Control_Protocol#...

oxplot · on April 24, 2016

Localhost goes from 127.0.0.1 through 127.255.255.254. By binding each connection to a random IP in that range [1], one could get a better mileage.

[1]: https://idea.popcount.org/2014-04-03-bind-before-connect/

tooker · on April 23, 2016

I have a library for doing coordinated async IO in python that addresses some of the scheduling and resource contention issues hinted out in the later part of this post. It's called cellulario in reference to containing async IO mechanics inside a cell wall..

    https://github.com/mayfield/cellulario

And an example of using it to manage a multi-tiered scheme where a first layer of IO requests seeds another layer and then you finally reduce all the responses..

    https://github.com/mayfield/ecmcli/blob/master/ecmcli/api.py#L456

simonw · on April 23, 2016

This looks really promising. I've often wanted to be able to do exactly this: run a bunch of async code in the middle of an otherwise synchronous block (classic example: writing a Django view which fires off a bunch of parallel HTTP API requests and continues once all of them have either returned or timed out).

tooker · on April 23, 2016

That's almost exactly the use case I began with. It unapologetically requires python 3.5+ but if you're already there I'd be happy to see and support some of your use cases. Hit me up on github if you want to try it and need some guidance (the docs are nonexistent).

ulyssesv · on April 24, 2016

Could you please elaborate on why async would be preferred than a task queue solution (would it)?

sandGorgon · on April 23, 2016

I really keep wishing that there would be benchmark comparisons of asyncio/aiohttp with gevent/python2 . Performance would be a killer reason to migrate immediately to Py3.

What I suspect though is that asyncio is not all that better than gevent. Can someone correct me on this?

riyadparvez · on April 23, 2016

Is there anything inherent to Python3 that is slower than Python2? Or is it just some of the performant packages still have not been ported to Python3?

sandGorgon · on April 23, 2016

i keep looking for a reason to switch to python 3 and cant find one. Plus if I want to use the cool stuff in Pypy.. then I better not !

overall - very less reason to consider Py3 at all. Performance would have been one - if there were a comparison between gevent and asyncio.

mrweasel · on April 23, 2016

>i keep looking for a reason to switch to python 3 and cant find one.

Unicode? Not having to deal with encoding all over the place has been well worth switch to Python 3. If performance is a a huge issue, I honestly don't know why you would stay on Python (regardless of version)

I wouldn't want to switch back to Python 2.7 is I can avoid it. There's honestly no reason not to go with 3.4 or 3.5 at this point, unless you happen to have a large Python 2 code base.

sago · on April 23, 2016

This is a superficially trivial bit of syntactic sugar, but an example of the way small tweaks can provide big impact. This:

    > do_something(*some_args, *some_more_args)

is rocking my world right at the moment. That's a massive time saving feature I've been waiting for and worth the price of a 3.5 upgrade.

kenneth_reitz · on April 23, 2016

Oh cool, I didn't know about this!

coldtea · on April 24, 2016

>If performance is a a huge issue, I honestly don't know why you would stay on Python (regardless of version)

I see this a lot. That one doesn't care of performance that much to switch to, say, C++ or Go, doesn't mean one is OK with regressions to what they currently use.

rdtsc · on April 23, 2016

I see you've been downvoted. Interesting, you said that you haven't found a reason to switch and someone looked at it and thought "How dare you not find a reason to switch, here let's teach you a lesson".

But in large I agree. 3 hasn't provided enought of a carrot and 2 hasn't been enough of a pain for many people to want to switch. Especially when it comes to existing stable code bases. For new development, yes, many can and should pick Python 3. But if say Python 3 brought even a 20% speed improvement overwall, the move would have been a lot faster.

I find people are ok with accepting some breakages due to re-writes if either existing stuff is very broken, or new stuff is so much better.

sandGorgon · on April 23, 2016

but Python 3 is not really an upgrade is it ? it is a very different language and most people who are pushing (downvoting?) for Python 3 dont seem to understand that.

I have zero problems with Python 3 per se - but when the vast majority of the ecosystem is on Py2 and there is no difference in performance... then I see no reason to consider any breakages.

BuckRogers · on April 24, 2016

I think over time watching this behavior like the downvotes you received, I've figured it out. The idea is that newer folks come into Python, many don't want to learn the dominant version in effort to focus on the future as they understand it. So 2 continuing to live is viewed as a threat to that investment. Even though the two aren't that different and shouldn't matter which one you use, that isn't a popular point to bring up. It's a bit of a "newer version is always better" trap. That's true in general for software like a web browser, but not true for programming languages if they contain breaks or gain feature bloat (both are Python3 flaws). Conservative languages in both of those regards, are usually held in higher regard. It's also not a zero-sum game where for 3 to succeed, 2 must fail. Not that this is ever going to happen anyway. Thanks to how the PSF and associated handled this, the Python3 mistakes were never corrected. We're stuck with a permanent split for a long time as a result. It's tragic really, coming from someone who programs in Python daily. There are many ways to resolve it too, but the CPython core dev team refuse to consider any of them.

As a result, guys like you who are thinking rationally become the problem for being 'lazy' (acting in your own best interests which is exactly what everyone is doing) and are the enemy. People, especially newcomers, get tired of waiting for Python2 to 'die' and instead of put the onus on those who made the mistakes with Python3 (they've stuck it out, refusing to correct their own mistakes because that's more work), it becomes twisted and you are now the problem in their mind. Even though you were probably a part of what made Python successful to begin with. Amazing how that pans out, right?

Usually these illusions go away once a full time job is found, there are some but the vast majority are companies with big Python2 codebases that have features to deliver. If they move services anywhere from CPython it's to PyPy for the performance gain.

eeZi · on April 24, 2016

Nah. I'm a long-time Python developer (10 years+) and I moved everything over to Python 3 because there are so many advantages. This includes a number of massive internal code bases that I maintain at my day job. Porting is surprisingly easy nowadays, it used to hurt a lot more.

Management is fine with development time spent on migrating to Python 3, since it's an investment in the future (Python 2 will be EOL in 2020!).

BuckRogers · on April 24, 2016

Sounds like you're a lone wolf at a smaller company. Different ballgame from the "longtime Python devs" that I'm talking about. I had an employer with a 500KLOC Python2 codebase, a billion dollar business on the line, and new features to deliver. That 2020 date is just a big political stunt. As was 2015 or it wouldn't have been a snap of the fingers to extend it. Code won't stop working in 2020 and security is largely handled by a webserver.

I use Python3 sometimes as well, but doesn't mean what I'm saying isn't true.

eeZi · on April 24, 2016

It is NOT a different language. It's not backwards compatible, sure, but usually, only minor changes are required and 2to3 helps a lot. At this point, pretty much all of the libraries are ported and their API stays the same. The libraries which haven't been ported yet are either notable exceptions (Twisted!) or are unmaintained.

Very helpful for porting: http://python-future.org

eeZi · on April 24, 2016

A few reasons which made me switch:

  - Python 2 will be EOL in 2020, that's four years

  - vastly improved Unicode support

  - a number new libraries are Python 3 only

  - asyncio and the new async syntax

  - exception chaining (!)

  - type annotations

  - lots of improvements all over the place

sandGorgon · on April 24, 2016

Python 2 will not be EOL in 4 years. Not with the billions of lines of code out there. If the Python foundation dares to do this, it will create a fork. Probably even funded by Dropbox, Google and the like.

I wont dispute you on any other aspects - except two. have you tried using gevent versus asyncio ? gevent is running in production at several of the largest API services in the world. Asyncio is not yet deployed at this scale.

Second about new libraries being python 3 only - really dispute that. In fact its the other way around. For example, the brand new Tensorflow library (which google uses in production for its own AI) was released on Python 2 only .. and Python 3 support was later patched in. This is the case with every new library of consequence that I'm seeing.

eeZi · on April 24, 2016

The Python foundation won't budge on the 2020 EOL date, for sure. RedHat will support it until RHEL 7 is EOL (~2027), but that's security/critical fixes only - while Python 3 gets all the new features and development efforts.

At some point, the opportunity cost of staying with Python 2.7 is higher than the one-time effort of porting everything to Python 3, so companies will move. Especially the likes of Google and Dropbox.

gevent vs. asyncio - sure, gevent is more mature. But asyncio is undoubtedly the better/nicer API. asyncio's explicit await syntax is much nicer than gevent's implicit monkey-patching, which makes it harder to reason about the code.

Tensorflow isn't really a new library, it was only recently open sourced.

As for Python 3-only libraries:

- https://pypi.python.org/pypi?:action=browse&show=all&c=595

- https://github.com/aio-libs

Also note that Django will drop Python 2 support in time with the Python 2 EOL: https://www.djangoproject.com/weblog/2015/jun/25/roadmap/

Also, many new Python-based open source projects are Python 3 only.

takeda · on April 26, 2016

This is wishful thinking.

There's a cost to maintain, and that's why PSF is moving to 3. I'm sure if Google and/or Dropbox would pay, PSF might continue maintenance[1].

Anyway the issue here is that you won't get any new features in python 2.7, and that already happened last year, currently python 2 only gets security fixes.

> have you tried using gevent versus asyncio ? gevent is running in production at several of the largest API services in the world. Asyncio is not yet deployed at this scale.

Both of them are available on python 3. On python 2 you can't chose. AsyncIO strength is that it is integrated into the language. Especially in 3.5 you have a new syntax to use it, no need to install new libraries, no need to compile anything. No monkey patching. Also AsyncIO is more generalized and could be adapted to other tasks, not necessarily for asynchronous calls.

[1] In fact surely Red Hat will continue to maintain it until 2027 since they decided to ship it with Red Hat 7, but question is whether they will share it with general users, or will they only provide it to paying customers.

haimez · on April 23, 2016

It seems ridiculous (to me, someone that is superficially aware of the python ecosystem but not a real user) that they haven't tried to find a way to incentivize people to switch with a real carrot. For example: removing the GIL.

Running out the clock on waiting for library support to become mostly python 3 is inevitable, but so far has been slower than molasses.

poooogles · on April 23, 2016

Ironically enough unicode takes its toll on performance. This has mostly been made up though from 3.4 onwards.

velox_io · on April 23, 2016

The 1 million in the title is misleading (1M per hour is nothing to write home about, only 278/sec). There are frameworks that are able hit 1M per minute plus (16,666/sec).

dpc_pw · on April 24, 2016

1 million per hour is nothing...

Here, mioco handling 10M http request per second(1) on my desktop:

https://github.com/dpc/mioco/blob/master/BENCHMARKS.md

1) with a bit of cheating http server.

With actual proper http parsing it goes down to 368K req/s, but that's still a lot.

jorge_leria · on April 23, 2016

1M per minute it is something. Could you name those frameworks?

jc4p · on April 23, 2016

Elixir is the name I see thrown around the most when it comes to stuff like this: http://www.phoenixframework.org/blog/the-road-to-2-million-w...

pbz · on April 24, 2016

Even 1M per minute is rather pathetic. The game is around multiple millions per second:

https://www.techempower.com/benchmarks/#section=data-r12&hw=...

ben_jones · on April 23, 2016

Does anyone enjoy doing async work in python? I've done a few hobby projects and honestly I was yearning for javascript + async lib after awhile. As great as python is maybe we should yield async programming to the languages designed for it?

rgacote · on April 23, 2016

I think async and aiohttp are game changes for Python 3.5. After working with Twisted callbacks for over a decade, it's a pleasure to write async code that does not use the callback approach (granted Twisted is a mature environment with lots to offer).

I've switched to Python3.5 and aiohttp for all new web service applications. The coding style is clean, enjoyable to write, and easy to debug.

Plus, I've never once been stymied for speed. I know there's applications out there where people expect to to be handling zillions of connections -- but the bulk of my use cases think 100 transactions per second is a huge through-put, and aiohttp handles that with ease.

bjt · on April 24, 2016

Have you used eventlet or gevent? I thought they were game changers. Gevent has been working very well for me for quite a while, without callback hell.

undergrowth54 · on April 23, 2016

I found it easier than doing it in javascript because I could insert `import pdb;pdb.set_trace()` into the code and get an interactive debugger.

Supposedly you can do this in javascript by running node with a particular flag, then connecting to a port on localhost, and opening the chrome debugger. However, the multiple times I've tried throughout 2014-2016 has show that to be incredibly finnicky. It is especially frustrating when trying to insert a debugger into an automated test.

JustSomeNobody · on April 23, 2016

I guess I don't know how JS was any more "designed for" async than python was.

henryw · on April 23, 2016

From https://developer.mozilla.org/en-US/docs/Web/JavaScript/Even...:

JavaScript has a concurrency model based on an "event loop". This model is quite different than the model in other languages like C or Java.

...

A very interesting property of the event loop model is that JavaScript, unlike a lot of other languages, never blocks. Handling I/O is typically performed via events and callbacks, so when the application is waiting for an IndexedDB query to return or an XHR request to return, it can still process other things like user input.

tyingq · on April 23, 2016

>>JavaScript, unlike a lot of other languages, never blocks

I think that's a little strong. It's more like "The group controlling Javascript has mostly tried to discourage introduction of things that block".

You can, for example, do a blocking XMLHttpRequest. It's deprecated, but possible. https://jsfiddle.net/923d5sda/

coldtea · on April 24, 2016

You can do blocking everything.

A 10.000 repetitions for look while block the whole interpreter for its duration.

Any JSON parsing does the same.

Processing strings.

Doing math work.

...

tyingq · on April 24, 2016

Guess I should have said "blocking I/O"? I thought it was a given that a single threaded language wouldn't magically inject some kind of concurrency around tight loops or CPU intensive tasks. Node.js people don't really think that sort of thing is "non blocking", do they?

coldtea · on April 24, 2016

>Guess I should have said "blocking I/O"? I thought it was a given that a single threaded language wouldn't magically inject some kind of concurrency around tight loops or CPU intensive tasks.

Well, Erlang (and Elixir) does just that -- it's preemptive, and implicitly yields under the covers even in loops.

>Node.js people don't really think that sort of thing is "non blocking", do they?

Judging from forum threads and blog posts, a lot of them do, especially web programmers not familiar with blocking and non-blocking that only know that "Node is webscale".

Matthias247 · on April 23, 2016

I think that quoted statements are not very good.

The other languages don't have a builtin concurrency model. For C, Java and others event loop libraries and applications that are built on top of them can be found (nginx, netty, ...). As well as there are libraries that build on top of synchronous IO.

The eventloop was also not tied to Javascript in the older standards. Only the introduction of Promises and other stuff required the existence of an eventloop in order to define when continuations should run.

"The event loop model never blocks" is also only true as long as you (and all the libraries that you use) do not block it. There is no automatic "does not block" guarantee.

coldtea · on April 24, 2016

>A very interesting property of the event loop model is that JavaScript, unlike a lot of other languages, never blocks.

JavaScript actually always blocks. It's only external function calls that somebody took care to write in an evented style that don't block -- but anything written in pure Javascript (from for loops to text manipulation) blocks.

JS single-thread async without preemptiveness is not something to write home about...

darpa_escapee · on April 23, 2016

This is a genuine question: in what ways is Python's async implementation lacking? Could it have been baked in a better way?

In what ways do languages that were supposedly designed for async programming different than Python?

Python is definitely lacking an elegant interface for async programming.

bdarnell · on April 23, 2016

I think that Python 3.5 now has a very elegant interface for async programming. I prefer Tornado to the standard library's asyncio, but the new keywords are nice for both packages (disclaimer: I'm the maintainer of Tornado).

The downsides have nothing to do with the design of the language. The problem is that introducing a new concurrency model late in a language's life splits the ecosystem. Most existing packages are synchronous, so if you want to build asynchronous systems you must avoid packages like requests, django, or sqlalchemy and find (or develop) asynchronous equivalents for the functionality you need.

Javascript has an advantage here not because the design of the language is especially well-suited for asynchronous programming, but because it never went through a synchronous/multi-threading phase. Every javascript package is designed for asynchronous use.

nbadg · on April 23, 2016

Yeah, my biggest complaint personally is that combining multithreading and async is a massive pain in the ass. Now, realistically, you aren't usually going to want to do that, except if you have multiple event loops, or are bridging between external synchronous code and internal async code. Otherwise, I really enjoy async python -- of course, I'm also the kind of person who has written my own event loops using synchronous code before, so maybe I'm just crazy like that.

markbnj · on April 23, 2016

>> The problem is that introducing a new concurrency model late in a language's life splits the ecosystem.

Really you could say that about a number of features/changes in python 3, not just the new async syntax. Python 3 itself was an ecosystem-splitting instrument.

sandGorgon · on April 23, 2016

simple reason - there is NO framework that is built ground up for nodejs style async programming.

Obviously there is Twisted and Tornado - but gevent or asyncio are actually the paradigms that people are using now. If there were a Flask like framework that was ground up built to leverage async (rather than bolting it on) and included all the batteries for web development.. then python would have a serious edge over node.

filmor · on April 24, 2016

Tornado works in pretty much the same way as asyncio.

coldtea · on April 24, 2016

Javascript was hardly designed for async work -- it is just that it wasn't designed for anything else.

ojii · on April 24, 2016

aiohttp has completely replaced flask for my "small web apps/web apis" needs. For my personal performance needs, just running `python myapp.py` is enough, no need for gunicorn or other "complicated" setups.

philippb · on April 26, 2016

I'm the CTO at KeepSafe. We open sourced aiohttp.

We wrote aiohttp for our production system. We build everything on aiohttp. In our production systems we constantly run more request then in the benchmark with business logic on each request.

The main reason we like aiohttp a lot if that you we can write asynchronous code that reads like synchronous and does not have callbacks.

takeda · on April 24, 2016

IMO you should place all requests within a single ClientSession().

This will provide two benefits:

1. You won't need to use a semaphore. To limit connections you will need to create a TCPConnection() object with limit set to the limit you used in the semaphore and pass it to the ClientSession() and aiohttp will not make more connections than the limit set (default behavior is to have unlimited number of connections).

2. With single ClientSession(), aiohttp will make use of keep-alive (i.e. it will reuse same connections for next requests, but it will keep at most the limit of connections you set in TCPConnection() object).

This should improve performance further, and (given sane limit) it'll also solve issue with "Cannot assign requested address" error.

BTW: Even without limit set aiohttp will try to reduce number of connections open so it might still fix the connection error issue as long as individual requests don't take long. It's still good idea to set limit, just to be nice to the remote server.

nbadg · on April 23, 2016

First off, awesome to see more benchmarks (even if it's just personal experimentation) for synchronous vs asyncio performance. I think the real argument for asyncio right now is that it makes it very easy for you to write extremely efficient code, even for hobbyist projects. Even though your experiment is only handling 320 req/s, that you were able to do that so quickly and with very, very little optimization is, I think, a testament to the potential for asyncio.

Some pointers:

The event loop is still a single thread and therefore subject to the GIL. That means that at any given time, only one coroutine is running in the loop. This is important for several reasons, but probably the most relevant are that

1. within any given coroutine, execution flow will always be consistent between yield/await statements.

2. synchronous calls within coroutines will block the entire event loop.

3. most of asyncio was not written with thread safety in mind

That second one is really important. When you're doing file access, eg where you're doing "with open('frank.html', 'rb')", that's something you may want to consider moving into a run_in_executor call. That will block the coroutine, but it will return control to the event loop, allowing other connections to proceed.

Also, more likely than not, the too many open files error is a result of you opening frank.html, not of sockets. I haven't run your code with asyncio in debug mode[1] to verify that, but that would be my intuition. You would probably handle more requests if you changed that -- I would do the file access in a run_in_executor with a max executor workers of 1000. If you want to surpass that, use a process pool instead of a threadpool, and you should be ready to go, though it's worth mentioning that disk IO is hardly ever cpu-bound, so I wouldn't expect you to get much performance boost otherwise.

Also, the placement of your semaphore acquisition doesn't make any sense to me. I would create a dedicated coroutine like this:

    async def bounded_fetch(sem):
        async with sem:
            return (await fetch(url.format(i)))

and modify the parent function like this:

    for i in range(r):
        task = asyncio.ensure_future(bounded_fetch(sem))
        tasks.append(task)

That being said, it also doesn't make any sense to me to have the semaphore in the client code, since the error is in the server code.

[1] https://docs.python.org/3/library/asyncio-dev.html#debug-mod...

dante9999 · on April 24, 2016

Thanks for feedback.

> You would probably handle more requests if you changed that -- I would do the file access in a run_in_executor with a max executor workers of 1000.

This is really good point. I'm going to check this and edit post adding this information there.

> Also, the placement of your semaphore acquisition doesn't make any sense to me. I would create a dedicated coroutine like this:

looking into my semaphore code next day after writing it I do wonder if I'm using it correctly. I assumed it works correctly because it fixed my "too many open files" exception, so it seems to mean that I'm no longer exceeding 1024 open files limits. Can you clarify why you think my use of semaphore does not make sense and why your suggestion is better? What is the benefit of dedicated coroutine?

> That being said, it also doesn't make any sense to me to have the semaphore in the client code, since the error is in the server code.

I admit that I focused more on my client than server. One thing that worries me about my test server is that it does not print any exceptions. Either it does not fail at all, which seems unlikely, or it fails silently, which is more likely and is bad. So I need to check my server code to see what exactly happens there.

> it also doesn't make any sense to me to have the semaphore in the client code, since the error is in the server code.

main reason for semaphore in client code is that it should stop client from making over 1k connections at a time. My logic here is that if client wont make 1k connections at a time - server wont receive 1k connections at a time and thus there will be no problem of too many open files on server (it won't have to send more than 1k responses). However I see that this logic may not be totally correct, other comment points out that it's possible for sockets to "hang around" after closing: https://news.ycombinator.com/item?id=11557672 so I need to review that and edit post.

> https://docs.python.org/3/library/asyncio-dev.html#debug-mod...

this looks really great, will look into this thanks.

e12e · on April 24, 2016

As per my comment further up, it might be interesting to spin up a handful of listening processes (eg: 127.0.0.1 through 127.0.0.10), and a handful of clients; and have the clients pick one at random, or something like that. Not so much for "real world testing", but just as an exercise to see if one can press the system to other limits than open connections/address pairs?

nbadg · on April 24, 2016

No problem, it's especially hard to find external feedback for side projects and experiments so I try to give it when I can.

> I assumed it works correctly because it fixed my "too many open files" exception

It works, so at the end of the day that's what matters. The client vs server question, from my perspective, ultimately comes down to a question of test realism; in a real-world deployment you couldn't limit connections with client-side code because there are multiple clients. That's what I mean by "it doesn't make sense given that the error is server-side".

> Can you clarify why you think my use of semaphore does not make sense and why your suggestion is better? What is the benefit of dedicated coroutine?

I'm saying that mostly, but not exclusively, from a division of concerns standpoint. You're acquiring the semaphore in a completely different context than you're releasing it. On the one hand, that's partly a programming style issue. On the other hand, it can also have some really important consequences: for example, it's actually the event loop itself that is releasing the semaphore for you when the task is done. Because of the way the event loop works, it's hard to say exactly when the semaphore will be released. You want to hold it for the absolute minimum time possible, since it's holding up execution of other connections in the loop. Putting it into a dedicated coroutine makes it clearer what's going on, makes it such that the acquirer and releaser of the semaphore are the same, and means you are definitely holding the semaphore for the minimum amount of time possible (since, again, execution flow will not leave any particular coroutine until you yield/await another). In general I would say that releasing the semaphore in a callback is significantly more fragile, and mildly to moderately less performant, than creating a dedicated coroutine to hold the semaphore and handle the request.

Does that all make sense?

> Either it does not fail at all, which seems unlikely, or it fails silently, which is more likely and is bad.

That's a fair statement, I think. As an aside, the print statement is slow, so keep that in mind. It might actually be faster to have a single memory-mapped file for the whole thing, and then just append the error and traceback to the file. The built-in traceback library can be very useful for that. That's also a bit more realistic, since obviously IRL you wouldn't be using a print statement to keep track of errors. On a similar note, because file access is so slow, you'd be best off figuring out some way to remove the part where the server accesses the disk once per connection entirely. On a real-world system you'd possibly use some kind of memory caching system to do that, especially if you're just reading files and not writing them. That allows you to use a little more memory (potentially as little as enough to have a single copy of the file in memory) to drastically improve performance.

dante9999 · on April 25, 2016

> Does that all make sense?

yeah it does make sense.

> in a real-world deployment you couldn't limit connections with client-side code

yeah that's a very good point. But in a real world scenario handling this would not be that easy. Limiting number of available connections on the server side is not a trivial task to implement. Setting your server to avoid failures and simply return either 503 service unavailable to some clients or 429 (too many conns) to others would probably require quite a lot of coding. It's also not very clear to me how this would be implemented, how do people implement things like this? Just putting some check for number of open files before line that opens file and setting response code to 500 and 429 before opening file? This would only stop server from opening to many "html" files, but would not stop server from getting flooded with connections. Is my aiohttp app even the right place to add checks like this? Wouldn't it be better to use haproxy or nginx or some other load balancing service in front of aiohttp app and let it handle too much traffic?

Other thing that comes to my mind (need to check this later) is that perhaps some partial "handling" of cases like this could/should be implemented in aiohttp library. I'm not sure how it behaves now, but maybe it should simply fail to open file, return 500 to the client, and print noisy traceback about open files to my logs? I didnt see this behavior when doing my tests, so either it didn't occur, is not implemented in aiohttp, or it occurrred and I somehow missed that. From my experience with Twisted I know that this is how Twisted resources behave, if you have some unhandled exceptions twisted just returns 500 to client and show traceback in logs.

nbadg · on April 25, 2016

Keep in mind that 5XX error codes are for server errors and 4XX codes are for client errors. Returning 429 would imply "too many connections (from your computer)", not total for the service. Choosing to return a 503 for over-taxed servers is, as far as I can tell, done maybe half the time. Depending on the kind of service you're running, you might want to enforce a server timeout that says "after a certain number of milliseconds of local response time, return a 5XX error code and abandon the connection". That would be a particular component in an overall strategy for handling high load, which would heavily bias towards serving the easiest responses first. That may or may not be a good idea: what if the "expensive" requests are from paying customers accessing account pages, and the "inexpensive" ones are from a sudden spike in traffic to your homepage due to some good press somewhere? Of course eventually, you'd want to separate these two kinds of traffic entirely, such that customers are only affected by outages that they create. You can then focus on expanding your capacity to handle customers directly, instead of trying to lump that in with the much more unpredictable behavior of general web traffic.

> Just putting some check for number of open files before line that opens file and setting response code to 500 and 429 before opening file?

So actually this is one of the big benefits of putting the semaphore limiting file access within its own dedicated coroutine (except on the server side instead of the client). It allows you to handle the connection without having to deal with immediate responses. What that means in practice is that your server will be slower to respond under high load, but until it hits the client's (browser) timeout limit, you'll still be able to respond. It actually doesn't require any extra code to do that. Note that this isn't the only way to achieve this result, but it's probably the most direct, and simplest, especially given the approach you've taken with the code thus far.

A load balancer sits on top of that, ideally monitoring metrics like server CPU usage, memory load, or (most directly) request response time, and then shifts around requests between servers accordingly, to minimize the delay incurred in the aforementioned "wait for semaphore (or other synchronization primitive)" part.

At the end of the day, until you start hitting the limit of concurrent connections that others have mentioned, you don't really actually need to worry very much about how many connections you have open at once. You just want to focus on handling every connection you have as quickly as possible.

henryw · on April 23, 2016

Looks pretty interesting to do async on python. I once did something similar in node (async by default) with a few lines of code. I think I scraped 12 or 20 million real URLs in 8 hours on a $5 cloud VM. It was limited by network bandwidth.

azinman2 · on April 23, 2016

"Everyone knows that asynchronous code performs better when applied to network operations"

Ummm that seems a bit far reaching.

15155 · on April 23, 2016

It depends on what "network operations" you are trying to do.

For high-concurrency purposes, asynchronous programming is far more scalable (see: epoll/kqueue + state machines).

For high-throughput, low-concurrency operations, it doesn't matter as much.

azinman2 · on April 23, 2016

I happen to know of a very major tech company who scale is insane yet their core c++ code is based on highly tuned blocking threads. It's not a given that async is the only way to scale.

imaginenore · on April 23, 2016

1,000,000 requests in 52 minutes is just 320 req/sec.

Am I missing something? What's so amazing about this?

I just deployed some production feed that serves at 1955 requests/second on a cheap VPS in freaking PHP, one of the slowest languages out there.

kh_hk · on April 23, 2016

> Am I missing something? What's so amazing about this?

The article is not about testing performance of a web server, but showcasing performance differences between synchronous and asynchronous code using asyncio. So, not about serving requests, but consuming.

lossolo · on April 23, 2016

Then he should change the title.

kh_hk · on April 23, 2016

Making, not serving. I think the title is pretty accurate.

nathancahill · on April 23, 2016

I don't care for PHP as much as the next guy, but it's usually in the top 25 of the web framework benchmark (most of the other top langs are Java, Go and C++): https://www.techempower.com/benchmarks/

50CNT · on April 23, 2016

Just curious, what's up with this Ur language at both position 1 and 4? Never heard of it, and probably not experienced enough to make sense of it, but how is it that a language that doesn't even have a full official tutorial to its name beat out java, C++ and Go in those rankings by a factor of >2?

I'm genuinely curious.

nathancahill · on April 23, 2016

Ur (Ur/Web?) seems to be built very specifically for the exact things that this benchmark checks (dynamic web pages with SQL queries). So it's not surprising that those code paths are highly optimized in the language.

aaossa · on April 23, 2016

Why you say is not amazing? Honestly curious here :)

imaginenore · on April 23, 2016

Because it's trivial.

I would be interested in anything doing 10,000+ req/sec on a cheap VPS. 320 is nothing.

People achieve 2 million requests/second with C++ on EC2:

https://medium.com/swlh/starting-a-tech-startup-with-c-6b5d5...

aaossa · on April 23, 2016

Oh I see now... This speaks by it self:

C++/Proxygen =1,990,130 requests per second

Python/Tornado = 41,329 requests per second

Thanks for sharing btw

e12e · on April 24, 2016

Interesting article. Not sure that the conclusion is all that solid though: "I quantified that 1 C++ server is roughly equivalent to 40 load-balanced python servers for raw computational power based on our HTTP benchmarking. Thus using C++ can really squeeze all the computational juice out of the underlying hardware to save 1/40 off server costs."

Well, for this particular start-up, I think c++ was an excellent choice (especially as they already knew c++!) -- but what if you could still run the service on a single service with python? You would still need 2-3 servers with the c++ version (failover, test etc).

Or it might turn out that for production load, you'd need 10 python server-instances. Sure, one server could handle this with c++ -- but you're not actually saving 39/40, you're only saving 9/10, because you didn't need "the full 40".

Not to mention, that it seems unlikely that the http request stuff is the limiting factor in a distributed OLAP system. So you might do 100 "OLAPS/s", and have them easily served by a 40Kreq/s python service.

"I guess we could have written it in Python to start off with but, economically, it would be a wastage of labor cost and time because, at some stage, we would have to scrap it for a C++ version to get the performance we need. The Python code will have no economic value once scrapped."

As mentioned, I think c++ was an excellent choice, but the above assumes they'd have to rewrite the entire system in c++, and scrap all the python. Granted, if you don't know python, but do know c++ very well -- it's doubtful that it would be faster to prototype in python than to just use c++.

But if (wild guesstimate) you could get 90% of the features, at 10% the loc/dependencies in python -- who's to say that wouldn't make sense? Maybe parts could be done in c/c++, or the project could move to pypy for enough of a speedup that no complete rewrite would be needed... etc.

Overall though, I really think this is a great illustration that high-performance compiled languages offer great performance, and can be rather pleasant to work with.

jorge_leria · on April 23, 2016

The article is not about serving, but about consuming. Not the same beast.

Cyph0n · on April 23, 2016

Absolutely excellent article. I always keep C++ at the back of my mind in case I need it some day, so I think the list of libraries they used will be useful for me in the future. Thanks.

gst · on April 23, 2016

If you want to write fast C++ Web services I recommend a look at Seastar: http://www.seastar-project.org/

coldtea · on April 24, 2016

Because it's like Dr Evil asking the UN leaders for "ONE MILLION DOLLARS" to not destroy the world...

https://www.youtube.com/watch?v=cKKHSAE1gIs

merdreubu · on April 23, 2016

Because you can get 540 req/s on Raspberry Pi 2 with Elixir/Phoenix.

http://blog.onfido.com/using-cpus-elixir-on-raspberry-pi2/