> OS threads are orders of magnitude lighter than any Python coroutine implementation.
But python threads, which have extra weight on top of an cross-platform abstraction layer on top of the underlying OS threads, are not lighter than python coroutines.
You aren't choosing between Python threads and unadorned OS threads when writing Python code.
Everyone has been discussing relative performance of different techniques within Python; there is neither a basis to suggest from that that people don't understand that aspects of that are Python specific, nor a reason to think that that is even particularly relevant to the discussion.
Okay, then let's do a bakeoff! You outfit a Python webserver that only uses threads, and I'll outfit an identical webserver that also implements async. Server that handling the most requests/sec wins. I get to pick the workload.
FWIW, I have a real world Python3 application that does the following:
- receives an HTTP POST multipart/form-data that contains three file parts. The first part is JSON.
- parses the form.
- parses the JSON.
- depending upon the JSON accepts/rejects the POST.
- for accepted POSTs, writes the three parts as three separate files to S3.
It runs behind nginx + uwsgi, using the Falcon framework. For parsing the form I use streaming-form-data which is cython accelerated. (Falcon is also cython accelerated.)
I tested various deployment options. cpython, pypy, threads, gevent. Concurrency was more important than latency (within reason). I ended up with the best performance (measured as highest RPS while remaining within tolerable latency) using cpython+gevent.
It's been a while since I benchmarked and I'm typing this up from memory, so I don't have any numbers to add to this comment.
Each Linux thread has at least an 8MB virtual memory overhead. I just tested it, and was able to create one million coroutines in a few seconds and with a few hundred megabytes of overhead in Python. If I created just one thousand threads, it would take possibly 8 gigs of memory.
But have you tried creating one thousand of OS threads and measuring the actual memory usage? If I recall correctly I read some article where it was explained that threads in Linux are not actually claiming their 8MB each so literally. I need to recheck that later.
Absolutely false. OS threads are orders of magnitude lighter than any Python coroutine implementation.