> But threads require more system overhead, and eventually adding more threads w...

dragonwriter · on June 12, 2020

> OS threads are orders of magnitude lighter than any Python coroutine implementation.

But python threads, which have extra weight on top of an cross-platform abstraction layer on top of the underlying OS threads, are not lighter than python coroutines.

You aren't choosing between Python threads and unadorned OS threads when writing Python code.

otabdeveloper4 · on June 13, 2020

You're absolutely right.

I'm pointing out that this is a Python problem, not a threads problem, a fact which people don't understand.

dragonwriter · on June 13, 2020

Everyone has been discussing relative performance of different techniques within Python; there is neither a basis to suggest from that that people don't understand that aspects of that are Python specific, nor a reason to think that that is even particularly relevant to the discussion.

willseth · on June 12, 2020

Okay, then let's do a bakeoff! You outfit a Python webserver that only uses threads, and I'll outfit an identical webserver that also implements async. Server that handling the most requests/sec wins. I get to pick the workload.

js2 · on June 12, 2020

FWIW, I have a real world Python3 application that does the following:

- receives an HTTP POST multipart/form-data that contains three file parts. The first part is JSON.

- parses the form.

- parses the JSON.

- depending upon the JSON accepts/rejects the POST.

- for accepted POSTs, writes the three parts as three separate files to S3.

It runs behind nginx + uwsgi, using the Falcon framework. For parsing the form I use streaming-form-data which is cython accelerated. (Falcon is also cython accelerated.)

I tested various deployment options. cpython, pypy, threads, gevent. Concurrency was more important than latency (within reason). I ended up with the best performance (measured as highest RPS while remaining within tolerable latency) using cpython+gevent.

It's been a while since I benchmarked and I'm typing this up from memory, so I don't have any numbers to add to this comment.

heavyset_go · on June 12, 2020

Each Linux thread has at least an 8MB virtual memory overhead. I just tested it, and was able to create one million coroutines in a few seconds and with a few hundred megabytes of overhead in Python. If I created just one thousand threads, it would take possibly 8 gigs of memory.

otabdeveloper4 · on June 13, 2020

Virtual memory is not memory. You're effectively just bumping an offset, there's no actual allocations involved.

> ...it would take possibly 8 gigs of memory.

No. Nothing is 'taken' when virtual memory is requested.

shk1338 · on June 13, 2020

But have you tried creating one thousand of OS threads and measuring the actual memory usage? If I recall correctly I read some article where it was explained that threads in Linux are not actually claiming their 8MB each so literally. I need to recheck that later.

heavyset_go · on June 13, 2020

You're right, I've read the same. Using Python 3.8, creating 12,000 threads with `time.sleep` as the target clocks in at 200MB residential memory.