Hacker News new | past | comments | ask | show | jobs | submit login

Slightly OT:

How does HN feel about the recent craze to make web python asynchronous?

To me, the performance gains are dubious in many cases and the complexity overhead of handling cooperative multitasking just seems like a step back for a language like python.




Long term Python user: if I want to do something asynchronously I reach for Go or Elixir (unless it's just way more practical to do it right there in Python). Adding function colors [1] might have been a practical decision, but was IMHO a mistake.

Why do I have to decide if my function is synchronous or not when I write it? I don't want to do that, I want to write only one function that can be called synchronously or asynchronously. In Go or Elixir, the decision is made by the caller, not by the API designer.

Which leads me to a parallel universe: Go-like asynchronicity should have been introduced with Python 3, when backward compatibility was explicitely broken. The gain of switching to Python 3 would then also have been a much easier sell than "just" fixing strings.

Of course, there are probably a thousand things that I'm overlooking, but this is my feeling...

[1] https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...


> Why do I have to decide if my function is synchronous or not when I write it?

I feel that sync and async functions are fundamentally different. In python coroutine is really just a function you can pause and while it might seem like it's the same thing as a normal function it's actually very different algorithmic-ally speaking as you can include a lot of low level optimizations which is really what async code is all about — getting rid of that IO waiting.

I love async python but after working with it for better half of a year now it is often a bit tiring as you've pointed out. It feels a bit like a patch to the language rather than a feature even with the newest 3.9 version.

Btw you might like https://trio.readthedocs.io which makes asyncio much more bearable!


Is that because the Python interpreter has to explicitly handle coroutines differently from normal functions?

One difference for Elixir/Erlang (which GP mentioned) is that the BEAM VM can interrupt any function by default. (There's a few exceptions when you deal with native code and other things)


99% of Django apps are CRUD apps with zero need for this. It's easy to get sucked into new-hammer-ism where you have a new hammer and start seeing nails where there aren't any.

The 1% where this is needed does exist, but I suspect that there are far more people using the new async features than actually have need for them. And if you don't need them, you're introducing a lot of complexity, without mature tooling around it to reduce that complexity.

Probably 5 years from now there will be mature tooling around this stuff that lowers the complexity so that it is a good tradeoff for average websites. But for now, I don't need to be an early adopter.


Well, if you have external API calls in your Django app and you are running sync (which I would absolutely advice, with running async it is really easy to get an unpredictable performance which is sometimes hard to track down) having the ability to run some views async is really crucial.

Otherwise your application might me humming along smoothly at some point and coming to a sudden complete standstill or performance plummets when a random external API endpoint starts to time out. Yes I have been bitten by this :-)

To fix this while running sync I have dedicated separate application processes for the views that do external calls, but this makes the routing complex. Alternatively you can juggle timeouts on the external API calls but this is hard to get right and you need to constantly keep track if calls are not timed out just because the external endpoint is a bit slower at some point.

So I think this solves a very real-world challenge.


> plummets when a random external API endpoint starts to time out

You should add something like https://pypi.org/project/circuitbreaker/

Continuously failing external requests should not make each one of your responses slow.


Interesting, will certainly try it out, thanks!

> Continuously failing external requests should not make each one of your responses slow.

It is not really a matter of the responses becoming slow, the problem is that if you are running sync with i.e 6 application server processes and you have just 6 hits on an endpoint in your app that is hung up on an external API call your application stops processing requests altogether.


Exactly this. I see the whole django async stuff being far more relevant for applications with lots of traffic or high request rates where you are already running on beefy infrastructure with a ton of workers and any small improvement in performance translates into huge real world cost savings. Your standard blog, no so much.


Isn't that why gunicorn(+gevent) was implemented and does the switching behind the scenes w/o waiting that api call to finish? Is there a good reason I should manually "await" network calls from now on?


Yes; gevent does also fix this problem. But it also gives you a lot of new problems when running all requests async. In my experience mostly with views that (in some specific calls, i.e for a specific customer) keep the cpu tied up, for example serializing a lot of data. Random other requests will be stuck waiting and seem slow while it is a lot more difficult to find out which view is the actual problem.

I have deployed applications both under gevent and sync workers in gunicorn and would personally never use gevent again, especially in bigger projects. It makes the behavior of the application unpredictable.


How does async/await solve this? I would have thought it has exactly the same problem?


It does have the same problem. Standard library CPU-heavy functions are not generally async-friendly. You'll be stuck blocking for that 500kb JSON file to serialize.


I've been looking at whether this would be appropriate for something like server-side mixpanel event tracking. Or for sending transactional emails or text notifications using a 3rd party service like mailgun or twilio.

From what I can tell it is not intended for that purpose, and outright will not work.


90% of the time I don't want it. Database, cache, etc, not really that bothered. Web requests take 100-300ms to complete, tying up a worker for 300ms isn't much of a problem.

10% of the time I'm calling an API that takes 3s and tying up a worker for 3s _might_ be a problem. Being able to not do that would be really handy sometimes.

Not web servers, but I also do a lot of web scraping and Python is definitely the best tool I've used for that job (lxml is fast with great XPath support, Python is very expressive for data manipulation), using async for that could dramatically improve our throughput as it's essentially network bound, and we don't really care about latency that's in the 10s of seconds.

Source: I work on a large production Django site.


Web requests taking over 100 ms is an absolute shame that slow languages like python are enabling.


I'd love the site to be faster, but it's very hard to do this. For an API called while serving a user request, 100ms is slow, but for the frontend that a user hits directly, it's fairly typical.

As a point of comparison, Amazon's time to first byte for me is 270ms, with a 15ms round-trip time to their servers, so they're looking at about 255ms to serve a page.

To get significantly faster than this, a site must be engineered for speed from the ground up, and the productivity hit would be huge. We've got ~250k lines of Python, which would probably translate to ~750k lines of Go (which is fast, but not that fast), or probably >1m lines of C++. Engineers don't tend to produce that much more or less in terms of line count, so this would likely take ~4-6x the time to create (very rough ballpark). Plus, with a codebase so much larger there's a greater need for tooling support, maintenance becomes harder, more engineers are needed, etc.

When speed is the winning factor, like it sometimes is for a SaaS API that does something important in a hot code path (e.g. Algolia) then this is all worth it. When you're a consumer product where reacting to consumer demand is the most important thing, the speed difference really isn't worth it.


Most of the times I've profiled a web application I found that slow requests were coming from slow database queries.


So two examples off the top of my head where it’s the request latency and not python at fault

1. In an incident database, we allow full text search with filtering. Depending on the complexity of the query, and contents of the database this can take 10ms or 10,000ms. This isn’t something easily changed. It’s Lucerne's fault.

2. Querying the physical status of a remote site has variable latency because the sensors are on Wifi and it’s flakey. We can’t easily move the sensors, or make wifi coverage in some warehouse perfect.

Right now, we circuit break and route potentially slow requests to their own cluster via the router, but it’s a poor solution.


That depends on what that request is doing. It could be fetching a single record from the database and serializing it, or it could be running a complex analysis.


There is no _switch_! You can use easily _mix_ sync and async code without any consequences.

Django provides `sync_to_async` and `async_to_sync`, but it's trivial to do this yourself without Django:

https://docs.djangoproject.com/en/3.0/topics/async/#async-ad...

You can write sync code and use async calls only when needed.

Also, async python is awesome. Things were messy 2-3 years ago, but everything is so much better now.


Async operations have never been to speed things up, but to prevent synchronous operations from blocking threads.

This should have no (big) performance impact, but these will most probably allow better concurrency, which can be quite critical for a web framework.


Which should allow a Django instance to serve more concurrent requests.

Making views async won't do much to make an individual request faster (besides keeping it from being blocked by other slower requests).


I feel like it's too little too late, but the idea is good. There's an obligatory reading:

https://news.ycombinator.com/item?id=23218782

Five times later, there are some new frameworks, but much of the ecosystem is still sync-only. This is actually one of the things that is pushing me towards Go lately. Python just doesn't seem to mature fast enough and tools heavily disagree on conventions.


Some use cases and background are nicely presented in this article https://wersdoerfer.de/blogs/ephes_blog/django-31-async/

HN submission: https://news.ycombinator.com/item?id=24048208


If I look at the database queries on the vast majority of pages in a typical Django project, I see a big list of operations being executed sequentially that could actually be done in parallel.

Additionally, (this is my pet use case) if you implement a GraphQL server on top of Django (using one of the many libraries), you tend to get subpar performance because GraphQL lends itself really well to parallelised data fetching, which is hard to take advantage of at the moment.


uWSGI with Gevent patched works far better than async/await.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: