Why do we need Flask, Celery, and Redis? (2019)

sandGorgon · on April 18, 2020

I see a lot of comments that are talking about how python does not have a go-like concurrency story.

Fyi - python ASGI frameworks like fastapi/Starlette are the same developer experience as go. They also compete on techempower benchmarks. Also used in production by Uber, Microsoft,etc.

A queue based system is used for a very different tradeoff of persistence vs concurrency. It's similar to saying that the usecase for Kafka doesn't exist because go can do concurrency.

Running "python -m asyncio" launches a natively async REPL. https://www.integralist.co.uk/posts/python-asyncio/#running-...

Go play with it ;)

pinkbeanz · on April 18, 2020

I think a hard part with lots of these “what do I use x for” examples is it starts with the tool and the discusses the problem that it solves. I find it more helpful to start with a problem, and discuss the various tools that address it, in different ways.

Forget email, say you have a app that scans links in comments for maliciousness. You rely on an internal api for checking against a known blacklist, which follows shortened links first, and an external api from a third party. You want the comment to appear to submit instantly to the poster but are comfortable waiting for it to appear for everyone else. What are your options?

You could certainly use message queues and workers. If you’re cloud native maybe you leverage lambdas. Maybe you spin up an independent service that does the processing and inserting into the database in the background, and all you need to do is send a simple HTTP request on an internal network.

Your solution depends on your throughout requirements, the size of your team and their engineering capabilities, what existing solutions you have in place. Everything has its pros and cons. Pretending that celery/redis is useless and would be solved if everyone just used Java ignores the fact that celery and redis are widely popular and drive many successful applications and use cases.

lmeyerov · on April 18, 2020

We've been experimenting with nice compromises for using pydata on-the-fly compute with caching. Basic tension is indiv requests get blocked by their blocking APIs, while caches (eg, IP-pinned) are best when same-python-app-thread.

Right now, we do hypercorn multiproc -> per-proc quart/asyncio/aiohttp IP-pinned event loop -> Apache arrow in-app cache -> on-gpu rapids.ai cache.

But not happy w event loop due to pandas/rapids blocking if heavy datasets by concurrent users. (Taking us back to celery, redis, etc, which we don't want due to extra data movement..) Maybe we can get immutable arrow buffs shared across python proc threads..

Ideas welcome!

stavros · on April 18, 2020

While I agree with the rest of your comment, the sentence "if you’re cloud native maybe you leverage lambdas" made me irrationally angry.

CalRobert · on April 18, 2020

Can you explain why? I use lambdas often and they seem to solve the problems they're meant for well.

stavros · on April 18, 2020

It wasn't the lambdas, it was the combination of "cloud-native", which is a very salesmany term, and "leverage", which is my pet hate word. It's exactly as useful as "use", only much more pretentious. I'm just easily triggered with language :P

More off-topic (or, rather, on-topic), I find lambdas great for things like a static website that needs a few functions. I especially like how Netlify uses them, they seem to fit that purpose exactly.

normalnorm · on April 19, 2020

> I'm just easily triggered with language :P

Me too! It makes me irrationally angry when people regurgitate linguistic clichés. I was already mad with:

"python does not have a go-like concurrency story"

when it would be enough (and 1000x less cringe) to say:

"python does not have go-like concurrency"

I think these mindless clichés make language really ugly and dysfunctional, and even worse they are thought-stoppers, because they make the reader/listener feel like something smart is being said, because they recognize the "in-group" lingo. In my experience, people get really offended when you point this out. It's kind of an HN taboo to discuss this. Which is also interesting in itself.

Going forward we should pay more attention to our communication use cases. Btw: I wonder if we can stack several of these clichés. For example: "leverage" + "use case" = "leverage case".

stavros · on April 19, 2020

I agree, I think Orwell's "Politics and the English Language" is spot on here. I try to use simpler language whenever possible, I agree that people think that using longer words makes them sound smart but is just worse for communication.

I've found it's a taboo to discuss anything even slightly personal. People are averse to feeling bad, so criticism needs to be extremely subtle in order to not offend.

> Btw: I wonder if we can stack several of these clichés. For example: "leverage" + "use case" = "leverage case".

I hate you for even thinking of this.

rumanator · on April 19, 2020

> I've found it's a taboo to discuss anything even slightly personal. People are averse to feeling bad, so criticism needs to be extremely subtle in order to not offend.

The personal association you made between "discussing anything even slightly personal" and "criticism needs to be extremely subtled" makes it sound that your problem isn't language or Orwellian discourse but the way you subconsciously link discussing personal matters with harshly criticising those you speak with for no good reason.

If your personal conversations boil down to appease your own personal need to criticise others then I'm sorry to break it to you but your problem isn't language.

stavros · on April 19, 2020

You just misconstrued my saying "personal" and clearly meaning "personal criticism" as meaning personal things in general and then criticized me on that straw man. I don't hold that opinion at all.

You also went to "Orwellian discourse", which has a specific meaning, from a text by Orwell I mentioned. It seems to me like you got personally offended, interpreted my comment in the most uncharitable way, and chose to lash out at me instead, and I'm not sure why. I wasn't even talking about anyone specifically.

rumanator · on April 19, 2020

You were the one associating discussing remotely personal stuff with criticising others, and if that was not bad enough your personal take was that you felt the need to keep criticising others but resorting to subtlety just to keep shoveling criticism without sparking the reactions you're getting for doing the thing you want to do to others.

I repeat, your problem is not language. Your problem is that you manifest a need to criticize others. That problem is all on you.

heyoni · on April 20, 2020

That is such quite a stretch of the imagination and I sure didn’t read it that way. I may be wrong but here’s one fun example from this comment section that I wanted to “respond” to and demand some clarification on.

https://news.ycombinator.com/item?id=22911497

PS: I work with like 65-70% of that stack daily

stavros · on April 19, 2020

Ah, okay. If I had said "any remotely personal criticism", like I meant, you'd have an entirely different conclusion, but I guess that doesn't matter when you just want to jump to conclusions.

This shows from the fact that you're saying "criticize" like it's a bad thing.

BiteCode_dev · on April 18, 2020

Besides, serious projects in Go do use additional tools for creating task queues because they need to handle various stages of persistence, error handling, serialization, workflow, etc. Which are not stuff you want to write by hand yourself.

If you don't need all that, then it's not a problem in Python either. You don't even need asyncio, you can just use a ThreadPool or a ProcessPool, dump stuff with pickle/shelve/sqlite3 and be on your way.

arp242 · on April 18, 2020

I was using gevent in Python about 10 years ago, and from memory, it's roughly similar to goroutines. It's not exactly the same of course, but just like goroutines it's pretty easy to just spawn a few jobs off, wait for them to finish, and get the results.

It's not in the standard library, and there are probably other options too now (not a heavy Python user any more), but Python has had easy parallelism for at least a decade (probably longer).

slashdev · on April 18, 2020

Gevent is like goroutines with GOMAXPROCS=1. Which is to say not nearly as useful. It gives you concurrency without parrallelism, because Python never did shake the GIL. Which goes to show some technical debt will haunt you forever.

sandGorgon · on April 18, 2020

Funny you mention the GIL - this is being done as part of PEP 554 which was slated to release in python 3.9 (in alpha).

And it's showing great promise ...while it could be delayed.

https://www.mail-archive.com/python-dev@python.org/msg108063...

2021 is probable going to be the "Gone Gil" moment !

amelius · on April 18, 2020

Why don't they use/fork the GoLang runtime?

YawningAngel · on April 18, 2020

There are a few reasons: 1. This would definitely break the CPython API, which is not an option for mainstream Python. 2. The Golang runtime isn't really well-understood as a backend for languages that aren't Golang, although I acknowledge that there's no reason in principle why you couldn't compile $LANGUAGE to Golang.

antoinealb · on April 18, 2020

To nuance your comment, you can still get some form of parallelism, "just not" thread parallelism in Python. You can still spawn multiple handler process, or have threaded code in a C extension.

slashdev · on April 18, 2020

Yes, you can use multiple processes to get the parallelism. But that's quite limiting compared with goroutines. Passing data back and forth is hard, and you can petty much forget about shared data structures. Memory usage is also much higher.

zepto · on April 18, 2020

It’s arguable whether either of those are ‘in python’.

quacker · on April 18, 2020

The multiprocessing module in the standard library is absolutely a Python-native way to do parallelism:

    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))

This runs f(1), f(2), and f(3) in parallel, using a pool of five processes.

https://docs.python.org/3.6/library/multiprocessing.html

goostavos · on April 18, 2020

Whoa! It never occurred to me that `Pool` could be used with context manager. I've always typed out `.close()` manually like a sucker.

kbutler · on April 18, 2020

https://docs.python.org/2/library/multiprocessing.html

Python standard library since 2.6. Pretty much the definition of "in python".

quacker · on April 18, 2020

> Gevent is like goroutines with GOMAXPROCS=1.

On its own, yes. For webapps, you can easily combine it with a multi-process WSGI server (like gunicorn or similar).

throwaway894345 · on April 18, 2020

Running four processes with GOMAXPROCS=1 is strictly worse than 1 process with GOMAXPROCS=4. The difference is coarse-vs-fine grained parallelism. Notably, a WSGI+gevent system doesn't allow you to do parallelism within a request, not to mention configuring these WSGI implementations (especially for production) is a bunch of extra headache for which there is no analogy in Go.

marcosdumay · on April 18, 2020

The GIL is not a total ban on thread parallelism. It's a significant obstacle, but not a complete stop.

Besides, Go has its own set of problems with parallelism. None of those are best in class.

throwaway894345 · on April 18, 2020

This comment makes it sound like Go and Python are pretty much in the same class because neither is perfect. I invite anyone who thinks this way to write considerable (shared-memory) parallel code in both languages and see which they prefer.

As for Go's "set of problems with parallelism", they're pretty much just that sharing memory is hard to do correctly without giving up performance. No languages do this well; Rust and Haskell make it appear easier by making single-threaded code more difficult to write--requiring you to adhere to invariants like functional purity or borrowing. If you're writing Python, you very likely have values that are incompatible with these invariants (you want to onboard new developers quickly and you want your developers to write code quickly and you're willing to trade off on correctness to do so).

Go is absolutely best-in-class if you have typical Python values.

dnautics · on April 19, 2020

Why bother writing shared-memory parallel code if it makes your life so hard? Most of the time you're i/o bound, or network bound, or storage bound. Being compute bound is exceptionally rare these days.

throwaway894345 · on April 19, 2020

I do it when I have to do it. Most things are I/O bound, but sometimes things are compute bound. And since Python is about two orders of magnitude slower than Go (and Python also makes it much harder to optimize straight-line execution because it lacks semantics for expressing memory layout), you tend to need parallelism more often than you would in a fast language. Sometimes you can leverage pandas or write a small chunk in C, but very often those options aren’t available, and naively throwing c at the problem can make your performance worse.

marcosdumay · on April 18, 2020

> Go is absolutely best-in-class if you have typical Python values.

"Best in class" is not a relative term. Neither Go nor Python are appropriate choices for highly parallel intercomunicating code. Yes, Python is more limited than Go here, but hardly makes a difference when you avoid it.

Haskell and Rust do make it easier, by forcing developers to organize their code in a completely different way. Erlang does the same, with a different kind of organization. None of those languages are more difficult to program in, but yes, they are hard to learn.

jatone · on April 19, 2020

> Neither Go nor Python are appropriate choices for highly parallel intercomunicating code.

thats an extraordinary claim which needs evidence.

marcosdumay · on April 19, 2020

Extraordinary?

Name any single Go feature aimed at helping parallel computing.

meowface · on April 18, 2020

I still use and greatly prefer gevent to this day. They are indeed quite similar to goroutines. Asyncio is a different model, and more irritating for me to work with. (I'm sure this isn't the case for everyone. I'm just more productive with gevent, personally.)

Performance is pretty close for both. I was disappointed to not see Python get an official green thread implementation. The counter-argument I commonly see cited is https://glyph.twistedmatrix.com/2014/02/unyielding.html. I personally don't find it to be a very convincing argument.

sandGorgon · on April 18, 2020

What's missing in asyncio when compared to gevent ?

The coroutine and queue model is the same right ?

Cool thing in 3.8:

Running python -m asyncio launches a natively async REPL.

meowface · on April 18, 2020

They're just completely different models. gevent is green threads, asyncio is explicit coroutines. In gevent, you don't use an "async" or "await" syntax. You just spawn greenlets (green threads), which run in the background, and if you want, you can have them block until they return a result. And they're efficient, so you can spawn thousands without a problem.

quacker · on April 18, 2020

I always considered "green threads" and "coroutines" to be the same thing? Are they not?

zenhack · on April 19, 2020

The difference is that with coroutines, yielding is explicit. Tight-looping will hog the CPU, and a blocking call will block other coroutines too. Typically "green threads" are semantically just threads but cheaper. They're scheduled independently, so there's no risk of them hogging the CPU, and you can use synchronous apis. The one downside is that you need explicit synchronization between them, whereas with coroutines you can mutate shared data structures and not worry about race conditions as long as you don't yield in between.

infinite8s · on April 18, 2020

They are both built on the same technology in the CPython runtime.

sateesh · on April 18, 2020

If you have a significant python code base that is not async, then all of that need to be ported to support async model where as with gevent I can do monkey patching and move to concurrency model. If I am starting a fresh project with python and need concurrency, yes "async" is a better choice, but if you already have some code base then moving to async is a fair amount of work.

throwaway894345 · on April 18, 2020

With asyncio, your whole app falls over if you accidentally call a library function that makes a sync API call under the covers. gevent (as I understand it; haven't actually used it) will patch all sync APIs and make them async. Also, if you do `aiohttp.get("www.example.com/foo.json").json()", you get a TypeError because coroutine has no method '.json()' (you forgot `await`) unless you're using Mypy.

meowface · on April 20, 2020

Yep, that about sums it up. gevent can't monkeypatch synchronous code that's implemented in non-Python native modules, but I think pretty much all native Python libraries struggle with those sorts of things, and asyncio of course also can't deal with it.

The vast majority of the time, gevent's monkeypatching works without any issues. With asyncio, you basically have to rewrite everything from the ground up to always use the new async APIs, and you can't interact with libraries that do sync I/O.

throwaway894345 · on April 18, 2020

> Fyi - python ASGI frameworks like fastapi/Starlette are the same developer experience as go.

Can you provide some context for this statement? I've used Python asyncio extensively in Fargate (no ASGI frontend), and the developer experience is far from Go; however, I don't see how an ASGI framework can fix this. It seems like it offers the same general course-grained parallelism that you get from a containerized environment like Fargate except that Fargate abstracts over a cluster while ASGI frameworks presumably just abstract over a handful of CPUs.

For example, we have a large data structure that we have to load and process for each request. We want to parallelize the processing of that structure, but the costs to pickle it for a multiprocessing approach are much too large. We've considered the memory mapped file approach, but it has its own issues. We're also looking at stateful-server solutions, like Dask, but now we're talking about running infrastructure. In Go, we could just fork a few goroutines and be on our way.

sandGorgon · on April 18, 2020

you should ask this exact same question here - https://github.com/tiangolo/fastapi/

i dont claim to have expertise in your business domain, but you should get the answer here.

solidasparagus · on April 19, 2020

The difference is that it doesn't play well with large parts of the existing Python ecosystem. Concurrency is a pain if you are writing a library or trying to integrate with existing code. Python async is powerful, but:

(1) it's undersupported (e.g. you can in theory download s3 files using async botocore, but in practice it is hard to use because of strict botocore version dependencies)

(2) it isn't natural - once you're in the event loop it makes sense, but using an event loop alongside normal python code is confusing at best.

(3) it got introduced too late. The best async primitives are only available in pretty recent versions of python3.

The difference with go is that it go has these primitives built in from the beginning and using them doesn't introduce interoperability problems.

jatone · on April 19, 2020

and on top of it go is performant enough that 90% of the time you don't need them to begin with.

hackworks · on April 19, 2020

Most common use case for concurrency is for IO intensive workloads. When it comes down to IO, programming language hardly matters.

If you have CPU intensive workload, an optimizing compiler can help. Well, for that you have C or other languages with more mature and aggressive compilers.

zaro · on April 19, 2020

I've played with the concurrency in Python, and it's simply not worth it. Much better to use Node.js or Go where the async story is not an afterthought.

Of course if you are stuck with Python it's better than nothing.

jakearmitage · on April 18, 2020

> same developer experience as go

You can't say that when Python shoves Pip/Poetry/VirtualEnv/Black/Flake and what not into people. In contrast, Go has built-in package management and gofmt. Python is essentially gate-keeping new devs.

sandGorgon · on April 18, 2020

Go has had an equally bad packaging experience.

Kubernetes - which is one of the biggest projects built in go - has been struggling with dependency and package management.

Here's the CTO of Rancher commenting on his struggles

https://twitter.com/ibuildthecloud/status/118752909888666419...

https://twitter.com/ibuildthecloud/status/118753821015230873...

This is not trivial stuff..and it shouldn't be trivialised into a go vs python flamewar. Because it can't be.

throwaway894345 · on April 18, 2020

I've used Python and Go extensively. Go's packaging story has a few rough edges, but Python's is an impenetrable maze of competing tools that each purport to address others' major hidden pitfalls.

To work with Python packages, you have to pick the right subset of these technologies to work with, and you'll probably have to change course several times because all of them have major hidden pitfalls:

* wheels

* eggs

* pex

* shiv

* setuptools

* sdist

* bdist

* virtualenv

* pipenv

* pyenv

* sys.path

* pyproject.toml

* pip

* pipfile

* poetry

* twine

* anaconda

To work with Go:

* Publish package (including documentation): git tag $VERSION && git push $VERSION

* Add a dependency: add the import declaration in your file and `go build`

* Distribution: Build a static binary for every target platform and send it to whomever. No need to have a special runtime (or version thereof) installed nor any kind of virtual environment nor any particular set of dependencies.

ForHackernews · on April 18, 2020

Yeah, until github is unreachable and the entire Go universe grinds to an immediate halt because nothing will build.

Python packaging is a mess, but Go doesn't even bother. "Just download from some VCS we'll pretend is 100% reliable and compile from source" is not a packaging solution.

kelnos · on April 18, 2020

How is that any different than the entire Python universe grinding to an immediate halt if there's an issue with pypi.python.org? (Hint: it's not.)

You can certainly debate the difference in uptime between specific services; I don't know either way, but if you told my that PyPi had higher uptime than GitHub, I'd believe you... but that's kinda missing the point. If you depend on an online service to host your release artifacts, if and when that service goes down, it's gonna hurt.

Meanwhile, Python's packaging wars continue to rage on. Go's is simple: a release is a tag in a VCS repository. I'm sure there are issues with that as well, but that should come as no surprise, considering there are issues with literally every packaging solution. At any rate, there's little moral difference between downloading a tarball (or a wheel, or... whatever), vs. pulling a tag from a git repo. It requires equal levels of trust to believe that no one has tampered with prior releases in both cases.

I'd like to also point out that I don't have a dog in this race. I've done a little Go here and there, but frankly I don't like the ergonomics of the language too much, so I stay away from it. I've done (and continue to do) a decent amount of Python. I like the language, but tend to prefer strongly-typed, functional languages, and languages with performant runtimes, so I tend to only use it for smaller projects.

ForHackernews · on April 19, 2020

You can trivially run your own local PyPI mirror or install packages directly from some other source (e.g. S3 bucket, LAN storage). Is there a way to do that for Go? If so, I've never seen it done.

morelisp · on April 19, 2020

It's possibly as of the proxy in Go 1.13, but this was not well-documented, suffers from competing implementations, and introduced in a way that probably broke more builds than it helped.

jatone · on April 19, 2020

yes you just vendor your dependencies.

go has support for a proxy system, tooling is still immature though.

throwaway894345 · on April 19, 2020

I’ll take “doesn’t even try, but just works” all day every day. What is Github’s downtime for cloning in the last year, and how does it compare to Pypi? And if you’re really worried, why not use a caching proxy just like you do with Pypi? In my experience (using Python since 2008 and Go since 2012), Go package management has far fewer problems.

petre · on April 19, 2020

I can't build lego from source due to a failed dependency. The docs don't help either, they're plain wrong. To make matters worse, Go pulls the latest dev version so good luck trying to build a stable binary of some complex package. I've opened an issue which was promptly closed and I was told to "just download the binary dist, source builds are for devs". To add insult to injury, each project is built in its own usually broken way. Out if date software? Good luck. Sorry, but I've had overall better experiences installing random Python programs with pip or building D libs with dub. Pulling half of Github rarely qualifies as "package management". It only encourages a giant mess, which is precisely what software development has been lately. Go is probably worse than npm in this respect.

jatone · on April 19, 2020

no have no idea what you're talking about. these problems don't exist anymore since godep and now go modules which is builtin to standard go tooling.

jatone · on April 19, 2020

> Yeah, until github is unreachable and the entire Go universe grinds to an immediate halt because nothing will build.

that's what vendoring is for and the proxy cache. this problem hasn't existed since like go 1.8 and is completely resolved in go1.14.

joshuamorton · on April 19, 2020

You're combining multiple problems: maintaining a package for redistribution, and using packages. For the second, the much more common case, 2/3 of the things on your list are irrelevant.

throwaway894345 · on April 19, 2020

For either case, Python’s story is more complex than Go’s.

morelisp · on April 18, 2020

Go may be unique in being the only ecosystem built after Python that can't claim it avoided Python's packaging disasters.

throwaway894345 · on April 18, 2020

How do you figure? Go's packaging is wayyyy better than Python's. I've done considerable work with each and while Go's ecosystem has warts here and there, it's far from disastrous. I can't say that about Python.

If nothing else, Go lets you distribute a static binary with everything built in, including the runtime. Python's closest analog is PEX files, but these don't include the runtime and often require you to have the right `.so` files installed on your system, and they also don't work with libraries that assume they are unpacked to the system packages directory or similar. In general, it also takes much longer to build a PEX file than to compile a Go project. Unfortunately, PEX files aren't even very common in the Python ecosystem.

morelisp · on April 18, 2020

In the context of

> Pip/Poetry/VirtualEnv

"packaging" refers to the way the language manages dependencies during the build and import process, not how you distribute programs you have built.

Python has a deservedly poor reputation here, having churned through dozen major overlapping different-but-not-really tools in my decade and a half using it. And even the most recent one is only about a year into wide adoption, so I wouldn't count on this being over.

Go tried to ignore modules entirely, using the incredibly idiosyncratic GOPATH approach, got (I think) four major competing implementations within half as long, finally started converging, then Google blew a huge amount of political capital countermanding the community's decision. My experience with Go modules has been mostly positive, but there's no really major new idea in it that needed a decade to stew nor the amount of emotional energy. (MVS is nice but an incremental improvement over lockfiles, especially as go.sum ends up morally a lockfile anyway.)

dnautics · on April 19, 2020

I'm slowly deprecating a python system at work and replacing it with elixir. We don't use containerization or anything, and installing the python system is a nightmare. You have to set up virtualenvs, not to mention celery and rabbit, and god help you if you're trying to operate it and you forget something or another.

With elixir, you run "mix release" and the release pipeline is set up to automatically gzip the release (it's one line of code to include that). Shoot the gzip over (actually I upload to s3 and redownload), unzip, and the entire environment, the dependencies, the vm, literally everything comes over. The only thing I have to do is sudo setcap cap_net_bind=+ep on the vm binary inside the distribution because linux is weird and, as they say, "it just works".

throwaway894345 · on April 19, 2020

I fully agree with this assessment, but I don’t see how this puts Python’s story on par with Go’s. While GOPATH was certainly idiosyncratic, it generally just worked for me. While go modules aren’t perfect and the history was frustrating, they generally work fine for me. Python feels like an uphill battle by comparison.

morelisp · on April 19, 2020

If Go sticks with modules and doesn't keep making significant changes (e.g. the proxy introduced 1.13 was not handled well), then it will be better than Python.

But if Python finally "picks" poetry, sticks with it for a few years and incrementally fixes problems rather than rolling out yet another new tool, that will also be better.

You can only identify the end of the churn for either retroactively. Python just looks worse right now because it's been around longer.

jatone · on April 19, 2020

difference here is track record.

go: tends to wait and implement something once the problem is understood. took 2 years after go maintainers decided to solve the dependency issues. and as of the latest release its finally been labelled production ready.

and honestly the proxy issues were not real. go modules was still optional. you could just turn it off.

python is how old now? couple decades? and it has only gotten worse over time.

morelisp · on April 19, 2020

Another thing Python and Go unfortunately have in common is a community (not necessarily core developers) with knee-jerk reactions to any criticism.

> go: tends to wait and implement something once the problem is understood.

Go's modules provide no additional "understanding" over any of the other Bundler-derived solutions in the world. MVS was the primary innovation, but wanting checksum validation means I have to track all the same data anyway.

> took 2 years after go maintainers decided to solve the dependency issues

This is revisionist history. There were other official "solutions" before ("you don't need it", "vgo is good enough", and "we'll follow community decisions"). If this one sticks, it's fine. But you can't say it's good now just because it's the one we have now - it's good now if it's the one we still manage to have in five years.

Go's track record is not "good" (in that regard I think only Cargo qualifies). At best it's "mercifully short."

> and honestly the proxy issues were not real.

Documentation was poor, the needed flags changed shortly before release, the design risks information leaks, and the entire system should not have been on by default for at least one more minor version.

> python is how old now? couple decades? and it has only gotten worse over time.

Yeah, that's exactly why I said "Python just looks worse right now because it's been around longer." It hasn't gotten worse though, it just also hasn't stopped churning. And if Go doesn't stop churning, in 10 years it will look the same.

The age argument works both ways - multiple major versions of Python predate Bundler. Go has no excuse for taking so long to reinvent "Bundler with incidentals", just like every other language.

takeda · on April 18, 2020

I believe Python suffers from no leadership in that space (everyone creates their own packaging, every tutorial advocates something different, many tutorials are outright wrong).

There was also a bad decision of using Python code for installation (setup.py) instead of a declarative language.

Most of that issues are actually fixed in setuptools if you put all settings in setup.cfg and just call empty setup() in setup.py.

Like here: https://github.com/takeda/example_python_project/blob/master...

mdtusz · on April 18, 2020

Cargo works pretty well too.

takeda · on April 18, 2020

Cargo is not from Go

jessaustin · on April 18, 2020

As a name for a package manager, "cargo" certainly appears more suitable for "go" than for "rust".

waterside81 · on April 18, 2020

What's interesting is that McDonald's wait times have actually gone up since they moved to kiosk ordering, mobile app ordering, Uber eats etc. They've increased their ability to take orders, but haven't been able to keep up on the supply side. The old way was almost better in that it introduced a natural bottleneck so while it took longer to place your order, once you did, the queue in front of you was shorter.

https://www.businessinsider.com/mcdonalds-spending-millions-...

skrebbel · on April 18, 2020

This is intentional. In fact, I've seen many McDonald'ses that were redecorated such that you can't see the screen with the queued/ready orders from where the kiosks are. This way, you're not discouraged from ordering if you feel like the wait will be too long.

This is also why McDonald's introduced table service, which is only in restaurants that have a layout where it's impossible to hide how many people are waiting. It costs manpower to deliver food to tables, but the additional orders are worth it.

McD's don't mind if you have to wait, they mind if you leave before you order. "Busy-looking queue" is a much more frequent problem than "totally-packed-restaurant".

Source: I like hamburgers + I geek out over stuff like this. I.e., Just Guessing.

smacktoward · on April 18, 2020

I wonder how much of the delay is due to their recent decision to offer breakfast items all day. Many franchisees were complaining at the time about how doing so would reduce throughput, as it expanded greatly the list of possible items a customer could order and necessitated pulling people away to staff tasks that were only required for breakfast, like cooking eggs.

toyg · on April 18, 2020

A lot of these measures have come to markets where breakfast is still not served all day (e.g. UK, last I checked).

My reasoning is that they are just trying as hard as they can to decouple order-taking from order-preparing and distributing, taking their clue from Starbucks. Volume of this or that is not really an issue: with the exception of chips, these days they hardly prepare anything at all before it’s been ordered, so it doesn’t really matter whether they make a burger or a muffin.

arrty88 · on April 18, 2020

Yet we are willing to wait for chipotle, five guys, shake shack for 20 min m

darkerside · on April 18, 2020

?

Chipotle is about as fast as it gets

dillonmckay · on April 18, 2020

Not when the person in front of you is getting 6 different orders.

loblollyboy · on April 18, 2020

Do you not have tastebuds?

jiofih · on April 18, 2020

Do you? Those are some of the best dirty burgers you can have on this planet.

sli · on April 18, 2020

They've gotten horrendously expensive in the fast few years, though. That's my only real issue. My order has gone from $10 to $17 since they opened in my town, so I just don't go anymore.

rhizome · on April 18, 2020

I was talking about this just last night. Five Guys is absolutely the worst of the mid-priced hamburger chains, and even their fries are almost as bad as In-N-Out's, which is like making a car whose steering wheel flies off while you're driving.

jessaustin · on April 18, 2020

In-n-out fries must be eaten while they're hot. In that state they are delicious. The precipitous decline in quality upon cooling occurs because they are made by frying potato pieces rather than by assembling various starches in a laboratory.

freehunter · on April 18, 2020

I think my tastebuds must be off because I feel the same way. There's always a huge line at Five Guys and people rave about it but whenever I eat there I feel like I need a straw for my fries because they're just drowning in grease.

I get that people like it and I'm certainly not going to discourage anyone from doing what they enjoy, but I also feel like social media has turned certain fast food chains into memes where you can't merely just be satisfied with something, you either need to looooovveeeee iiiitttt or demand it be "canceled". There's no middle ground.

AlphaSite · on April 19, 2020

The fries are fairly mediocre, but their burgers are pretty fantastic, espesicslky if you’re a fan of animal style.

seized · on April 18, 2020

Saying Five Guys is the worst in a thread that features McDonalds... Five Guys is gourmet compared to McDonalds.

rhizome · on April 20, 2020

That's why I distinguished McD's out of the comparison with "mid-priced."

agumonkey · on April 18, 2020

Well someone tell McDonald's some people are not going there anymore. The few I do I use McDrive because the wait is predictable and shorter.

If I had the money I'd spent it to open a good old McDonald's I'm sure more people would come.

cle · on April 18, 2020

It’s called backpressure and it’s one of the tradeoffs that should be considered when thinking about adopting async work queues. There is no natural backpressure mechanism so it takes a lot more work to ensure you either reject new work after hitting a limit, or scale your workers so you can keep up with the queue backlog. Queues seem great, until they’re not, and then you enter a much more complicated world.

agumonkey · on April 18, 2020

It's so sadly true that you can realize it without even looking.

The kiosk system is human-less thinking, we're not machines operating with pure logistics in mind. We like to have responsibilities in a way. When I'm talking to a cashier, he/she feels a duty to do something from A to B. With decoupled ordering.. nobody knows who I am, nobody really cares (McDonalds doesn't pay nor train for welcoming and service mindset). I'm just a thing carrying a ticket.

I've seen things so absurd, people walking around not knowing who to talk to, where to go, what to assume; both customers or employees. It was a surrealist comical situation.

Also my average time to serve for 1 hamburger with no other customer around is 7min. (5-15). Talk about fast food :)

Another point, I don't mean to make people work more, but I even prefer busy waiting lines with hectic kitchen action. That's what McDonalds were, high throughput grill. It felt something.. now it's all dull and clinic.

Oh and lastly .. the kiosk are fugly. They break the room space, break the flow of people, are way too big for their purpose (100$ a stupid tiny 80s monochrome terminal, would do better :p)

ps: This decoupling idea was tried at a company I worked for. Exact same principle (which I also gave some credit at first), split everything in small chunks so people can go faster.. it all went worse because nobody took responsibility for anything since a single task was now a dozen tiny bits done by a dozen people not really knowing what their bit was for. They just passed the products from hand to hand, not being able to track who or what was wrong until the last guy received all the shit because he's the one to show the result to the managers :)

satvikpendem · on April 18, 2020

That may be true for you, but I like the flow of the kiosks, where I can take my time and not be rushed, and I don't have to interact with someone just to take an order. I'm not sure why it has to be a process that requires a human at all, just to be a cashier.

toyg · on April 18, 2020

They also work extremely well in busy tourist hotspots and big cities, where people might struggle to communicate and slow down operations pretty dramatically.

marktangotango · on April 18, 2020

I feel even more rushed when people are in line behind me waiting for me to figure out the ui a cashier knows by heart. I’m not sure why it has to a process that requires a kiosk at all, just to be a cashier.

m463 · on April 18, 2020

social distancing may skew everything towards kiosks.

bigyikes · on April 18, 2020

If I’m worried about catching or spreading the virus, wouldn’t the kiosks be worse? Dozens of people touching the same surface with their bare hands seems more dangerous to me than telling my order to someone over a short distance, especially if masks are involved.

m463 · on April 18, 2020

A full answer might depend on whether you are a customer or a cashier.

agumonkey · on April 18, 2020

That is true. We might end up living in biohazard like cities from now one :)

nebulous1 · on April 18, 2020

That seems to be looking at "speaker to order window" instead of "arrival to order window" times. This seems odd to me, as personally I would want to minimize the latter not the former.

spamizbad · on April 18, 2020

I can't speak to the apps, but there's a foodcourt in my building with a kiosked McDonalds. Based on my observations, ordering with it is a slow, time consuming process. It's way faster to just say "I'll have a 5 with a coke". Most people in a hurry opt to stand in line with a cashier, which has a line that moves much faster.

rootusrootus · on April 18, 2020

Totally agree, I loathe the kiosks, it is an incredibly slow way to order food. Perhaps they need to implement a voice assistant so you can just ask for a 5 with a coke.

DeonPenny · on April 18, 2020

That seems like a good problem cause before they had issues with demand. I think based on McDonald history of optimization they feel like they can solve that supply side issue more easily than the demand side one they were contending with

punnerud · on April 18, 2020

Could it be that the reduced stress on the buyers result in reduced speed on the staff?

xenonite · on April 18, 2020

Moreover, one could just look at the shelf of already prepared burgers and buy one of them. So quick!

formercoder · on April 18, 2020

Wouldn’t that be working as intended? More revenue

1337shadow · on April 18, 2020

Actually with uWSGI, be in on Flask, Django or else: I don't need neither Celery nor Redis. uWSGI has a built-in celery-ish spooler, cron-ish task scheduler, memcache-ish key-value store, along with plenty of other toys that I love from the deepest of my heart ... And I've been in this (great) situation for years, not planning to move out to more complicated stacks. I would highly recommend uWSGI over of Celery and Redis, which I used in the past, prior to doing it all in uWSGI, unless you have a really good reason which I'm eager to read about. And now that uWSGI supports a lot of languages, even if I have some PHP or whatever to deploy I'll go for uWSGI, one of the most beautiful piece of software I have the chance to use.

d33 · on April 18, 2020

Could you elaborate, e.g. link to documentation, subprojects or example code? I'd love to get rid of Celery because of how difficult it is for me to tweak it for good performance.

darkerside · on April 18, 2020

Would love to see more detail as well because I'm a bit skeptical. From uWSGI documentation, it looks like to coordinate cron across anything more than a single server setup, you'd need to configure a Legion, which means you're then integrating uWSGI's orchestration framework with whatever you're already using (k8, ECS, etc).

I like minimalism, but sometimes batteries are included for a reason.

GlennS · on April 19, 2020

I suspect they are using a single server. I found uwsgi's mules and cache2 very useful in that situation.

If someone finds that Redis and Celery are more complication than they need for a given task, then I think they're probably not using an orchestration framework.

GlennS · on April 19, 2020

This is what I have used in the past. I think they're very convenient if you are just running one instance of uwsgi and want to share some state quickly without going through a persistent database.

If you're worrying about tweaking Celery for performance, then I suspect your uses may be a bit more complex than uwsgi's mules are designed for though.

https://uwsgi-docs.readthedocs.io/en/latest/Caching.html https://uwsgi-docs.readthedocs.io/en/latest/Mules.html

The cache is a simple key-value store. Works well. I later swapped it out for Redis, because I needed to shared my cache between multiple machines. Switching to Redis was a very quick and easy replacement, so you don't need to worry about lock in.

I used mules in a couple of ways. I had some background task mules which mostly just ran in a loop with a `sleep()` call. An example is deleting old records from a database once a day. Another example is listening to an Amazon SQS queue for files being uploaded to an S3 bucket.

I also had some mules which were triggered by web requests. These are normally what you would use "farms" for. The web request sends a farm message, and a mule picks it up and acts on it. For example, I used this for sending webhook callbacks in response to certain web requests.

You could probably also combine this with uwsgi's async mode. That would be useful if you needed a web request to wait for a long running task to finish before sending a response back. I handled that kind of situation with the aforementioned webhook callbacks instead.

A sibling comment has mentioned Legion. I've never used that, so can't comment on whether or not the caching and messaging works together with that.

dzonga · on April 18, 2020

you can also use dramatiq

k_bx · on April 18, 2020

How does it work on more than one server? (which is how I hope you deploy all your production apps)

radus · on April 18, 2020

The need for multiple servers kinda depends on the application, no? I'm interested in the single server story.. many of my apps are internal and used by 50 people at most.

k_bx · on April 18, 2020

Yeah, but just in terms of redeploy without a downtime, secureness if new version won’t start properly, and redundancy when one server is down because of an OS error while you were asleep. Not about number of clients at all.

ljvmiranda · on April 18, 2020

Hi, author here! Pleasantly surprised I saw this on HN, thanks for posting feross! Sorry for the Mcdonalds analogy, it's just that it's really near our office and I got that insight while ordering McNuggets! Didn't expect it will cause some divide

Agree, Mcdonalds has definitely upped their ordering game recently. Thank you and I appreciate all the helpful comments!

throwaway888abc · on April 18, 2020

Hey, Nice article format for actual humans with easy to digest flow. Will share with devs. Great work! Thanks

andybak · on April 18, 2020

I wish there was a paragraph up the top that made two points:

1. Quite often you don't (I've built dozens of websites without needing Celery)

2. Even if you think you do there's often a much simpler solution that is enough for most needs (Use cron, spawn a process etc)

Celery is a big, heavy lump of code to add to most websites and it increased the deployment complexity.

danpalmer · on April 18, 2020

It’s worth scoping out what your site will need to do up front to some extent. Spending a couple of days setting up a basic background processing system (whether that’s celery, rq, or a home grown system) makes it easy to make great engineering decisions later down the line.

Questions like: should I send this email in-line in the web request? Get a very easy answer: no, just stick do it later. Sure, sending an email is probably fine to do in-line for now, but months in you may realise that things are slow, that you’re sending emails and rolling back transactions later, or committing the transaction but losing the email that needed to be sent, or all manner of other annoying edge cases. Queues don’t solve everything, but they can be an ok answer to a lot of stuff for a long time.

For basic sites, yeah maybe not necessary, but a reliable background processing system has always been a significant accelerator in my projects.

1337shadow · on April 18, 2020

Same in my experience: emails should always be sent in a background process. Luckily for me, using uWSGI to deploy anything in any languages: it builds in a nice little spooler that's going to let me spool emails without adding a single new software to my stack: not even having to start another process.

raverbashing · on April 18, 2020

Completely agree

Several developers like to overengineer and "go for celery" (also applies to other technologies with other uses) even for small things.

You don't need Celery to run a batch job every day for example. Or to even do some parallel processing.

"Oh but python multithreading sucks" do you know when it does not suck? When your thread is waiting on something else. Also there's the multiprocessing module with a lot of "batteries included" for basic use cases.

Not to mention (my biggest pet-peeve of Celery) is that it "forces" you to work with a task queue model. Not a data queue model. Sure, it helps a lot when you need that, but sometimes you just need a queue.

mattbillenstein · on April 18, 2020

I prefer rq - Celery is too complex imho.

ptype · on April 18, 2020

Also worth checking out nameko, I’ve had good experiences with it

cepp · on April 18, 2020

Huey is also worth a shot.

mattbillenstein · on April 19, 2020

I like it - uses redis as a broker - supports crontab style periodic tasks.

I've usually had to build a small python cron runner using croniter in previous systems - which I think is a pretty clean solution - it just deferred tasks to rq workers. But having direct support in the lib might be nice.

mattbillenstein · on April 19, 2020

* depends on rabbitmq

procinct · on April 18, 2020

One aspect of this set up I’ve never been able to understand is how the application then gets the result from the worker? If it’s polling for status from the backend doesn’t that defeat the purpose of having a worker to begin with? Or does this set up only work for tasks that don’t need to have the backend notified about the result so the front end can just poll for the result via the application?

adamcharnock · on April 18, 2020

You’re spot on really. Having the front end wait for a background task to complete broadly defeats the purpose. There are some caveats though: if you’re using an async/threaded web server then it may not matter that you have a pending request hanging around as your web server is free to continue serving other requests. It also may be that you need to run the task on specialist hardware for some reason.

Really though, I think a lot of people use celery for offloading things like email sending and API calls which, IMHO, isn’t really worth the complexity (especially as SMTP is basically a queue anyway). Of course, YMMV depending on your use case.

However, I find it is often more worthwhile for:

1. Tasks which take a long time to run

2. Tasks which need to happen on a schedule, rather than in response to a user request.

There is an option 3 too, which is for inter-software communication. Eg events or RPCs, but I found Celery to be very much a square-peg-round-hole for this, which is why I developed Lightbus (http://lightbus.org). Lightbus also supports background tasks and scheduled tasks. /plug

doliveira · on April 18, 2020

It's not just about the time the operation takes, it's about reliability. Even if sending an email synchronously doesn't usually take more than a few milliseconds, you still need to handle cases like servers failing in the middle of the request, temporary upstream unavailability, some expired API, account limits reached, etc...

Honestly, I think that mostly anything that doesn't depend directly in the current state of your infrastructure should be done asynchronously. I've had a lot of issues with systems that start up doing everything synchronously: you'll probably need to refactor it to be asynchronous in emergency mode during a crisis.

marcosdumay · on April 18, 2020

For email, that's why you set a relay within your control, that will accept messages without a fuss and send them around following SMTP conventions.

But anyway, how is your application supposed to respond after any of those failures? Is it just supposed to ignore the failure and thread on like if nothing happened, leaving your users on the dark? Is it supposed to reliably log every task so that it can retry anything that fails and in the worst case feed failures into some monitoring system/process? Or is it supposed to inform the user of any success or failure before the user can move on?

Queue software is only a good match for the first. For the second you will need to roll your own interface with the monitoring system anyway, so it's much easier to roll your own queues and get control of everything. The third one is best done synchronous, it doesn't matter the nature of the process or how long it takes. But funny thing is, I have never seen the first situation on the wild.

WJW · on April 18, 2020

If you really need the answer immediately to show to a user on a page, background workers (usually) don't help. However, there are a bunch of situations where there is no feedback to the user other than "we received your request in good order and will see that it's done".

For example, sending a mail can take a while because mailservers have queues and whatnot. If you send a mail after signing up a user for them to verify their email, you don't have to wait until the mail is "really" sent before letting them know that their signup has been processed. You can tell a background worker to send the mail and return to the user much faster. For another example, in a previous job I worked for a big file sharing service. If user wanted their files deleted that caused all sorts of calls to AWS to actually delete the files, which could take a while. However, from the user perspective it was pretty fast because all we had to do was set the file in the database to "delete in progress" state and tell a background worker to delete the files. Then we could show the user that their files were being deleted within a couple dozen milliseconds instead of having to wait for all the AWS calls to complete.

1337shadow · on April 18, 2020

Not to mention the case where the mailserver is down or denies service, which will also happen at some point even if you have HA mailserver: be it with AWS emails, mailjet and whatnot. One day it'll fail everyone. Then, what's it going to be ? Return an HTTP 500 to the user rolling back the transaction ? Spooling emails removes that failure spot at all.

notyourday · on April 18, 2020

In my experience a lot of applications with workers that require a result to be furnished to a client are incorrectly engineered.

The correct pattern should be a client submits a request to get something done to a thin layer, receives a ticket that allows it to claim the result and goes away to either check for the result via polling for the ticket or receives a call back.

masklinn · on April 18, 2020

I would agree, generally a task queue makes sense for jobs which are not needed to respond to a query (e.g. generating reports or sending emails or whatever), otherwise you're just adding delays in the response chain.

Although this delay chain might be considered worth it you don't want to scale the frontend to multiple workers for some reason e.g. single-threaded evented runtime, or GIL runtime (python, ocaml), or if you want to avoid CPU-hard tasks being executed on your frontends.

In that case, it might be valuable to transform CPU waits into IO waits by moving the CPU work to a jobs queue, possibly running its workers on a different set of machines entirely.

jb3689 · on April 18, 2020

> If it’s polling for status from the backend doesn’t that defeat the purpose of having a worker to begin with?

It depends. If you needed a coordination layer or you needed to isolate certain types of traffic then it makes more sense. I assume your alternative here is "why not just have a new client tier doing work" which is a reasonable architecture too

> One aspect of this set up I’ve never been able to understand is how the application then gets the result from the worker?

Often they don't in these architectures. I found it a little strange that a job queue was being used to serve (what seems like) synchronous traffic. Usually I see job queues in the wild used for async/send-and-forget workloads

Personally I would rather chain multiple synchronous service calls if I needed a synchronous workflow. It's just simpler to me to stick web workers behind a load balancer and scale that. This is less elegant when things take a very long time though or are prone to retries. Either the client or server needs to be responsible for queuing/message persistence/retries - with services the client does it, with job queues the server does it

leowoo91 · on April 18, 2020

In case of a polling requirement from user perspective (e.g. waiting for a notification) it's true it doesn't matter. However, it allows application to respond to 'other' users. Whole point is to serve multiple users at the same time.

procinct · on April 18, 2020

What I mean is that if your backend is polling for a task to finish, that is also taking away time from other users. For some tasks that won’t matter because the backend won’t need to be aware of the outcome of the task but there could be some longer jobs that the backend needs to be aware of the outcome right away. You could get it so the polling is done on the front end and then passes the outcome to the backend but that obviously isn’t a good idea because then the backend is trusting outcome data from the front end.

leowoo91 · on April 19, 2020

I see your concern is focused on polling case (e.g a chat room). As goes with the McDonalds example, a single cashier can reply the question "is my order ready?" rapidly. While it takes seconds in real world, a client polling request would take milliseconds to complete, so it can still serve hundreds of clients in a given second. If you'd need more for that part, there can be more cashiers/apps to make "asking" part scale indefinitely (even you'd have 1 worker only).

golergka · on April 18, 2020

Right now, I'm designing a service that's very similar to OP, with workers waiting for an external API (or APIs) to answer, which can be slow sometimes. Looks like we're going to use websockets or polling to update the client with all the delivered data, and introduce yet another service for it, which would actually play the role of "LED screen" and ready order station in this analogy.

But as we decided to implement an MVP with a single HTTP request from the client, this whole separation doesn't make any sense, exactly as you noted.

doliveira · on April 18, 2020

Checking the result is just doing a SELECT in your database.

procinct · on April 18, 2020

I was more referring to how it knows once it’s finished since the task is asynchronous. So for the backend to find out, it could do the select you mentioned and find out it’s still in progress. Then it checks again next second and still in progress. Then checks again and it’s done. But now your backend has been polling for an update and so you might as well have performed the task in the application because it is still being used up for the duration of the task.

turtlebits · on April 18, 2020

Have your worker POST back to your web server when it’s done. Use websockets to notify the user.

nickjj · on April 18, 2020

If anyone is looking for another Celery post that goes over common web development use cases for using Celery there's: https://nickjanetakis.com/blog/4-use-cases-for-when-to-use-c...

The above post walks through sending emails out with and without using Celery, making third party API calls, executing long running tasks and firing off periodic tasks on a schedule to replace cron jobs.

There's links to code examples too in an example Flask app which happens to use Docker as well.

doteka · on April 18, 2020

They mostly need Celery and Redis because in the Python world concurrency was an afterthought. In most other languages you can get away with just running tasks in the background for a really long time before you need spin up a distributed task queue. In Python I’ve seen Celery setups on a single machine.

mattbillenstein · on April 18, 2020

You still should use some sort of work queue - your application process may need to restart (deploys), or for a period of time, work could overflow the amount of available resources (bursts), so having some place to put the task before it goes onto processing is useful regardless of the underlying concurrency primitives of the language.

rollingbarreler · on April 18, 2020

Transactionality is enough for most systems. Your order either succeeds or fails, it never stays in an incomplete state. Queues are not a panacea, and introduce their own problems like obscuring problems by delaying them long enough to bring everything to a halt.

mattbillenstein · on April 19, 2020

Yes, you wouldn't do all work in the task queue - commonly, we make some change in the database which can happen pretty fast, and after that transaction commits, we might defer a task that sends a notification, email, whatever.

acjohnson55 · on April 18, 2020

It may not make sense to retain jobs across deployments. What if the contract of the job is changed by the code being deployed? Might be easier to keep it all in-process, letting queues drain in a graceful shutdown.

mattbillenstein · on April 19, 2020

I haven't found that to be the case typically -- you could always serialize some information into the task to check for things like this.

Also consider if the machine running that process just disappears and that process dies. Putting work into a task queue allows you to do it durably until it can be processed so that it's not lost in some typical "that machine/instance died" scenario.

sateesh · on April 18, 2020

This might not be viable all the time, what if you have a stream of tasks being put to the queue ? The alternative would be to ensure that any change in the job contracts are backward compatible, and if any change in contract would need to have a remediation/migration plan for handling pending tasks.

momokoko · on April 18, 2020

In the business, customer, end-user sense, in many situations, you are much better off to have a transaction fail with an error as opposed to finally successfully happening after 2 hours once a service disruption is cleared.

Every problem is not the same. Work queues introduce a magnitude or more of complexity to an http application. Sometimes that is very needed. Sometimes its overengineering. Sometimes it is a gray area.

jb3689 · on April 18, 2020

> They mostly need Celery and Redis because in the Python world concurrency was an afterthought

You have an operating system that you can use. You don't really _need_ concurrency when you have a machine that can timeshare amongst processes. That's the world Python was designed for

> In most other languages you can get away with just running tasks in the background for a really long time before you need spin up a distributed task queue

This is partly true, but not a lot of people do this because you need to persist those tasks unless you want them dropped during a reboot. Same logic goes for entire machines disappearing. You _can_ get by without a distributed system, but you will need to tolerate loss in those scenarios. Those losses are non-trivial for most apps/companies, so it doesn't seem all that practical to me to consider a world without a distributed system (and yes, persisting things in postgres/mysql before they are being worked on is still using a distributed system)

lifty · on April 18, 2020

I agree. That is why I stopped using languages that have a poor concurrency story. Also, using an external queue and workers considerably increases the operational complexity of the system. An in-process solution with some light persistence for the work queue in an embedded database can go along way, at least until you outgrow a single machine. Doing both ops and development I’ve learned to appreciate simple solutions.

nickjj · on April 18, 2020

You could use something like async / await without Celery and Redis but Celery brings a lot to the table.

What happens when you want to retry jobs with exponential back off, or rate limit a task, or track completed / failed jobs?

You can wire all of this stuff up yourself but it's a hugely complicated problem and a massive time sink, but Celery gives you this stuff out of the box. With a decorator or 2 you can do all of those things on any tasks you want.

I use it in pretty much every Flask project, even on single box deploys.

BiteCode_dev · on April 18, 2020

> In most other languages you can get away with just running tasks in the background for a really long time before you need spin up a distributed task queue

In Python too:

    # or ThreadPoolExecutor depending of the type of work
    from concurrent.futures import ProcessPoolExecutor, as_completed

    with ProcessPoolExecutor(max_workers=2) as executor:

        a = executor.submit(any_function)
        b = executor.submit(any_function)

        for future in as_completed((a, b)):
            print(future.result())

Any task you would put in celery would be a good candidate for being first passed to a process pool executor.

No, the real reason we use celery is that it solves many problems at once:

  - it features configurable task queues with priorities
 
  - it comes as an independent service accessible from multiple processes

  - its activity can be inspected and monitored with stuff like flower

  - it has the ability to create persistent queues that survive restart

  - error handling is backed in, they are logged and won't break your process
 
  - it offers an optional result backend for persisting results and errors
 
  - many distribution and error handling strategies can be configured

  - tasks are composable, can depend on each others or be grouped

  - celery also does recurring tasks, and better than cron

  - you can start tasks from other languages than Python

Now some People use celery when they should use a ProcessPool, because they don't know it exists. But that's hardly because of the language: you didn't seem to know about it either.

It is true that Python concurrency story is not comparable to something like Go, Erlang or Rust, but very common use cases are solved.

In the same way, we can perfectly use shelves instead of redis.

We use redis beacuse it solves many problems at once:

    - it's very performant on a single machine, and can be load balanced if needed

    - it has expiration baked in

    - it offers many powerful data structures 

    - it embeds numerous strategies to deal with persistence and resilience

    - it's accessible from multiple processes, and multiple languages

    - its ecosystem is great, and it's hugely versatile

It's almost never a bad choice to add redis for your website. It's easy to setup, cheap to run, and and it shines to manage sessions and caching for any service with up to a few million users a day.

Once you are at it, why not later use it for other stuff, like storing the result of background tasks, log stream, geographical pin pointing, hyperlolog stats, etc. ? There are so many things it does better than your DBMS, faster, easier or for less resources.

It's such fantastic software really.

But no, nothing prevents you in Python to create a queue manually and serialize it manually. It's just more work for less features.

doteka · on April 19, 2020

Thanks, I am well aware of every kind of multiprocessing and quasi-threading in the standard library, having built several large Python systems over the years.

I also understand the benefits of task queues. However, there are many cases in which you do not need any of those. Specifically, everything you wrote applies to the web backend/distributed systems usecase. Doing things in the background in a simple application, not so much. My problem is exactly with introducing distributed systems machinery for a local process on a single machine that doesn't need any of that.

Dowwie · on April 18, 2020

You can signal events to invoke background processing and immediately return a response.

trboyden · on April 18, 2020

Excellent craftmanship of a helpful blog. Very reminiscent of the style used by the Head Rush Ajax (http://shop.oreilly.com/product/9780596102258.do) book O'Reilly published back in 2009 and the rest of the Head First series.

memco · on April 18, 2020

I think it’s really important to understand task queues and workers but my experience working with these particular tools isn’t exactly fun. I inherited a system built on celery, rabbitmq and nameko I’d be interested to hear how people setup their systems to debug and test new tasks. I’m currently using manually added psb.set_trace to telnet in to a debugging session, but I’d prefer to use an IDE so I can modify code while debugging. Anyone have any tips? One thing I would caution against is putting the business logic in the task logic. This is obviously up to whoever set up the tasks but it seems like most of the tutorials don’t mention how painful this can make testing especially once you start making chains, chords and sub tasks.

rukittenme · on April 18, 2020

For testing you can set "CELERY_ALWAYS_EAGER" to "True" in your config.

dralley · on April 18, 2020

Does anyone have enough experience with alternatives to Celery to give a good comparison of Celery vs. Dramatiq vs. RQ?

timwis · on April 19, 2020

Cool article! But why do you need a database _and_ a message queue? I would think a message queue is the main thing, and a database is only necessary if you want long term persistence. Or you could just use a database as a message queue.

bryanrasmussen · on April 18, 2020

I think the McDonald's order by touch screen also has the benefit of seeming to take less time when waiting because you are not standing in line behind someone, even if it takes the same amount of time psychologically it seems less.

kantye · on April 18, 2020

Very noice! I am not normally not a huge fan of stick figures (a la waitbutwhy), but this was very pleasant to read :D And noice work picking an example I can relate to

nurettin · on April 18, 2020

Celery is good for distributed and persistent message queues which can be monitored. If you just need multiprocessing, use a multiprocessing pool, it comes with python.

quezzle · on April 18, 2020

I only use celery for sending out emails. It’s overkill.

I wonder how many other people have celery just for email.

tchaffee · on April 18, 2020

McDonald's is very often a love or hate divide as you can already see from some of the comments here. You could prevent that distraction by using a generic food takeout store and ask people to imagine their favorite. The article will resonate with a larger audience, and the comments will be higher quality.

ljvmiranda · on April 18, 2020

Hi author here, definitely didn't intend and expected that! I'm mostly drawing from my own experience and that insight while ordering food inside Mcdo.

tchaffee · on April 18, 2020

As someone who also does technical writing, I agree you should draw from your own experience.

It can be hard to find the right analogy. If the subject of the analogy is enough of a hot topic, it will get attention itself. As you can see, you've got comments in here even talking about how McDonald's isn't faster with their new system, or how they could have better optimized for customers, links to articles about McDonald's business model etc. Some of the earlier negative comments about McDonald's were deleted - probably due to downvotes. Since my advice to other writers was sincere and I believe useful, I'm keeping my comment.

For sure I'm not expecting you to change your article. Just hoping that my tip might help you with future technical writing. If not, no worries.

alanfranz · on April 18, 2020

And, of course, you need multiple separate components because Python/Flask has no central "application" concept,there are multiple, stateless processes.

Had you got e.g. a Java app running with a multithread application server model, you could serve and process everything within a single process. No Celery, no Redis, no MQ.

This doesn't mean that the above stack has no use. But, whenever picking a tech, you should understand the use case. The "simpler" Python/Flask solution has an increased complexity when the task at hand is not simple anymore.

icebraining · on April 18, 2020

> Python/Flask has no central "application" concept,

They do, it's a WSGI application. Flask has been multithreaded for many years. Python also has multi-process queues that don't need an extra process.

alanfranz · on April 18, 2020

WSGI is purely an interface between a webserver and python. What has WSGI to do with state?

The application server model makes it so there's a running application, with a state, and which exposes an HTTP endpoint.

> Flask has been multithreaded for many years

So? You run a blocking thread to performing a long-running task in Flask? With Python? Try, then report what you find.

Flask/Django are mostly designed to work with a stateless approach. Nothing wrong with that, but it's got drawbacks.

> multi-process queues that don't need an extra process.

More processes usually imply more complexity. And still, since you don't have a central application with a state, you NEED an extra piece to manage the result from the queue.

icebraining · on April 18, 2020

> So? You run a blocking thread to performing a long-running task in Flask? With Python? Try, then report what you find.

I did it for years. It works just fine. The GIL is essentially like running an app on a single core, which works just fine for many use cases. CPU cores are quite powerful.

> More processes usually imply more complexity.

Right, but I'm only saying you can have more processes without requiring Redis.

> And still, since you don't have a central application with a state, you NEED an extra piece to manage the result from the queue.

A regular thread can do that.

WJW · on April 18, 2020

When the task complexity rises further, splitting out the message queue and workers into separate processes makes sense again. That way, you can easily scale worker capacity up and down and you can restart your web server processes without having to worry about background job persistence since that is handled by the message queue.

alanfranz · on April 18, 2020

Sure. You can do the same with the application server model. But you haven't to as long as you don't want to.

polote · on April 18, 2020

It has nothing to do with Python, there a plenty of async web python framework.

alanfranz · on April 18, 2020

I wrote Python/Flask because the application server model is inherently flawed in Python, while Flask is not asynchronous AFAIK; you MUST use an async model (because Python and multithreading still don't work well together), you can't use threads for long-running tasks.

pupdogg · on April 18, 2020

With all due respect, my recent experiences ordering at McDonald's have been nothing short of horrible! As nice as the kiosk is, it has taken the accountability factor for your order out of the picture. I vividly recall getting blank stares from employees when asked "how much longer until my order is complete?". My past few visits (11 to be specific from 11/19 thru 2/20) have yielded 8 minutes of wait time on average. This is ordering inside the facility and at 5 different locations. Last I recall, it used to be a lot faster...I think between 1-3mins tops! I can't say the same for drive-thru though...seems like any orders from the drive through are always tagged with a higher priority. I do remember this since I was in my teens in the late 90s. Though it looked like a mainframe system, McDonald's did have a countdown timer that would initiate on orders. I'm all for tech and automating queues...but humans are complex beings and until the gap is bridged, I think we have a lot more room for improvement!

jessaustin · on April 18, 2020

Drive-through is higher priority at most restaurants, because the customer can drive away after ordering but before paying.