Hacker News new | past | comments | ask | show | jobs | submit login
Python Deployment Anti-Patterns (hynek.me)
293 points by craigkerstiens on April 23, 2012 | hide | past | favorite | 122 comments



As a python dev who deploys a lot of software, I found this article to be wonderfully helpful and informative, and a good reflection of current best practices.

Summary of the deployment tools mentioned:

  - Manage remote daemons with supervisord
    http://supervisord.org/    

  - Manage python packages with pip (and use `pip freeze`)
    http://pypi.python.org/pypi/pip
    http://www.pip-installer.org/en/latest/requirements.html

  - Manage production environments with virtualenv
    http://www.virtualenv.org/

  - Manage Configuration with puppet and/or chef
    http://puppetlabs.com/
    http://www.opscode.com/chef/

  - Automate local and remote sys admin tasks with Fabric
    http://fabfile.org
Other tips:

  - Don't restrict yourself to old Python versions to appease your tools / libs.

  - Strongly consider rolling your own DEB/RPMs for your Python application. 
Author also touted:

  - Celery for task management
    http://celeryproject.org/

  - Twisted for event-based python.
    http://twistedmatrix.com/trac/

  - nginx / gunicorn for your python web server stack
    http://www.nginx.com/
    http://gunicorn.org/


Chef requires Ruby programming. Puppet doesn't, but the core is obviously Ruby, and you extend it using Ruby.

You may possibly like my new project:

http://ansible.github.com

The core is Python but you can write modules in any language.


There is also cuisine, which brings chef-like functionality to fabric.

https://github.com/sebastien/cuisine


Thanks for that! We gave up on chef when one of their version updates failed to work with a prior version, both of which were the OS package defaults. Chef silently failed, no error message, nothing in the docs, nothing even in the source code. Had to do a fair bit of searching to find out why.

When open source projects like chef have nobody interested in even documenting much less testing backwards incompatibilities we move them to the bottom of our to-eval list.

This also illustrates a problem in article's blind enthusiasm for the latest revisions and libraries i.e., it dismisses the headaches this causes end-users, who often don't have staff or budget to fix whatever breaks during an upgrade. That said we are at least talking about python, which has had better release QA and backwards compatibility than perl, ruby or, gasp, php.


That said we are at least talking about python, which has had better release QA and backwards compatibility than perl....

I'm curious as to your experience here. I've found that Perl has by far the best backwards compatibility and release QA of the major dynamic languages. What did you encounter?


We don't use as much perl as we used to but the last upgrade issue was amavisd-new (a Spamassassin wrapper). Spamassassin has perl version issues every so often as well. NetDNS used to introduce new bugs about every 4th revision but seems to have been stable for the past couple of years. GNUmp3d and many audio libraries have non-perl revision-related, backwards-compatibility issues with some regularity.


That makes sense. XS components (compiled code which uses the Perl API) don't have binary backwards compatibility between major Perl releases.


The audio library incompatibilities were API changes. Amavisd's issues are not binary either but do seem mostly socket related.


Cool, I'm also investigating the python-based salt stack: http://saltstack.org/ for this purpose but it seems a bit heavy just starting out. Gonna try ansible next.


Currently deploying salt. Community is great, grains are easy to write, and the codebase is very clear. Can't recommend enough.


Yes to all of this and I'll throw in my two cents on our web server setup which is nginx + uwsgi (which has served remarkably well).


+1 to uWSGI and nginx. Now wsgi is built into nginx distribution, and uwsgi is gaining areputation for rock solid performance it's a nobrainer.


Likewise +1 to Nginx + uwsgi. In the various performance benchmarks* uwsgi beats gunicorn, and uwsgi comes bundled with Nginx now, so there's really no reason to bother with setting up gunicorn any more.

* http://www.peterbe.com/plog/fcgi-vs-gunicorn-vs-uwsgi http://nichol.as/benchmark-of-python-web-servers


Can you restart the uwsgi instance(s) without restarting nginx? That's a pretty significant benefit of gunicorn...


Yes, uwsgi instances run separately from nginx. The uwsgi nginx module simply speaks a binary protocol for reverse proxying rather than HTTP.


uwsgi comes bundles with Nginx? First I've heard of that. Very interesting.

Do you have any easy tutorials on getting Nginx + uwsgi set up?


The uwsgi upstream module has been bundled for a long time now.

Here is the doc from the uwsgi site: http://projects.unbit.it/uwsgi/wiki/RunOnNginx


I chose nginx and gevent after reading a blog post about py server performance. Anyone have any experience with these two?


I use gevent instead of twisted for event-based python. Gevent is a lot nicer to work with and doesn't make your code less readable in the way twisted does with its callbacks and errbacks. It's also a lot easier to do unit testing with gevent.


I don't get the negativity on using your distro's packages, at least from the staying-stable perspective. Any decent package manager should let you pin/hold critical packages on a particular version, so if "the next Ubuntu ships with a different SQLAlchemy by default" you just hold the SQLAlchemy package at the version you want and then ignore it until you're ready to make that move.

99% percent of the time when I hear people complaining about their distro's packages, the complaints are coming from the opposite direction -- they want to run something bleeding-edge and the distro doesn't have it yet. (This is the standard beef Rubyists have with Debian, for instance -- that code that just hit Github ten minutes ago isn't in Debian's repos yet.)


It's usually best to leave these packages for the system python to run pieces of the system. There's one exception that I usually make for this rule: packages with complex C dependencies. NumPy is the first package that comes to mind. I usually prefer to just use the package manager to deal with that.


The worst thing about them is the fact that they are installed into global site-packages that you shouldn’t use for any serious coding.

And yes, they are mostly outdated too.


> "into global site-packages that you shouldn’t use for any serious coding"

Any specific reason for that? I find it quite good and have quite large deployments using .debs only with packages in global location. (tens of packages produced locally - either updated or unavailable dependencies and the service itself) Any direct dependency is handled by package pinning and no update goes into production untested, so the whole "new sqlalchemy suddenly appears" issue does not exist. As long as people don't break API versioning in silly ways, what's the problem with this?

The only version-related issue I remember was when someone thought it would be nice to install something through pip, instead of via package. (went to /usr/local)


Having a global python installation where packages are constantly installed, uninstalled, and updated is the path to madness. If something goes wrong, what can you do? You can't wipe out the system python. You can wipe out a virtualenv though.


What do you mean by goes wrong? Either some package is installed or not. For me, chef manages which ones are. If something is really FUBAR, then wiping is exactly the path I'd take - or more accurately, take that server down for analysis of how exactly it got into that state (so we won't do that again) and bring a clean one up.


Mainly it's about incompatibilities. What if you have two apps that require different versions of a library? If you've installed it in site-packages, then you have little recourse. By separating them out with virtualenv the two apps will work just fine.


Fortunately I'm in a one-service-one-server environment, so I may be biased here ;)


Well, not really, because what if ubuntu packages rely on version X, and you need version Y.


You roll version Y yourself and install it into an alternate prefix. If the server uses debian that means make a new deb and deploy it using the standard tools. apt-get/yum/etc. are very solid deployment tools.


I wasn't saying it was impossible, but what you've described is already about 10 times harder than using virtualenv.


> what can you do?

Understand what went wrong and not ignorantly hit Ctrl-Alt-Delete.


What is the benefit of using global packages? If you pin them, what do you gain? virtualenv give you complete control over the runtime environment of every single application. Stuff like what you wrote can’t really happen.

And BTW, there’s more stuff that can happen than breaking APIs: new bugs that happen only on your system or even better: your code worked only _because_ of a bug. :)


For me the benefit is a much easier security audit/update. Unless you have dependencies packaged by yourself, the only thing you need to monitor is what comes into security-updates of your repository. If there's anything new, do a test deployment, do what you do to verify it's correct and change the pin in production.

Honestly: how many people installing software through virtualenv are registered to security mailing lists for each of their packages (and their dependencies down to things like simplejson)?


But if you _pin_ your packages, how are you updating? You have to monitor it anyway.

High profile projects like simplejson, Django or Pyramid and their deps won’t be missed and the really obscure ones will never make it into the repositories anyway.


Of course. I'm just saying that in case you're pinning only the default upstream versions from your distro so that they don't change, it's easier to automatically report on which packages have a new version in {distro}-security repository. Then retest and change the pin.

The same can be achieved by subscribing to CVEs... but you have to remember to filter the ones you use. Of course that's not a huge difference, so if someone prefers the second way, there's nothing wrong with it ;)


Your system python applications might have different requirements to your project. Just use virtualenv, it's easier and less error-prone.


You never have two apps deployed on the same server that need different versions of a dependency?


Oh the problem has NOTHING to do with "10 minutes ago".

Ubuntus's latest release 11.10 (yes I know 12.04 is a couple of days away) is Python 2.4. I don't remember what Ruby it is, but it's something like ~=1.8.7. Ruby is on 1.9.3 and the next version of Rails won't even support 1.8.7.

I'd be fine with a year or two old Python and Ruby....


Huh?

    $ uname -a
    Linux apollo 3.0.0-17-generic #30-Ubuntu SMP Thu Mar 8 20:45:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

    $ python -V
    Python 2.7.2+


They rarely have a modern enough package to be worthwhile. When you control the package pipeline, you have the ability to dictate how modern or conservative your software is.

The problem is that most programming languages (especially Python and Ruby) are ecosystems unto themselves and often move at a much faster pace than any stable distro (or LTS) could keep up with. That's why we have gems and pip.

Cf. Ubuntu LTS MongoDB default is like 1.2 or something. This is why I switched to using 10gen's repo.


Perhaps, but that's a completely different argument than the one TFA is making.

Besides, in the Ubuntu case anyway, LTS is supposed to be old. The whole point of LTS releases is to let slow-moving institutions/enterprises sit on ancient packages for 3-5 years without having to worry about backporting security updates. If you want recent versions of packages LTS is precisely the wrong place for you to be.


I would disagree with this. LTS (I'm referring to Ubuntu LTS as well as RHEL) is still a good fit because there are 9001 other packages your system relies on for day-to-day operations, and I'd rather mess with those as least often as possible. If you don't use a LTS then you'll have to do whole system upgrades every few years - or else you stop getting security backports. I'd much rather micromanage the dozen highly visible packages that are immediately relevant to my app.

Also, even if you're not on an LTS, doesn't mean you have the latest/greatest available. The python community moves at it's own pace, so there's still a chance that you'll be stuck with the just-before-latest-stable version.


there's still a chance that you'll be stuck with the just-before-latest-stable version.

Is that so bad? Does python really change that much from release to release?


Not entirely, but if your reason to using a non-LTS is to get the latest versions of your stack, then you've just made a really bad choice. As I've said, you're either giving up security updates or are left to test and upgrade your entire system every ~two years in exchange for no guarantee you're on the latest of anything. That tradeoff doesn't make sense no matter what way you look at it.


You don't really want to be on anything earlier than 2.7 right now.


Well, depending on your program.



LTS is the default distro for a lot of server and VPS providers. Often the latest VM images will be rather...unstable, even putting aside the instability of the distro itself.


Regarding virtualenv, I have come to the conclusion that Linux containers are robust enough now (like freebsd jails say two or three years ago) that I don't need to virtualise just python - I can afford to have the whole server as a "virtualenv" - no need for that extra complexity just install into site packages. No conflicts because a whole instance is dedicated. Jails take this to the limit - one virtual machine, one process - say Bind. A vulnerability in Bind ? The attacker takes over ... Nothing.


Sorry, but I'm not sure I understand your logic. Using virtualenv adds extra complexity, but virtualizing the entire server doesn't? I mean, the only complexity using virtualenv adds is having to run the virtualenv process once. After that, you can still install to site-packages. You just have to install to a different site-packages directory.

Besides that, it's worth pointing out that using a virtualenv is not a security precaution. It's a precaution to prevent mucking up the global python installation for other packages that run on it. Using linux containers to achieve this seems like overkill.


A late reply but I don't just want the python environment virtualised - if it's important enough that I should section off python, then there is a good chance I will want to consider the whole box as a single unit, python, firewall rules, database what ever. I tend to think the unit of abstraction should not be the python process, but the server. This is a little easier to grasp when you think of BSD jails where essentially you can choose to only run those processes that actually mTter - it's less virtualised and os than pick and mix an os.

Apologies for late reply - I guess I am straightening it out in my head more than telling anyone else.


Regarding "Don't use ancient system Python versions" and "Use virtual environments", you can knock out two birds with one stone by just using pythonbrew. It also saves you the hassle of rolling your own deb/rpm if a package doesn't happen to exist.

Also, Chef/Puppet aren't "alternatives" to something like Fabric. Use the former for server provisioning, and use the latter for actually kicking off the deployment process. Trying to shoe-horn the finer deployment steps (git checkout, tarballing, symlinks, building the virtualenv, etc) into Chef was a nightmare every time I tried. Those tasks are better suited for Fabric's imperative design. Plus you can just run any Chef commands from Fabric itself, or use something like pychef for finer grained control. It's a win/win.


Fabric seems to be the most popular deployment tool, yet the author advises against it without giving any reason why.

I'd love to see some proper detail in the article around why.

And not in the vein of "Chef/Puppet are better", but more along the line of "here's what can go wrong with Fabric".


Fabric is an invaluable deployment tool, but you have to know its boundaries for what it was intended.

With Fabric you tell what to _do_ and with Puppet/Chef you define what the result should _look_ like.

You define how a server should look like and it can make sure its true for 1000 servers. Or 10000.

It’s not about Fabric vs. Puppet, but how to use both in the most efficient way.


I find the line between when to use fabric and when to use puppet/chef is needing the permissions of a more privileged user.

I prefer to deploy applications in the home directory of a dedicated user account with minimal privileges, and use fabric for installing updates and running application tasks.

OTOH, I'd prefer to be using puppet for creating the user accounts, managing the installation and configuration of PostgreSQL, putting app configuration in a safe location, and so on.

This comes down to my (maybe antiquated?) view of having multiple applications running on a single server.


That’s exactly how we do it too. The only difference is that we don’t install the app using Fabric. Instead, puppet installs the deb with the app into the directory.

Maybe I should add a “running apps as root” anti-pattern, but this its 2012 after all, everyone should know that, right? :-/


It would be fantastic if someone were to write a tutorial how to deploy, for instance, a simple Flask+MySQL or Django+PostgreSQL application using these tools. I'm at a loss as to where to start.



> Those tasks are better suited for Fabric's imperative design.

Yes they are, but IMHO not on the target servers.

I use Fabric to build DEBs that get deployed by Puppet. I prefer to have no build tools on target servers, YMMV.


Fabric isn't normally deployed on target servers though - it connects to them through SSH.


That’s not what I meant: If you build your virtualenv on the target server, you need build tools like GCC or development files like libpq-dev. I prefer to have as few stuff on servers as possible.


Thanks - that does sound like a big advantage of the deb / rpm route. Plus you don't really want your app servers spending their CPU time compiling.


That’s a plus too. An “aptitude dist-upgrade” is really fast.


Yes, but unless rpm/deb works with softlinks, it is not atomic right? Do you do anything before/after dist-upgrade or is it included in the pre/post install script?

For example, for our deployment, we rely on softlinks and uwsgi robust reload behavior to avoid losing requests. I've seen many devops who were using hg update/git update as a way to "deploy" (arg!), but I'm not sure about the behavior of deb/rpm.


You can do whatever you want inside of the post-install hooks which are just shell scripts. Including any kind of soft link black magic. :)

And you’re right: replacing files of a running application can lead to all kind of weirdness. I’d even prefer to lose some requests than to risk that.


DEBs for your own project files, or debs containing built python modules that the server needs to run those project files?

If the latter, how well does that mix in with virtualenv? or do you just avoid it entirely?


I’m the dude that wrote article, so like it says: Own stuff + deps.

Your can re-initialize a virtualenv to fix it by simply running virtualenv again.

But pinky swear I’ll write the second article. ;)


Get on it! :) I really enjoyed the article and want to know more of your magic.


   > [...] you can knock out two birds with one stone by just using pythonbrew
Would you recommend using pythonbrew on a production system?


> The trick is to build a debian package (but it can be done using RPMs just as well) with the application and the whole virtualenv inside.

I would love to read an article describing some best practices for doing that. I tried it once and found it extremely difficult, reverting to a git checkout + virtualenv kind of deployment.


Check out git's "make rpm" target.

https://github.com/gitster/git/blob/master/Makefile

Hosting your own apt/yum repo is pretty simple.

Does anyone have an example of a similar "make deb" target they could share?

I've heard of git-dpm and git-buildpackage but haven't used them extensively myself. They're the debian git packaging tools.

http://wiki.debian.org/PackagingWithGit


Have a look at fpm: https://github.com/jordansissel/fpm

More details to come.


I'm familiar with building and hosting deps in general. The problem I had was that I tried to build debs for every single dependency which turned out to be really hard to automate. That's why I'm really interested in seeing an approach with the virtualenv embedded into the package.


It will come, I promise.


Long-time Python user here.

For a lot of my projects I write a shell script which builds all of the application dependencies (including services) into a project directory and run them all from there.

It takes a little bit of work to get going --- especially when building a new service for the first time --- but I like that it side-steps language-specific packaging tools (particularly the half-baked Python ones) and lets me pin an applications dependencies and port to various environments (develop on Mac, deploy on Unix) almost exactly. Integrating with Puppet/Chef is just a matter of breaking up the shell script into pieces.


We do this, but also add a couple layers of safety between us and PyPI:

1. Run your own secure, local pypi clone with exact source versions of the packages you use.

2. The packages for production are built into RPMs from the local pypi.

PyPI is great for discovery, getting things running quickly, and testing new versions, but you never want to rely on it, even for development.


I'd really enjoy reading about how to setup a PyPI mirror like the one you use in your development/deployment workflow. It seems like a really good idea, considering I've had problems with PyPI at really inconvenient times in the past.


I think we use this: http://pypi.python.org/pypi/chishop/

You just set it up on a local server, and upload packages the same way they are uploaded to real PyPI.

    python setup.py register sdist upload
You can specify alternate PyPI server with a ~/.pypirc. There are probably other ways to do this. What's nice about this is you can upload your own private packages or your own personal forks of packages. We do both.


I've used ClueReleaseManager to great effect in the past. It lazy loads from an upstream PyPI so you don't need to maintain a full mirror. If you use the same instance as your main PyPI mirror for developing and CI/deployment you never have to worry about syncing manually: any package you develop with will be in the cache by the time your code hits the CI server.


Yeah, plus it speeds package building up and protects against not-so-occasional pypi outages.


I don't think using virtualenv to jam everything into a big deb file is really a best practice.

But at the end of the day, I do have to do a lot of that with application deployment, but I try to only go as far as packaging libraries (ie. gems, jars, python equiv) in the rpm/deb file.

RHEL 6 is python 2.6.6, btw.

What happens when there are vulns for your stack?


> What happens when there are vulns for your stack?

That’s a good point and the answer is: You have to monitor your dependencies of public services (that aren’t that many).

But you have to do that anyway, because I can’t explain to our customers that their data has been hacked because Ubuntu/Red Hat didn’t update Django (fast enough).


You make it sound like if you do one then you can do 100. Not the case.


My public services don’t have 100 dependencies and that’s on purpose. Relying on magic distribution fairies for all your libraries is a IMHO a false sense of security, YMMV.

How do you make sure that whenever one of your dependencies gets updated that your daemons get restarted?

And what do you do if you need a package that isn’t part of your distribution?


Do you have your own linux distro?


My two cents: I am a developer + ops person and deploy Python apps all the time. Typically they are Django and Tornado services. On top we also have a lot of daemons and a ton of library code.

I agree with the OP on most points but do not on a few. First DO use packages that come with the OS. The OP says that you should not have the distro maintainers dictating what you use. I say, use what is widely available. It takes the headache out of a lot of your deployments. If you are looking for a library that converts foo to bar look in your distro's repos before going on GitHub. Your sysadmin will thank you.

Second, DO NOT use virtualenv. It fixes the symptoms (Python's packaging system has many shortcomings such as inability to uninstall recursively, poor dependency management, lack of pre and post install scripts, etc.), but not the problem. Instead, use distro-appropriate packages. Integrate your app into the system. This way you will never end up running a daemon inside a screen session, etc. You also get the ability to very nicely manage dependencies and a clean separation between code and configuration.

Lastly, DO use apache + mod_wsgi. It is fast, stable, widely supported and well tested. If apache feels like a ball of mud, take the time to understand how to cut it down to a minimum and configure it properly.

When it comes to infrastructure, making boring choices leads to predictable performance and less headaches more often than not (at least in my experience).


"First DO use packages that come with the OS."

I'd go middle ground, and start here, but consider a self-built package where necessary. It depends in part on the focus of your distro.

virtualenv. What problem does it solve? Different python version/environments? Wouldn't that be better solved with another (virtual) server? I understand if an extra $20/month is an issue, but otherwise ...


I have just been educated by reading more of this thread. I can see an obvious need for one virtualenv, so that you can separate your service and its needs from the system python and its needs. Beyond that my inclination would be to go more servers rather than more virtualenvs, but circumstances vary and my experience is narrow.


Not every Python application is a big web app. We have systems that run several smaller Python apps. Python is everywhere.


> First DO use packages that come with the OS. The OP says that you should not have the distro maintainers dictating what you use. I say, use what is widely available. It takes the headache out of a lot of your deployments. If you are looking for a library that converts foo to bar look in your distro's repos before going on GitHub. Your sysadmin will thank you.

The sysadmin will have no part in the game if you use packaged virtualenvs. OTOH developer time is expensive. Do you really want to pay your developers to implement functionality that a more recent version of a package has already implemented? A good example is IPv6 support in Twisted. It’s getting implemented right now but I guess (and hope) that I’ll need it sooner than it lands in major distros (please no “lol ipv6” here, it’s just an example and the support is growing).

> Second, DO NOT use virtualenv. It fixes the symptoms (Python's packaging system has many shortcomings such as inability to uninstall recursively, poor dependency management, lack of pre and post install scripts, etc.), but not the problem. Instead, use distro-appropriate packages.

I’m not sure what your problem is, but mine is that I don’t want to develop against a moving target and need to run apps with contradicting dependencies on the same host.

That’s how I started using virtualenv years ago BTW, I’m not talking ivory tower here.

> Integrate your app into the system.

Yes. And I prefer supervisor for that. If your prefer rc.d scripts, be my guest.

> You also get the ability to very nicely manage dependencies and a clean separation between code and configuration.

I don’t get this one TBH.

> Lastly, DO use apache + mod_wsgi. It is fast, stable, widely supported and well tested.

And nginx + uwsgi/gunicorn aren’t?

> If apache feels like a ball of mud, take the time to understand how to cut it down to a minimum and configure it properly.

I know Apache pretty well, because we’re running thousands of customers on them. I’ve already written modules for it and been more than once in it‘s guts. And my impression is not a good one.

My point was to look around before you settle. If you think Apache is da best, knock yourself out. However stuff I see daily on IRC lets me think that it isn’t very unproblematic.

I’m not going to start a “vi vs. emacs”-style holy war here. That’s why I wrote “shop around before you settle” and not ”don’t ever use Apache”.

> When it comes to infrastructure, making boring choices leads to predictable performance and less headaches more often than not (at least in my experience).

Absolutely. nginx is way past the “new and hacky” state though.


Sysadmin/Ops should have a part in what gets deployed. It is their job. My point is that if you have a choice between Python 2.6 that comes with your distro and Python 2.7 that doesn't, go with 2.6. The cost is minimal. Same for various libraries (PIL, NumPy, etc.) I currently have to deploy Django-Piston directly from a master branch on GitHub due to some terrible decisions made by developers who never gave deployment a second thought. Avoid this. If you would save more hours by using a newer library, minus time it takes to set it up and maintain it in production, go for it, but don't discount the cost on the Ops side.

I don't have a problem with virtualenv: it's a fine development tool, but it is not what I would use in production. If you want separate clean environments for each app, use KVM or Xen and give it a whole server.

supervisord is a fine solution. I just prefer that my processes look exactly like system processes. Upstart and rc.d are fantastic and do everything I need well.

As for separation of code and configuration, I simply mean that in my case I use Debian packages to deploy all of our software. This means that each package must be generic enough that it is deployable so long as its dependencies are satisfied. Thus your config files are mostly external to your packages. Then you can easily use Puppet or some such to deploy code. One other reason to use native distro packages: Puppet does not play well with pip/easy_install, etc.

>> Lastly, DO use apache + mod_wsgi. It is fast, stable, widely supported and well tested.

> And nginx + uwsgi/gunicorn aren’t?

I didn't say that. I am simply stating that saying "don't use apache" is wrong. Do use it. You can also use nginx + uwsgi/gunicorn if you want to, but apache is by no means a bad choice. It's got 25 years of use and nobody that uses it seriously is complaining.

> Absolutely. nginx is way past the “new and hacky” state though.

It is not, and I never said so. I use nginx + apache, where nginx is a reverse proxy. In fact nginx is my top choice for front-end server setups. I am saying, don't deploy things directly out of GitHub. Go with slightly older, more tested stuff. It'll be a bigger payoff in the end.

Overall, I think we are saying the same thing, with slightly different tools we normally reach for. I am just trying to throw a different perspective out there and a different way to do things. Thanks for the detailed reply.


> I currently have to deploy Django-Piston directly from a master branch on GitHub due to some terrible decisions made by developers who never gave deployment a second thought. Avoid this.

JFTR, there is a pretty good solution for this (because in the real world, you can’t always avoid that): a custom pypi server.

You take a git version you know that works do a `python setup.py sdist` und push it to a private repo. We have to do stuff like this for Sybase drivers for example which are open source but not in PyPI (or any distribution). It saves so much pain.

> As for separation of code and configuration, I simply mean that in my case I use Debian packages to deploy all of our software.

Well we do the same but the Debian packages contain code _and_ the virtualenv.

> Overall, I think we are saying the same thing, with slightly different tools we normally reach for.

Mostly yes. Just wanted to add the pypi tip as it hasn’t been mentioned yet in this thread.


Nive tip about the private PyPI server. I might use it in the future if we end up in the same situation again.

Also, one more nice thing about using distro packages for your own deploys: if you ever need to mix Python with anything else it is a godsend. For example, at one point I had a network sniffer/userland packet forwarder that was written in C deployed alongside our Python daemons. Debian packages do not care what you put in them: C, Python, JavaScript or pure data.


This guy seems to be of the opinion that the software should be completely isolated from the deployment operating system.

I know that's a common view and wrapping as much of the site as possible up in a virtualenv certainly has a lot of advantages. But ultimately, your software is going to have to interact with the OS, at some level, otherwise, why do you even have an OS? So the question is: where do you draw the line? He seems to draw it further down the stack than most people (no system python, for instance) but he doesn't give his opinion on, for instance, using the system postgresql.

Anyway, I personally would draw the line further up the stack than him, but take things on a case-by-case basis, and I don't really consider it an "anti-pattern."

With regards to fabric vs. puppet, I understand the advantages of puppet when you have a complicated, hetrogenous deployment environment. But the majority of projects I've worked on have the operations model of a set of identically-configured application servers back-ended against a database server. For this configuration, what does puppet give you? If the author's argument is that the site may eventually outgrow that model, well, I can see puppet becoming necessary, but why not cross that bridge when you get to it?


I think there's an exageration as well, today the trend seems to be "use nothing by the distro"

Ok, sure, MongoDB still changes a lot between versions, in this case you should use the latest version.

But stop there. Especially if you're paying for support (like RHEL)

There should be a good reason for you to compile Apache / MySQL / PostgreSQL / Python. Otherwise, use the distro version. One (common) exception would be "we need Python 2.7 but this ships only 2.6"

Most of the "just download and compile" have no idea of the work that goes behind Linux distributions to ship these packages.

Yes, I'm sure you're going to read all security advisories and recompile all your stack every X days instead of running apt/yum upgrade


Yes you should, but if you have them: do it. That’s what the article says. ;) Please don’t push it in a wrong direction.

What I actually wrote is: because we’re a LAMP web hoster, we compile MySQL+PHP+Apache ourself. And because we’re a Python shop, we don’t let Ubuntu/Red Hat dictate which Python version we use.


You're right, sorry about that ;) Guess I was thinking too much about the "compile everything" people.

Great article, btw!


thanks!


Some choices certainly make more sense for web hosters than run-of-the-mill Python dev shops.


If you use puppet/chef for the whole stack, you gain the ability of just starting a new machine and in a couple of minutes having it configured in the exact same way as all the others.

With fabric and similar systems it's a bit harder. Basically you'd have to write your scripts exactly the same way you'd write a puppet/chef recipe: "make sure this is configured that way, make sure that is installed", etc. (or do migration steps) It's very different from fabric's "do this, do that" approach. Unless you run fabric on every single host after you make every change, some of your infrastructure will be lagging behind.

For example, what do you do when you create a new server, or do an upgrade that involves different dependencies? Run a fabric script that migrates from state X to Y? What happens to new machines then? How do you make sure they're in the same state?

I found chef a very good solution even if I have a single server to manage. No need to think about how it was configured before. Migrating to another provider? Just migrate the data, point the server at chef, done.


Point of fact, he does give his opinion on the postgresql issue: "We also have to compile our own Apaches and MySQLs because we need that fine-grained control."

But that's besides the point. I don't think he's arguing that software should be completely isolated form the deployment operating system. That would be absurd, since, as you pointed out, software has to interact with the OS at some level, i.e., to manage system resources. Just because the OS ships with a bunch of packages with specific versions doesn't mean you have to use them. And what I think the OP is saying is that to make your applications portable and easily deployable, you shouldn't.


Good catch. Sorry I missed that. I suppose I've forgotten that a lot of people still use MySQL, even for python. :)

The question "why even have an OS?" was a rhetorical one, meant to illustrate the fact that the argument here is over where one draws the line, between the dependencies you maintain yourself and those you outsource to your OS vendor. Unless you're deploying on Linux From Scratch then you are outsourcing to the vendor at some level, so the only question is, where is that level?

That this guy maintains even his own web and database servers means wants/needs to control things at a much deeper level than is typical. And I think that's beautiful if it works for him. My point of disagreement is his use of the word "antipattern" to suggest that any other choice is wrong.


First of all I’m afraid the point about Apache & MySQL is misleading and I’ll have to clarify it: We don’t use neither for internal projects. It’s all PostgreSQL. :)

They’re both for our customers as we sell traditional web hosting with a LAMP stack (and yes, we have to compile PHP ourselves too, so we can offer different flavors).

I never said you _have_ to compile everything yourself. In case of Python I said you shouldn’t inflict the pain of programming Python 2.4 just because you’re on RHEL/CentOS 5 or earlier.

System packages have several problems I outlined too. Most important: you add a middleman to the release pipeline, no virtualenv and possible dependency conflicts: I have apps that depend both SQLAlchemy 0.6 and 0.7 – they couldn’t run on the same server.

But OTOH we use stock packages as far as possible, because it’s less work. Eg. there’s no reason to compile an own Python on Oneiric and later. Same goes for PostgreSQL which is uptodate.

It’s all about freeing yourself from constrains in points that matter, not adopting a new religion.


I draw the line at system libraries and long established system services email, cron, syslog, dns, etc.

I always want to compile my full stack; DB, network servers, language.


> isolated

Perhaps a better term would be "decoupled"?


I am glad this is working for the OP, but pushing virtualenv and "self-contained" apps as the one solution is a diservice to the community. There are valid reasons to rely on your OS, assuming you have an homogenous deployment target (same OS, maybe different versions):

- lots of people argue for virtualenv because some versions may be incompatible. The problem here is the lack of backward compatibility of packages, and frankly, if you need to rely on packages which change API willingnily betwen e.g. 1.5 and 1.6, or if each of your service depends on a different version of some library, you have bigger problems anyway.

- any sufficiently complex deployment will depend on things that are not python, at which point you need a solution that integrates multiple languages. That is, you re-creating what a distribution is all about.

- virtualenv relies on sources, so if some of your dependences are in C, every deploy means compilation

- I still have no idea how security is handled when you put everything in virtualenv

See also http://bytes.com/topic/python/answers/841071-eggs-virtualenv...


> pushing virtualenv and "self-contained" apps as the one solution is a diservice to the community.

Wow. :(

> There are valid reasons to rely on your OS, assuming you have an homogenous deployment target (same OS, maybe different versions):

I’d love to hear them.

> lots of people argue for virtualenv because some versions may be incompatible. The problem here is the lack of backward compatibility of packages, and frankly, if you need to rely on packages which change API willingnily betwen e.g. 1.5 and 1.6, or if each of your service depends on a different version of some library, you have bigger problems anyway.

Well, you said there are possibilities of problems but that they shouldn’t matter in an ideal world. Maybe you’re okay to take the chances but I’m not. Every code I deploy has been tested rigorously against a certain set of versions and that is the only combination of dependencies I’m willing to consider “working”. UnitTests with different dependencies are just as worthless like integration/functional tests against sqllite instead of the same DB type as in production.

There’s even the possibility that your code works because of a bug and when that one gets fixed, you app goes south because of some weird side-effect.

> any sufficiently complex deployment will depend on things that are not python, at which point you need a solution that integrates multiple languages. That is, you re-creating what a distribution is all about.

I’m not sure if I understand what you mean, but yes if you want to use certain features outside the Python ecosystem, you’ll have to buckle up and package them yourself too. “We can’t do that, package XYZ is missing/too old.” isn’t really a good excuse to not do something that is important/good for your business. And that’s one of the main points of the article.

> virtualenv relies on sources, so if some of your dependences are in C, every deploy means compilation

That’s wrong if you go the way described: The virtualenv is packaged with the code. Build tools don’t belong on production servers.

> I still have no idea how security is handled when you put everything in virtualenv

Just as everywhere else. If you think it’s okay to tell customers that their data has been hacked because debian was to slow to issue a fix, be my guest. We can’t afford that. What happens on my servers security wise is _my_ responsibility and using ancient versions of Python libraries just to be able to blame others for FUBARs is not a solution in my book.


> I’m not sure if I understand what you mean, but yes if you want to use certain features outside the Python ecosystem, you’ll have to buckle up and package them yourself too.

I think his point is that pip can't do this, but learning to actually work with your distro's packaging system properly results in a more powerful and easier to redistribute project.

> Just as everywhere else. If you think it’s okay to tell customers that their data has been hacked because debian was to slow to issue a fix, be my guest.

Your packaging methodology is not what gives you security there. It's that you noticed a vulnerability and deployed a fix. The method of deployment is irrelevant. Your point is that knowing you have a security issue and waiting for upstream to get around to fixing it isn't always acceptable. That goes for everything.

If you know how to roll .debs it's just as easy to patch and release a fixed version of a library. (Or even install the pip version earlier in your sys.path...)


> I think his point is that pip can't do this, but learning to actually work with your distro's packaging system properly results in a more powerful and easier to redistribute project.

Absolutely. And that’s why I package the whole virtualenv into the DEB along with the project. I always have the assertion, that that combination of packages that’s inside passes all my tests no matter where I install it.


I disagree. IMO dynamic load libraries was a known bad solution before they were made (dll hell). Plenty of exploits have come from them, plenty of horror stories, etc. If you have a nice distro anyway, upgrading a given library that everyone packages should be doable.

Windows still uses shared libraries but at least they invented the GAC so applications can specify exactly which version of a library they work with and the installer will install that version if it's not present.


Using tmux is a Python daemon antipattern? And then "there's so much wrong about this approach" that he doesn't bother explaining why? Isn't that why we are reading the article: because we want to know why?

If the author is trying to convince people to change their habits, he is doing a crummy job. He comes across as elitist and "if you don't do it my way you're wrong".


He's talking about creating /etc/init.d scripts that do something like:

screen python manage.py gunicorn &

That is an anti-pattern. Granted he didn't really explain it so well.


Honestly I didn’t think it would be necessary. :-/

Your example is btw the advanced version. Many people just ssh on the host, fire up screen and start their server inside.


I’m sorry if that felt to you like that. I’ll consider rewording it.

update I added some more context. I would never spit on my beloved tmux. :)


Don’t run your daemons in a tmux/screen

Wow, I always though of myself as an idiot for doing this. But that is for some not yet launched thing. Who on earth does this for a production website?


The ability of Go to produce a single, self-contained executable is one of the biggest advantages it has over the "scripting" languages. It makes deployment so much simpler.


Virtualenv is a half solution and a hack. Use vagrant and VMs. There's a whole sea of libs and software that isn't "versioned" by pip/virtualenv.

supervisord is the wrong solution. It answers the wrong question (is the process running). It's worse than useless in that it has given false positives. The right question is (is the process responding correctly). Use monit or something else that actually does what's needed.


Vagrant ist nice for testing but I’m not going to deploy services to VirtualBox (we use kvm for that). Of course there are dependencies outside of the Python ecosystem but their influence proved to be negligible till now. YMMV.

supervisor is not a an alternative to monitoring, I never claimed that. But that’s a whole different story.


virtualenv isn't just used for testing, though. it's widely used to isolate applications' python environments/dependencies from one another.


Why would you want to virtualise an entire system when your dependencies are restricted to a bunch of Python modules?


Personally I would just go for Virtualenv, but you'd be surprised if you know people do with virtualization. I've mostly seen it in Windows shops, but it's by no means limited to that. A common "anti-pattern" I've seen is to build a vmware image, or set of images, that run your "environment". Developers then spin up copies and request changes that they need be made to the "master copy".

When it's time to test and deploy, the code is deployed to a vmware image, which is then sent to the testers. When everything checks out, the code is once again deploy to a new copy of the image, which is then promoted to production.

On argument we've seen is that it makes it easy to do a rollback of the system, using vmware snapshots. It might be a sensible idea in some cases, I just thing it's a bit weird and some overhead.

The worst use case of this I've seen is a Telco, where you needed to spin up as much 12 vms, depending one what you and your teams was working on. But then again that's the same company that read the Perforce guidelines on recommended setup and still decided to just pill all the projects in to one repository.


Just a thank you for writing a positive, easy to follow overview with links to more in-depth information. I love when people boil down experience and serve it without a side dish of attitude.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: