As a python dev who deploys a lot of software, I found this article to be wonderfully helpful and informative, and a good reflection of current best practices.
Summary of the deployment tools mentioned:
- Manage remote daemons with supervisord
http://supervisord.org/
- Manage python packages with pip (and use `pip freeze`)
http://pypi.python.org/pypi/pip
http://www.pip-installer.org/en/latest/requirements.html
- Manage production environments with virtualenv
http://www.virtualenv.org/
- Manage Configuration with puppet and/or chef
http://puppetlabs.com/
http://www.opscode.com/chef/
- Automate local and remote sys admin tasks with Fabric
http://fabfile.org
Other tips:
- Don't restrict yourself to old Python versions to appease your tools / libs.
- Strongly consider rolling your own DEB/RPMs for your Python application.
Author also touted:
- Celery for task management
http://celeryproject.org/
- Twisted for event-based python.
http://twistedmatrix.com/trac/
- nginx / gunicorn for your python web server stack
http://www.nginx.com/
http://gunicorn.org/
Thanks for that! We gave up on chef when one of their version updates failed to work with a prior version, both of which were the OS package defaults. Chef silently failed, no error message, nothing in the docs, nothing even in the source code. Had to do a fair bit of searching to find out why.
When open source projects like chef have nobody interested in even documenting much less testing backwards incompatibilities we move them to the bottom of our to-eval list.
This also illustrates a problem in article's blind enthusiasm for the latest revisions and libraries i.e., it dismisses the headaches this causes end-users, who often don't have staff or budget to fix whatever breaks during an upgrade. That said we are at least talking about python, which has had better release QA and backwards compatibility than perl, ruby or, gasp, php.
That said we are at least talking about python, which has had better release QA and backwards compatibility than perl....
I'm curious as to your experience here. I've found that Perl has by far the best backwards compatibility and release QA of the major dynamic languages. What did you encounter?
We don't use as much perl as we used to but the last upgrade issue was amavisd-new (a Spamassassin wrapper). Spamassassin has perl version issues every so often as well. NetDNS used to introduce new bugs about every 4th revision but seems to have been stable for the past couple of years. GNUmp3d and many audio libraries have non-perl revision-related, backwards-compatibility issues with some regularity.
Cool, I'm also investigating the python-based salt stack: http://saltstack.org/ for this purpose but it seems a bit heavy just starting out. Gonna try ansible next.
Likewise +1 to Nginx + uwsgi. In the various performance benchmarks* uwsgi beats gunicorn, and uwsgi comes bundled with Nginx now, so there's really no reason to bother with setting up gunicorn any more.
I use gevent instead of twisted for event-based python. Gevent is a lot nicer to work with and doesn't make your code less readable in the way twisted does with its callbacks and errbacks. It's also a lot easier to do unit testing with gevent.
I don't get the negativity on using your distro's packages, at least from the staying-stable perspective. Any decent package manager should let you pin/hold critical packages on a particular version, so if "the next Ubuntu ships with a different SQLAlchemy by default" you just hold the SQLAlchemy package at the version you want and then ignore it until you're ready to make that move.
99% percent of the time when I hear people complaining about their distro's packages, the complaints are coming from the opposite direction -- they want to run something bleeding-edge and the distro doesn't have it yet. (This is the standard beef Rubyists have with Debian, for instance -- that code that just hit Github ten minutes ago isn't in Debian's repos yet.)
It's usually best to leave these packages for the system python to run pieces of the system. There's one exception that I usually make for this rule: packages with complex C dependencies. NumPy is the first package that comes to mind. I usually prefer to just use the package manager to deal with that.
> "into global site-packages that you shouldn’t use for any serious coding"
Any specific reason for that? I find it quite good and have quite large deployments using .debs only with packages in global location. (tens of packages produced locally - either updated or unavailable dependencies and the service itself) Any direct dependency is handled by package pinning and no update goes into production untested, so the whole "new sqlalchemy suddenly appears" issue does not exist. As long as people don't break API versioning in silly ways, what's the problem with this?
The only version-related issue I remember was when someone thought it would be nice to install something through pip, instead of via package. (went to /usr/local)
Having a global python installation where packages are constantly installed, uninstalled, and updated is the path to madness. If something goes wrong, what can you do? You can't wipe out the system python. You can wipe out a virtualenv though.
What do you mean by goes wrong? Either some package is installed or not. For me, chef manages which ones are. If something is really FUBAR, then wiping is exactly the path I'd take - or more accurately, take that server down for analysis of how exactly it got into that state (so we won't do that again) and bring a clean one up.
Mainly it's about incompatibilities. What if you have two apps that require different versions of a library? If you've installed it in site-packages, then you have little recourse. By separating them out with virtualenv the two apps will work just fine.
You roll version Y yourself and install it into an alternate prefix. If the server uses debian that means make a new deb and deploy it using the standard tools. apt-get/yum/etc. are very solid deployment tools.
What is the benefit of using global packages? If you pin them, what do you gain? virtualenv give you complete control over the runtime environment of every single application. Stuff like what you wrote can’t really happen.
And BTW, there’s more stuff that can happen than breaking APIs: new bugs that happen only on your system or even better: your code worked only _because_ of a bug. :)
For me the benefit is a much easier security audit/update. Unless you have dependencies packaged by yourself, the only thing you need to monitor is what comes into security-updates of your repository. If there's anything new, do a test deployment, do what you do to verify it's correct and change the pin in production.
Honestly: how many people installing software through virtualenv are registered to security mailing lists for each of their packages (and their dependencies down to things like simplejson)?
But if you _pin_ your packages, how are you updating? You have to monitor it anyway.
High profile projects like simplejson, Django or Pyramid and their deps won’t be missed and the really obscure ones will never make it into the repositories anyway.
Of course. I'm just saying that in case you're pinning only the default upstream versions from your distro so that they don't change, it's easier to automatically report on which packages have a new version in {distro}-security repository. Then retest and change the pin.
The same can be achieved by subscribing to CVEs... but you have to remember to filter the ones you use. Of course that's not a huge difference, so if someone prefers the second way, there's nothing wrong with it ;)
Oh the problem has NOTHING to do with "10 minutes ago".
Ubuntus's latest release 11.10 (yes I know 12.04 is a couple of days away) is Python 2.4. I don't remember what Ruby it is, but it's something like ~=1.8.7. Ruby is on 1.9.3 and the next version of Rails won't even support 1.8.7.
I'd be fine with a year or two old Python and Ruby....
They rarely have a modern enough package to be worthwhile. When you control the package pipeline, you have the ability to dictate how modern or conservative your software is.
The problem is that most programming languages (especially Python and Ruby) are ecosystems unto themselves and often move at a much faster pace than any stable distro (or LTS) could keep up with. That's why we have gems and pip.
Cf. Ubuntu LTS MongoDB default is like 1.2 or something. This is why I switched to using 10gen's repo.
Perhaps, but that's a completely different argument than the one TFA is making.
Besides, in the Ubuntu case anyway, LTS is supposed to be old. The whole point of LTS releases is to let slow-moving institutions/enterprises sit on ancient packages for 3-5 years without having to worry about backporting security updates. If you want recent versions of packages LTS is precisely the wrong place for you to be.
I would disagree with this. LTS (I'm referring to Ubuntu LTS as well as RHEL) is still a good fit because there are 9001 other packages your system relies on for day-to-day operations, and I'd rather mess with those as least often as possible. If you don't use a LTS then you'll have to do whole system upgrades every few years - or else you stop getting security backports. I'd much rather micromanage the dozen highly visible packages that are immediately relevant to my app.
Also, even if you're not on an LTS, doesn't mean you have the latest/greatest available. The python community moves at it's own pace, so there's still a chance that you'll be stuck with the just-before-latest-stable version.
Not entirely, but if your reason to using a non-LTS is to get the latest versions of your stack, then you've just made a really bad choice. As I've said, you're either giving up security updates or are left to test and upgrade your entire system every ~two years in exchange for no guarantee you're on the latest of anything. That tradeoff doesn't make sense no matter what way you look at it.
LTS is the default distro for a lot of server and VPS providers. Often the latest VM images will be rather...unstable, even putting aside the instability of the distro itself.
Regarding virtualenv, I have come to the conclusion that Linux containers are robust enough now (like freebsd jails say two or three years ago) that I don't need to virtualise just python - I can afford to have the whole server as a "virtualenv" - no need for that extra complexity just install into site packages. No conflicts because a whole instance is dedicated. Jails take this to the limit - one virtual machine, one process - say Bind. A vulnerability in Bind ? The attacker takes over ... Nothing.
Sorry, but I'm not sure I understand your logic. Using virtualenv adds extra complexity, but virtualizing the entire server doesn't? I mean, the only complexity using virtualenv adds is having to run the virtualenv process once. After that, you can still install to site-packages. You just have to install to a different site-packages directory.
Besides that, it's worth pointing out that using a virtualenv is not a security precaution. It's a precaution to prevent mucking up the global python installation for other packages that run on it. Using linux containers to achieve this seems like overkill.
A late reply but I don't just want the python environment virtualised - if it's important enough that I should section off python, then there is a good chance I will want to consider the whole box as a single unit, python, firewall rules, database what ever. I tend to think the unit of abstraction should not be the python process, but the server. This is a little easier to grasp when you think of BSD jails where essentially you can choose to only run those processes that actually mTter - it's less virtualised and os than pick and mix an os.
Apologies for late reply - I guess I am straightening it out in my head more than telling anyone else.
Regarding "Don't use ancient system Python versions" and "Use virtual environments", you can knock out two birds with one stone by just using pythonbrew. It also saves you the hassle of rolling your own deb/rpm if a package doesn't happen to exist.
Also, Chef/Puppet aren't "alternatives" to something like Fabric. Use the former for server provisioning, and use the latter for actually kicking off the deployment process. Trying to shoe-horn the finer deployment steps (git checkout, tarballing, symlinks, building the virtualenv, etc) into Chef was a nightmare every time I tried. Those tasks are better suited for Fabric's imperative design. Plus you can just run any Chef commands from Fabric itself, or use something like pychef for finer grained control. It's a win/win.
I find the line between when to use fabric and when to use puppet/chef is needing the permissions of a more privileged user.
I prefer to deploy applications in the home directory of a dedicated user account with minimal privileges, and use fabric for installing updates and running application tasks.
OTOH, I'd prefer to be using puppet for creating the user accounts, managing the installation and configuration of PostgreSQL, putting app configuration in a safe location, and so on.
This comes down to my (maybe antiquated?) view of having multiple applications running on a single server.
That’s exactly how we do it too. The only difference is that we don’t install the app using Fabric. Instead, puppet installs the deb with the app into the directory.
Maybe I should add a “running apps as root” anti-pattern, but this its 2012 after all, everyone should know that, right? :-/
It would be fantastic if someone were to write a tutorial how to deploy, for instance, a simple Flask+MySQL or Django+PostgreSQL application using these tools. I'm at a loss as to where to start.
That’s not what I meant: If you build your virtualenv on the target server, you need build tools like GCC or development files like libpq-dev. I prefer to have as few stuff on servers as possible.
Yes, but unless rpm/deb works with softlinks, it is not atomic right? Do you do anything before/after dist-upgrade or is it included in the pre/post install script?
For example, for our deployment, we rely on softlinks and uwsgi robust reload behavior to avoid losing requests. I've seen many devops who were using hg update/git update as a way to "deploy" (arg!), but I'm not sure about the behavior of deb/rpm.
> The trick is to build a debian package (but it can be done using RPMs just as well) with the application and the whole virtualenv inside.
I would love to read an article describing some best practices for doing that. I tried it once and found it extremely difficult, reverting to a git checkout + virtualenv kind of deployment.
I'm familiar with building and hosting deps in general. The problem I had was that I tried to build debs for every single dependency which turned out to be really hard to automate. That's why I'm really interested in seeing an approach with the virtualenv embedded into the package.
For a lot of my projects I write a shell script which builds all of the application dependencies (including services) into a project directory and run them all from there.
It takes a little bit of work to get going --- especially when building a new service for the first time --- but I like that it side-steps language-specific packaging tools (particularly the half-baked Python ones) and lets me pin an applications dependencies and port to various environments (develop on Mac, deploy on Unix) almost exactly. Integrating with Puppet/Chef is just a matter of breaking up the shell script into pieces.
I'd really enjoy reading about how to setup a PyPI mirror like the one you use in your development/deployment workflow. It seems like a really good idea, considering I've had problems with PyPI at really inconvenient times in the past.
You just set it up on a local server, and upload packages the same way they are uploaded to real PyPI.
python setup.py register sdist upload
You can specify alternate PyPI server with a ~/.pypirc. There are probably other ways to do this. What's nice about this is you can upload your own private packages or your own personal forks of packages. We do both.
I've used ClueReleaseManager to great effect in the past. It lazy loads from an upstream PyPI so you don't need to maintain a full mirror. If you use the same instance as your main PyPI mirror for developing and CI/deployment you never have to worry about syncing manually: any package you develop with will be in the cache by the time your code hits the CI server.
I don't think using virtualenv to jam everything into a big deb file is really a best practice.
But at the end of the day, I do have to do a lot of that with application deployment, but I try to only go as far as packaging libraries (ie. gems, jars, python equiv) in the rpm/deb file.
> What happens when there are vulns for your stack?
That’s a good point and the answer is: You have to monitor your dependencies of public services (that aren’t that many).
But you have to do that anyway, because I can’t explain to our customers that their data has been hacked because Ubuntu/Red Hat didn’t update Django (fast enough).
My public services don’t have 100 dependencies and that’s on purpose. Relying on magic distribution fairies for all your libraries is a IMHO a false sense of security, YMMV.
How do you make sure that whenever one of your dependencies gets updated that your daemons get restarted?
And what do you do if you need a package that isn’t part of your distribution?
My two cents: I am a developer + ops person and deploy Python apps all the time. Typically they are Django and Tornado services. On top we also have a lot of daemons and a ton of library code.
I agree with the OP on most points but do not on a few. First DO use packages that come with the OS. The OP says that you should not have the distro maintainers dictating what you use. I say, use what is widely available. It takes the headache out of a lot of your deployments. If you are looking for a library that converts foo to bar look in your distro's repos before going on GitHub. Your sysadmin will thank you.
Second, DO NOT use virtualenv. It fixes the symptoms (Python's packaging system has many shortcomings such as inability to uninstall recursively, poor dependency management, lack of pre and post install scripts, etc.), but not the problem. Instead, use distro-appropriate packages. Integrate your app into the system. This way you will never end up running a daemon inside a screen session, etc. You also get the ability to very nicely manage dependencies and a clean separation between code and configuration.
Lastly, DO use apache + mod_wsgi. It is fast, stable, widely supported and well tested. If apache feels like a ball of mud, take the time to understand how to cut it down to a minimum and configure it properly.
When it comes to infrastructure, making boring choices leads to predictable performance and less headaches more often than not (at least in my experience).
I'd go middle ground, and start here, but consider a self-built package where necessary. It depends in part on the focus of your distro.
virtualenv. What problem does it solve? Different python version/environments? Wouldn't that be better solved with another (virtual) server? I understand if an extra $20/month is an issue, but otherwise ...
I have just been educated by reading more of this thread. I can see an obvious need for one virtualenv, so that you can separate your service and its needs from the system python and its needs. Beyond that my inclination would be to go more servers rather than more virtualenvs, but circumstances vary and my experience is narrow.
> First DO use packages that come with the OS. The OP says that you should not have the distro maintainers dictating what you use. I say, use what is widely available. It takes the headache out of a lot of your deployments. If you are looking for a library that converts foo to bar look in your distro's repos before going on GitHub. Your sysadmin will thank you.
The sysadmin will have no part in the game if you use packaged virtualenvs. OTOH developer time is expensive. Do you really want to pay your developers to implement functionality that a more recent version of a package has already implemented? A good example is IPv6 support in Twisted. It’s getting implemented right now but I guess (and hope) that I’ll need it sooner than it lands in major distros (please no “lol ipv6” here, it’s just an example and the support is growing).
> Second, DO NOT use virtualenv. It fixes the symptoms (Python's packaging system has many shortcomings such as inability to uninstall recursively, poor dependency management, lack of pre and post install scripts, etc.), but not the problem. Instead, use distro-appropriate packages.
I’m not sure what your problem is, but mine is that I don’t want to develop against a moving target and need to run apps with contradicting dependencies on the same host.
That’s how I started using virtualenv years ago BTW, I’m not talking ivory tower here.
> Integrate your app into the system.
Yes. And I prefer supervisor for that. If your prefer rc.d scripts, be my guest.
> You also get the ability to very nicely manage dependencies and a clean separation between code and configuration.
I don’t get this one TBH.
> Lastly, DO use apache + mod_wsgi. It is fast, stable, widely supported and well tested.
And nginx + uwsgi/gunicorn aren’t?
> If apache feels like a ball of mud, take the time to understand how to cut it down to a minimum and configure it properly.
I know Apache pretty well, because we’re running thousands of customers on them. I’ve already written modules for it and been more than once in it‘s guts. And my impression is not a good one.
My point was to look around before you settle. If you think Apache is da best, knock yourself out. However stuff I see daily on IRC lets me think that it isn’t very unproblematic.
I’m not going to start a “vi vs. emacs”-style holy war here. That’s why I wrote “shop around before you settle” and not ”don’t ever use Apache”.
> When it comes to infrastructure, making boring choices leads to predictable performance and less headaches more often than not (at least in my experience).
Absolutely. nginx is way past the “new and hacky” state though.
Sysadmin/Ops should have a part in what gets deployed. It is their job. My point is that if you have a choice between Python 2.6 that comes with your distro and Python 2.7 that doesn't, go with 2.6. The cost is minimal. Same for various libraries (PIL, NumPy, etc.) I currently have to deploy Django-Piston directly from a master branch on GitHub due to some terrible decisions made by developers who never gave deployment a second thought. Avoid this. If you would save more hours by using a newer library, minus time it takes to set it up and maintain it in production, go for it, but don't discount the cost on the Ops side.
I don't have a problem with virtualenv: it's a fine development tool, but it is not what I would use in production. If you want separate clean environments for each app, use KVM or Xen and give it a whole server.
supervisord is a fine solution. I just prefer that my processes look exactly like system processes. Upstart and rc.d are fantastic and do everything I need well.
As for separation of code and configuration, I simply mean that in my case I use Debian packages to deploy all of our software. This means that each package must be generic enough that it is deployable so long as its dependencies are satisfied. Thus your config files are mostly external to your packages. Then you can easily use Puppet or some such to deploy code. One other reason to use native distro packages: Puppet does not play well with pip/easy_install, etc.
>> Lastly, DO use apache + mod_wsgi. It is fast, stable, widely supported and well tested.
> And nginx + uwsgi/gunicorn aren’t?
I didn't say that. I am simply stating that saying "don't use apache" is wrong. Do use it. You can also use nginx + uwsgi/gunicorn if you want to, but apache is by no means a bad choice. It's got 25 years of use and nobody that uses it seriously is complaining.
> Absolutely. nginx is way past the “new and hacky” state though.
It is not, and I never said so. I use nginx + apache, where nginx is a reverse proxy. In fact nginx is my top choice for front-end server setups. I am saying, don't deploy things directly out of GitHub. Go with slightly older, more tested stuff. It'll be a bigger payoff in the end.
Overall, I think we are saying the same thing, with slightly different tools we normally reach for. I am just trying to throw a different perspective out there and a different way to do things. Thanks for the detailed reply.
> I currently have to deploy Django-Piston directly from a master branch on GitHub due to some terrible decisions made by developers who never gave deployment a second thought. Avoid this.
JFTR, there is a pretty good solution for this (because in the real world, you can’t always avoid that): a custom pypi server.
You take a git version you know that works do a `python setup.py sdist` und push it to a private repo. We have to do stuff like this for Sybase drivers for example which are open source but not in PyPI (or any distribution). It saves so much pain.
> As for separation of code and configuration, I simply mean that in my case I use Debian packages to deploy all of our software.
Well we do the same but the Debian packages contain code _and_ the virtualenv.
> Overall, I think we are saying the same thing, with slightly different tools we normally reach for.
Mostly yes. Just wanted to add the pypi tip as it hasn’t been mentioned yet in this thread.
Nive tip about the private PyPI server. I might use it in the future if we end up in the same situation again.
Also, one more nice thing about using distro packages for your own deploys: if you ever need to mix Python with anything else it is a godsend. For example, at one point I had a network sniffer/userland packet forwarder that was written in C deployed alongside our Python daemons. Debian packages do not care what you put in them: C, Python, JavaScript or pure data.
This guy seems to be of the opinion that the software should be completely isolated from the deployment operating system.
I know that's a common view and wrapping as much of the site as possible up in a virtualenv certainly has a lot of advantages. But ultimately, your software is going to have to interact with the OS, at some level, otherwise, why do you even have an OS? So the question is: where do you draw the line? He seems to draw it further down the stack than most people (no system python, for instance) but he doesn't give his opinion on, for instance, using the system postgresql.
Anyway, I personally would draw the line further up the stack than him, but take things on a case-by-case basis, and I don't really consider it an "anti-pattern."
With regards to fabric vs. puppet, I understand the advantages of puppet when you have a complicated, hetrogenous deployment environment. But the majority of projects I've worked on have the operations model of a set of identically-configured application servers back-ended against a database server. For this configuration, what does puppet give you? If the author's argument is that the site may eventually outgrow that model, well, I can see puppet becoming necessary, but why not cross that bridge when you get to it?
I think there's an exageration as well, today the trend seems to be "use nothing by the distro"
Ok, sure, MongoDB still changes a lot between versions, in this case you should use the latest version.
But stop there. Especially if you're paying for support (like RHEL)
There should be a good reason for you to compile Apache / MySQL / PostgreSQL / Python. Otherwise, use the distro version. One (common) exception would be "we need Python 2.7 but this ships only 2.6"
Most of the "just download and compile" have no idea of the work that goes behind Linux distributions to ship these packages.
Yes, I'm sure you're going to read all security advisories and recompile all your stack every X days instead of running apt/yum upgrade
Yes you should, but if you have them: do it. That’s what the article says. ;) Please don’t push it in a wrong direction.
What I actually wrote is: because we’re a LAMP web hoster, we compile MySQL+PHP+Apache ourself. And because we’re a Python shop, we don’t let Ubuntu/Red Hat dictate which Python version we use.
If you use puppet/chef for the whole stack, you gain the ability of just starting a new machine and in a couple of minutes having it configured in the exact same way as all the others.
With fabric and similar systems it's a bit harder. Basically you'd have to write your scripts exactly the same way you'd write a puppet/chef recipe: "make sure this is configured that way, make sure that is installed", etc. (or do migration steps) It's very different from fabric's "do this, do that" approach. Unless you run fabric on every single host after you make every change, some of your infrastructure will be lagging behind.
For example, what do you do when you create a new server, or do an upgrade that involves different dependencies? Run a fabric script that migrates from state X to Y? What happens to new machines then? How do you make sure they're in the same state?
I found chef a very good solution even if I have a single server to manage. No need to think about how it was configured before. Migrating to another provider? Just migrate the data, point the server at chef, done.
Point of fact, he does give his opinion on the postgresql issue: "We also have to compile our own Apaches and MySQLs because we need that fine-grained control."
But that's besides the point. I don't think he's arguing that software should be completely isolated form the deployment operating system. That would be absurd, since, as you pointed out, software has to interact with the OS at some level, i.e., to manage system resources. Just because the OS ships with a bunch of packages with specific versions doesn't mean you have to use them. And what I think the OP is saying is that to make your applications portable and easily deployable, you shouldn't.
Good catch. Sorry I missed that. I suppose I've forgotten that a lot of people still use MySQL, even for python. :)
The question "why even have an OS?" was a rhetorical one, meant to illustrate the fact that the argument here is over where one draws the line, between the dependencies you maintain yourself and those you outsource to your OS vendor. Unless you're deploying on Linux From Scratch then you are outsourcing to the vendor at some level, so the only question is, where is that level?
That this guy maintains even his own web and database servers means wants/needs to control things at a much deeper level than is typical. And I think that's beautiful if it works for him. My point of disagreement is his use of the word "antipattern" to suggest that any other choice is wrong.
First of all I’m afraid the point about Apache & MySQL is misleading and I’ll have to clarify it: We don’t use neither for internal projects. It’s all PostgreSQL. :)
They’re both for our customers as we sell traditional web hosting with a LAMP stack (and yes, we have to compile PHP ourselves too, so we can offer different flavors).
I never said you _have_ to compile everything yourself. In case of Python I said you shouldn’t inflict the pain of programming Python 2.4 just because you’re on RHEL/CentOS 5 or earlier.
System packages have several problems I outlined too. Most important: you add a middleman to the release pipeline, no virtualenv and possible dependency conflicts: I have apps that depend both SQLAlchemy 0.6 and 0.7 – they couldn’t run on the same server.
But OTOH we use stock packages as far as possible, because it’s less work. Eg. there’s no reason to compile an own Python on Oneiric and later. Same goes for PostgreSQL which is uptodate.
It’s all about freeing yourself from constrains in points that matter, not adopting a new religion.
I am glad this is working for the OP, but pushing virtualenv and "self-contained" apps as the one solution is a diservice to the community. There are valid reasons to rely on your OS, assuming you have an homogenous deployment target (same OS, maybe different versions):
- lots of people argue for virtualenv because some versions may be incompatible. The problem here is the lack of backward compatibility of packages, and frankly, if you need to rely on packages which change API willingnily betwen e.g. 1.5 and 1.6, or if each of your service depends on a different version of some library, you have bigger problems anyway.
- any sufficiently complex deployment will depend on things that are not python, at which point you need a solution that integrates multiple languages. That is, you re-creating what a distribution is all about.
- virtualenv relies on sources, so if some of your dependences are in C, every deploy means compilation
- I still have no idea how security is handled when you put everything in virtualenv
> pushing virtualenv and "self-contained" apps as the one solution is a diservice to the community.
Wow. :(
> There are valid reasons to rely on your OS, assuming you have an homogenous deployment target (same OS, maybe different versions):
I’d love to hear them.
> lots of people argue for virtualenv because some versions may be incompatible. The problem here is the lack of backward compatibility of packages, and frankly, if you need to rely on packages which change API willingnily betwen e.g. 1.5 and 1.6, or if each of your service depends on a different version of some library, you have bigger problems anyway.
Well, you said there are possibilities of problems but that they shouldn’t matter in an ideal world. Maybe you’re okay to take the chances but I’m not. Every code I deploy has been tested rigorously against a certain set of versions and that is the only combination of dependencies I’m willing to consider “working”. UnitTests with different dependencies are just as worthless like integration/functional tests against sqllite instead of the same DB type as in production.
There’s even the possibility that your code works because of a bug and when that one gets fixed, you app goes south because of some weird side-effect.
> any sufficiently complex deployment will depend on things that are not python, at which point you need a solution that integrates multiple languages. That is, you re-creating what a distribution is all about.
I’m not sure if I understand what you mean, but yes if you want to use certain features outside the Python ecosystem, you’ll have to buckle up and package them yourself too. “We can’t do that, package XYZ is missing/too old.” isn’t really a good excuse to not do something that is important/good for your business. And that’s one of the main points of the article.
> virtualenv relies on sources, so if some of your dependences are in C, every deploy means compilation
That’s wrong if you go the way described: The virtualenv is packaged with the code. Build tools don’t belong on production servers.
> I still have no idea how security is handled when you put everything in virtualenv
Just as everywhere else. If you think it’s okay to tell customers that their data has been hacked because debian was to slow to issue a fix, be my guest. We can’t afford that. What happens on my servers security wise is _my_ responsibility and using ancient versions of Python libraries just to be able to blame others for FUBARs is not a solution in my book.
> I’m not sure if I understand what you mean, but yes if you want to use certain features outside the Python ecosystem, you’ll have to buckle up and package them yourself too.
I think his point is that pip can't do this, but learning to actually work with your distro's packaging system properly results in a more powerful and easier to redistribute project.
> Just as everywhere else. If you think it’s okay to tell customers that their data has been hacked because debian was to slow to issue a fix, be my guest.
Your packaging methodology is not what gives you security there. It's that you noticed a vulnerability and deployed a fix. The method of deployment is irrelevant. Your point is that knowing you have a security issue and waiting for upstream to get around to fixing it isn't always acceptable. That goes for everything.
If you know how to roll .debs it's just as easy to patch and release a fixed version of a library. (Or even install the pip version earlier in your sys.path...)
> I think his point is that pip can't do this, but learning to actually work with your distro's packaging system properly results in a more powerful and easier to redistribute project.
Absolutely. And that’s why I package the whole virtualenv into the DEB along with the project. I always have the assertion, that that combination of packages that’s inside passes all my tests no matter where I install it.
I disagree. IMO dynamic load libraries was a known bad solution before they were made (dll hell). Plenty of exploits have come from them, plenty of horror stories, etc. If you have a nice distro anyway, upgrading a given library that everyone packages should be doable.
Windows still uses shared libraries but at least they invented the GAC so applications can specify exactly which version of a library they work with and the installer will install that version if it's not present.
Using tmux is a Python daemon antipattern? And then "there's so much wrong about this approach" that he doesn't bother explaining why? Isn't that why we are reading the article: because we want to know why?
If the author is trying to convince people to change their habits, he is doing a crummy job. He comes across as elitist and "if you don't do it my way you're wrong".
Wow, I always though of myself as an idiot for doing this. But that is for some not yet launched thing. Who on earth does this for a production website?
The ability of Go to produce a single, self-contained executable is one of the biggest advantages it has over the "scripting" languages. It makes deployment so much simpler.
Virtualenv is a half solution and a hack. Use vagrant and VMs. There's a whole sea of libs and software that isn't "versioned" by pip/virtualenv.
supervisord is the wrong solution. It answers the wrong question (is the process running). It's worse than useless in that it has given false positives. The right question is (is the process responding correctly). Use monit or something else that actually does what's needed.
Vagrant ist nice for testing but I’m not going to deploy services to VirtualBox (we use kvm for that). Of course there are dependencies outside of the Python ecosystem but their influence proved to be negligible till now. YMMV.
supervisor is not a an alternative to monitoring, I never claimed that. But that’s a whole different story.
Personally I would just go for Virtualenv, but you'd be surprised if you know people do with virtualization. I've mostly seen it in Windows shops, but it's by no means limited to that. A common "anti-pattern" I've seen is to build a vmware image, or set of images, that run your "environment". Developers then spin up copies and request changes that they need be made to the "master copy".
When it's time to test and deploy, the code is deployed to a vmware image, which is then sent to the testers. When everything checks out, the code is once again deploy to a new copy of the image, which is then promoted to production.
On argument we've seen is that it makes it easy to do a rollback of the system, using vmware snapshots. It might be a sensible idea in some cases, I just thing it's a bit weird and some overhead.
The worst use case of this I've seen is a Telco, where you needed to spin up as much 12 vms, depending one what you and your teams was working on. But then again that's the same company that read the Perforce guidelines on recommended setup and still decided to just pill all the projects in to one repository.
Just a thank you for writing a positive, easy to follow overview with links to more in-depth information. I love when people boil down experience and serve it without a side dish of attitude.
Summary of the deployment tools mentioned:
Other tips: Author also touted: