Hacker News new | past | comments | ask | show | jobs | submit login
Python on Wheels (pocoo.org)
151 points by donaldstufft on Jan 27, 2014 | hide | past | favorite | 74 comments



Wow, great article. I've heard of wheels for months, but I've never really understood why I should remotely care about them before (it always seemed to me they solved a problem that didn't exist; namely, binary distributions for pure python packages, since you can't support c-plugin packages cleanly with wheel anyhow...).

Turns out the answer is:

1) Yes, you can kind of support packages with c-plugins, if you're specific and careful.

2) They make server deployments significantly faster.

That's actually pretty neat.

...but damn, they still look like a massive pain to work with.


I confirm about the pain. They should be transparent and you shouldn't need to care about them but that's absolutely not the case. Also, it's the responsibility of the pypi maintener to push wheels on it, but none do this, so you end up building your own wheels while this should have already been done on pypi and it's stupid. And one last thing: https://twitter.com/mitsuhiko/status/426700148409135104 "It turns out, Python binary wheels on Python 2 are rejected by PyPI, I suppose because of the lack of UCS tag in the filename."

Apart from all of this pain, I'm very happy to be able to use them.

I'm really waiting for someone to build a "wheel on demand" website that would be a proxy to pypi. Or to turn pypi into a wheel farm (I don't know how realist this idea is).


<- PyPI Administrator.

I have plans for either turning PyPI into a build farm, or making a secondary service that acts as a build farm for PyPI. We've just been more focused on cleaning up other issues.


After being bummed by the removal of 'bundle' in pip and writing an article about how we moved to using wheel for faster deployments[1], we moved to using Packer for server builds for our cloud infrastructure. Now we can backup to using a simple 'requirements.txt' file and not worrying about install and compile speeds.

Removing 'bundle' was painful for us, and I even contributed a bunch of code to 'pip2pi' [2] to make it easy to set up an S3 bucket as a "local" pip mirror for our EC2 infrastructure, but it was all bandaid after bandaid.

So far, Packer has felt much more elegant (despite the effort it took to put it in to our workflow) than wheel, pip2pi, or any other solution.

[1] http://tech.flyclops.com/replacing-pip-bundle-374

[2] https://github.com/wolever/pip2pi

(Edited for grammar.)


Packer's github readme is way too short. Is packer a replacement of pip? Do you have links to learn more about this tool? We're having similar problems, and I'm looking for some alternatives.

Thanks for these links!


For a language that prides itself on simplicity, this seems really painful...


Yes, it is. Coming to this mess from the joy of bundler/gems, npm, or hell, even CPAN will ruin your afternoon.

It is just an unfortunate circumstance of the community not efficiently converging on well designed standards quickly enough, and so, the rocky soil has sprouted strange twisty crops. There are articles on how the setuptools/distutils frankenhack pretty much set off the whole trainwreck that you see today, e.g.: http://blog.startifact.com/posts/older/a-history-of-python-p...

Armin (the OP) has also written about this sad history previously: http://lucumr.pocoo.org/2012/6/22/hate-hate-hate-everywhere/


Would be nice if there were one packaging tool all those languages were using instead of each one reinventing it.


Something that integrated into your operating system distribution perhaps....


Unfortunately most linux systems do not support multi-version installs or installs local to a user or path.


This comes very close: http://nixos.org/nixos/


An OS solution which allowed for versioned packages to be installed on a per-project base. That’d be something…


aptly put. you might be onto something.


Groan. Well put, but still, groan. 8-)


Whats apt going to do for me on Redhat/CentOS?

The problem with package managers is the same as npm/gem.


not much, but it was harder to work yum into a relevant sentence.

..yum.


> or even CPAN ? Cpan, particularly cpanminus is excellent.


Also maven. It may require an XML manifest, but it's robust, easy to use, and has come to be a solid standard. Python may be a (mostly) nice language, but the infrastructure and tooling badly lags the Java world.


Managing dependencies correctly is probably the most important long-term problem to solve in programming, but for some reason it always seems to be given a low priority by language authors and communities, until one day it's become a tangled mess that isn't logistically possible to get out of. I don't get it, and it's frustrating.[1]

[1]: Somehow, all my lines aligned perfectly in this comment box at its default size! https://www.dropbox.com/s/slq8x0rqkj7ydzn/aligned.png


Managing dependencies correctly is probably the most important long-term problem to solve in programming ... and naming things


It is interesting to think of this in terms of basic computer science problems. A package is a cache of the developers master version which need to be invalidated. It has to be named in such a way that you can decide if it is the correct version. And sometimes it needs to be handled in a transactional way so that you have a view of the data that is consistent.

Put like that the reliable solution would be a locally replicated SQL database. Perhaps the PostgreSQL or SQLite would be a better target for development than the underlying OS?


A great deal of the pain that Armin experienced is mostly due to the fact that Wheels are very new and aren't very polished yet. Additionally there is a vast wealth of technical debt in the packaging tools so it's quite easy to introduce these kinds of problems because of the general architecture (Something that we're trying to fix long term!).


Is there any reason to believe wheels will ever become polished, or might they simply be replaced with the next idea that comes along?

Apologies for acting the cynic, but this is the kind of uncertainty that comes when there is constant reinvention, instead of improvement of existing tools.


You know that simplicity of use is very complex to achieve, and requires complex tools behind, right?

Actually it is even true for music instruments: A piano is very complex inside and can be played relatively easily, while a violin is much more simple in the fabric but much harder to play.


Just go and copy bundler or even npm.

This is a solved problem with a really bad case of "Not-Invented-here" in the python community.

What's funny is I'm pretty sure gems/bundler and npm/packages were heavily influenced by pythons plethora of crappy attempts (eggs, distutils...etc) at package management.


> Just go and copy bundler or even npm.

Anyone who's worked with all three of those would tell you that this isn't yet a “rest on your laurels” situation.


I've worked with all three (if I count pythons ~4 as 1).

There are improvements to be made to all of them but gosh python is a complete mess in this regard.


given how npm "scales" i would not say they solved the problem,and npm lacks of namespaces. Why cant I name my package the way I want ?


As oppose to python "name it whatever you like because no one will ever find it" naming convention?

Every solution has drawbacks, but python's solution has almost every drawback.


Well, a synth trumps both in complexity, but it's very easy to play, so the analogy stops there.


On the contrary, a synth is the most complex instrument and the easiest to play, that's my point.


Alright then, how about a kazoo? Triangle? Or at the other end (complex AND difficult to play) a pipe organ?


I didn't mean to claim that 100% of what could be named "musical instrument" is following perfectly the pattern, I was merely illustrating that "simple interfaces often require complex system".

Another illustration, in reverse: the Chinese chopsticks are among the simplest tool you can imagine, and it requires a somewhat steep learning curve, but once mastered it is really the most powerful food grabbing tool ever invented.

The Chinese brush is of the same nature: extremely simple, very hard to master, and very powerful (expressive) once mastered. Just like the violin (and VIM, and git, etc.)

On the other side of the spectrum you have the synth, IDEs, Google search, etc. Simple UIs, simple to learn, but internals are very complex.


It is like a farce on the "there shall only be one way to do it" mantra. Not to mention that some of the package managers has to be installed through one of the other packet manager, creating a very long dependency chain if you want to install a package through the former.


The more that I use Python in the context of other languages, the more the "there should only be one [obvious] way to do it" mantra irks me.

First of all, there's the cases in which reality fails to live up to expectations, like this packaging debacle. If there was an obvious and best way to do everything, we wouldn't need to spend so much money and time engineering software. We wouldn't spend so much time arguing about distributed vs. centralized, OOP vs. declarative vs. functional, type systems, build/test systems, etc. The mantra is arrogant on its face.

Secondly, it flies in the face of innovation. It works against building a creative community of users. So often, when Python users point out the holes in the ecosystem, they are simply told "well the long and outdated PEP 5991834 already establishes the obvious way to do it" or "just shim it on top of this inadequate corner of the standard lib" or the use case itself is questioned, which patronizes the criticizer. The mantra becomes an excuse for reinforcing a hive mind philosophy.

In this case, I wonder if it has subtly undermined the development of a diverse and widely supported package index. The whole point of such a package system is to support the belief that there are actually many ways to do the same thing, and some may be better for certain styles/teams/circumstances. Having the opposite outlook as a design principle for a language will attract users less inclined to contribute to or build such a diverse packaging system. I wonder this particularly because the more "laissez-faire" languages like Perl, Ruby, and JavaScript are precisely the ones that have such great packaging systems and ecosystems.


Python 3.4 will have a pip installer bundled with it, which removes the easy_install pip farce.


No, the mantra is actually "There should be one-- and preferably only one --obvious way to do it."

And yes, one usually installs easy_install to install pip, but I don't see how that can create a "very long dependency chain". Even when both tools are installed, you can use either to install packages.


The recommended way to install pip has not been via easy_install for awhile. Generally we recommend using the ``get-pip.py`` script. Which with recent versions will also install setuptools for you.


Spent some time this week to move deployment of venvs to use wheels (we install numpy and matplotlib). Now, instead of close to 10 minutes deployment time, it's done in about 30 seconds. I can live with the fact, that I'll need to make changes to the deployment process now and then, while wheel traveling towards 1.0, but I'm happy to go that route. And you should to.


Isn't there a way to make a relocatable virtualenv? I'd be inclined to just do that, compress it up and distribute the entire virtualenv (our deploy environment is homogenous to the extreme).


Conda (http://conda.pydata.org) is a more general packaging format than wheels and we use it to distribute the Scientific Python Stack (including complex C-dependencies) with ease. It is more general than wheels and solves this fundamental problem now with all the benefits without waiting for something in the future. See these blog-posts for more information: http://www.continuum.io/blog/conda_packaging and http://technicaldiscovery.blogpost.com

In particular look at the `conda package` command for an easy way to bundle up any non-conda packages across a homogeneous environment.


FYI: I think your last link should be "http://technicaldiscovery.blogspot.com/". There doesn't seem to be an active site at blogpost.com.


Yes, thank you!. You have the correct link.


A few years ago I wrote a tool to do exactly this [1], which also works for any executable/library, not just Python code. It works by compiling the code with a unique virtual prefix that is something like /tmp/boxes/4031e76a-6bff-11e3-a3b0-002590a9f2cc so that it can be symlinked to any directory in the filesystem. The tool also generates a script that sets environment variables so that includes, libraries, executables, man pages, etc... are all found in the path. It also allows to install several versions of the same package and instantly switching from a version to another, which is very useful during development.

Unfortunately I didn't have more time to work on it so it is basically unmaintained, but every now and then I still use it, mainly when I need to install software on machines where I don't have administrative privileges, and it works great. I wish I implemented some kind of dependency handling mechanism, it would have made it much more useful.

[1] https://github.com/ot/bpt


There is a way to make a virtualenv relocatable but it has some issues where it doesn't always work very well. I'm not sure offhand what those issues are though.


Isn't that basically docker?


no. oversimplifying here, but..

virtualenvs are just overrides to environment variables that change where things should look to find the python interpreter, libraries, modules, et al.

docker is much more involved, using LXC to force process isolation, an effective chroot, networking restrictions, etc.

many (most?) folks' current usage of virtualenv would not be portable to docker without substantive work to accomodate the restrictions that docker imposes.

(note: this isn't a critique of docker; those things are features. they're just orthogonal to this)


A great introduction to wheels. I'm still not sure if wheels really is a good idea though. For simple use-cases traditonal pip seems fine, for more complex use-cases buildout seems better? Having tried to install numpy and pygame in a virtualenv I do see a need for something better -- but I'm not convinced wheels will solve the problem of complex c-library dependencies...?


This is a nice post, and the Python packaging community is making strides in the right direction.

However, it's not there yet. Armin could bring himself to say this in this post: "It's there, it kinda works, and it's probably worth your time."

Conda is not perfect (it needs more people building conda binaries as well as a few pull requests to add https improvements) --- but I will confidently say "it is there, it does work now, and it is worth your time"

This is especially true with a package repository like exists at http://repo.continuum.io and the many personal repositories at http://binstar.org.

People that have used conda for deployment (especially of complex C-dependencies in the NumPy and PyData stack) are saving time and effort today.


I use wheels to speed up the tests for sorl-thumbnail https://github.com/mariocesar/sorl-thumbnail/blob/master/.tr... Initially I was into fully use wheel to install the environment but I found that some packages didn't work, now I just use it for just some critical packages and reduce the 12 minutes test to 3 minutes

Will love to see wheels getting more attention, really feels like a great direction for python packaging.


A lot of people in the PyData ecosystem using NumPy and Pandas are using conda and conda packcages (http://conda.pydata.org). Conda packages are easy to build and many exist at repo.continuum.io and on channels at http://binstar.org


Love sorl-thumbnail, thanks for all the great work!



Ugh. I'm so sick of the open source communities' disdain for binaries. There's no reason we should be recompiling shit everywhere. It's a waste of time, headspace, and processing power.


When you're trying to support multiple architectures and OSes and your ABI took a decade or two to settle down, and all your tools are open-source so you have the source anyways, I can see the instinct to just say "I'll just compile it against the target machine" even if it's absurdly wasteful.


Popular binaries can be uploaded falling back to building from source.


I thought it was another web framework ...


Just like Ruby on Rails :)



I've had the exact same experience as the author of this post. Eggs are great if you just want to install a python library. Horrible if you want to modify the library or distribute a program.

I personally use a mixture of setuptools and distutils2, and it's a confusing mix.


"What the command is supposed to do is to collect all the dependencies and the convert them into wheels if necessary..."

How does it convert the non-wheel packages into wheels?


On a slight tangent; does anyone else think that Docker is probably a better solution than virtualenv in most cases? I never really got on withe virtualenv.


You can also try conda's environments which are a system-level concept and is an even lighter-weight approach to some of the problems people are using Docker to solve. If you just need an independent software environment, conda gives you that while sharing files that can be shared. http://conda.pydata.org


The subset of problems that virtualenv solves that are also solvable by Docker is pretty tiny. But sure if you have problem that can easily be solved by Docker, then Docker is probably a better solution.


Really? I was thinking you just set up a new Docker env for each python project, then you don't need to worry about conflicting with libs from other projects.

What am I missing?


Docker only runs on a small subset of the platforms python runs on is the most obvious one.

Can docker easily access the host filesystem or do you have to copy all your data into each docker env?

Can I access my GPU for doing CUDA/OpenGL/OpenCL things from withing docker?

Does docker play nicely with GUI apps?

Calling into a running docker env from an external program (when you are using an app that has python as scripting or plug-in language) doesn't work as far as I know.

The overhead of setting up, tearing down an switching between several Docker envs seems to higher than with tools like virtualenvwrapper or conda.

Now there may be workarounds for these (and all the other) problems but on the surface they don't seem easier than virtualenv.


Docker runs on any linux. There is a few dependencies but the list is reducing very fast. You don't even need to have AUFS as you can use devicemapper or even regular copy if you have neither.

Yes, docker can easily access host's filesystem with volumes.

Yes you can access you GPU. I am using docker for mining.

Yes Docker plays nicely with GUI apps, there are plenty of GUI usecases with docker (docker desktop, firefox, etc..)

You can do anything you want within a python app and docker. There is a docker-py library that gives access to all docker features.

You can take a look at this: https://github.com/jpetazzo/stevedore in order to easily switch envs.


Sounds like it's high time I take a closer look at docker


Ok, thanks for the reply. Good questions.

I was mainly thinking in the context of developing webapps.

I'm interested in this enough that I intend to investigate and answer all your questions, perhaps in a blog post. I don't know enough right now to answer with certainty. However:

- it will currently only work well if you are developing on Linux. MacOS isn't bad but requires vagrant/virtualbox.

- I suspect switching between Docker envs may be faster, easier and cleaner than virtualenv which is why I suggested it.

- There is definitely support for sharing data.


And there's boot2docker for OSX that's even more lightweight than Vagrant - https://github.com/steeve/boot2docker.


You're also forgetting that Python itself is a core part of most modern Linux distributions. This means that even for a minimal install of an OS there are often times packages already inside your site-packages that your operating system depends on for a minimal set of functionality. Docker won't isolate you from these. It's a special kind of pain when you require one version of something, and your OS itself requires another version.


Interesting. I've only been working with Docker for a couple of weeks, so I probably have some misconceptions, but I thought that only the kernel was shared with the container host. Of course, you will be installing an OS into the container, and that will bring along its own dependencies. But if you run into this situation then there isn't anything to prevent installing virtualenv in the container too. From my perspective working in the context of server-side python deployment, Docker is looking pretty attractive as an alternative to virtualenv. It allows me to construct an image with the specific dependencies I need, and then run it anywhere.


Yeah, that's a very good point.

Thankfully I've never ran into this issue.


what about pex?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: