Hacker News new | past | comments | ask | show | jobs | submit login
Bundling binary tools in Python wheels (simonwillison.net)
110 points by goranmoomin on June 17, 2022 | hide | past | favorite | 32 comments



I have just used this trick to package Node.js as a Python package [0], so you can just do:

  pip install nodejs-bin
to install it.

This works particularly well in combination with Python virtual environments (or docker), it will allow you to have a specific version of Node for a particular project without additional tooling. Just add "nodejs-bin" to your requirements.txt.

You can also add it as an optional dependency on another Python package. That's my plan with the full stack Python/Django/Apline.js framework I'm building (Tetra[1]), that way users will be able to optional do this to setup the full project including Node.js:

  pip install tetraframework[full]
The framework uses esbuild, and will use PostCSS. It was either package them with the framework (including Node.js), or create a way to easily install Node.js for Python developers using the tools they are most used to.

It's brand new, I have only packed Node.js v16 so far. Once I'm happy there are no show stopping issues the plan is to release wheels for current+LTS Node.js versions as they come out.

Developer UX is my number one priority and removing the need for two toolchains is an important part of that.

[0]: https://pypi.org/project/nodejs-bin/

[1]: http://tetraframework.com


That's great! It could be used to avoiding needing multi stage builds in Docker when using JS frontend in a Python app. Assuming pip uninstall nodejs-bin work cleanly one also has easy way of keeping the image slim by uninstalling after frontend build.


Thanks!

Yes, "pip uninstall nodejs-bin" will remove the full nodejs install from your python environments site_packages. It won't remove node_packages however, you will still need to handle that yourself depending on use case.


The Python wheel "ecosystem" is rather nice these days. You can create platform wheels and bundle in shared objects and binaries compiled for those platforms, automated by tools like auditwheel (Linux) and delocate-wheel (OSX). These tools even rewrite the wheel's manifest and update the shared object RPATH values and mangle the names to ensure they don't get clobbered by stuff in the system library path. The auditwheel tool converts a "linux" wheel into a "manylinux" wheel which is guaranteed by the spec to run on certain minimum Linux kernels.

Wheels are merely zips with a different extension, so worst case for projects where these automation tools fail you you can do it yourself. You simply need to be careful to update the manifest and make sure shared objects are loaded in the correct order.

On the user side, a `pip install` should automatically grab the relevant platform wheel from pypi.org, or, if there is not one available, fall back to trying to compile from source. With PEP 517 this all happens in a reproduceable, isolated build environment so if you manage to get it building on your CI pipeline it is likely to work on the end user's machine.

I'm not sure what the state of wheels on Windows is these days. Is it possible to bundle DLLs in Windows wheels and have it "just work" the way it does for (most) Linux distros and OSX?


FWICS, wheel has no cryptographic signatures at present:

The minimal cryptographic signature support in the `wheel` reference implementation was removed by dholth;

The GPG ASC signature upload support present in legacy PyPI and then the warehouse was removed by dstufft;

"e2e" TUF is not yet implemented for PyPI, which signs everything uploaded with a key necessarily held in RAM; but there's no "e2e" because packages aren't signed before being uploaded to PyPI. Does twine download and check PyPI's TUF signature for whatever was uploaded?

I honestly haven't looked at conda's fairly new package signing support yet.

FWIR, in comparison to legacy python eggs with setup.py files, wheels aren't supposed to execute code as the user installing the package.

From https://news.ycombinator.com/item?id=30549331 :

https://github.com/pypa/cibuildwheel :

>>> Build Python wheels for all the platforms on CI with minimal configuration.

>>> Python wheels are great. Building them across Mac, Linux, Windows, on multiple versions of Python, is not.

>>> cibuildwheel is here to help. cibuildwheel runs on your CI server - currently it supports GitHub Actions, Azure Pipelines, Travis CI, AppVeyor, CircleCI, and GitLab CI - and it builds and tests your wheels across all of your platforms


You're right, both the infrastructure and metadata for cryptographic signatures on Python packages (both wheels and sdists) isn't quite there yet.

At the moment, we're working towards the "e2e" scheme you've described by adding support for Sigstore[1] certificates and signatures, which will allow any number of identities (including email addresses and individual GitHub release workflows) to sign for packages. The integrity/availability of those signing artifacts will in turn be enforced through TUF, like you mentioned.

You can follow some of the related Sigstore-in-Python work here[2], and the ongoing Warehouse (PyPI) TUF work here[3]. We're also working on adding OpenID Connect token consumption[4] to Warehouse itself, meaning that you'll be able to bootstrap from a trusted GitHub workflow to a PyPI release token without needing to share any secrets.

[1]: https://www.sigstore.dev/

[2]: https://github.com/sigstore/sigstore-python

[3]: https://github.com/pypa/warehouse/pull/10870

[4]: https://github.com/pypa/warehouse/pull/11272


We bundle DLLs in our wheels in such a way that it "just works" for the user but it kind of feels like a hack. First a main DLL library is built completely separately from the wheel. Then a binary wheel is built where the .pyd file basically just directly calls functions from the main DLL. The main DLL is then just manually included in the wheel during the build step. Any dependent DLLs can also be just manually included inside the wheel as well.


Still no way to publish a wheel for musl-based Linux though, is there?


musl-based wheels can use the `musllinux` tag, as specified in PEP 656[1].

For example, cryptography publishes musl wheels[2].

[1]: https://peps.python.org/pep-0656/

[2]: https://pypi.org/project/cryptography/#files


This is also the basis for the postgresql-wheel package, a Python wheel that contains an entire local PostgreSQL binary installation:

https://github.com/michelp/postgresql-wheel

Out of sheer laziness I used CFFI to encapsulate the build, even though there is no dynamic linking involved. There's probably a simpler way but this approach has worked very well for disposable database tests.


Very cool! Might be useful in CI scenarios.


About 10 years ago, before Docker, and before (I knew about?) conda, I was using Python packages with binaries inside to create reproducible virtual environments with consistently versioned binary tools to go along with all our Python code. We were avoiding virtualization and running on bare metal so this was the best way to distribute code across a cluster.

I added small setup.py scripts to a dozen open source C and Java packages, and deposited them on an internal package server.

It worked OK, and unlike Conda, to this day you can still run an install without burning through CPU-hours of SAT-solving.

Biggest problem was if there were packages that wanted different versions of the same dependency.


> Biggest problem was if there were packages that wanted different versions of the same dependency.

How often and in what contexts would you encounter this version dependency issue? Was there a good solution or would you work around the issue by finding compatible versions?


This never presented a real problem in practice, but IIRC pip just silently installs over the other version.

And since we had a closed system, it was entirely on us to set it up correctly. It was just a discovery a couple years in that we hadn't properly thought of ahead of time.


FWIW conda has mamba as an option now which (mostly) fixed the slow SAT silver


Mamba is completely unusable for me too.


In a similar vein: Python projects can use `auditwheel` to automatically relocate (fixup RPATHs) and vendor their "system" dependencies, such as a specific version of `zlib` or `libffi`[1].

[1]: https://github.com/pypa/auditwheel


Will fail if using sckit-build though. Had tot write a patcher fixing rpaths post-auditwheel.


next step is to use https://github.com/jart/cosmopolitan to build the binary


Awesome I'm gonna check this out for my bundle!


Suppose you're trying to distribute a piece of binary software that comes with different features, such as a build that includes only client applications, while another one also includes both client and server applications. If they both depend on the same shared library that needs to be included with the distribution, and come with the same python API, making it so you can't separate features into isolated packages, what's the best practice for packaging each bundle as wheels?

Would you name one package `software_server`, another `software_client`, but call the shared python component `software` in both of them?

Also, how do you manage versioning when the version of the binary software is independent from the version of the python wrapper around it?


This is not possible using wheels. Binary wheels cannot depend on shared libraries outside the wheel besides a small number of System libraries.

If you want to distribute a python package like this, you could use the conda forge eco system and have a package for the base library and then other packages depending on that, e.g.

foo-core foo-server foo-client


You can't depend on shared libraries outside the wheel, but you can include the library inside the wheel. The applications all expect the shared library they depend on to be in a path relative the location of the application, which prevents putting the shared library in a separate package anyway. So how would you handle this situation with wheels if you don't have the option of forcing users into a specific python distribution?


Maybe I misunderstood. After rereading, I think so...

I thought you wanted to distributed a shared library im one package and then have other packages depend on that.

So it's not clear to me what your actual Situation is


This sort of thing sounds like something better left to a proper package manager like Nix. Reproducibility and a complete top-to-bottom dependency chain are key in making sure a package works well into the future.


Which is all nice in theory except you have to meet people where they are and that’s using pip and publishing on Pypi. Unless you can auto-nixify an arbitrary Python package, and in this case an arbitrary binary package, you’re gonna end up doing a lot of work to get that reproducibility.

I don’t use pip because I like pip, I use it because that’s where the software I want is.


Nix has some serious ergonomics issues that need to be addressed. This is dead simple by comparison. Wheels are just zip archives with some specific semantics. Like, I wanted to get into Nix but it's super daunting and there's no real way to gradually migrate a project into it - it's very all-or-nothing.


The issues with this were always:

* size of wheels you can upload is constrained by PyPi

* difficult to support multiple versions across multiple operating systems, unless you provide a source distribution, which is then…

* Still a nightmare on Windows


> size of wheels you can upload is constrained by PyPi

I feel PyPi is pretty generous with their limits. You can even request more once you hit the ceiling, i think it’s around 60MB [1]. There are some wheels that are crazy large, tensorflow-gpu [2] are around 500MB each. I think there’s discussions out there to try and find ways of alleviating this problem on PyPi.

> difficult to support multiple versions across multiple operating systems, unless you provide a source distribution, which is then…

This can be a problem but I’ve found that recently the problem has improved quite a lot. You can create manylinux wheels for both x86, x64, and arm64 which cover pretty a lot of the Linux distributions using glibc. A musllinux tag was recently added to cover musl based distributions like Alpine. MacOS wheels support both x64, arm64, and can even be a universal2 wheel. Windows is still purely x86 or x64 for now but I’ve seen some people work on arm64 support support in CPython and once that’s in I’m sure PyPi won’t be too far around. There are also some great tools like cibuildwheel [3] that make building and testing these wheels pretty simple.

> Still a nightmare on Windows

I’m actually curious what is a nightmare about Windows. I found that Windows is probably the easiest of all the platforms to build and upload wheels for. You aren’t limited to a tiny subset of system libs, like you are on Linux, and building them is mostly the same process. Probably the hardest thing is ensuring you have tue correct vs build kit installed but that’s not insurmountable.

[1] https://pypi.org/help/#file-size-limit

[2] https://pypi.org/project/tensorflow-gpu/#files

[3] https://github.com/pypa/cibuildwheel


I’ve done this for static-ffmpeg with the added bonus of downloading ffmpeg binaries on first use.


That's a bit different than putting it in the wheel. It will not work with tools for vendoring, caching, or mirroring dependencies. It will also stop working if the separate place you get the binaries from breaks or runs out of money or purges that specific version from their site.


I built a Python version and dependency manager; program is a standalone executable coded in Rust. One installation method is as a wheel, by using this trick. Hosted on PyPi, users can install with Pip etc.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: