Hacker News new | past | comments | ask | show | jobs | submit login
Underappreciated challenges with Python packaging (pypackaging-native.github.io)
120 points by groodt on Jan 3, 2023 | hide | past | favorite | 71 comments



Pillow has most of the issues that are listed in the article. (oddly enough for a graphics library, the GPU part is the only part that I don't think we've stumbled over at one point or another.)

From a quality of life issue -- having the sdist install behind an opt-in flag by default for our package would be great. Unless you're a developer with a lot of -dev packages for imaging libraries already on your system, you're not going to be able to build from source. And even if the error that pops up is along the lines of

  The headers or library files could not be found for {str(err)},
  a required dependency when compiling Pillow from source.
  Please see the install instructions at:
    https://pillow.readthedocs.io/en/latest/installation.html
We still get people posting issues about pillow failing to install.

Build farms would be nice. We've burned tons of time on it between travis and GH actions and @cgohlke single handedly making all of the windows builds for the entire scientific python community.

Ultimately, something like the debian packaging system is probably the best open source model for this. (though the splitting of the python standard library so that virtual envs aren't in the base install is a pita). Unstable gets a reasonably current set of packages, and crucially all of the underlying library dependencies are compiled together. It's also not _that_ hard to rebuild individual packages from source, in an automated fashion. (This may be what Conda is doing, but I've never looked in detail at their system)


Could you release the sdist as a separate package and only upload binary wheels for a normal install?


You could, and that’s what psycopg2 did for a while (for a different reason but effectively they separated source and binary releases as different projects). This solves an immediate problem but introduces new challenges, probably the most significant being Python does not allow OR dependencies (I want at least one of those) and splitting the package made dependant projects’ lives miserable.


I've been planning on packaging a python package recently, and the internet is annoyingly full of guides which are, I think, out of date. They at least suggest quite different things.

I just have a single python file, meant to be treated as an executable (no package at present). There are a whole bunch of tests, but that's obviously separate. Any suggestions on modern best practices welcome!


If it's pure Python, the only packaging file you need is `pyproject.toml`. You can fill that file with packaging metadata per PEP 518 and PEP 621, including using modern build tooling like flit[1] for the build backend and build[2] for the frontend.

With that, you entire package build (for all distribution types) should be reducible to `python -m build`. Here's an example of a full project doing everything with just `pyproject.toml`[3] (FD: my project).

[1]: https://github.com/pypa/flit

[2]: https://github.com/pypa/build

[3]: https://github.com/pypa/pip-audit


> including using modern build tooling like flit[1] for the build backend and build[2] for the frontend.

Or you can use setuptools, which is the package that enables old setup.py builds, as the backend with pyproject.toml. This has the advantage of being mature, unlikely to be abandoned, and possibly some familiarity if you've used it before. Even then, you can use build as the front end build tool.


Yes, setuptools is also perfectly fine. It had some rough edges around PEP 621 support for a while, but those were mostly smoothed out in 2021.

(I'll note that maturity is not strong evidence here: distutils is very mature, but is in the process of being deprecated and removed from Python entirely. I don't think that's likely to happen to setuptools, but the fact that behavioral PEPs now exist for all of these tools means that the decline/abandonment of any poses much less of an ecosystem risk.)


I suppose I meant maturity as in: actively maintained and recommended for use for a long period, which doesn't apply to distutils.

And I should've been more upfront about the real reason for suggesting setuptools: there seem to be a number of build tools that support pyproject.toml, including flit, poetry and setuptools (and I'm sure I've seen at least one other). For me, at least, when I was making a small library recently, it was an overwhelming choice for a part of my project that feels like just admin rather than core business logic. I came close to giving up and just using setup.py with `setup()`. At least setuptools with pyproject.toml is a choice that feels safe; it may not be the best, but it will certainly be good enough that I'm unlikely to regret it later, so I didn't need to spend a lot of time looking at the detailed pros and cons of all the choices.


That's very reasonable! I don't mean to disparage that decision at all: setuptools is rock solid and a very safe choice.


Good luck finding clear documentation on how to write the pyproject file. For something that is supposed to be the way forward there hasn't been much effort to make it easy to implement.

There's also annoyances like the inability to install a script in the search path without implementing it as a module. Something setup.py doesn't require.


The hatch project (PyPA's own tool) has a good guide. https://hatch.pypa.io/latest/config/metadata/#project-metada...

You can also use `hatch new --init` to convert a `setup.py` (whether imperative or setup.cfg-backed) to a `pyproject.toml`.


> Good luck finding clear documentation on how to write the pyproject file. For something that is supposed to be the way forward there hasn't been much effort to make it easy to implement.

PEP 621, which I mentioned, covers the format of `pyproject.toml` in detail. I also linked an example which, to the best of my knowledge, covers all current best practices for that file.

> There's also annoyances like the inability to install a script in the search path without implementing it as a module. Something setup.py doesn't require.

I'm not sure I'm following. A module in Python is just a Python file, so your script is a module. Are you saying that you can't distribute single-module packages with pyproject.toml? Because I don't think that's true.


The PEP in no way explains how to write a usable pyproject for ordinary projects. It's basically just targeted at people developing installers.

I meant package. A directory with an __init__.py. You can't install standalone script.py (or a generated wrapper) as /usr/local/bin/script with a pyproject.


> The PEP in no way explains how to write a usable pyproject for ordinary projects. It's basically just targeted at people developing installers.

Did you look at it[1]?

> I meant package. A directory with an __init__.py. You can't install standalone script.py (or a generated wrapper) as /usr/local/bin/script with a pyproject.

I still don't think I understand what your expectation is here: a `pyproject.toml` is just a metadata specification. The only difference between it and `setup.py` is that the latter is arbitrary code.

There's an old, long deprecated way to use `setup.py`, namely `setup.py install`. But that's been discouraged in favor of `pip install` for years, which behaves precisely the same way with `pyproject.toml`. If you want to install a script into `/usr/local/bin`, `pip install` with a package specified in `pyproject.toml` will work just fine.

[1]: https://peps.python.org/pep-0621/#example


Not doing that either … until there are ways for only the person (me) to hold the key to the revision submission into pip.


You're confusing `pip` with PyPI. `pip` is a package installer; you can use it to install local packages, or packages that are hosted on an index. In this case, we're solely talking about local packages.


For me that is all handled by Poetry. I really like Poetry. But now I've been struggling (off and on) to install my private package from a private repo with PyPI dependencies for a week now...


If you are wanting to release it to pypi as a python package, I would personally use Poetry. But your case- a single pure Python package, is a simple case that won't have many problems like are brought up in the article, whatever tool you use.

If you want a stand alone executable, I haven't found a good, single, cross platform tool for that yet... seems like there is a separate tool for each platform.


> If you want a stand alone executable, I haven't found a good, single, cross platform tool for that yet.

PyInstaller is cross platform, and arguably good.


Nuitka works on Windows, Linux and Mac

https://nuitka.net/


Keep it simple and fashion-proof. Been using setup.py for a one-script package for one or two? decades:

    from setuptools import setup

    setup(
        name          = 'foobar',
        scripts       = ['foo'],  # install a script from current fldr
        # ...
    )
A few years ago I had to start using twine to register and upload it to pypi.


I wouldn't recommend that at all these days. setup.py definitely is not future-proof: it's undergoing deprecation.

Better would be a basic pyproject.toml file along the lines of the following:

    [build-system]
    requires = ["setuptools"]
    build-backend = "setuptools.build_meta"

    [project]
    name = "foobar"
    version = "0.0.1"
    dependencies = [
        "...",
    ]

    [project.scripts]
    foo = "foobar:main"
See: https://setuptools.pypa.io/en/latest/userguide/quickstart.ht...


> it's undergoing deprecation

In the "we would rather people not use this but it's going to stay around for a long time" sense. I strongly doubt it will disappear within the next decade or two. There's a long tail of setup.py-based tools.

Last I checked pyproject.toml only supports the simplest of Python/C extensions. Anything fancy, like --with/--without compilation flags to enable/disable optional support, compiler-specific flags (in my case, to add OpenMP), compile-time code generation (like using yacc/lex), etc. requires a setup.py and a bunch of hacking.


The new style to do that is a PEP 517 build backend.


Yes. That PEP comments:

> The difficulty of interfacing with distutils means that there aren’t many such systems right now, but to give a sense of what we’re thinking about see flit or bento.

Bento is dead. Flit doesn't handle extensions, and points instead to Enscons, which in turn depends on SCons - a build system I have no experience with.

Plus, I sell a source code license. My customers make wheels for their internal PyPI mirrors. I would need to consider how any change might affect them, without the experience to make that judgement.

It seems far easier for me to stay with setup.py than explore a PEP 517 alternative.

So far what I've seen is either people using something like Enscons, or a very complex build system like SciPy's where setuptools just doesn't work. I haven't seen much migration for smaller setup.py systems like mine .. but I also haven't been tracking that well enough.

Any pointers for how that would work?


You should continue to use the setup.py to define the extensions but put all the remaining metadata in the pyproject.toml. The pyproject.toml will reference setuptools as your build backend like https://github.com/jborean93/pyspnego/blob/main/pyproject.to... and the setup.py will reference anything that cannot be expressed in the pyproject.toml like C extensions https://github.com/jborean93/pyspnego/blob/main/setup.py.

The benefits of this is now your project has metadata that tools like pip/poetry/etc can use to figure out what is required (Python project wise) to build your project. For example pip will create an isolated venv with setuptools and Cython for the project I listed when installing from the sdist. You can now also take advantage of `python -m build` to build this project rather than a setuptools specific incantation. This is universal across all build providers so if you want to change to poetry in the future you can will hopefully no build script changes.


I know my project is an oddball. So far I have no required external dependencies, and my optional dependencies are for what the linked-to pages refer to as "native dependencies", which can't be specified by pyproject.toml.

My "setuptools specific incantation" is "pip install" or "pip install -e". I do have a setup.cfg.

The recommendation last year was "If you're building an application, use Poetry. If you're building a library, use Flit", and since my package is a library, I've never really considered poetry.

But! I'm switching from argparse to click - my first required dependency! - so within a month or so I'll be putting my toes into the pyproject.toml waters.

Thank you for your pointers. Isn't there also a way to specify the requirements for building the documentation? I didn't see it in your example.


“Undergoing deprecation” as in not deprecated yet. Great majority of pkgs using it, and Python takes a decade+ to deprecate things.

Also this is a few lines of well understood Python, not exactly a huge investment, right? Does several lines even need to be future proof?

My bet is you’ll need to modify the toml solution more often than the setup.py in the next decade.


Is there a place you recommend where I can learn more about this?


It’s documented, try google, stackoverflow, and reading other packages’ setup.py for tricks.

However the scripts=[], keyword is the key to the case above.


As detailed in the other answers, there are two parts to this: 1) Creating a python package from your project (and possibly share this on pypi), and 2) Making this package available as an end-user application.

For step 2 you can use nuitka or similar, but if your audience is somewhat developer-oriented, you can also propose for them to use pipx: https://github.com/pypa/pipx.


The approach I prefer is to not mess with setuptools etc at all in the first place, and simply make a nice executable package.

e.g. https://github.com/tpapastylianou/self-contained-runnable-py...


This is an incredible example of organizing information well and making a case to a wide audience. It's difficult enough to shave all the yaks necessary to get a high-level view of all issues related to a problem, and to express all those problems in good writing is an additional tough challenge. These folks have done an amazing job at both.

Shoutout to Material for MkDocs enabling the swanky theme and Markdown extensions. https://squidfunk.github.io/mkdocs-material/


With modern tooling packaging pure python code to be used by other python developers is a relatively painless process.

The main problem with python packaging is that it's often C/C++ packaging in disguise, among multiple OSes and CPU architectures, and that's far from being solved. Building such python wheel is essentially like building a "portable" (aka one you don't need to properly install into the system) linux/windows/macos application. That comes with a variety of caveats and requires some specialized knowledge one wouldn't pick up playing around with just python alone.


Is there any consensus on how to deal with packaging and environments in Python by now? Can you suggest me some tutorial for that?

I've been out of the loop for a long time, and would like to get an update on how things are in Python in 2023, but I'm not sure if there even is a consensus — what I can find by googling seems to be several kinda competing approaches. This seems surprising, because most "modern" languages seem to have a very well defined set of practices to deal with all of that stuff. Some languages already come with their built-in stuff (Go, Rust), others simply have well-known solutions (like, technically there still exist PEAR and PECL for PHP, but everyone just knows how to use composer, which solves both packaging and dependency-management problems, and it's also pretty clear what problems it doesn't solve).

For Python there seems to be like a dozen of tools and I'm not sure which are outdated and not used by anyone, which are useless fancy wrappers (not used by anyone) and what is the actual go-to tool (if there is any) for all common cases. Dependency-management, version locking, shipping an executable, environment separation for some local scripts, should I even ever use pip install globally, etc.


As I see it, the standard way to package Python projects is:

https://packaging.python.org/en/latest/tutorials/packaging-p...

and the longer story is that this method has the flexibility to allow other implementations of packaging tools to be used, and so it fosters choice and competition in the ecosystem. In contrast, the old method of packaging was tied to a particular implementation.


From the sibling thread about packaging and deploying a single script, there was no consensus. There was disagreement on the best way to package, and doubts about the mid term future of some suggested solutions. The following alternatives were suggested:

- package with a `pyproject.toml` file configured to use modern tooling

- package with a `pyproject.toml` file configured to use traditional `setup.py` tooling

- package with traditional `setup.py` tooling

- package with poetry

- package with whatever, deploy with nuikta or pipx

- skip the packaging and deploy with Pyinstaller

- skip the packaging and deploy with nikta

Note that, unless the Python world has radically changed while I was looking away, the packaging does not ensure a simple way to deploy the package and its single script. I remember vividly `pipenv` crashing on me, so switching to venv+pip (or was it virtualenv+pip?) then setting up a bash wrapper to call the Python script with the right venv...


I don't think there's a consensus, but there are some good modern options. I think poetry is a good choice, and it seems to be fairly popular. I use it for all my Python projects and haven't found a compelling reason to switch to another option in the past few years.

As to the question of whether you should ever use pip to install packages globally, the answer is almost always no. For command line tools, the best option IMO is pipx. The second best option is pip install --user.

If you're developing a library or application, you should always isolate it in a virtualenv, which is something poetry will handle for you when you run `poetry install`.


Python joke:

Sdist is only one letter away from sadist.


This is exactly why I avoid pip et al whenever I can and just use Nix.


Do you know of any up to date blogs/howtos/guides on nix+python where the python project contains modules that need to be compiled (eg Cython, pybind, etc)? I've found the basic info at https://nixos.wiki/wiki/Packaging/Python but it doesn't really go in depth for more complex use cases than having a setup.py...


Have you checked out the manual?

https://nixos.org/manual/nixpkgs/stable/#python

Particularly section 17.27.1.2. Developing with Python. Combine that with a generic guide on using Flakes and that should get you started




Python packaging’s greatest challenge is 10 competing tools and standards.


Python packaging gets a lot of criticism. It's a meme. The thing is, it's actually improved dramatically over the years and continues to improve.

The problems it solves are very complex if one looks a little below the surface. It is solving different problems to the ecosystems that it's often compared to: golang, rust, java, js.


This is true, still: Everyday things that should be straightforward, or pythonic if you will, are way to convoluted and complicated.

As a python dev with experience since 2.6 I agree it has gotten better, but it is also rotten at it's core. The problems python has to solve there are hard, but this is why it should be priority number one to solve them elegantly and in a fashion so that 99% of python users never have to worry about them.

Right now packaging and dependency managment are my number one concern when I think about rexommending the language to beginners. The things I needed to learn just to figure out what is a good way of developing something with dependencies and deploying it on another machine is just way too much. When I was going through this there was no official "this is how it is done" guide.

Now there is poetry for example. Poetry is good. But there are still hard edges when you need to deploy to a system without poetry on it for example.


Well... The setuptools package was around for way too long.

I never managed to get data files packaged into a dist elegantly... annoys me to this day.


I wonder to what degree these complex problems are self inflicted due to more than a decade of flipflopping between a myriad packaging and environment management solution. In such an environment, you are bound to have so many different approaches between important projects that trying to bring them all under one umbrella becomes next to impossible. What is a python package is relatively well defined, but how you build one is completely up to you.

Edit: I've just read another comment which I think pointed out the most pertinent fact - that python has often served as mere glue code for stuff written in lower level languages. This then results in an explosion of complexity as a python package not only has to package the python part, but has to be able to build and package the lower level stuff.


I'd have to look at numbers to be sure, but I think that the number of popular packages that include compiled libraries is dramatically higher than it was 10 years ago, and that's about when wheels were taking over. The Data/Science stack has really exploded in that time, and it's very heavily skewed towards packaging c/fortran/rust libraries. Previously there was PIL, some database drivers, and numpy was just getting started. I think more of the big packages were pure python. The earlier egg/easy_install/msi/exe installers have all faded away, and now it's really just wheels and sdists.


What made it click for me is "Python packaging" means "how to install everything that I want to use from Python."

I wouldn't have considered "how to get BLAS onto a system" to be a "Python packaging" issue, but for people who want to rely on it via scipy/numpy/whatever, it is.


How is it very different from NodeJS? Cause I find npm way easier to deal with than Python packaging, and it's also dealing with native code. I used Python heavily for 6 years and still have no idea how the packages work as a user trying to install libs; I used to just thrash around till it works. I don't use it anymore at my new job.

The one thing I understand is npm installs everything locally by default (unless you -g), and in Python it's hard to stay local even if you use a venv.


> How is [Python] very different from NodeJS?

Unlike Node, Python is essentially older than modern package management. When Python developers first decided to tackle distributing their code, `apt-get` did not yet exist.

Early approaches which stuck around way too long let any package do more or less anything at install time, and didn't bother with static metadata (can't figure out what your deps are except by attempting an install!). Subsequent solutions have struggled to build consensus in an already vast and mature ecosystem. Backwards compatibility means compatibility with bad conventions.


That makes sense. Backwards compatibility can be a large burden. I'm still sad that they haven't managed to make a new thing that just works. NodeJS is repurposing a language and runtime originally meant for hacky web scripting, but it ended up ok in the end.

In general, if a community doesn't agree on how to fix a problem, someone else will provide a solution at a higher layer. Now it's common to install a whole Docker image to run some Python code instead of a few Python deps.


Not really. CPAN was already a thing when Python itself was released, long before there was any Python packages to share.


What did cpan.pm actually look like in 1993? Did packages ('distributions') have static metadata? Could the installer perform recursive dependency resolution? Was the resolver complete?

My impression is that none of that resembled contemporary CPAN, and isn't really what I have in mind with the admittedly ambiguous phrase 'modern package management'.

But I'd love to hear more! The history of package management is very interesting to me. Tales of ancient but sophisticated package management systems are very welcome. :)


Fundamentally the problems are the same, but the communities have very different priorities. The vast majority of Node is still web dev and the community focuses on dependencies that are deployed as JavaScript files (including transpiled code); this allows NPM to mostly ignore issues related to linking to native binaries and put more focus on those user-facing features you see and like. Python on the other hand has a lot of stakes on interacting with native binaries (scientific computing, automation, etc.) and with systems older than Node’s existence. This consumes up a ton of the community’s resource, and every new UX improvement also needs to take account of those use cases. If you’re ever tasked with keeping projects with native dependencies working over a moderate period of time in both languages, you’d gain a lot of respect on how well Python packaging works.


I don't know how common this is, but some NPM packages involve native binaries, so it must be doable. pg-native for example.


pg-native is a good example actually. Its readme lines out how you need to first get a compiler, libpq, and have some certain commands in your PATH. With psycopg2 (Python’s equivalent), the most common scenario is ‘pip install paycopg2-binary’ and you’re good to go.


So the difference is psycopg2-binary* bundles the libpq native code and pg-native doesn't? I'm no expert, but I think npm packages can include native code if they want thanks to node-gyp, only the node-libpq (which pg-native relies on) author seemingly decided not to package in libpq itself.

* Back when I used this, there was just psycopg2 which had the bin included.


Correct, but the challenges are to compile the dynamic library correctly so it runs on another machine, and on a given target machine choose to load a correct compiled artifact. Python has extremely good support for those compared to other similar ecosystems. The fact that most Python packages decide to do bundling on a comparably broard set of platforms, while packages on some other language are not, is a window to understand how the operation is made much easier by the ecosystem.


That's ... completely missing the point of the article. There's nothing about competing standards that solves the problem of the C-ABI and how to package non-python library dependencies.


At this point, there are 10 competing tools but no longer so many competing standards: the standards for Python packaging from 2015 onwards are PEP 517 (build system standardization), PEP 518 (using pyproject.toml to configure the build system), and PEP 621 (storing project metadata, previously standardized, in pyproject.toml). These standards build on top of each other, meaning that they don't offer conflicting advice.

The TL;DR for Python packaging in 2022 is that, unless you're building CPython extensions, you can do everything through pyproject.toml with PyPA-maintained tooling.


The problem is, as long as old python versions continue to exist, the competing standards also will continue to exist. From a user experience stand point it is horrible. Depending on the python, pip, or Setup tools version on a system the Install command might do something drastically different in each case. Often it’s not even clear what’s happening under the hood.


Many of the tools don't attempt to handle the challenges mentioned here


And 2 competing Python versions.


Python packaging is a solved problem:

https://python-poetry.org/


Recently just migrate a project from pypoetry away to the traditional setup method. Poetry works great for simple package, but once you started to add in complexities, it just falls apart due to everything was abstract away and simplified into config files and command line.


I had the same experience. I think setuptools nowadays is quite good, esp. In combination with setup tools_scm.


Oh really how does it solve binary dependencies?


Based on OP's handle, I would guess optimally




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: