I don’t know how I feel about this. As a developer, I say good, I’m glad this impacts them. Many distributions cut apart and really break many aspects of Python such as breaking out pip and one off packages like virtual environment, often removing them from the standard library.
On the other hand, python’s packaging system is such a mess, I’m not sure the alternative / replacement is actually better. I wish the ecosystem could do better here and had an easy way for people who want to help improve this to do so.
I see where you are coming from, but there are legitimate concerns with the language package systems such as pypi. For one thing, pypi is unvetted, and the problems that bring should be obvious by now.
Another thing is that many move only the tip and older versions seldom gets fixed. If you introduce a dependency in a core package you take on the work of maintaining that version as long as the core package lives, which may be much longer than the upstream author wants to work on it.
The language ecosystems usually don't bring any stability guarantees. Just because the upstream author thinks a new incompatible version is the bee's knees doesn't mean core packages can suddenly be rewritten in the middle of stable period. So there needs to be a system for maintaining these packages in place anyway.
And if it's something distributions generally know how to do, it's packaging software, so it's pretty natural they use what tools they have available. The outcome isn't always great for the end user who wants unrestricted access to the language ecosystem, but that doesn't there aren't problems to solve here.
I've battled with Python's packaging systems and other tooling and performance problems for more than a decade. Every couple of years some new thing comes along promising to fix package management, but it always fails spectacularly on some important, mainstream package or use case. I suspect a lot of this complexity comes down to the significant degree in which the Python ecosystem depends on C: not only do we need to package Python programs but C programs as well, and some people think we should ship C source code and build on the target machine and others think we should ship pre-built C binaries and dynamically link against them on the target machine thus Python supports both mechanisms; however, both are perilous in the general case.
And interestingly, Python depends so hard on C because CPython is very slow, and CPython is very slow because the C-extension API is virtually the whole of the CPython interpreter (many interesting optimizations to the interpreter would break compatibility in the ecosystem). So it certainly feels like all of Python's major problems come down to this decision to lean hard on C and specifically to make so much of the interpreter available for the C-extensions.
The way out as far as I can tell is to define some narrower API surface (e.g., https://hpyproject.org), get the ecosystem to consolidate around that, deprecate the old API surface, and then make the requisite breaking optimizations such that the ecosystem can feasibly do more in native Python. This requires leadership; however, and the hard truth is this has not historically seemed to be Python's strong suit--the Python maintainers seem unable to drive out big, necessary changes like this (which is certainly not to say that leadership is easy or that I could do better, particularly when Python is so established in many respects).
Personally, I've come to use Go for 99% of my Python use cases and it's been great. There are dramatically fewer C bindings (<1% of the ecosystem by my crude estimation), so the build/packaging tooling and performance are orders of magnitude better than in Python. Static typing works well out of the box, real static binaries are not only feasible but trivial (as opposed to Python where you try to build a zip file with your dependencies and the result is hundreds of megabytes and it's still missing the runtime, std libs, and various .so files). Further still, builds, tests, and every kind of tooling are far faster than with Python, and far simpler to install and manage. Unless you're doing data science, I don't think you'll regret the transition.
>This requires leadership; however, and the hard truth is this has not historically seemed to be Python's strong suit--the Python maintainers seem unable to drive out big, necessary changes like this (which is certainly not to say that leadership is easy or that I could do better, particularly when Python is so established in many respects).
I just want to point out that while I agree with this, the reason it also clear - lack of manpower. Somehow, despite Python being one of the top 3 most widely used programming languages, most of the critical infrastructure is maintained by very very few people, usually in their spare time.
yea agree -- and while my comment elsewhere is being downvoted to the pits, I do understand the Debian/Ubuntu python teams' sense of overwhelm; Debian/Ubuntu simply does not have enough hands to handle changing the OS dependencies on system Python, the moving library sets that are in fashion AND small changes to reset Python "not-three" to be just a tool, not a system component. So they chose to make good-enough package sets of Python binaries, which seem to please no one but work reliably.
It's clear to me that there are immature people involved due to the name calling and basically vandalism of "not-three" Python on Debian, in addition to the incredibly smart and long-term committed people that make Debian uniquely great. I wrote email to Chris Lamb directly, but how could he work harder than he did. Herded cats would do better to listen sometimes.
Language-specific package managers make sense in some cases (generally only for developers, not users), but why should every language have its own siloed ecosystem? This just leads to people implementing things in other languages for no reason.
I don't think anyone wishes that Go or Rust would have chosen to just use the package manager from JavaScript or Python or Java or .Net. There's still quite a lot of innovation in package management--Go's package management seems pretty ideal in my opinion, but a lot of languages expect tooling to cater to the narrow tail of use cases (e.g., arbitrary code execution at install time). In general, package managers are pretty tightly coupled to the ideals of a language community, which tend to vary a lot.
If you want to freeze the universe, just freeze all of it, it's easier than ever to keep running Ubuntu 12.04 forever in a Docker image if you need to. Obviously this is a terrible idea for security and compatibility with newer tools, just don't expect the rest of the world to do double maintenance to keep old versions semi-up-to-date-where-i-want-to-yet-not-updated-where-i-dont.
This pretty much mirrors my experience trying to package Python code for any kind of use. Jesus it's a mess with pretty much everyone resorting incompatible 3rd party tooling. I really really hope it gets better and we eventually reach a turning point.
> Firstly, TOML. This is something I’ve been repeating for quite some time already, so I’ll just quickly go over it. I like TOML, I think it’s a reasonable choice for markup. However, without a TOML parser in stdlib (and there’s no progress in providing one), this means that every single build system now depends on tomli, and involves a circular dependency. A few months back, every single build system depended on toml instead but that package became unmaintained. Does that make you feel confident?
This seems crazy bad. I had naively assumed that when the PEP was adopted it included the parser. I genuinely can't believe they adopted an ini type format without a native parser when Python already has ConfigParser.
I'm extremely ill informed about these things, so is anyone able to explain why Gentoo actually needs to package Python libs? Or are they talking about Python itself? Over the years I've found distribution packages like `python3-pycurl` etc. to be more of a hinderance than a help when I want to run Python on linux.
A significant amount of systems tooling has migrated from perl to python in modern Linux, RHEL relies heavily on it when you start looking under the covers. The basic python interpreter installation ("stdlib") does not include a few key libraries like Requests (think curl) or YAML parsing; these must be installed as complementary (extra) packages on top.
In the article the author makes note of the TOML parser (basically an enhanced INI file design); if a TOML parser is required to install a library (pyproject.toml instructions), and no TOML parser is in the stdlib, how do you install python3-toml (sic) to provide it? It's a circular dependency, chicken & egg problem caused by removing the legacy Setuptools abilty to install a library using only stdlib functionality.
This is only one example, others exist - the OS detection library (needed to know which flavor of Linux you're on) is external and has similar (but not identical) needs, as the python installation paths are different on RHEL-like systems than Debian-like. This one has solutions for it but the author is pointing out those solutions might break based on the current trajectory of upstream thinking.
Requests generally provides a simpler interface and absorbs some of the complexities if you were trying to use the stdlib directly e.g. pooling, redirects, retries, proxies, etc. It also smooths over some of the differences in the stdlib across Python versions, though that's less of an issue as of late as older versions are being deprecated.
Gentoo, in particular, is very heavily dependent on Python. Its native package manager, Portage, is written entirely in Python, so the base OS install depends on functioning Python and many particular Python libs: https://en.wikipedia.org/wiki/Portage_(software)
Gentoo is in a special place with packaging Python as well, with Portage's slot system allowing many parallel python versions, and having each python lib installed for all installed python versions at once. Removing a python version from PYTHON_TARGETS and uninstalling that python version will cause all the installed python libs to eventually rebuild and remove their installed packages for that version. Adding a python version to PYTHON_TARGETS will do the inverse.
So Gentoo needs to package python libs because the OS depends on them. A lot of other distros have dependency on Python and some Python libs as well, or have very close-to-core use depend on it. A lot of the SELinux tools are Python, last I checked. So there needs to be at least some mechanism for including python libs in-distro.
This is right, but also at a fundamental level Gentoo's 'killer feature' has always been a package manager (Portage) that is flexible, modular, portable, maintainable, highly-compatible, efficient. Another package management system over the top of this (pip, flit for python) that doesn't share the same values - in this case doesn't care so much about backwards-compatibility or efficient builds, really breaks Gentoo's potential to provide that experience to its users that is the headline of why people use Gentoo. And the Gentoo devs are therefore forced into filling some of that gap and there's some uneasy compromises in the process.
I think this or something similar was posted a week or two ago
I also wondered the same thing, but the issue seems to be mostly for people who want to distribute tools built in Python via Linux package managers
i.e. it should be possible to do that and not have to explain to users that it even uses Python, and certainly not that you have to first set up a virtualenv, then pip install dependencies etc
It's a use case I've never had, and mostly just accepted that Python was ill-suited for, but I can see it would be nice if it worked and at the same time didn't compromise other uses too much
If you have an application (lets say offlineimap, which is one I use), then you have two options:
1.) Install via a package manager (if they have it)
2.) Install via pip (or whatever)
But #2 is a step backwards. Imagine having to learn the ecosystem and tools for every programming language just to get common apps running. And then being responsible for updating all the virtual environments, etc.
Package managers are a very useful abstraction for end users who don't always care and just want shit to work. But fragmentation is increasing, and each language is going the blackjack and hookers route (Futurama reference).
For something like a command line tool, you can package a dedicated python runtime with external dependencies preinstalled and treat it as a virtualenv. Maybe not the most elegant solution, but it works. iTerm2 does this for its Python scripting API.
Another option is to just include the required libraries in your package and have the executable add the paths via sys.addsitedir. Pipenv installed via Homebrew does this.
Then you have multiple versions of every dependency for every application you use. Good luck fixing security bugs in all of those.
Also, you have to do that for python apps, for node apps, for go apps, for rust apps, etc.
As an user, I just want to run "apt install application" and have it work with dependencies kept up to date by the OS package manager and not have to micromanage the dependencies for all applications I use.
Expecting system, and built-in applications to work on globally-installed interpreted language is fragile. Despite adoring the Python language, I groan whenever I need to install something written in Python (that isn't bundled into a binary), or want to program with it in Linux.
A mix of hold your breath and pray, and google until you find some github issue or stack overflow post that addresses incompatibilities of version x.x.x with OS version y.y.y on architecture z, then download some custom wheel that a person uploaded to their personal PyPi instance.
It's one of the most stressful and least-fun parts of being a developer for me.
I keep reading about NixOS lately, but I haven't done a deep dive into it. Does this seem like something that could be useful for this?
Also, does anyone have any resources for pragmatically working through such things or guidelines to help maintain one's sanity?
Particularly if you're on ARM. I did a stint with a company that worked in the AI / ML space, and when we tried to get stuff working on Amazon's Graviton it was a mess. Docker containers are suddenly installing gcc, make, every '*-dev' library you could think of. Builds take forever, and you end up with who-knows-what gigantic .whl files built under the hood. I ended up writing a script to find the wheels and upload anything larger than 500kb to our AWS Artifact pip repo so they didn't have to get built every time the pipeline ran. It worked, but took a while to iron out.
Wow this is depressing. Python's always been tricky to distribute, but everything here just sounds like backwards steps from conventions which work, rather then trying to standardize the methods we already have.
* With PEP 517 you can only get the files as a wheel or a tarball (I assume the latter refers to the sdist). The author dislikes having the compression done just to decompress again immediately. Fair enough, but this sounds pretty minor to me. Would there really be much time spent in this compress/decompress? Especially relative to everything else a gentoo update entails? Isn't this just a feature that can be added later, bringing a minor speed-up?
* PEP 517 doesn't support data_files. This is a surprising problem for a distribution maintainer to raise. I thought the whole objection to data_files was that it allows python package installations to stick files in arbitrary places, i.e. do stuff that should be the sole preserve of the system package manager. Why does the author want it to be allowed?
* distutils and "setup.py install" deprecation. I don't see the alternative here. Yes, it requires downstream changes, but proliferation of mechanisms without standardisation is the big problem with python packaging. Either it continues to be supported or it gets deprecated and removed. My vote would certainly be for the latter.
Nothing in the article even claims the changes aren't progress. The objections are about particular hassles that progress is generating for the author as a distro maintainer. And maybe they are valid but as far as I can see they are either quite minor or unavoidable if the state of the world is to improve.
I've been using python for ~20 years and it finally seems to be on a path towards some sort of sanity as far as packaging is concerned. Not there yet, by a long, long way, but heading in the right direction at least.
Edit: Oh, and one other problem: The lack of a standard library toml module when PEP 517 mandates the use of TOML. That is indeed mad. I don't understand how the PEP was approved without such a module being added.
> I thought the whole objection to data_files was that it allows python package installations to stick files in arbitrary places, i.e. do stuff that should be the sole preserve of the system package manager.
But those "arbitrary places" include things like the standard places where man pages and other documentation go, the standard places where shared data (and things like example programs) go, etc. Without data_files the Python installation tools give you no way to provide any of these things with your Python library.
Man-pages is a Linux thing, while python packages are cross-platform and can be installed on Windows.
It seems out of scope for a python packager to include such files. As a user i would also not be very happy to see a pip install start dropping files all over my system. How would that behave in a virtualenv anyway? Sandboxing should be a key feature of a package manager.
> Man-pages is a Linux thing, while python packages are cross-platform and can be installed on Windows.
Windows has help files, which are its version of man pages. So an installer that installed man pages on Linux would be expected to install the corresponding help files on Windows.
> dropping files all over my system
I said no such thing. I said there are certain designated places in a filesystem where certain common items like man pages (or Windows help files) live. Not to mention system-wide configuration files (which on Linux go in /etc), and I'm sure there are others I've missed. An installer that is not allowed to access those places does not seem to me to be a complete installer. Linux package manager installers certainly put files in such places. Windows installers do it too (although the specific items and places are different).
> How would that behave in a virtualenv anyway?
A complete virtualenv would have its own copies of the above types of files, in the appropriate places relative to the root of the virtualenv.
On Arch Linux, you can install packages from the user repository. Basically these are scripts to I stall things that aren't in the normal package repository, and the default is to install from source.
Whenever I install the Haskell package "aura", I need to be careful tot get "aura-bin". The install-from-source version will install a crazy number of Haskell packages.
That said, not a problem with Haskell per se. I'm very happy with aura-bin.
i bet it may be better on different OS and distributions but on arch haskell is a friggin nightmare.
at least the arch wiki page for it is more complaining about haskell than giving guidance: https://wiki.archlinux.org/title/Haskell
Well, if they ask for trouble, trouble will come. Haskell packaging is no different from Rust packaging: native executables statically linked with many small packages; both rely critically on cross-module inline for performance; no stable ABI.
Arch Linux wants to provide an _up-to-date_ compatible set of Haskell packages, instead of relying on the established Stackage like what NixOS does. This certainly causes frustrations as they must do a lot of patches.
End users don't benefit from this decision; they complain about a large number of tiny packages. Haskell devs don't benefit either, as they don't use packages provided by the OS -- just like any other Python/Node/Java/etc dev. Rust programmers use rustup+cargo, Python programmers use pip; likewise Haskell programmers use ghcup+cabal.
That being said, I respect their efforts and the packagers (Felix Yan, et al.). They made great efforts in updating a large number of old packages to be compatible with GHC 9.x.
Haskell packaging/building is a lot more pleasant with Nix. Cabal and stack are too fragile imho and user settings can far too easily break packages. With nix you can still do customisation but I find nix at least puts a reasonable amount of assurance that your configuration isn't going to cause the entire build to explode.
When things change drastcally, the stress ends up on the hands of the maintainers and distributors - that's not good because there is already little disbalance for OSS (compare with: the ratio of open "issues" vs "contributors").
When you change the rules at the top, everyone needs to comply or else...well, you need someone else to comply on your behalf - and forks happen!
So are we all going to settle on Nix for build and system configuration? Something else? I'm so tired of dealing with teetering imperative build scripts that forget to configure half the world because those were the defaults on the first dev's machine.
I'm guessing we'll eventually converge on something like Nix except where every artifact is in a Ducker image. (And with a micro k8s on a "cluster" of one machine to make this mess sorta work.)
The decision to put the /nix directory in the root rather than in /usr or /usr/local seems a deliberate provocation to any self-respecting distro. Since many binaries contain hard-coded paths, this isn't a decision that a distro can easily correct either.
Nix package manager should not break if you change the location of the nix store. However, you will end up having to recompile everything from scratch.
As far as I understand, hardcoding paths and /nix are both deliberate decisions to prevent unexpected implicit dependencies. It does seem like the cheapest way to do it.
I don’t understand why distros would be so attached to FHS? I don’t think there’s ever been significant amounts of consistency between different distros and other Unix OSes when it comes to the details. I’m fairly sure they could build a FHS layer on top if they really wanted to, since the paths inside packages in Nix are fairly standard.
mgorny is responsible for a large amount of deprecations of software in gentoo (including things people use because they "no longer have dependencies") so it's kind of ironic to see him complaining about an upstream doing the same thing.
And I agree, deprecating setup.py is one of the worst things the python world is doing, but breaking things for hosts of people hasn't stopped them before. It hasn't stopped anyone.
pipx is a solution without a problem. pip worked well. setuptools was fine. pypa is breaking packaging by making it more complex and less usable IMO.
pip + setuptools using setup.cfg works very well.
It seems to me the Python packaging community is chasing the nodejs community for very little value and a lot more added complexity and very little regard to the community and package maintainers in general.
It is a solution to the problem of: how do I run something without having to install it into my user environment and having to worry if it conflicts with package versions used by other things in my user environment. I often use it for simple code demos, share a pipx command to run code from some github repo I made to show how something can be done in python.
And it does not require any special consideration from package maintainers.
All that being said, the deprecations and removals are being done way before the replacements are in place and tested and features/rough edges rounded out, which I guess is python tradition at this point.
IMHO, the repos maintained by pypa should all be part of the core teams responsibility (e.g. pip is maintained by pypa, not core, but its installed as part of the python installer you download from the foundation's website...), along with solving building/installing/packaging/distributing as a whole. But the core team decided their volunteers didn't want to work on that (ok fair) and pypa stepped up to do it, but never really had enough manpower to tackle it well (ok also fair). And none of the tech companies ever thought "gee we rely so heavily on this awesome tool, maybe we should send them some money to keep the lights on..." So instead its just been a giant mess.
Sigh. I deeply appreciate the work that Michał does. This post signals a bunch of work that I will have to do in the future to mitigate the eternal churn in the python ecosystem.
> all the breakage is dumped on distribution packagers.
This line sums it up for me. It is the fundamental disrespect for other peoples' time, and the lack of any attempt to maintain backward compatibility that goes along with it that have convinced me that the whole mindset around python is not suitable for any project that needs stability.
For example, I foresee a future where the academic python codebases that happen to have avoided the superfund site of jupyter notebooks will be broken and useless within a decade because no one will be able to install them and use or reuse them due to some as yet unforeseen change from upstream. The maintenance overhead of using python is absurdly high, to the point where it seems that every single project has to have an active maintainer otherwise it rapidly becomes completely useless.
Maybe it will get better, but it would require a significant shift in community priorities.
edit: Oh, don't miss the circular dependency introduced by tomli. You can just curl it onto your system right?
edit2: No more data_files? Wow. That's a disaster.
Why should the Python developers care about how difficult it is for Gentoo? You are basically asking open source maintainers to do a bunch of work for distribution maintainers for no reward.
Linux distributions of Python are horrifically broken. If you ever use the system pip, you can only break things in horrible ways.
They should care because their behavior is a strong negative signal that they don't care about supporting existing users and taking responsibility for code that they have written and supported in the past and because it will ultimately cost them users and contributors.
From a technical standpoint if there is existing functionality that is widely used and even if it is deprecated it should not simply be removed. Deprecation means that no new projects should make use of that functionality. Actively removing functionality without providing a path forward that supports the same features is a sign of an immature engineering culture.
Gentoo patches system pip to ensure that users don't fubar their systems. Some distros do actually know what they are doing, sometimes better than upstream. I would argue that distro python is so often broken precisely because the core solutions for python packaging are immature to say the least, and entirely deficient and poorly thought out to say the worst. This is likely largely due to the reliance on virtual environments which are the ultimate non-solution and allow the real problems to fester without a solution.
I've never understood distutils or setup.py anyway. I can count the times I did `setup.py install` on two fingers, and was not satisfied with the result in both cases. (Where does it end up? How do I uninstall it?)
For simple packages, it is enough to copy them into the pythonpath. Usually you would use virtualenv + pip. I would never want to install a python package globally (unless it is some kind of meta tool like pip). If you have binary components, maybe there is a use case for the distutils machinery, but even then a makefile would probably be better.
The only thing missing for me would be something like a python GAC (global assembly cache). Let pip install all modules into somewhere like ~/.pythonmodules/mymodule/v1.0.1, and then resolve it at import time (either by a line in requirements.txt, or a new syntax `import django == 4.0`). I believe go moved to such a scheme some time ago.
After one too many times Python breaking changes got me to a point of Gentoo system that can't be updated (and it usually happens midway through bigger system/world update, so potentially many applications/tool also don't work) - I've switched to https://www.calculate-linux.org/
It's still real Gentoo system, and unlike Sabayon Linux (many years ago) where you couldn't easily switch between emerge/portage and whatever Sabayon had - with Calculate it's all seamless. You can still do manual Stage3 tarball install, or just use Live image with GUI installer ...
Most (like 99.95% percent) of packages come as pre-built binary, those remaining 0.05% is just packages where I had niche reasons to deviate from config (use flags) that Calculate used to build those binaries.
And unlike `emerge -uND world` which can easily get stuck and even break (usually if you haven't updated system and toolchain in a while) - `cl-update` has been so good that several years passed and it didn't happen - well kind of.
Combo of some python based apps being old/weird/whatever, and python breaking shit between versions - I do have multiple Python versions installed:
Which actually came in handy one time where (through my mistake) on top of pure Gentoo's emerge/portage, Calculate Linux tools (written in Python) also broke. Luckily older versions of python (with all the libs/packages tied to that version as well) worked - so it wasn't hard to temporarily use older Python version to get things stable.
On the other hand, python’s packaging system is such a mess, I’m not sure the alternative / replacement is actually better. I wish the ecosystem could do better here and had an easy way for people who want to help improve this to do so.