Hacker News new | past | comments | ask | show | jobs | submit login
Traps for the Unwary in Python’s Import System (curiousefficiency.org)
84 points by mmastrac on Aug 17, 2022 | hide | past | favorite | 64 comments



Getting started in python I quickly ran into the "name shadowing trap" and saw this was the exact same problem as this poor guy had way back in 2004!: https://bugs.python.org/issue946373

There are hundreds of "reserved" names that must be avoided when naming scripts, many of them are exactly the sort of names someone might use playing around in a new language, and every update/new module installed can add any number of new ones breaking your existing programs.

There really should be a flag or environment variable to specify that the current directory should be ignored (or considered to be at the end of sys.path) for all scripts across the whole system.

For something with such an easy fix I'm surprised this foot gun has been left out for 20 years. I mean, the bug report even had a patch included to fix the problem!

Wherever you are now wrobell, know that I have shared in your pain and that I thank you for trying to fix this mess.


I don't know if there's a good solution to this because ignoring the current directory would only solve the first order problem but all those package names would still be essentially reserved, you'll just get bit less often, and you'll have to add your own code into the path explicitly. I think forcing relative imports and adding a new syntax for "project absolute" imports is the real way forward but is totally infeasible because it break everyone's code.


It'd also be much improved if the names of scripts could never be confused for the names of packages... maybe a new file extension like packagename.pm (to steal from perl)?

It'd require an update to package names, but nobody would have to update their code.


What are some good strategies for naming things that don’t trigger name shadowing?

If I’m writing a library of, say, github functions for our team’s Wibble project I want to put it in wibble.github but that then precludes importing the global github package.

wibble.githubutils? wibble.githubhelpers? wibble.githubwrapper?

It’s such a silly thing but I’ve never really settled on a best practice for this and wonder what others do.


Name shadowing is not unique to python, it exists in every language with semantics allowing to access ancestor scope without fully qualifying the name. A lot of us have been bitten in C++ style languages where introduction of method-local identifier shadows object-local identifier not prefixed with `this`.

Python name shadowing is a side of the same coin. I have limited experience with python so take my words with a grain of salt. A working strategy is to go sort of Java style and have local modules/packages/scripts in their own directory beside main script. This way any shadowing you do becomes explicit in code


As long as Wibble was unique I think you'd be okay, but personally I've just been going with option #2 and at the start of every script I modify sys.path to move sys.path[0] to the end. I agree with wrobell that it's hacky and far from the ideal, but it works and it should keep me from running into nasty surprises later.

Another idea I saw suggested someplace was to just keep every script or module you create in its own directory to limit the potential for conflicts.


So great to look at a thread from 2004 and see Josiah Carlson, one of my favorite guys ever.


Python packages are a hack that is held together by duct tape and Stackoverflow answers that everyone copies.

Even worse is packaging the packages for PyPI, especially if multiple packages depend on each other.

Everyone complains about C++, but since C++11 many things are so much easier than in Python. Not to mention Lisp, OCaml, Java, which all have far better solutions.


This stuff has got SO MUCH better over the past few years.

The hardest thing about it these days is mainly that there are so many historical artifacts, so if you're trying to figure out how to use packaging there are a lot of outdated resources.

Modern Python packages are really pleasant to work with. Here's the best current tutorial that I've seen: https://packaging.python.org/en/latest/tutorials/packaging-p...


I'm not so sure. I went down this rabbit hole. It's really a mess.

Setuptools? Poetry? Hatch? PBR? Flit? PDM? Why are there so many build systems? How do I choose the proper one?

The article you posted uses hatch, which I hadn't even heard of until last week.

Should I use setup.py, setup.cfg, or pyproject.toml? Things seem to moving to pyproject.toml but lots of existing functionality seems to point to setup.py. Looking for answers on SO seems to result in a mix of all different combinations.

And then there's virtual environments...


Ten years ago things were pretty bad, and I can see how it could be confusing trying to pick through all of the different options that have risen and fallen over the years.

In 2022, If you're building an application and are looking for a simple answer, just use Poetry; it's similar enough to what you'll be familiar with if coming from other languages (e.g. yarn, cargo, ...). For most usecases you can just lean on Poetry's venv management; you don't need to do more than `poetry shell` or `poetry run` to get access to it. Poetry will create, and for the most part manage, your pyproject.toml file for you.

If you want a bit more flexibility Virtualenv is part of python now (`python -m venv`). If you're on MacOS you can get great venv management with pyenv and pyenv-virtualenv. And for sure, things get more complex if you're building a library. But I think Poetry is a solid place to start.


I've used python on and off since 2009 and every time I dive back in, there's a new standard. I feel like there's several 'proper' ways to do things and I don't really mind there being different flavors of packaging frameworks.

I do find it confusing because there's no real documentation of the various frameworks in one place. It makes it unappealing to invest in writing and maintaining anything in python because I don't know if the process I've used is going to deprecated soon.

Just a side note, I did end up finding and using poetry a couple of weeks ago and stuck to that as I was bouncing around setuptools vs pyproject and then needing to decide whether or not I needed a makefile, etc.


But this directly contradicts the link given above - the "official" python packaging tutorial recommends Hatch, and barely mentions poetry...


"Just use X" is not what I want to hear. Python packaging is a pile of garbage. I don't want to have to use a 3rd party tool! Just fix the damn core tools and make them consistent and intuitive!

Sorry, rant over.


…and logging


But that's always the story with Python isn't it. There is always this new way to do things that is so much better than all the previous ones. If only everybody was doing it the right way. At least until the next one comes along.

We've come a long way since PEP 20.


Every time I see a language or a framework proclaim that it is designed so that there is one "right" or "obvious" way to do something, you can inevitably see that right way has changed multiple times over the course of its life, and it becomes obvious the larger an older any given project is.

I left one company 3 years ago in the middle of a major transition between versions. The "right way" changed fairly drastically, and surprise surprise, they're still stuck in the middle of that transition.

I think this is largely why I have found myself preferring static typing with roll-your-own framework style / library-based ecosystems over batteries-included systems.

When your language or framework of choice suddenly insists that you use AA batteries instead of button cells, and it is up to you to modify every electronic device in your house to use them, it gets real old real fast (an exaggerated example, but not entirely unheard of).


what I wish they would do is what Go does with dependencies. Package it up, compile it and turn it into a binary. This way I know 100% it will work on AWS Lambda for instance. I've spent ridiculous amount of time building dependencies that has to match Python versions and OS. AWS layers on github help save time but I don't know if I can trust strangers for this process.

Really should be, write stuff and test it locally on my machine, put it in a container and upload it. I know Lambda supports containers but sometimes I feel like its too much of an overhead, especially with shitty upload speeds but I really don't see any other choice unless I manually build AWS Lambda layers myself.


It's not perfect, but in my day to day use I find it relatively simple to use and as good as any other language's importing of packages.


Wait: does c++ have a package manager like pip or npm or gem that I didn't know about?


There are file system packages (with __init__.py) and PyPI packages.

The former are an extremely poor version of Lisp packages, OCaml modules or C++ name spaces.

The latter are distribution packages (even for distribution packages C++ has Conan etc.).

PyPI packaging gets even worse than usual when attempting to distribute a file system package.


The single worst part of Python import for me is "from". I think "from" was a mistake, and should never have been added. Like many "pythonism" its just another layer of sugar thats not needed, and is only marginally useful. Especially when dealing with new packages, its essentially trial and error for me with:

    import hello
    from hello import world
    import hello.world
    from hello import world.bye
    from hello.world import bye
I didn't realize until I went to Go, where "from" doesn't exist, how much I hated it with Python.


Would you rather write "hello.world.bye" every time you need it?


I appreciate the explicitness, so indeed I would.


Then do that? Is there anything forcing you to use from?

Edit: While opinionated, it's a real question; in my fairly simple use of Python, I have never hit a use of "from" that couldn't have been a plain "import", but if I'm missing a case feel free to correct me.


    bye = hello.world.bye


  import hello.world.bye as bye


Yeah, being forced to understand the layout of the code you're importing from isn't great. Python imports could have been designed to just do the right thing. But explicit is better than implicit, amirite?



What is the right thing?


Another potentially surprising trap is the sheer number of filesystem syscalls per import. Python searches a ton of potential filesystem locations for each import, and the default config may not search the most likely locations first. This had a significant impact at a previous employer.


This was (still is?) a well known problem for supercomputing facilities. Any given job has every single node access a shared r/w directory which python searches for imports. I believe python opens (opened?) Imports r/w and this created a transactional 2PC action for each node, so you get thousands of nodes blocking on the filesystem,a test run that could be a few seconds could take hours to actually spin up


Honestly I will always be saddened by the fact in this reality Python won out over Ruby in the readable scripting language wars. Writing Ruby feels elegant, it feels fluid, it feels like creating.

Working with Python always leaves me with a sour taste in my mouth and frustration after having to try and figure out how to make modern concepts work well, and giving up and going back to old imperative styles of coding.


It is notable that you said "Writing Ruby feels..." but made no mention of _reading_ Ruby. I think I agree with you that I enjoy writing Ruby more, but I definitely enjoy reading Python.

It's definitely easy to write unreadable Python code, just as your can write easily readable Ruby code. But I think that some of the "clever" things that the Ruby language allows (some of the things that make it fun, things that aren't considered obscure or advanced features) tend to make Ruby code more difficult to comprehend.


Eh, that just means you haven't read enough Ruby.

It's really quite readable once you have worked with the language a bunch.


I say the same about perl.


Also a fantastic language!


"Once you have worked with the language a bunch."

I find that's kind of the point. Python was extremely easy the first time,

Learning Ruby took me twice as long and the output was not really an improvement.


If we select only for readability, we reduce down to a single language. Which would be something that most closely matches the readers' background... which puts us back at square one.

You say Python is readable, I say it is obnoxiously pedantic. (Among many other critiques). Honestly I find Java to be easier to parse than Python

My love goes to Ruby though...


One reason Ruby lost is the culture of monkey patching. The same facility exists in Python but is used much more sparing due to the mess it makes.


I guess this is what I think of as the "Too Much Magic" problem in Ruby.

I'm a programmer, not a wizard...


Is it part of the Ruby culture, or part of the Rails culture?


I know when I did some RPG Maker XP hacking they extensively crammed stuff onto Object, so I suspect the culture is not limited to Rails.


I don't know that I ever gave Ruby a fair try but reading the source felt like it was too Perl-inspired for me.


Python's mission statement has been to be the most ergonomic imperative language. That it doesn't support other styles of programming doesn't seem so bad to me, languages do not need to be multi-paradigm.

Where does the frustration come from with Python being more popular than Ruby, is it that some libraries can only be found in Python?


> Where does the frustration come from with Python being more popular than Ruby

Both ecosystems have produced a lot of "content" and there's certainly some things Ruby got better. But because Python has won more marketshare the ceiling of those ruby tools feels lower.

Now the reasons python won aren't hard to find:

* data science / machine learning dominance

* closer to Java/C++ gives it a huge edge in academia as a beginner language

* Python is bigger then the sum of its parts where as people see Ruby as a dependency of Rails (only half joking)

* ...


Python has excellent support for dependency injection and monkeypatching these days.

It has an entire arsenal of pre-made decorators.

I've found Python's original paradigm, "create deep, object oriented libraries, and have users write procedural/functional scripts in production," to be absolutely excellent.

These days though, the world's most popular programming language has been extended, and not in a bad way.

You can do just about anything in it, and it'll be a really good experience (and extremely slow code...).


What’s the measure by which Python “won out” over Ruby? Both seem to have pretty broad usage.


Woah, we don't need __init__.py any more. Mind blown.


You still do. These "init-less" packages are something specific called namespace packages, and you shouldn't use them unless you have a good reason to use them. See https://www.python.org/dev/peps/pep-0420/.


Can you please explain why not use them or what's the use case?


The difference is that namespace packages can exist in more than one place at a time, while regular packages cannot.

Python looks for regular packages (which are really just modules that can contain other modules) by searching a pre-defined list of directories, stopping at the first valid entry.

What happens when you run `import foobar`?

Let's say that the search path is:

    sys.path = [
        # The current working directory
        '',
        # System package locations
        '/usr/lib/python3.9/site-packages',
        '/usr/local/lib/python3.9/site-packages',
    ]
And let's say that you have these files on your system:

    /usr/lib/python3.9/
    └── foobar/
        ├── __init__.py
        └── nice.py
    
    /usr/local/lib/python3.9/
    └── foobar/
        ├── __init__.py
        └── wow.py
In this situation, Python finds the version in /usr/lib/ because that directory comes first in the search path, and ignores the other one entirely. You can do `import foobar` or `import foobar.nice` or `from foobar import nice`, but you can't do `import foobar.wow` or `from foobar import wow`.

Now let's assume that you didn't have the `__init__.py` files:

    /usr/lib/python3.9/site-packages/
    └── foobar/
        └── nice.py
    
    /usr/local/lib/python3.9/site-packages/
    └── foobar/
        └── wow.py
From the perspective of the Python import system, `foobar` is no longer a regular package but a namespace package. You will be able to `import foobar`, but you will not be able to access anything inside it! However, you will be able to do both `import foobar.nice` (or `from foobar import nice`) and `import foobar.wow` (or `from foobar import wow`) and be able to access both modules freely that way. That is, `foobar` as a namespace package is no longer useful by itself, but now it acts as a namespace for several other packages that might live in different locations on your system.

The use case for this feature is the ability to split a package over several distributions. For example, you can install one or both of zope.interface and zope.component, and they will both be available to import from the "zope." namespace. It is also useful within organizations, where you can namespace all of your internally-developed libraries with some identifier specific to your organization.

You could even emulate a reverse-domain namespacing system this way if you wanted to:

    /usr/local/lib/python3.9/site-packages/
    └── org/
        └── best_stuff/
            └── pytools/
                ├── thing1.py
                └── thing2.py

Which you could import as, e.g. `from org.best_stuff.pytools import thing1, thing2`.

As you might imagine, spreading a package across multiple locations is not a desirable outcome if you aren't expecting it. So you shouldn't use namespace packages unless you specifically want the behavior I described above.


What a legend. Thank you very much.


Ah, thanks, I was jumping to conclusions.


It's not your fault, the article also didn't explain it, and it makes me think that the article author didn't understand it either.


https://pypi.org/project/fuckit/

if you need an example


I’ve just accepted it’s weird…

So I always invoke my Python as a module and do relative imports, which at least eliminates the conflicts from my code.

YMMV.


python 3 (being not completely compatible) was a great chance to add... namespaces everywhere

"Namespaces are one honking great idea -- let's do more of those!" after all

sadly I still have to worry about a system package one day being pulled in instead of my code

(and pypi could do with namespaces too)

one of the few things the Java world got spot on


A far worse problem that still exists is that your module can get pulled by one of your dependencies (which could include some other part of stdlib!) instead of a system package if there is a name clash.

Worse yet, since Python stdlib gets new packages in new versions, valid code written in the past might get a name clash in a new Python version.

It's really unfortunate that there's no single top namespace package for the entire Python stdlib, like "std" in C++, Rust, or Zig.


> Namespaces are one honking great idea -- let's do more of those!

How that quote got into zen of Python is a bit of a mystery to me.

I have no idea what a namespace is in Python or how to create one.

I’ve never come across a guide or pytalk teaching me the benefit of namespaces or why they are a honking great idea.


Namespaces are pervasive in Python. Virtually everything you deal with is a namespace: package, modules, classes, instances; even functions are namespaces:

    >>> def func(ns):
    ...     print(ns.stuff)
    ...
    >>> func.stuff = 'honking great'
    >>> func(func)
    honking great


Yes Java cops a lot of criticism but the packages/module/namespace stuff has never given me any grief.


The fact that Python still can't handle circular imports in 2022 is insane to me!

I don't know if any other language that fails that.

At least we get a readable error message since a few years...


I love python. I’m pretty good at it or so I like to think. But yet circular imports trip me up constantly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: