The slow startup combined with the general lack of interest of the Python ecosys...

cdavid · on May 2, 2018

One "simple" thing that could be done is to make it easier to build python statically, and improve the freezing toolchain.

When I used to care about the problem, I looked into it (https://cournape.wordpress.com/2015/09/28/distributing-pytho...) and got somewhere. It improves somewhat startup times, and allow distribution of a single binary.

Lots of libraries are terribly slow to import (e.g. requests), but right now there is little incentive to fix it as there is no toolchain to build good, self-contained python CLI apps.

blattimwind · on May 2, 2018

I've written a whole bunch of CLI tools over the years and maintained some I didn't author originally, I always found it annoying how slow these are. A CLI tool for some larger project can easily take a second just to display --help, or a command line parse error. Tests running against it can be made to progress faster (no forking, no separate interpreter, in-process capture etc. which brings a lot of complexity and subtle differences and error persistence/"tainting" of the execution environment), but still you might only get a few CLI invocations per second per core.

Theses experience are a major turn-down from Python for me.

stevekemp · on May 3, 2018

Indeed this is a long-standing issue with Python.

LWN gave some excellent coverage late last year, in this piece:

https://lwn.net/Articles/730915/

fulafel · on May 2, 2018

Sure there has been desire to change this. It's a hard problem, and there are tradeoffs.

the_mitsuhiko · on May 2, 2018

It’s only a hard problem if there is no desire. The slowdowns for the interpreter startup did not happen because they are necessary but because site.py and friends just do more stuff now and a lot of important internal tooling became unnecessarily complex.

zbxxx · on May 2, 2018

Yeah, this whole "importlib in Python" thing continues to mystify me.

pas · on May 3, 2018

It would be okay, if Python would support a low level faster subPython within itself. RPython for example.

sametmax · on May 2, 2018

> it even got worse as time progressed which is odd.

Quite the contrary, as I stated in my other comment, we now have nuikta.

coldtea · on May 2, 2018

Parents means got worse in official Python releases.

Not what some fringe tool can or cannot do.

chubot · on May 2, 2018

This is disappointing to me too, but I think there are some problems baked in to the language that make it hard.

- Imports can't be parsed statically.

- Startup time has two major components: crawling the file system for imports, and running all the init() functions of every module, which happens before you get to main(). The first is only fixable through breaking changes, and the second is hard to fix without drastically changing the language.

The import code in CPython was a mess, which was apparently cleaned up by importlib in Python 3, through tremendous effort. But unfortunately I think importlib made things slower?

I recall a PyCon talk where as of 3.6, essentially everything about Python 3 is now faster than Python 2, EXCEPT startup time!

This is a shame, because I would have switched to Python 3 for startup time ALONE. (As of now, most of my code and that of my former employer is Python 2.) That would have been the perfect time to address startup time, because getting a 2x-10x improvement (which is what's needed) requires breaking changes.

I don't think there's a lack of interest in the broader Python community, but there might be a lack of interest/manpower in the core team, which leads to the situation wonderfully summarized in the recent xkcd:

https://xkcd.com/1987/

FWIW I was the one who sent a patch to let Python run a .zip file back in 2007 or so, for Python 2.6 I think. This was roughly based on what we did at Google for self-contained applications. A core team member did a cleaner version of my patch, although this meant it was undocumented until Python 3.5 or so:

https://docs.python.org/3/library/zipapp.html

The .zip support at runtime was a start, but it's really the tooling that's a problem. And it's really the language that inhibits tooling.

Also, even if you distributed self-contained applications, the startup time is not great. It's improved a bit because you're "statting" a zip file rather than making syscalls, but it's still not great.

In other words, I have wondered about this "failure" for over a decade myself, and even tried to do something about it. I think the problem is that there are multiple parts to the solution, the responsibility for these parts is distributed. I hate to throw everything on the core team, but module systems and packaging are definitely a case where "distributed innovation" doesn't work. There has to be a central team setting standards that everyone else follows.

Also, it's not a trivial problem. Go is a static language and is doing better in this regard, but still people complain about packaging. (vgo is coming out after nearly a decade, etc.)

I should also add that while I think Python packaging is in the category of "barely works", I would say the same is true of Debian. And Debian is arguably the most popular Linux package manager. They're cases of "failure by success".

blattimwind · on May 2, 2018

> The import code in CPython was a mess, which was apparently cleaned up by importlib in Python 3, through tremendous effort. But unfortunately I think importlib made things slower?

AFAIK importlib is entirely written in Python and kinda portable across Python implementations, while previously most was C code. It's not surprising something gets slower when written in Python.

> Also, even if you distributed self-contained applications, the startup time is not great. It's improved a bit because you're "statting" a zip file rather than making syscalls, but it's still not great.

PyQt applications on Windows typically take two or more seconds before they can do anything, including Enterprise's favourite start-up pastime, splashscreens. Except maybe if you rolled your own .exe wrapper that displayed the splash before invoking any of the Python loading.

That's really, really poor in the age of 4 GHz CPUs from the factory, RAM big enough to fit multiple copies of all binaries on a PC and SSDs with at the very least tens of thousands of IOPS.

chubot · on May 3, 2018

Yeah the time it takes is really mind-boggling if you think about it. I recently had occasion to run Windows XP in a VirtualBox on fairly underpowered Macbook Air.

It not only installed really fast, but at runtime it was fast and responsive! And so were the apps! Virtualbox recommends 192 MB of RAM for Windows XP, and it works fine. Amazing. Remember when everyone said Windows was slow and bloated?

On the other hand, I tried compiling Python 2.7 on a Raspberry Pi Zero, which is probably around as fast as the machines at the time of XP (maybe a little slower). This was not a fun experience!

Actually I just looked it up, and the Pi Zero has 512 MB of RAM. So in that respect it has more power. Not sure about the CPU though... I think I ran Windows XP on 300 Mhz computers, but I don't remember. Pi Zero is 700 Mhz, but you can't compare clock rates across architectures. I think they're probably similar though.

---

FWIW I think importing is heavily bottlenecked by I/O, in particular stat() of tons of "useless" files. In theory the C to Python change shouldn't have affected it much. But I haven't looked into it more deeply than that.

blattimwind · on May 3, 2018

IIRC the foundation originally compared the RPi's CPU to a Pentium II running at 266 MHz, which seems about right to me.

IME/IMB startup is almost always CPU bound (to a single CPU thread, of course). Note that the Linux kernel also caches negative dent lookups, so these "is there something here?" stat()s will stay in the dentry cache.

pas · on May 3, 2018

> splashscreens; exe wrapper

I was convinced that IDEA/Eclipse and other JVM-based things used the .exe launcher just for that (the loading screen). But have never decompiled their .exe to verify it :)

chubot · on May 3, 2018

EDIT: I should also add that the length of PYTHONPATH as constructed by many package managers is a huge problem. You're doing O(m*n) stat()s -- random disk access -- which is the slowest thing your computer can do.

m is the number of libraries you're importing, and n is the length of the PYTHONPATH.

So it gets really bad, and it's not just one person's "fault". It's a collusion between the Python interpreter's import logic and how package managers use it.

mhneu · on May 2, 2018

You've characterized the problems well. And yes this is a core problem for python - startup time and import processing is limiting in a lot more cases than just CLI tools. And yes the design of the language makes it hard or possibly impossible to solve.

Also, even if you distributed self-contained applications, the startup time is not great. It's improved a bit because you're "statting" a zip file rather than making syscalls, but it's still not great.

Exactly. There is no silver bullet. The problem is how much code gets run on startup, and how Python's dynamic nature makes traditional startup speedup strategies impossible. Is this even fixable?

chubot · on May 3, 2018

I don't think it's fixable in Python unfortunately. As someone else pointed out, the fact that it got WORSE in Python 3, and not better, is a bad sign. Python 3 was the one chance to fix it -- to introduce breaking changes.

As I mentioned, this problem has bugged me for a long time, since at least 2007. Someone else also mentioned the problem with Clojure, and with JIT compilers in general. I'm interested in Clojure too, but my shell-centric workflow is probably one reason I don't use it.

In 2012 I also had the same problem with R, which starts even more slowly than Python. I wrote a command line wrapper that would keep around persistent R processes and communicate with them. I think I can revive some of my old code and solve this problem -- not in Python, but in the shell! Luckily, I'm working on a shell :)

http://www.oilshell.org/

In other words, the solution I have in mind is single-threaded coprocesses, along with a simple protocol to exchange argv, env, the exit code, and stdout/stderr (think CGI or FastCGI). Coprocesses are basically like servers, but they have a single thread to make porting existing apps easy (i.e. turning most command line apps into multi-threaded servers is nontrivial).

If you're interested I might propose something on Zulip. At the very least I want to dig up that old code.

http://www.oilshell.org/blog/2018/04/26.html

I think it's better to solve this problem in shell than Python/R/Ruby/JVM. There's no way all of them will be fixed, so the cleaner solution is to solve it in one place by introducing coprocesses in the shell. I will try to do it in bash without Oil, but it's possible a few things will be easier in Oil.

ptx · on May 5, 2018

> introducing coprocesses in the shell

I did this with bash and Python a few years ago when I learned about the "coproc" feature (which, by the way, only supports a single coprocess per bash process, unless I misunderstood it).

But it turns out I tend to open new terminal windows a lot, which meant that the coprocess needs to relaunch all the time anyway, so it wasn't very useful. Even if I start it lazily, to avoid slowing down every shell startup, most of my Python invocations tend to be from a new shell, so there was no real benefit.

Maybe if you have a pool of worker processes that's not tied to any individual shell process, and connect to them with by a Unix-domain socket or something...

chubot · on May 5, 2018

Hm yeah I was hoping to do it in a way that's compatible with unmodified bash, but maybe it will only be compatible with Oil to start.

Basically I think there should be a "coprocess protocol" that makes persistent processes look like batch processes, roughly analogous to CGI or FastCGI.

I thought that could be built on top of bash, but perhaps that's not possible.

I'll need to play with it a bit more. I think in bash you can have named multiple coprocesses with their descriptors stored in an array like ${COPROC[@]} or ${MY_COPROC[@]}. But there are definitely issues around process lifetimes, including the ones you point out. Thanks for the feedback.

ptx · on May 6, 2018

I looked into the issue with multiple coprocesses in bash again to make sure. While you would naturally think you could simply give them different names, it's unfortunately not supported:

https://lists.gnu.org/archive/html/bug-bash/2011-04/msg00059...

Bash will print a warning when you start the second one, and the man page explicitly says at the bottom: "There may be only one active coprocess at a time."