Hacker News new | past | comments | ask | show | jobs | submit login
Python startup time: milliseconds matter (python.org)
662 points by vanni on May 2, 2018 | hide | past | favorite | 378 comments



I've always been disappointed by how large software projects, both FOSS and commercial, lose their "can do" spirit with age. Long-time contributors become very quick with a "no". They dismiss longstanding problems as illegitimate use cases and reject patches with vague and impervious arguments about "maintainability" or "complexity". Maybe in some specific cases these concerns might be justified, but when everything garners this reaction, the overall effect is that progress stalls, crystallized at the moment the last bit of technical boldness flowed away.

You can see this attitude of "no" on this very HN thread. Read the comments! Instead of talking about ways we can make Python startup faster, we're seeing arguments that Python shouldn't be fast, we shouldn't try to make it faster, and that programs (and, by implication, programmers) who want Python startup to be fast are somehow illegitimate. It's a dismal perspective. We should be exercising our creativity as a way to solve problems, not finding creative ways to convince ourselves to accept mediocrity.


This isn't an attitude of "no" - it's an attitude of "yes" to other things. The arguments are that making Python startup fast makes other things worse, and we care about those other things.

Here are some other things we can say "yes" to:

- Rewrite as much of Mercurial in Rust as possible, which will provide performance improvements well beyond what Python can possibly offer. https://www.mercurial-scm.org/wiki/OxidationPlan

- Spend resources on developing PyPy, which (being a JIT) has relatively slow startup but much faster performance in general, for people who want fast performance.

- Write compilers from well-typed Python to native code.

- Keep CPython easy to hack on, so that more people with a "can do" spirit can successfully contribute to CPython instead of it being a mess of special cases in Guido's head.

Will you join me in saying "yes" to these things and not convincing ourselves to accept mediocrity?


I have to note that none of the projects you suggested, all of which are good and useful, will do anything to address cpython startup latency problem under discussion. Why shouldn't cypthon be better?

There's also no reason to believe that startup improvements would make the interpreter incomprehensible; the unstated assumption that improvements in this area must hurt hackability is interesting. IME, optimizations frequently boost both simplicity and performance, usually by unifying disparate code paths and making logic orthogonal.


I think you misunderstood the point. These weren't things that would address the cpython startup problem - these were other priorities that can be worked on, instead of (or in addition to) the latency problems under discussion.

Saying yes to fixing one thing usually means saying no to all the other things you can be doing with your time instead. Unless you're lucky and can "kill 2 birds with 1 stone".


> - Write compilers from well-typed Python to native code.

That is one thing I really want to happen, because I think it begins to open up python to the embedded space. Micropython is nice, but it still needs an interpreter embedded into it.


There have been plenty of attempts to compile Python to faster code (usually by translating to C).

Cython can use type annotations and type inference to unbox numbers for faster numerical code, but uses ordinary Python objects otherwise. http://cython.org

ShedSkin translates a restricted (statically typeable) subset of Python to C++. https://shedskin.github.io

RPython has a multi-stage approach where you can use full dynamic Python for setup, but starting from the specified entry point, it is statically typed. Since it was created for PyPy, it comes with support for writing JIT compilers. https://rpython.readthedocs.io

Pythran translates a subset of Python with additional type annotations to C++. http://pythran.readthedocs.io

In general, the major hurdle for all attempts to compile Python to native code is that Python code is dynamically typed by default.


An hurdle that has been solved for quite some time in Lisp, Scheme, Prolog, Smalltalk, SELF (was the basis of Hotspot), Dylan, Ruby, JavaScript.

All not less dynamic than Python.

What Python lacks is the funding and willingness to actually push one of those implementations to eventually become the new reference implementation.


Nuitka might be interesting for this use case, too. http://nuitka.net/index.html

It will compile python to C++ and then compile that code to machine instructions and calls to the CPython library.


Thanks for that rundown. It's pretty good.


> Why shouldn't cypthon be better?

The point is that "better" is almost never a well-defined direction unless you only consider a single use-case. It's almost always a tradeoff, especially in a widely used project.

A language is a point on a landscape of possible language variants, and "better" is a different direction on that landscape for every user.


> cpython startup latency problem

The problem under discussion is that projects that currently use CPython have slow startup. One potential solution is for those projects not to use CPython. (Certainly it's not the only potential solution, but, a language that tries to be all things to all people isn't going to succeed. Python has so far done an extraordinarily good job of being most things to all people, with "I want native-code performance" being one of the few out-of-scope things.)


>IME, optimizations frequently boost both simplicity and performance, usually by unifying disparate code paths and making logic orthogonal.

I would really like to see some examples where this is the case. Optimizations in my experience have made systems more brittle, less portable and ultimately less maintainable.


Programs a fast when they don't force the computer to do much stuff, so speed overlaps (albeit imperfectly) with simplicity of code. Mostly this explains why programs start fast and then slow down as they cover more use-cases. But some optimisations also amount simplifying an existing system.

For example: as you get to know your use-cases better you might simplify your code to sacrifice unwanted flexibility. Or you might replace a general-purpose data structure with a special purpose one that not just faster, but concretely embodies the semantics your desire.

A case, that is not quite a simplification is removing code re-use. Instead of using function in three different ways, you use three separate optimised functions. Now changes to one use case don't cause bugs in the others. That's the kind of thing that quotemstr meant by "making logic orthognal".


>A case, that is not quite a simplification is removing code re-use. Instead of using function in three different ways, you use three separate optimised functions. Now changes to one use case don't cause bugs in the others. That's the kind of thing that quotemstr meant by "making logic orthognal".

And which is what I mean by making systems more brittle, less portable and less maintainable.

You find a corner case in the original function that's not covered, now instead of fixing it in one place you need to fix it in three places with all the headaches that causes.

So the next maintainer thinks: "Gee I can fix this by bringing all these functions together".

So if they're using an oo language they make an abstract base class from which the behaviour is inherited, or a function factory otherwise.

So now you're back to a slow function with even more overhead, that's even harder to debug.

So the next maintainer comes around and thinks: "Gee I can speed this up if I break out the two functions that are causing 90% of the bottleneck".

Now you have 4 completely independent functions to keep track of.

Repeat ad-nauseum.


I'm so longing for a Python(like) compiler.

MicroPython put together a Python in 250kb. Why the hell can't we make an LLVM frontend for Python that can use type hints for optimization? Sure, you lose some dynamic features as you optimize for speed, but that's the dream. Quickly write a prototype, not caring about types, optimize later with adding types and removing dynamicism.

I'm currently learning Racket and LLVM and I have about 70 more years to live. I'm gonna try make Python fast on slow weekends 'til I die.


Since you're after micro-controllers you might be interested in Nim. At this year's FOSDEM we showed off some micro-controllers running Nim code[1]. The language is definitely less Python-like than Cython, but it might just be similar enough for your use cases (I started using Nim as a Python replacement).

1 - https://twitter.com/nim_lang/status/959736268870639616


For people who don't want to bounce through twitter to get to the home page:

https://nim-lang.org/


Unless your Python compiler can use cpython modules without a massive performance penalty, it's going to see very limited adoption. The ecosystem matters.


We had a chance with ctypes and now CFFI to move away from the platform calcifying cpython module interface that is overly coupled to the cpython runtime. I am very disappointed in the lack of affordances that cpython gives to alternative pythons to support their work. The stdlib is a crufty mess that is overly coupled to cpython as well. The batteries are corroded and need to be swapped out for a modular pack.


I don't think it would be hard to add type annotations to existing projects, though. Something similar happened in the JS world with the transition of JS projects to TypeScript and it wasn't a big deal, IIRC.


> I'm so longing for a Python(like) compiler.

There is Cython, of course, which is a great python (+) compiler. I assume you mean a just-in-time compiler as opposed to a static compiler.


No. Pypy already does JIT. And Cython has the speed, but not the size.

The target of my phantasy compiler are microcontrollers. That's why the (correct) comment of quotemstr isn't that big of a concern to me.


Just throwing Python code at Cython doesn't really improve performance all that dramatically, because it'll be doing pretty much the same thing as the bytecode interpreter, except that all the saucy special-cases are now unrolled many times across all code.


Isn't it what Numba does quite successfully for a subset of Python?

[1]: https://numba.pydata.org/


Not static AOT, not creating tiny binaries that fit on a microcontroller.

There is a lot of stuff out there that goes in this direction. There is nukita (again no small binaries), there is even an abandoned GCC frontend that can compile some minimal examples, but has been abandoned long ago.

Seriously, the time I spent researching this topic - if a proper compiler engineer would spend that on the actual compiler, it'd be done by now.



Seeing as that page states the resulting executables still require numpy, I'd guess it was the static requirement that it misses.


You might be interested in Matthew Might's course on compilers. In one of his courses he targeted a Python compiler written in his weapon of choice, Racket.


> PyPy, which (being a JIT) has relatively slow startup

    > time pypy -c 'print "Hello World"'
    Hello World
    pypy -c 'print "Hello World"'  0.08s user 0.04s system 96% 
    cpu 0.120 total

    > time luajit -e 'io.write("Hello World!\n")'
    Hello World!
    luajit -e 'io.write("Hello World!\n")'  0.00s user 0.00s 
    system 0% cpu 0.002 total


Sometimes I wonder why we're not all using Lua instead of Python. Lua seems to get a strange amount of hate in some circles, but I've found both Lua and Python to be reasonably pleasant languages to work with.


From my personal account about Lua [1]:

> Three, the language is not a mere combination of syntax and semantics. Any evaluation should also account for user bases and ecosystem, and in my very humble opinion Lua spectacularly fails at both. I'm not going to assume the alternative reality---Lua has a sizable user base and its ecosystem is worse even for that user base.

> [...] The lack of quality library also means that you are even more risky when you are writing a small program (because you have less incentive to write it yourself). I have experienced multiple times that even the existing libraries (including the standard ones) had crucial flaws and no one seems to be bothered to fix that. Also in the embedded setting the use of snippets, rather than proper libraries, are more common as libraries can be harder to integrate, and unfortunately we are left with PHP-esque lua-users.org for Lua...

I think this critique still holds today, and unless miracle happens (like D), I doubt this is fixable.

[1] https://news.ycombinator.com/item?id=13902023


I had a similar feeling a while back, but when I actually picked up Lua for a project I was shocked by how limited the standard library is. Third party libraries aside (which Python clearly has in spades), even just the standard library is pretty sparse. It makes sense, since Lua is at least partly motivated by being lean and embeddable, but it puts Lua in a completely different space for me.


Other folks complain about the lack of libraries. Personally, the two things which turn me off on Lua (luajit) are 1-based arrays and the conflation of hash-tables and array-lists into a single thing.


I don't like how everybody invents their own class system.


> the conflation of hash-tables and array-lists into a single thing.

Precisely, one of the things that turns me off on Python (coming from Lua), is the unnatural proliferation of different container types.


Oddly that's one of the things I like most about python. The containers module has so many very useful things. Plus hash-tables and array-lists shouldn't ever really be the same thing.


To each their own :-)

You might like Tcl - associative arrays (hash tables), arrays (lists), strings, numbers, and code are all the one type (at least conceptually).


no batteries


Is ... that faster than CPython? Wow. Maybe I should symlink /usr/bin/python -> pypy on my laptop....


It's much slower. On my machine, "$PYTHON -c ''" (execute nothing) takes 60ms on Python 2.7, 80ms on 3.6 and 250ms on pypy.


It's not. You can test it on your system. In every case I've seen, Python starts up faster than PyPy. Neither takes a super long time.

PyPy has some great performance characteristics, but startup time isn't one of them.


Startup << warmup.


None of these address, for instance, the issue raised about the firefox build invoking python many times. This seems both an accepted use case of cpython and an area where traditionally cpython has a huge edge on the JVM and PyPy. If scripts are not a priority, what is the expected use case of cpython?

I would like to note, the cpython ties to the PyObject C abi seem to stymie rather than encourage “hacking”. Cpython seems to have traditionally valued stability over all else.... see the issues pypy has had chasing compatibility with c and retaining speed.

So: normally i’m with you and a language should lean into its strengths, but i’ve always listed startup time as a primary strength of python!


In my experience, "script" is usually well-correlated with "a bit of inefficiency is okay." There's a reason that, say, many UNIX commands that could be implemented as scripts (true, false, yes) are actually implemented as binaries. There's a reason that most commercial UNIXes/clones (Solaris, macOS, Ubuntu, RHEL, etc.) switched from a script-based startup mechanism to a C-program-based one.

I certainly write and continue to write Python scripts where even an extra half second won't matter. It's doing some manipulation of data where the cost of what it's doing is dominated by loading the data (e.g., grabbing it from some web service), and even if the script is small and quick, it's not so small and quick that I'll notice 50-100 ms being shaved off of it.

Use cases where CPython continues to make sense to me are non-CGI web applications and things like Ansible, where load time isn't sensitive to milliseconds and runtime performance is pretty good. (Although if you believe the PyPy folks, perhaps everything that's PyPy-compatible should be running on PyPy.)


This hits the nail on the head.

Optimization is very, very rarely completely „free“ - and usually a concious trade of some property for another trait that‘s deemed more important in a specific case.

Simplicity for performance. Code size for compilation speed. Startup time for architectural complexity. UX for security.

For a great product, you need to say „no“ much more often than not. Do one thing and do it well. Be Redis, not JBoss.

I love how this article gets down to the essence of it: https://blog.intercom.com/product-strategy-means-saying-no/


Agree. I recently found that trying to optimise code can make it a lot more complex.


> Rewrite as much of Mercurial in Rust as possible, which will provide performance improvements well beyond what Python can possibly offer. https://www.mercurial-scm.org/wiki/OxidationPlan

I read that article and I'm still wondering: why Rust?


The last three paragraphs of the section "Why use Rust?" should address that - basically, they have experience with solving this problem by writing parts of the code in C, they are not fans of that experience, and Rust is a compelling better C (and there are specific reasons they don't think C++ is compelling).

Are you asking in comparison to some other language? The most obvious other languages in the "compelling better C" niche I think are Ada, D, and Go; Ada and D (I think, I do not know them well) don't have as good of a standard library or outside development community, and Go is less suited to Rust to replacing portions of a process. Go would be a reasonable choice were one writing a VCS from scratch today.


The C++ rationale is bizarre. That a 2008 compiler doesn't provide modern features is unsurprising. They've chosen to use Rust is a strange reaction to this limitation, since it would be just as easy, from a toolchain perspective, to just use a modern C++ compiler.


One big advantage of rust is that (while some people won't like this), rust assumes you won't install it from your package manager, but download a script which installs it in your home directory. This script is quick and easy.

Trying to install a c++ compiler from source (and I've done it several times) is a much less plesent experience, so most people stick with what their package manager provides.


Language features aside, Rust is a lot nicer to work with because it has cargo and using libraries is no longer a pain.


Depends which libraries we are talking about, try to use GUI libraries from Rust.


That isn't fair, all languages besides JS+HTML+CSS have issues with GUI libraries. Either you go Electron/webview or you have to deal with Qt/GTK for cross-platform GUI.


Sure it is fair, Java, C++, C#, VB.NET, Delphi, Objective-C, Swift have quite good GUI libraries available.

And regarding JS+HTML+CSS, they are still on the stone age of RAD tooling.


I think you missed the cross-platform part of my answer.


Some of those languages do have cross-platform GUI offerings.

Even AWT is better than any option currently natively available to Rust.

After all, "using libraries is no longer a pain" is not what I felt when converting an old toy application from Gtkmm to Gtk-rs.


> Go is less suited to Rust to replacing portions of a process

How so? Is it because Go has a GC and Rust doesn't?


That's part of it, but more generally, Rust prioritizes fitting into other programs: it offers direct compatibility with the C ABI for functions and structs (because the C ABI is the effective lingua franca for ~all present-day OSes), it uses regular C-style stacks instead of segmented stacks, threading is implicit, calls between Rust and C in either direction are just regular function calls and involve no trampolines or special handling by the GC/runtime, there is no runtime that requires initialization so you can just call into a random Rust function from a C program without setup, etc.

Go has cgo, and has slowly come to a few of the same decisions (e.g., Go gave up on segmented stacks too), and gccgo exists, so it's certainly possible to use Go for this use case. But it's not as suited as Rust.

There's a nice post about calling Rust from Go which goes to lengths to avoid the cgo overhead, even though Rust can just directly expose C-compatible functions and cgo can call them without any special effort on either side: https://blog.filippo.io/rustgo/


> How so? Is it because Go has a GC and Rust doesn't?

More generally a heavy runtime, which creates issues when you're trying to replace parts of a process which has its own runtime, unless you can make the two cooperate (by actually using the same runtime e.g. jvm or graal or whatever).

Go's FFI is also easy to work with but the source of… other issues which is why the Go community often prefers reimplementing things entirely to using cgo and existing native libraries.


Yes, manually move objects between GCs is tricky.


Because he works for Mozilla.


That is probably the only reason to use rust over go or cpp.


Really? The only reason?


Only one I can see. Want speed and performance and safety? C++. Want easy multithreading? Go. Rust's only claim to fame is that Mozilla is dogfooding it.


> Want speed and performance and safety? C++.

Speed and performance sure, but safety automatically rules out C++.

> Want easy multithreading? Go

Want speed, performance, safety, and easy multithreading? Rust.


C++ still does multi-threading better than rust, just that go does it better. Similarly go does perf better than rust, just that cpp is even better. So yeah, if you want the worst of all worlds coupled with the pains associated with a brand new language (try compiling rust for a armv5 soc) rust all the way!


Can you show a benchmark where Go outperforms Rust?


I agree with you. As the limitation of developing resources, say "no" is difficult but important.


I am slightly afraid to ask, but what is a "well typed python"?


Have you seen MyPy, static type annotations / checking for Python? http://mypy-lang.org/

In context, what I'm really getting at "a sufficiently non-dynamic subset of Python that it can be compiled statically, but also a sufficiently large one that real Python programs can have a chance of being in the subset." PyPy has a thing called RPython that fits the former but not really the latter (I don't know of any non-PyPy-related codebases that work in RPython). In general, adding complete type annotations to a codebase is pretty correlated with making it static enough to do meta-level things on like compiling and optimizing it - for instance if you have a variable that changes types as the program runs, at least now you've enumerated its possible types. It's not the only way of doing so, but it seems to work well in practice and there seems to be a correlation between compiled vs. interpreted languages and static vs. dynamic typing.


correct. Every time you say yes, your saying no to something else. Its important to realize what your saying no to, before you say yes.


It's funny when developers themselves think effort is so fungible. Like if you spent 1 hour on A, then you would've also made 1 hour of progress on B, C, or D, and that it would've been worthwhile. To the point of fallacy in your post.

I would think developers have the experience to realize this isn't true but I see it all the time on these forums.


I think I'm making the opposite claim - effort isn't fungible (and availability of effort isn't fungible). You can't necessarily spend 1 hour that would otherwise go into, say, rewriting Mercurial into a compiled language and instead spend it on making CPython faster and get the same results. One of these is more likely to work, and also the two problems are going to attract interest from different people.

And one of the things that affects how productive one hour of work will be - and also whether random volunteers will even show up with one hour of work - is the likelihood of getting a change accepted and shipped to users. This is influenced by both the maintainers' fundamental openness to that sort of change, and any standards (influenced by the maintainers, who are in turn influenced by their users) about how careful a change must be to not make the project worse on other interesting standards of evaluation. It's also influenced by the number of people working on the project (network effects) because a more vibrant project is more likely to review your code promptly, finish a release, and get it into the hands of more users.

So I'm claiming that it's better to spend time on rewriting Mercurial in Rust than to spend time on getting CPython startup faster, because the Mercurial folks are actively interested in such contributions and the CPython folks are actively uninterested, and because there are fewer external constraints in making Mercurial startup faster than in making CPython startup faster. And I'm saying that the more we encourage folks to help with rewriting Mercurial in Rust, the more likely additional folks are to show up and help with the same project, thereby making 1 hour of effort even more productive.


> This isn't an attitude of "no" - it's an attitude of "yes" to other things.

You are literally bringing an attitude of "no" to the question of whether you are being an attitude of "no" to the discussion....


If those who complain about no-attitudes are insisting that the only acceptable response to anything is "yes", I doubt they'll get far.


FWIW no one who replied to this email thread said something even close to "no". Victor Stinner points out that startup time is something that comes up a lot and mentions some recent work in the area [1].

Python is a big ship, it may not be as nimble as a young FOSS project but it is always improving and investments in things like start up time pays dividends to a large ecosystem.

[1] https://mail.python.org/pipermail/python-dev/2018-May/153300...


I get the impression that backwards-compatibility does weigh pretty heavily on the Python core developers these days. There are so many Python installations out there doing so much that the default answer to a change has to be "no". The fact that macOS and popular Linux distributions ship with copies of Python is great, but once something is effectively a component of operating systems, boldness is not a viable strategy. Arguably, one of the reasons why the transition to Python 3 has been so drawn out is that every time somebody installs macOS or one of many Linux distributions, a new Python 2 system is born. I've seen .NET Core developers explain that having .NET Framework shipped in Windows put them under massive constraints, and this was one of the motivations for a new runtime.


I'm not denying this phenomenon, but part of it is surely that widely used projects get more conservative because any change risks breaking something for someone somewhere. And the maintainers tend to feel a sense of responsibility to help people deal with these breakages.


I'll bring a slightly different perspective, as someone who's been using Python professionally for over a decade: there is no such thing as just saying "yes" or "no". Every "yes" to one group is at least an implicit "no" to some other group, and vice-versa.

The Python 2/3 transition is a great example of this. Python 2 continued an earlier tradition of saying "yes" to almost everything from one particular group of programmers: people working on Unix who wanted a high-level language they could use to write Unix utilities, administrative tools, daemons, etc. In doing that, Python said "no" to people in a lot of other domains.

Python 3 switched to saying "yes" to those other domains much more often. Which came with the inherent cost of saying "no" (or, more often, "not anymore") to the Unix-y crowd Python 2 had catered to. Life got harder for those programmers with Python 3. There's been work since then to mitigate some of the worst of it, but some of the changes that made Python nice to use for other domains are just always going to be messy for people doing the traditional Unix-type stuff.

Personally, I think it was the right choice, and not just because my own problem domain got some big improvements from Python 3. In order to keep growing, and really even to maintain what it already had, Python had to become more than just a language that was good for traditional Unix-y things. Not changing in that respect would have been a guaranteed dead end.

This doesn't mean it has to feel good to be someone from the traditional Unix programming domain who now feels like the language only ever says "no". But it does mean that it's worth having the perspective that this was how a lot of us felt in that golden age when you think Python said "yes" to everything, because really it was Python saying "yes" to you and "no" to me. And it's worth understanding that what feels like "no" doesn't mean the language is against you; it means the language is trying to balance the competing needs of a very large community.


"people working on Unix .... In doing that, Python said "no" to people in a lot of other domains."

Could you elaborate on this?

I thought Python was pretty good about supporting non-Unix OSes from early on. It was originally developed on SGI IRIX and MacOS. From the README for version 0.9:

> There are built-in modules that interface to the operating system and to various window systems: X11, the Mac window system (you need STDWIN for these two), and Silicon Graphics' GL library. It runs on most modern versions of UNIX, on the Mac, and I wouldn't be surprised if it ran on MS-DOS unchanged. I developed it mostly on an SGI IRIS workstation (using IRIX 3.1 and 3.2) and on the Mac, but have tested it also on SunOS (4.1) and BSD 4.3 (tahoe).

though it looks like there wasn't "painless" DOS support until 1994, with the comment "Many portability fixes should make it painless to build Python on several new platforms, e.g. NeXT, SEQUENT, WATCOM, DOS, and Windows."

I also thought that PythonWin had very good Windows support quite early on. The 1.5a3 release notes say:

> - Mark Hammond will release Python 1.5 versions of PythonWin and his other Windows specific code: the win32api extensions, COM/ActiveX support, and the MFC interface.

> - As always, the Macintosh port will be done by Jack Jansen. He will make a separate announcement for the Mac specific source code and the binary distribution(s) when these are ready.


So, take the Python 3 string changes as an example.

Python 2 scripting on Unix was great! Python just adopted the Unix tradition of pretending everything is ASCII up until it isn't, and then breaking horribly. And then the Linux world said "just use UTF-8 everywhere!" and really meant "just keep assuming things are ASCII, or at least one byte per code point, and break horribly when it isn't!"

This was great for people writing command-line scripts and utilities. This was a nightmare for people working in domains like web development.

Python 3 flipped the script: now, the string type is Unicode, and a lot of APIs broke immediately under Python 3 due to the underlying Unix environment being, well, kind of a clusterfuck when it came to locales and character encoding and hidden assumptions about ASCII or one-byte-per-character. Suddenly, all those people who had been using Python 2 -- which mostly worked identically to the way popular Linux distros did -- were using Python 3 and discovering the hell of character encoding that everybody else had been living in, and they complained loudly about it.

But for growing from a Unix-y scripting language into a general-purpose language, this change was absolutely necessary. Programmers should have to think about character encoding at their input/output boundaries, and in a high-level language should not be thinking of text as a sequence of bytes. But this requires some significant changes to how you write things like command-line utilities.

This is an example of a "yes" to one group being a "no" to another group. Or, at least, of it feeling that way.

Also, saying that Python was a great Unix-y language is not equivalent to "Python only ran on Unix and never supported Windows at all", and you know that, so it was kind of dishonest of you to try to start an argument from the assumption that I said the latter when really I said the former. Don't do that again, please.


You wrote: Python was a great Unix-y language is not equivalent to "Python only ran on Unix and never supported Windows at all",

Let me elaborate further. I recall that Mark Hammond at one of the Python conferences around 2000 said that Python was the language with the best support for Windows outside of the languages developed at Redmond. Hammond did much of the heavy work in making that happen.

I didn't mean to sneak in a dishonest argument.

I am under the genuine impression that Python worked well for Windows, with the narrow Unicode build that matched the UCS2 encoding that Windows used, and was comparable to the experience of developing under Python for Unix.

Similarly, I thought the native Mac support under, say, OS 9, was also well supported, and matched the Mac environment.

I'm not saying that there weren't problems, and I agree that that web development is one of the places where those problems came up.

Rather, I'm saying that I think the native Unix, native Windows, and native Mac support were roughly comparable, such that I don't think it's right to say that there was a really strong bias towards Unix.


What I said: Python was a great language for writing traditional Unix-y things like shell scripts, daemons, sysadmin tools, and here's an example of it adopting something that made that much easier.

What you are trying to twist that into saying: Python somehow didn't run on or wasn't used on or was terrible on operating systems not explicitly named "Unix".

I don't see any way to assume good faith on your part given you've repeated that attempt at putting words in my mouth while demonstrating knowledge that indicates you understand perfectly well what it was I really said. I'm going to ignore you now.


What you said was:

> Python 2 continued an earlier tradition of saying "yes" to almost everything from one particular group of programmers: people working on Unix who wanted a high-level language they could use to write Unix utilities, administrative tools, daemons, etc. In doing that, Python said "no" to people in a lot of other domains.

> Python 3 switched to saying "yes" to those other domains much more often.

I would like to know why you singled out Unix when it seems like Python also said "yes" to MS Windows.

Of course Python developers said "no" to other domains. Every language says "no" to some domains. I thought you were trying to make something more meaningful about a specific bias towards Unix.

Eg, as I recall, the Perl implementation was biased towards Unix and was difficult to compile under Windows. The glob syntax, for example, called out to the shell.

Honestly, I was expecting you to point out a difficulty that Python had with non-Unix OSes, specifically with MS Windows, which has since been remedied with Python 3.

I didn't expect this response at all, nor have my attempts to explain myself seemed to have made a difference.

I still don't know why you singled out Unix in your earlier comment. And it seems I will never know.


"Unix-y" is a paradigm or design philosophy, not an operating system. You can write unixy things for any OS. That's what the parent is talking about, not an operating system. https://en.wikipedia.org/wiki/Unix_philosophy


I think what would make things clear for me is if there was an example of how Python did something like a "no" for MS Windows support.

That is, outside of those places where MS Windows might (to the exasperation of Dave Cutler) be considered Unix-y.


>"Python only ran on Unix and never supported Windows at all",

I think the misunderstanding stems from no one having said this :P

One might rephrase "english-unix-ascii" from my other comment to "english-command line tooling-fixed width system encoding".

It was really a problem for web and fullish unicode.


Your wording wouldn't have cause me to raise an eyebrow.

But ubernostrum seemed to be making a stronger statement that Python favored "one particular group of programmers: people working on Unix ... to write Unix utilities, administrative tools, daemons, etc."

While I know that I used Python 2.x with the win32 extensions to write daemons for MS Windows, and to write an ActiveX extension for Excel.

That's why I wanted clarification on the basis for ubernostrum's statement, with pointers to why I thought Python was well-supported on other OSes.


The biggest thing is likely the str/unicode change. In py2, if working on a unix system with only ascii, you never had to think about strings. Suddenly with python3, you had to a little bit.

The gain was that for everyone else (read: web, non-English, anywhere where unicode is common), python became much easier to use. But for those specific english-unix-ascii cases, it was a mild inconvenience.

Edit: as ubernostrum pointed out, more than a mild inconvenience if you were porting code. If writing new code, it was not much worse, but porting was absolutely a pain.


And I thought that if working with Python 2 on a MS Windows system, then you also didn't really have to think about strings. That is, Python's narrow Unicode strings matched the native UCS-2 of Windows.

I did some Python 2 programming under Windows and don't recall string issues; certainly fewer issues than I've had in dealing with Python 3 changes.

I agree that what we have now is an improvement. I just don't see why the old way was really Unix-centric.


> In py2, if working on a unix system with only ascii, you never had to think about strings.

On any system with only ASCII, you never had to think about strings. Unix is an irrelevant word in this sentence.


That's a nice sounding comment, but... could you be a little more specific about what particular "traditional Unix-y things" did Python 3 say "no" to?

...I can't really think of many, if any at all. Sometimes you just say "no" to "inertia".


I think it's because they've seen exactly where saying "yes" leads them and they don't like that place.


They hate fast code?


Perhaps the known opportunities for dramatic perf increases require compatibility breaks and some expressiveness downgrades.

That Python 3 transition was a fun time, yeah?


If python 3 transition did bring dramatic perf increases I guess people would have been way faster to sell that upgrade to their hierarchies. Not that I blame python team for the lack of it. But fictional dev history would have been different


No doubt, huge perf improvements for free would have made the transition more compelling.

However, what I'm suggesting is that large perf wins are not free. Breaking compatibility too much more could have doomed Python 3. And now that the devs know how painful a transition is, they're far less likely to break compatibility again for any reason.


I think part of what explains this attitude in people is "lack of imagination". In the sense that sometimes, especially when an existing project or organization or bureaucracy has become huge and daunting, people cannot imagine excellence anymore, so they believe it to be literally impossible.


To be fair, they are frequently saying no to things other people think they should do (rather than saying no to things like contributions of startup improvements).


We is very abstract term I am sure if you proposed a patch that addressed the issue without adverse side-effects it would get accepted.


I think your comment is well-intentioned (I upvoted) but I respectfully disagree. I think wanting Python to be a bit faster is similar to wanting Haskell to have a little bit of mutability. Engineering with restrictions is a good thing, we can do great systems in Haskell because it's a very neat language even though it lacks mutability. We also can do great systems in Python because it's a very neat language even though it's a bit slow. Sure, you can always optimize Python's performance, that's a legitimate problem and it takes a few engineers to solve it. But it's more interesting to work around Python's slowness by engineering tricks such as better algorithms etc.


That's not a great analogy. Haskell is a neat language in part because it doesn't have mutability. Python is a neat language despite being slow.

I can't imagine anyone would object if Python could magically be 10x faster. I can't say the same thing for the Haskell thing.


My whole point is that 10x thing cannot just magically happen. The reason Python is slow is not incompetent programming or lack of magic. We know why that's happening. Because every variable in Python interpreter is a hash map and pretty much every operation is a hash map lookup. How do you optimize this? The only way is to remove language features like `setattr` and my whole point is that some people use Python because it's flexible enough to do that so they need their `setattr`.


> The reason Python is slow is not incompetent programming or lack of magic. We know why that's happening. Because every variable in Python interpreter is a hash map and pretty much every operation is a hash map lookup.

This statement is easily refuted by PyPy. Here's a simple program which runs 70 times faster in PyPy than Python on my machine - including startup time:

     https://pastebin.com/2n0PL9hY

> How do you optimize this?

Semantically, everything in JavaScript and Lua is also hash map lookups. Yet very smart people have made those languages very fast. CPython is not slow for the reasons you stated.


Not all python programs are compatible with PyPy though. Nor do they get much of a performance boost by switching. Pandas and NumPy for example didn't work on PyPy until less than a year ago. And a good chunk of Python codebases are going to use one of those at some point.


> Not all python programs are compatible with PyPy though.

I'm not sure how this is relevant to my reply... I didn't say PyPy was a replacement for Python. I said PyPy, Lua, and JavaScript are existence proofs that dynamic "hash table" languages don't have to be slow. Therefore, CPython must be slow for some other reason.


Pypy can't accelerate the Pandas/Numpy part anyway, the computationally intensive code in Numpy and Pandas is coded in C, C++ or Fortran.

Though there are a lot of if isInstance(Foo, bar) that could be avoided in a statically typed language.


> Sure, you can always optimize Python's performance, that's a legitimate problem and it takes a few engineers to solve it. But it's more interesting to work around Python's slowness by engineering tricks such as better algorithms etc.

Surely you're not implying that improving Python's performance would preclude finding interesting algorithms, nor that this is a suitable rationale for keeping Python slow? Anyway, algos can only get you so far when they're built on slow primitives (all data scattered haphazardly across the heap, every property access is a hash table lookup, every function call is a dozen C function calls, etc).


> I think wanting Python to be a bit faster is similar to wanting Haskell to have a little bit of mutability

I'm sorry but that makes zero sense. Haskell is defined by immutability. People want to use haskell because of that characteristic. I don't want to use python because it is slow.


Sorry I disagree. There is definitely a sense in which Haskell is desirable because it is immutable, e.g. I myself love immutable data structures, it certainly makes it desirable. But my point is that it puts a restriction. Now you cannot implement algorithms that need mutability such as hash maps. It is easy to circumvent such problems but one other way is basically introducing mutability to Haskell which totally doesn't make sense. I think same goes for python, if you want to make it significantly faster then you need to face certain trade-offs: maybe data model should be optimized, or maybe `int` shouldn't be arbitrary precision integers, or maybe there should be primitive types like `int`, `double` as in Java to increase performance. Truth is these are not Pythonic solutions, and just like mutability is not Haskell-esque, optimizing Python sacrifising these trade-offs is not Pythonic.


> one other way is basically introducing mutability to Haskell which totally doesn't make sense

It makes perfect sense, because Haskell is not "an immutable language". It's a language in which some things are immutable, and those that are not are explicitly indicated in the type system.


This is why large companies like Google often reinvents the wheel. Open source gives everyone the right to use, but not the power of control. Sure, you can fork, but then your version will diverge from the official, and the pain of maintaining compatibility may be greater than writing your own from scratch.


It's a byproduct of how many people you have to answer to. I was kind of having discussion with a coworker about an app that had a lot of features that made it seem kind of cluttered but useful. I think small projects can make bolder choices and enable more options because they have a smaller userbase that would be impacted by their changes and they want to be able to reach more people so adding a feature is generally a net benefit. But a larger project cannot risk hurting the large userbase they have already established so they have to be more cautious about the changes that they make.


I've always been disappointed at how quickly people make sweeping generalizations from a single anecdote. (I also think Python can do better here, but the generalization isn't justifiable.)


With major infrastructure like Python there's a tendency to over-emphasise compatibility between releases.

Look at this post in the same list thread: https://mail.python.org/pipermail/python-dev/2018-May/153300...

Python 3.6 is trying an enormous number of potential paths that code for imports might be found at. Why is that fixed in stone? Couldn't Python 3.(n+1) change that, if it's slow and historical, cutting out a bunch of slow system calls?

As someone who makes use of Python to deploy software, it's entirely possible that could cause me a few issues... which I'd fix quite easily. It should be totally reasonable to expect the community using the software to cope with those sorts of changes after a major release; the alternative is ossification.

Django suffered from maintaining too much compatibility, and releasing too slowly, and they fixed it. Three or four years ago everyone was talking about moving away from it; now they release often, deprecate stuff when they need to, and the project is as vibrant as it ever was. Time for cPython to learn the same lesson.


It may also be that they simply don't have an attack on the startup problem.


Competition. Hiphop VM lit a fire under the PHP team.


Everyone is focusing on python, but where is this "can do" spirit from mozilla? Their are languages with better startup times, bash, perl, lua, awk to name a few, and could likely do whatever the python scripts are doing.


> but where is this "can do" spirit from mozilla?

The mail included both

> At some point, we'll likely replace Python code with Rust so the build system is more "pure" and easier to maintain and reason about.

and

> Since I am disproportionately impacted by this issue, if there's anything I can do to help, let me know.


Python3 has the exact opposite problem: Too many devs willing to say "yes" to features and a small number of devs who try to keep things fast and maintainable.

Remember that Python2 was faster.


That changed with the dict improvements in 3.6.


This is true but python's relative slowness (along with the GIL) is an issue that is regularly blown out of all proportion.

Part of the reason for the language's success is because it made intelligent tradeoffs that often went against the grain of the opinions of the commentariat and focused on its strengths rather than pandering to the kinds of people who write language performance comparison blog posts.

If speed were of primary importance then PyPy would be a lot more popular.


You're conflating two kinds of "performance", startup latency and steady state throughput. We're talking about the former, and you're proposing improvements for the latter. In fact, moving to pypy is exactly what you shouldn't do to improve startup.

It's surprising but frequently true that startup latency has a greater effect on the perception of performance than actual throughput. Nobody likes to type a command and then be kept waiting, even if the started program could in principle demonstrate amazing feats of computation once warmed up.


The GIL is a pretty nasty problem once you try to scale things beyond one core.

Simply try something like unpickling a 10 GB data structure while keeping your GUI in the main thread responsive. You cannot do that because the GIL locks up everything while modifying data structures. Move the data to another process instead of another thread. Great, your GUI is responsive but you can't access the data from the main thread.

You can say that such a humongous data structure is wrong or that a GUI isn't meant to be responsive or programmed in Python or that I'm holding it wrong. Probably right.


I've flailed around with this a few times in the last year or so and have found that posting things up and down a multiprocessing.Pipe is the least painful alternative.


So you're basically building a distributed application just because you can't share memory properly. This can be very efficient if little communication is involved or a total nightmare if you have gigabytes of data where you need lots of random read access to walk the data structures at high speed. If you're not careful you spend most of your time pickling and unpickling the stuff you send over your pipes while requiring duplication of your gigabyte data structures in order to gain at least some parallelism.

I don't see a way around this mess with the current structure of python. You would have to reimplement the data heavy part completely in another language that provides proper threading models.


"You're holding it wrong" is a poor response to a wide audience, like iPhone users. But it's an OK response to a specialist, like someone tackling the task you describe.


I'm a professional Python developer and I run into performance problems a lot. Python makes things really hard for even specialists to "hold right". Contrast that with Go, which (for all the hate it gets) writes very alike well-formed Python in single-threaded applications, and writes how you would like to write Python in parallel applications. And all the while being two orders of magnitude faster. If we don't start taking performance seriously in the Python community, Go (or someone else) will eat our lunch sooner or later.


Go offers faster performance with code that is up to 50% longer - with the commensurate added maintenance burden.

And, go is still very slow compared to C, C++ or Rust.

Since performance is usually a power law distribution (99% of the performance gains are made in 1% of the code), it's frequently more effective - in terms of speed and maintenance burden - to code up hot paths in a language like C, C++ or Rust and keep python.


I accept that Go is more verbose than Python, but your maintainability claim doesn’t match my experience at all. I find that Go is more maintainable for a few reasons: magic is discouraged in Go, everyone writes Go in pretty much the same way and with the same style, Python’s type system is still very immature (no recursive types, doesn’t play nicely with magical libs). Further, in my experience with working with large Python and Go codebases, Python becomes less maintainable very quickly as code size increases, especially in the face of more junior developers. Go seems to be more resistant to these forces, probably because of the rails it imposes. Lastly, any maintainability advantages Python might have had are quickly eaten up by the optimizations, which are necessary in a much greater portion of the code base because naive Python is so much slower than naive Go.

Go is ~100X faster than Python and about half as fast as C/C++/Rust, and I find it to be at least as maintainable as Python for most (but not all!) applications.

As for your power law claim, I agree with the premise but not the conclusion—-“rewrite the hotpath in C!” is not a panacea. This only works when you’re getting enough perf gain out of the C code to justify the marshaling overhead (and of course the new maintenance burden).

I don’t like bashing on Python, but it doesn’t compete well with Go on these grounds. It needs to improve, and we can’t fix it by making dubious claims about Go. We should push to improve things like Pypy and MyPy, as well as other tooling and improvements.


Have you watched David Beazley's talks about using generators to implement coroutines? That might give you a similar pattern to goroutines. If non-blocking IO isn't the challenge, do you make use of the concurrent.futures module?

While I also encounter efficiency issues, most of them are frustrations with the overhead of serialization in some distributed compute framework or the throughput of someone else's REST API. As much as so many people complain about the GIL, it's never been a blocker for me (pun intended). Perhaps it's because my style in Python is heavily influenced by Clojure.

Now that I think about it, Python's string processing is often my bottleneck.


Coroutines aren’t parallelization, so they’re quite a lot worse than goroutines in terms of performance. If you want parallelism in Python, you’re pretty much constrained to clumsy multiprocessing. Besides parallelism, Python makes it difficult to write efficient single threaded code, since all data is sprinkled around the heap, everything is garbage collected, and you can’t do anything without digging into a hashmap and calling a dozen C functions under the hood. And you can’t do much about these things except write in C, and that can even make things slower if you aren’t careful.

Probably the best thing you can do in Python is async io, and even this is clumsier and slower than in Go. :(


I'm getting confused. Are you trying to do parallel compute or parallel networking?

If parallel networking, the benchmarks I've seen set Python asynchronous IO at about the same speed as Golang. The folks at Magicstack reported that Python's bottleneck was parsing HTTP (https://magic.io/blog/uvloop-blazing-fast-python-networking/). Note their uvloop benchmark was about as fast or faster than the equivalent Golang code.

If parallel compute, then multiprocessing is the way to go and Python's futures module ain't clumsy. It's just ``pool.submit(func)`` or ``pool.map(func, sequence)``. If you're asking for parallel compute via multithreading, you're going against the wisdom of shared-nothing architecture. Besides, pretty soon you'll want to go distributed and won't be able to use threads anyway.

In contrast to your experience, I find Python makes it easy to write efficient code. Getting rid of the irrelevant details lets me focus on clear and efficient algorithms. When I need heavy compute, I sprinkle in a little NumPy or Numba. My bottleneck is (de)serialization, but Dask using Apache Arrow should solve that problem.


> I'm getting confused. Are you trying to do parallel compute or parallel networking?

Parallelism conventionally means "parallel computation". For async workloads, you're right--there are third party event loops that approach Go's performance, but that's not the subject of my complaint.

Regarding parallelism, I haven't used Python's futures module specifically, but all multiprocessing solutions are bad for data-heavy workloads simply because the time to marshal the data structure across the process boundary poses a severe penalty. There are many other disadvantages to processes as well--they're far less memory friendly than a goroutine (N Python interpreters running, each with the necessary imports loaded), they require extra support to get logging to work as expected (you have to make sure to pipe stderr and stdout), they're subject to the operating system's scheduler, which may kill them on a whim.

> Besides, pretty soon you'll want to go distributed and won't be able to use threads anyway.

Processes have the same problem in addition to being generally less efficient.

> you're going against the wisdom of shared-nothing architecture

I mean, sort of. If you're doing a parallel computation on a large immutable data structure, you don't lose out on maintainability, but you gain quite a lot of performance (no need to copy/marshal that structure across process boundaries). The loss of maintainability is negligible due to immutability. Besides, there are lots of other good reasons to share things across processes, like connection pools, file handles, and other resources.

Also, it's terribly ironic that you're defending CPython and specifically its GIL on the basis of "shared nothing architecture".

> In contrast to your experience, I find Python makes it easy to write efficient code. Getting rid of the irrelevant details lets me focus on clear and efficient algorithms.

Then you'll love Go--Go has far fewer irrelevant details than Python and Python lacks many _relevant_ details, such as control over memory. Your efficient algorithm in Python will almost certainly be at least two or orders of memory better than the equivalent CPython without compromising much in terms of readability.

> When I need heavy compute, I sprinkle in a little NumPy or Numba.

I haven't used Numba, but I've seen a lot of Python get _slower_ with NumPy and Pandas (and lots of other C extensions, for that matter). You have to know your problem well or you'll end up with code that is less readable and less performant than the original, and even when it works it's still less readable than the naive Go implementation and not significantly more performant.

> My bottleneck is (de)serialization, but Dask using Apache Arrow should solve that problem

They'll help, but the fact that Python needs these projects when other languages have far simpler solutions is an admission of guilt in my view. That said, I'm excited to see what sorts of things these projects enable in the Python community.


> Processes have the same problem in addition to being generally less efficient.

What I meant was that you should consider a multiprocessing approach that shares essentially no data between processes. As you say, the memory copying overhead is highly inefficient. Once you approach a problem like that, you've already implemented an essentially distributed system and the change is trivial.

I've regretted multithreading enough times to convince me it's almost never the right choice. Mostly because I find I've underestimated the project scale and needed to rewrite as distributed. Maybe those new monstrous instances available on EC2 will change my habits. I've never had such flexible access to a 4TB RAM / 128 core machine before.

> the fact that Python needs these projects

Apache Arrow solves problems for many languages. The ACM article that popped up the other day, "C is not low-level" touched on some of the issues.

https://arrow.apache.org


Python derives a good chunk of its speed (if not all of it) from carefully tuned libraries written in other languages (or even for other architectures in the case of many machine learning packages). As soon as you try to do a lot of heavy processing python even the compiled versions quickly bog down. IMO the best way to use python is to use it to cleverly glue together highly optimized code. That way you spend the minimum amount of effort and you get maximum performance.


Multithreading is the glue I need. How am I supposed to write optimized native module to spawn threads to do computation in numpy and pandas?


Yeah, that was kind of my point :/


I have to say that my first reaction was: "maybe you shouldn't use python for this, then". If you are using a language in a way that it gets worse in subsequent versions, that's a good sign that they're optimizing for something other than what you care about.

The programming language R does not, as I understand it, optimize for speed, because they are optimizing for ease of exploratory data analysis. R is growing quite rapidly. So is python, actually. It doesn't mean that either one is good at everything, and it's probably the case that both are growing because they don't try to be good at everything. A good toolbox is better than a multi-tool.


(I authored the linked post)

While the "maybe you shouldn't use Python" comment could be construed as trolling to some, there is definite truth to your line of reasoning and I agree with comment.

I absolutely love Python as a programming language for the space it is in. But as someone who needs to think long term about maintaining large projects with lifetimes measured in potentially decades, Python has a few key weaknesses that make it really difficult for me to continue justify using it for such projects. Startup time is one. The GIL is the other large one (not being able to achieve linear speedups on CPU-bound code in 2018 with Moore's Law dead is unacceptable). General performance disadvantages can be adequately addressed with PyPy, JITs, Cython, etc. Problems scaling large code bases using a dynamic language can be mitigated with typing and better tools.

Python can be very competitive against typed systems languages. But if it fails to address its shortcomings, I think more and more people will choose Rust, Go, Java, C/C++, etc for large scale, long time horizon projects. This will [further] relegate Python to be viewed as a "toy" language by more serious developers, which is obviously not good for the Python ecosystem. So I think "maybe you shouldn't use Python for this, then" is a very accurate statement/critique.


I would characterize Python's weaknesses differently.

Startup time is a problem for Python. But concurrency is much more complex than you state: threading is not the only or best concurrency model for many applications. And certainly removing the GIL will not just enable Python "to achieve linear speedups on CPU-bound code". Distributed computing is real. One of Python's problems for a long time was not the GIL, it was the sorry state of multi-process concurrency.

The speed issues that JITs solve for other languages may not be solvable in Python due to language design.


I'm totally OK with Python's threading choice of saying only 1 Python thread may execute Python code at any time. This is a totally reasonable choice and avoids a lot of complexity with multithreaded programming. If that's how they want to design the language, fine by me.

But the GIL is more than that: the GIL also spans interpreters (that's why it's called the "global interpreter lock").

It is possible to run multiple Python interpreters in a single process (when using the embedding/C API). However, the GIL must be acquired for each interpreter to run Python code. This means that I can only effectively use a single CPU core from a single process with the GIL held (ignoring C extensions that release the GIL). This effectively forces the concurrency model to be multiple process. That makes IPC (usually serialization/deserialization) the bottleneck for many workloads.

If the GIL didn't exist, it would be possible to run multiple, independent Python interpreters in the same process. Processes would be able to fan out to multiple CPU cores. I imagine some enterprising people would then devise a way to transfer objects between interpreters (probably under very well-defined scenarios). This would allow a Python application to spawn a new Python interpreter from within Python, task it with running some CPU-expensive code, and return a result. This is how Python would likely achieve highly concurrent execution within processes. But the GIL stands in its way.

The GIL is an implementation detail, not poor language design.


It is a tractable amount of work ~40-80 hrs to convert CPython from a sea-of-globals to a context based system where one could then have a distinct Python interpreters in the same address space, as it is now. You get one. Lua got this right from the beginning, Lua state doesn't leak across subsystems. There is zero chance I would do this work and then see of it would stick. I am going to waste 2 weeks of full time work and then have the CPython folks say, yeah, no, because reasons.

Startup time should be fixed, Python does way too much when it boots, using blank files.

    $ time lua t.lua 

    real	0m0.006s
    user	0m0.002s
    sys	        0m0.002s

    $ time python t.py 

    real	0m0.052s
    user	0m0.036s
    sys	        0m0.008s


Lua supports the scenario you describe effortlessly not to mention that it's actually designed for embedding.

Python can't even be re-initialized in the same process without introducing memory leaks and other non-deterministic gotchas! [1]

[1] https://docs.python.org/3.6/capi/init.html#c.Py_FinalizeEx


> The GIL is an implementation detail, not poor language design.

As I understood the GIL simplifies data structures by removing any regard for concurrent access.

If you remove the GIL you must move your synchronization (mutexes) into the data structures and immediately get a big performance penalty.

If you wanted to avoid this overhead you run into swamplands where the programmer must take care of concurrent access patterns and everything. Also many CPython modules would stop working because they assume the GIL.

It can be done but last time I read about the GILectomy there was no clear way forward.


Yeah, I think this kind of issue is why Ruby, which also has a GIL, seems to be heading for a new concurrency and parallelism model that introduces a new level (Guilds) between threads and processes where the big lock would be held, and where Guilds communicate only by sharing read access to immutable data, and transferring ownership or copies of mutable data.


I agree that this is an implementation detail. If they were to simply use the JS model of "every thread gets its own environment and message passing is how you interact", then you could still use threads safely and achieve some pretty impressive performance improvements in some cases.

Knowing literally nothing about Python other than what I read, I'm kind of confused as to how the current implementation came to be, because it is much easier to design an interpreter that uses the JS model than one that uses a shared environment among multiple threads. I created an Object Pascal interpreter, and it has this design: it can spin up an interpreter instance in any thread pretty quickly because it's greenfield all the way with a new stack, new heap, etc.


Python's slowness can help improve performance by teaching you to use techniques that end up being faster no matter the language.

Python is so slow that it forces you to be fast.

Consider data analysis: on modern machines, you're almost always better off with a columnar approach: if you have a struct foo { int a, b, c; }, you want to store int foo_a[], foo_b[], foo_c[], not struct foo data[]. It's better for the cache, better for IO, and better for SIMD.

numpy makes it much easier to use the latter than the former, whereas in C, you might be tempted with the former and not even realize how much performance you were leaving on the table. Likewise for GPU compute offloading, reliance on various tuned libraries for computationally intensive tasks, and the use of structured storage.


Sorry, I didn't mean it to be trolling, I just meant it more or less literally. If Rust (for example) gets used for things like Mercurial and Mozilla, is that bad? I'm not saying Python shouldn't care, if it could improve the startup time without sacrificing other things. But presumably the transition from py2 to py3 was not intending to make things slower, it was intending to solve other problems. There are almost always tradeoffs. Even the mercurial folks quoted in the article said that the things py3 solved were not what they needed. That's a good indicator that Python is not the right language (anymore) for what they're doing.

I am primarily a Python programmer, but if Rust, Go, etc. take over as the language of choice in certain cases, I don't think that's a bad thing. Which doesn't mean one shouldn't write an article to highlight this cost of not having short startup time, just in case this cost wasn't understood by Guido, et al. But my guess (and it's only a guess), is that it was.


> While the "maybe you shouldn't use Python" comment could be construed as trolling to some, there is definite truth to your line of reasoning and I agree with comment.

I wouldn't say I construed it as trolling. More like, "You might be right, but where does that get us?" Not trolling, but also not that constructive, because it's extremely easy to write something like "maybe you shouldn't use Python" but likely hard and time-consuming to make it so.

There are a lot of questions when considering such a move. For example:

- What's the opportunity cost of migrating $lots_of Python to Rust, or some other language?

- Is that really where you can add (or want to add) the most value?

- And what does having to do that do to your roadmap? Maybe it enables it, but surely it's also stealing time from other valuable work you could be doing?

- Longer term, are we sacrificing maintainability for performance? (In your case it sounds like the opposite?)

- How easily can we hire and onboard people using $new_tech? (Again, it sounds like you might reduce complexity.)

Basically I suppose what I'm saying is I find it a little trite when people say, "well, maybe you should do X," without having weighed the costs and benefits of doing so. And in a professional environment, if that's allowed to become a pattern of behaviour, it can contribute to the demotivation of teams. Hence, I found myself a bit irritated by the grandparent post.


Python was always slow to start. Not as slow as the JVM, but maybe around the 300th test case for hg and maybe around the 100th python script invocation in any build system, people should start to wonder about how to get all of that under one Python process.

It's not like Python is so ugly it'd be messy to do. (It was possible with the JVM after all. It even works by simply forking the JVM, with all its GC threads and so on: https://github.com/spray/sbt-revolver )

Make style DAGs are nice, but eventually the sheer number of syscalls for process setup (and module import and dynamic linking) are going to be a waste of time.


If one needs Rust, C/C++ level of performance I doubt there is much Python can do and one can wonder if Python was ever the right tool for such a project.


It’s a great tool for prototyping.


If you expect to need the performance of a statically typed, compiled language I don't see why you'd prototype in a dynamically typed, interpreted language.


That's why build systems still look like black magic infused with even darker sh, and a bit of perl sprinkled all over, presumably because the previous maintainers were all out of goat blood.


Most layers of a large project that need to be designed and figured out care nothing of those concerns.


I feel bad for even thinking it but ... I bet go's startup times are great.


I think your characterization of the GIL is not accurate. Show me ANY real world program that can achieve linear speedups on multicore or multi-processor systems. Humans have not sufficiently mastered multithreading to be able to make such a claim. I am not aware of any "CPU-bound" use cases that would actually use Python like this instead of, say, C or Fortran. And anyway, I submit that it would benefit (both from a design and an execution standpoint) from being multi-process (in other words, using explicitly coded communication).


Regarding the GIL I‘ve always wondered about Jython but never gotten around to trying it. What are the drawbacks of running it on a JVM to get true multithreading? Having to properly sync the threads like in other environments without global locks?


Nothing, it's just not maintained. People realized, that yeah, python is nice, but why spend years reimplementing it on the JVM, when there's Kotlin. (And Java itself is quite a breeze to program in nowadays. And of course Scala, if you dare go beyond the Pythonic simplicity.)


Jython doesn't look completely unmaintained: https://hg.python.org/jython

It's also not completely obsoleted by Kotlin, e.g. for the use case of calling a Python library from Java. However, the Python semantics are not a great fit for the JVM, so you should expect it to be slower than plain CPython: https://pybenchmarks.org/u64q/jython.php


The supposed attitude of the python developers about startup time works against the popular niches Python is supposed to be such a great fit for. Little scripts, glue, short run applications.

That’s a problem if that’s an area python wants to compete in.


I might be biased because I'm from the hordes that are moving from Stata and Matlab to Python (but then there are the hordes attracted to data analysis now), but that was never really Python's strong suit, nor its target market.

I mean, I was always into little scripts, but I used Tcl and then Perl.


Back in the 1990s, Python was promoted as a web programming language. This was back in the days when everyone used CGIs. Python came with an cgi module, while in Perl you had to download cgi-lib.pl. I even helped maintain a Python web application that was all CGI-based.

So I can assure you that at one point Python was trying to be in the "short run applications" space. They may have given up since then, but that's a different issue.

As for me, I do write little scripts in Python. I don't like how most of my run time is spent waiting for Python to get ready.

What I really don't like is using NumPy. I tend to re-implement features I want rather than reach for NumPy because that 0.2s import time irks me so much. And it's because the NumPy developers want people to do "import numpy; numpy.reach.into.a.deep.package", so they import most of its submodules.

They used to also eval() some code at import, causing even more overhead. I don't know if that's gone away.


Ah, the days when I knew quite some people doing Zope consulting.

Apparently it is still around.


Ah, Zope. I remember when the IPC Python conference seemed to double in size (I think it was the DC one). 1/2 the people were seemingly there because of Zope.


Both Tcl and Perl are dead languages walking these days, and it's Python that's displaced them. It absolutely competes in that market.


Markets are a funny thing. Both "dead" languages are thread safe and can easily run separate interpreters per thread.

See: "THREADS DONE RIGHT... WITH TCL"

https://www.activestate.com/blog/2016/09/threads-done-right-...


> Markets are a funny thing. Both "dead" languages are thread safe and can easily run separate interpreters per thread.

Perl threading is officially recommended against IIRC? In either case, "threads" in either of them don't share memory (except explicitly and manually), at which point what you have is multiprocessing by a different name.


In Perl 5 threading things are only "shared" semantically. In reality, there is an extra interpreter running where the "shared" variables live, and any value fetches and stores are handled with the `tie` interface (see http://www.perlmonks.org/?node_id=288022 for more information).

In Perl 6 on the other hand, everything is always shared and you can use high level constructs such as atomic increments, supplies, taps, `react` and `whenever` so you don't have to think about any (dead)locking issues as a developer.


Tcl has an "easy" threading mode where each thread runs its own interpreter, and a "hold my beer" mode where you can spawn threads within a single interpreter.

Tcl also has synthetic channels, so you can stay in easy mode but open a bi-directional read/write channel with the two ends in separate threads, with readable/writable events assigned, so you can do automatic event-driven information sharing between threads.

I don't know any other language that gives you options like that.


I do have to wonder if Tcl would have gained significantly greater mindshare if the syntax had been more Algol-like.


Quite a few sysadmins around here have a different point of view regarding Perl.


Dream on.


The linked post is about Python startup being a problem with thousands of invocations. Is Python startup really a problem for the niches you mention, or is it a problem in some extreme edge cases? I would argue this is the latter and perhaps signals that an architecture change for the build or tests would be best.

I have been using Python for small scripts for 20+ years and haven't had this issue. The JVM on the other hand was historically slow to start.


If you need to run thousands of scripts, do you need to (re-)start Python for each script? IMHO what needs to be done for this problem is not faster startup, but a way to avoid startup by implementing a feature where you can keep a single Python "machine" in memory that can make a "soft reset" to execute a fresh script.


Yep. Even PHP solved this :)

That said PHP startup (parsing, because there's no on disk bytecode cache like .pyc - though there's an in-memory one [OPcache], as somewhat expected from a server thingie) was always quite fast, and it got a bit faster in php7: https://wiki.php.net/rfc/abstract_syntax_tree#impact_on_perf...


Yep. Tried to use a Raspberry Pi as my main system for a while and one of the pain points was slooooow startup of Python. As a Python fan I was embarrassed.


I don't particularly agreed about this being what "Python is supposed to be such a great fit for."

I've been to quite a few PyCons and never heard anyone espousing this view, but I'm open to the possibility that I have missed it. Can you link me to a piece of media that you think persuasively makes the case that this is what Python is supposed to be for?


Python is not optimized for small glue code at all. The fact that it is the sanest language for use in that niche speaks much more about the ecosystem than about Python.

Python seems to be mainly optimized for web servers, scientific computing and machine learning tasks. None of those care about startup time.


Python is really only the target for those because someone lied to all of the systems folk and told them that Ruby was too slow. (The previous wave of infrastructure management tools seemed to all be written in Ruby and nowadays it's Python or Go.) That and python is one of the "official" languages at Google and everyone wants to be Google, right?

Meanwhile, Ruby is making great strides in performance and even has JIT coming in 2.6.


Not sure why you're saying Python was used as an alternative to Ruby when Python is older than Ruby.


Its popularity and its popularity specific to the mentioned use-case is not, for those that know their history.


I think this is why Mercurial is switching (largely) to Rust: https://www.mercurial-scm.org/wiki/OxidationPlan


I totally understand that milliseconds matter in the use case described in the article.

For me, personally, I use python to automate tasks - or to quickly parse through loads and loads of data. To me, startup speed is somewhat irrelevant.

I built a micro-framework that is completely unorthodox in nature, but very effective for what I needed - that being a suite of tools available from an 'internet' server, available to me (and my coworkers) over port 80 or 443.

My internet server, which runs python on the backend (and uses apache to actually serve the GET / POST) literally spits out pages in 0.012 seconds. Some of the 'tools' run processes on the system, reach out to other resources, and spit the results out in under 0.03 seconds (much of that being network / internet RTT). To me, that's good enough - adding 30 or even 300 milliseconds to any of that just wouldn't matter.

I totally get that if Python wants to be a big (read bigger?) player then startup time matters more...but for my personal use cases, I'm not concerned with the current startup time one bit.


As expected, language start up time only matters to some people. Often in my case, Python is used to build command line tools (similar to the case of Mercurial).

In such an event, the start-up time of the program might dominate the total run time of the application. And on my laptop or desktop with a fast SSD with good caching and a reasonably fast CPU... that still ends up being 'okay'.

But once I put that on an ARM chip with a mediocre hard drive - some python scripts spend so long initializing that they are practically unusable. Whereas the comparable Perl/BASH script runs almost instantaneously.

Often to make Python even practically usable for such systems I have to implement my own lazily loaded module system. Having some language which allowed me to say...

    import(eager) some_module
    import(lazy) another_module

Which could trigger the import process only when that module becomes necessary (if ever).


Have you tried moving import statements into the functions where they are invoked? My understanding is this is effectively the same as lazy loading the module[1].

[1] https://stackoverflow.com/questions/3095071/in-python-what-h...


I have, and it actually works well (performance wise). The maintenance burden is a little higher.


A little Python preprocessor that lets you annotate your lazy modules sounds like a fun little toy project, actually. Not something I'd use for real, but it would be fun to build.


3.7 makes it easier to use dynamic imports. https://snarky.ca/lazy-importing-in-python-3-7/


Python is moving to have a lazy loader as part of the standard library. I mean, it's there already, at https://docs.python.org/3/library/importlib.html#importlib.u... , but not clearly easy to use, and with a big warning label against using it.

The issue at https://bugs.python.org/issue32192 says the plan is to start with an easier to use system as a PyPI package.


I've also written a little asynchronous module loading system. Why not load a module we know we're going to need in the background?


I think you're telling us about how you're not affected by a problem that does affect other people. I feel like this doesn't add any substantial, interesting points to this discussion.


I have similar use cases. Startup time starts to matter once you either want to build test cases or put scripts in loops. If I have a script that parses one big data file, and I decide to parse 1000, it's often helpful if I can run that script a thousand times rather than refactor it to handle file lists. Or if you want to optimize some parameter.


> To me, startup speed is somewhat irrelevant.

But isn’t that the author’s point? It doesn’t seem like much time but because you’re paying it so often in so many little places it really does add up.


Sort of related story: we needed a scripting language able to run on an x86 RTOS type of architecture compiled with msvc and looked into CPython because, well, Python is after all quite a nice language. After spending a considerable amount of time to get it compiled (sorry, don't recall all the issues there, but main one was that the source code assumed msvc == windows which I know is true for 99% of cases but didn't expect a huge project like CPython to trip over) it would segfault at startup. During step-by-step debugging it was astonishing how much code got executed before even doing some actual interpreting/REPL. Now I get there might not be a way around some initialization, but still it simply looked too much to me and perhaps not overly clean either. Moreover it included a bunch of registry access (again, because it saw msvc baing used) which the RTOS didn't have in full hence the segfault. Anyway we looked further and thankfully found MicroPython which took less time to port than the time spend to get CPython even compiling. While not a complete Python implementation, it does the job fur us, and it gets away with startup/init code of just something like 100 LOC (including argument parsing etc). Yes I know it's not a fair comparision, but still, the difference is big enough to, at least for me, indicate CPython might just be doing too much at startup and/or possibly spend time on features which aren't used by many users and/or possibly drags along some old cruft. Not sure, just guessing.



Context?


Mercurial's startup time is the reason why, for fish, I've implemented code to figure out if something might be a hg repo myself.

Just calling `hg root` takes 200ms with hot cache. The equivalent code in fish-script takes about 3. Which enables us to turn on hg integration in the prompt by default.

The equivalent `git rev-parse` call takes about 8ms.


Wow, that's quite a difference.

But 8ms is still too slow for me. :) I implemented the Git recognition code myself in my own prompt using the minimal amount of FS operations [1], and it renders in 5 ms from start to finish, including a "git:branch-name/47d72fe825" display.

[1] https://github.com/majewsky/gofu/blob/master/pkg/prompt/git....


(I work on Git in my copious free time)

One of the reasons git-rev-parse takes slightly longer than your implementation is that you just unconditionally truncate the SHA-1 to 10 bytes. E.g. run this on linux.git:

    git log --oneline --abbrev=10 --pretty=format:%h |
    grep -E -v '^.{10}$' |
    perl -pe 's/^(.{10}).*/$1/'
You'll get 4 SHA-1s that are ambiguous at 10 characters, this problem will get a lot worse on bigger repositories.

Which is not to say that there isn't a lot of room for improvement. The scope creep of initialization time is one of the things that tends to get worse over time without being noticed, but Git unlike (apparently) Python makes huge use of re-invoking itself as part of its own test suite (tens of thousands of times), so it's naturally kept in check somewhat.

If you have this use-case I'd encourage you to start a thread on the Git mailing list about it.


I put similar code in Emacs's vc-hg to get revision information straight form Mercurial's on-disk data structures instead of firing up an hg subprocess.


You mean actually reading dirstate[0] or just the branch/bookmark files?

We also do the latter, but dirstate format isn't easily readable just with shell builtins (lots of fixed-length fields with NUL-byte padding, also we don't even have a `stat` builtin and the external program isn't a thing on macOS AFAIK), so we still fire up `hg status` for that - but only after we decide that there is a hg repo.

[0]:https://www.mercurial-scm.org/wiki/DirState



Somewhat tangentially, I noticed that fish performs quit badly in remote-mounted (sshfs) directories that are git repositories. I wonder if it would be possible to detect a remote mounted filesystem and turn off/tone down some of the round-trip heavy operations?


I've gone through your problem myself countless times, and concluded that hitting ctrl+c to interrupt the status line every time it tries to render the current repository state is not very productive.

My git status line uses timelimit (https://devel.ringlet.net/sysutils/timelimit/) to automatically stop if any of the git status parts (dirty/staged/new files) take > 0.1 seconds to finish:

https://github.com/justuswilhelm/pufferfish/blob/master/fish...


I implemented something similar for Xonsh.


Ironically, xonsh itself suffers from a long startup time due to it's use of python. This is my primary (negative) experience with the issue in the linked article, and the reason why I stopped using xonsh.


This is truly a problem. Even more so if you host your application on a network directory. Loading all the small files takes ages. I really wish there would be a good way to compile the whole application with all the modules into one package once you're ready to release. I really wish the creators of Python would have given such use-cases more consideration.

Edit: I'm aware that there are solutions that put everything a program touches into a kind of executable archive. A single file several hundred Megabytes in size. I've tested it. It doesn't really pre-compile the modules. The startup time was exactly the same.


Nuikta (http://nuitka.net/) already does that and much more:

- it compiles your program and make it stand alone so you can distribute just the exe

- it makes it start faster

- it makes it run faster

- it's fully independant of the system python. Actually your system doesn't even need a python at all

I don't get why it's not used, it's very robust, compatible with 3.6 and on some of my script I get about x4 speed up just on start up alone.


This is different from the package that I've tested (PyInstaller or py2exe).

Is Nuikta compatible with numpy, pickle, etc? I remember that numpy was very problematic with compilers like pypy for a long time.


In my experience it's easier and more reliable than PyInstaller or Py2exe to use, and cross plateform (but no cross compilation). It doesn't pack python files with an executable. It translates the Python code to C then compiles it.

Nuikta supports numpy officially, you can even see it in change logs: http://nuitka.net/posts/nuitka-release-0521.html

I haven't tried pickle.


>This is different from the package that I've tested (PyInstaller or py2exe).

In an ancestor comment you say:

> A single file several hundred Megabytes in size.

Are both points referring to PyInstaller? Asking because I've tried out PyInstaller with small CLI as well as GUI (wxPython) programs, and the resulting EXEs did not reach near that size, IIRC.


It was around 150..300 MB if I remember right. Admittedly, I do import a lot of modules. Program startup for me is in the seconds, not milliseconds. I maybe could cut it down 50% or even 80% but then that would cost a few weeks or a few months. Having a quick and robust solution to be implemented in a single week to cut down startup times by that amount would be highly preferable.


Interesting. I see what you mean. That tip that someone else mentioned, IIRC, in this thread, to import modules inside functions that use them, could cut down startup time, but only if those functions are not called at startup, only later during the program's run. But it would not change the EXE size, I guess.

I wish Python and all other interpreted languages came with a way to build EXEs from the start. It would be great for deployment.


First time I hear about this, and I've looked for alternatives to cxfreeze and its cousins in the past.

Any time I see something like this, I feel like I'm hearing about some homeopathic cancer cure. If Nuitka actually does what it says it does, it's solving a big recurrent problem for the Python community, so why is nobody talking about it?


Having followed Nuitka since it started, I can offer my perspective:

- Before Nuitka, someone already did a "Python->C" compiler along similar lines: translate the Python source code into the C calls that the interpreter will make, eliminating any interpreter overhead and providing C compiler optimization opporunities. That thing sort-of-worked (with v1.5.2 IIRC), but was cumbersome to use and delivered a meager 5% performance improvement for the cases it did support; it was abandoned.

- Nuitka's plan had the same thing as a starting phase; people told the Nuitka guy that he's wasting his time based on prior experience. When he actually delivered a mostly-robust working version (much more usable than the previous attempts ever were), it indeed delivered only a small performance gain compared to CPython.

- As a result, it seemed like the community believed both that the whole thing is futile, and that the developer is fighting windmills.

- a lot of time passes, Nuitka keeps improving with better analysis, translation, compilation, etc - but the community has already cemented its opinion.

- Nuitka remains a useful magic system known to few.

I would say that the early Nuitka versions (and the prior attempt) gave it a SEP field that has never been lifted, and short of e.g. DropBox or Facebook adopting it, nothing will lift it either.


I think Nuitka is an answer looking for a problem. Usually people move away from Python. If they really need more low level performance, or implement whatever system in Python itself, to solve the problem Nuitka might solve for them.

That said, it's a wonderful project, I hope more and more people will find it useful for them.


Quite the opposite, distributing a Python binary is one of the most popular demand in the community, along with better multiple core and a JIT.

It used to be packaging and V2/V3, but those are fading away now with wheels everywhere and 3.4+ being the new love affair. Python has been in the habit of improving every year, steadily for 28 years, solving the problems the community asked every time.


How is that the opposite? :)

It just means that there are a lot of folks who stay on Python, but want better deployment. That's great, but we don't see those who simply move away from Python (and use Electron with a Rust backend for example, or go full web + maybe native Android/iOS apps).


That's the question I'm asking.

Not only it's a beautiful tool, but the author has been quietly and steadily working on it for 8 years. Compatibility is the number one goal.

The guy has a lot of rigor and humility, so maybe communication suffered ?


Let's give it some visibility then. https://news.ycombinator.com/item?id=16980704


hg allows loading modules at runtime. maybe thats a problem.


I doubt it, since you can pass pass manually a list of all modules you want nuikta to embed with --recurse-plugins=MODULE/PACKAGE


But the problem is that you can't know the list you need at embed time for hg, because extensions are arbitrary python files discovered at runtime.


I collect ideas, especially weird and powerful ideas.

I've learned not to try to talk about it because of that question: "If foo is so great, why isn't everybody using it?"

It's one of the single greatest frustrations of my life. I don't know. I've never made any progress on it. The best you can say is, "Well, that seems to be human nature." The world is full of "magic beans" and most people seem interested in banging their heads against the wall.

(Did you know, you can make a 140hp engine that fits in the volume of two stacked pizza boxes and has only one moving part?)

Anyhow, Nuitka is great, it does do all that.

And the creator is a freakin' saint for putting up with the way he's been treated by the Python community, is my opinion.


Also, you should write a blig about those ideas. Or a place where we can share it, but in a non tin foil way / silver bullet way.

Eg: after 7 years of having malaria, one tropical disease doctor explained to me that we have been able to cure malaria for years. Generalist doctors usually don't know it because they don't encounter the disease often enough to keep up to date. It's kind of hard pill to swallow given that i always though you had it for life.


I'd love to, but from experience I can tell you, you just get skeptics and crackpots and scammers and suckers crawling out of the woodwork and gumming up the scientific/inventive process. That, combined with the apathy of the general public, means that it's just hard to get a real conversation going in a "non tin foil way / silver bullet way".

It also means that a lot of great ideas go nowhere, or take 50-100 years to get adopted. Your experience with malaria is an example. My condolences btw, that sounds terrible.

In any event, there's Rex Research: http://rexresearch.com/1index.htm (IGNORE HOW IT LOOKS!!!) This fellow has been collecting inventions and other weird stuff for decades, since before the internet. He used to run little ads in the back of Popular Science and others like that. Yes, the site looks a little... creative, and much of the stuff he lists is just crackpottery, but the stuff that isn't is mind-blowing.

Just one example, one of my favorite devices: the Hilsch-Ranque Vortex Tube. It's a "Maxwell's Demon" (although it does not violate Thermodynamics. Of course.)

> The vortex tube, also known as the Ranque-Hilsch vortex tube, is a mechanical device that separates a compressed gas into hot and cold streams. It has no moving parts.

http://rexresearch.com/ranque/ranque.htm

You can actually buy these to go on the end of a compressor and provide "spot cold" for cooling off whatever. I emailed a company that sells them once to ask what would happen if you set it up in a feedback loop so that the cold output was chilling the input line, but they were uninterested.

Oh hey, time marches on and it now has a wikipedia page! https://en.wikipedia.org/wiki/Vortex_tube

Anyhow, like I said, I'd like to blog about this stuff but most of the people who would be into it would either haters or credulous fools. Speaking of which, youtube has lots of videos of people talking about and sometimes even demonstrating things. But again, you have to wade through all kinds of bullshit and scammers and hoaxers and skeptics and credulous fools to actually find the handful of people who take this stuff seriously but can maintain a proper detached scientific attitude to actually investigate it. YMMV


Saint is the word given he has be working really seriously at it for 8 years, alone, never complaining, and giving away everything without any recognition.


Bingo.


It's a function of how many people really need that solution. It's high enough that it exists. It's low enough that it's not a thing you do by default or even talk about much. But if you need it and it's compatible with your code - it's there.


Actually any sysadmin or scripter wannabe could benefit from nuikta. Twitter even invented pex because of that. Nuikta took some time to become 100% compatible with the latest versions of cPython, but now it's the case so let's all enjoy it.


I like nuitka and what it's doing, but it's just not that comfortable in every single situation. There's value in seeing the source and being able to modify it in place without a compile&redeploy steps. There's value in dtrace support which as far as I can tell nuitka doesn't have. And other small things like existing profilers which work with cpython stack traces specifically.

So if there is a reason to use it - great. But that's not every situation.


Can't you use CPython for development and Nuitka for deployment?


Does nuitka build a static executable or do you still need to supply shared libraries with the executable?


It builds a static executable if you pass it the --standalone option.


No, it doesn't.

  $ echo 'print "Hello world"' > hello.py
  $ nuitka --standalone hello.py 
  $ ldd hello.dist/hello.exe
  	linux-gate.so.1 (0xf7f94000)
  	libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xf7895000)
  	libpython2.7.so.1.0 => /home/jwilk/hello.dist/libpython2.7.so.1.0 (0xf7508000)
  	libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf732f000)
  	/lib/ld-linux.so.2 (0xf7f96000)
  	libz.so.1 => /home/jwilk/hello.dist/libz.so.1 (0xf7310000)
  	libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf72f1000)
  	libutil.so.1 => /home/jwilk/hello.dist/libutil.so.1 (0xf72ed000)
  	libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xf71eb000)


Ok, it buids a static executable against the python runtime and extensions, including the c ones. It doesn't against libc, lubutil, etc. But aren't those almost always installed ?

If it's a real problem on your machines, it's PR time !


Nuitka has not been able to compile any Python code I've written myself. It's not used because it's incredibly limited.


When was the last time you've tried it?


Is it a perl2exe descendant, packing the interpreter into an executable wrapper?


No it compiles python to C, then compile the C.


I know this isn’t everyone’s favorite but Cython has a way to convert your python code Into an executable with Python embedded and I bekievr it also Packs your imports

Cython is a complicated beast but I feel like it just needs a more friendly wrapper for this to be more widespread.

https://stackoverflow.com/questions/22507592/making-an-execu...

https://github.com/cython/cython/wiki/EmbeddingCython

Why Cython isn’t in the stdlib (I think it could easily replace ctypes) is beyond me sometimes


I worked on one Python application that had a startup time problem because it was on a network filesystem with slow metadata/stat times. It took several seconds to start Python.

We were able to solve most of the problem by zipping up the Python standard library and the our application.

That is, if you look at sys.path you'll see something like:

  >>> sys.path
  ['', '/usr/local/lib/python36.zip', '/usr/local/lib/python3.6', ...]
If you zip up the python3.6 directory into python36.zip then it will use that zip file as the source of the standard library, and use the zip directory structure instead of a bunch of stat calls to find the data.

This should also include getting access to the pre-compiled byte code.

You can also have Python byte-compile all of the .py files in a directory as part of your build/zip process.

  python -m compileall --help


Don't forget

  find . -type f -name "*.py" -delete
right afterwords.

Also note calls to imp.load_source need to change to imp.load_compiled, and any .py files references directly in code need to be changed to .pyc (this is with 2.7, not sure about 3.x)


Thanks, I will try that!


cool!


I think design choices made in Python simply don't allow for comprehensive ahead of time compilation. For what it's worth, they have recently landed snapshots in Dart that do what you want:

https://github.com/dart-lang/sdk/wiki/Snapshots

It's what Flutter uses on iOS since you can't run JITed code; AOT compile it and load it as just another shared library.



also it’s not static linked, so you need to make sure all of the shared libraries exist on the host, requiring to install a whole bunch of trash.


Here's what has worked for me:

1. Don't do that. Either write the driving app in Python or write the subprocesses in an ahead-of-time compiled language. Python's a great language but it's not the right tool for everything.

2. Be parsimonious with the modules you import. During development, measure the performance after adding new imports. E.g., one graph libraries I tried had all its many graph algorithm implementations separated into modules and it loaded every single one of them even if all you wanted to do was to create a data structure and do some simple operations on it. We just wrote our own minimal class.


> Don't do that. Either write the driving app in Python

Even if you write the driver in Python, you don't necessarily want to call the program you're testing in the same process. You might want independent launches of a command-line tool, so that you test the same behavior people get when they run the tool. Otherwise, your test suite might trip over some internal state that gets preserved from run to run in ways that command-line invocation wouldn't.


Good point, but I didn't mean to sound specific to testing apps. I just meant, in general, write big apps using Python top-down and something precompiled if you must spawn lots of external processes.


I've definitely seen significant improvements with #2. Unfortunately, it's not very Pythonic to tuck your imports into functions (or under conditionals). It would be nice if imports were more lazily evaluated.


The slow startup combined with the general lack of interest of the Python ecosystem to try to find a solution for distributing self contained applications was the biggest reason we ended up writing out CLI tool in something else even though we are a Python shop.

I'm really curious why there hasn't been much of a desire to change this and it even got worse as time progressed which is odd.


One "simple" thing that could be done is to make it easier to build python statically, and improve the freezing toolchain.

When I used to care about the problem, I looked into it (https://cournape.wordpress.com/2015/09/28/distributing-pytho...) and got somewhere. It improves somewhat startup times, and allow distribution of a single binary.

Lots of libraries are terribly slow to import (e.g. requests), but right now there is little incentive to fix it as there is no toolchain to build good, self-contained python CLI apps.


I've written a whole bunch of CLI tools over the years and maintained some I didn't author originally, I always found it annoying how slow these are. A CLI tool for some larger project can easily take a second just to display --help, or a command line parse error. Tests running against it can be made to progress faster (no forking, no separate interpreter, in-process capture etc. which brings a lot of complexity and subtle differences and error persistence/"tainting" of the execution environment), but still you might only get a few CLI invocations per second per core.

Theses experience are a major turn-down from Python for me.


Indeed this is a long-standing issue with Python.

LWN gave some excellent coverage late last year, in this piece:

https://lwn.net/Articles/730915/


Sure there has been desire to change this. It's a hard problem, and there are tradeoffs.


It’s only a hard problem if there is no desire. The slowdowns for the interpreter startup did not happen because they are necessary but because site.py and friends just do more stuff now and a lot of important internal tooling became unnecessarily complex.


Yeah, this whole "importlib in Python" thing continues to mystify me.


It would be okay, if Python would support a low level faster subPython within itself. RPython for example.


> it even got worse as time progressed which is odd.

Quite the contrary, as I stated in my other comment, we now have nuikta.


Parents means got worse in official Python releases.

Not what some fringe tool can or cannot do.


This is disappointing to me too, but I think there are some problems baked in to the language that make it hard.

- Imports can't be parsed statically.

- Startup time has two major components: crawling the file system for imports, and running all the init() functions of every module, which happens before you get to main(). The first is only fixable through breaking changes, and the second is hard to fix without drastically changing the language.

The import code in CPython was a mess, which was apparently cleaned up by importlib in Python 3, through tremendous effort. But unfortunately I think importlib made things slower?

I recall a PyCon talk where as of 3.6, essentially everything about Python 3 is now faster than Python 2, EXCEPT startup time!

This is a shame, because I would have switched to Python 3 for startup time ALONE. (As of now, most of my code and that of my former employer is Python 2.) That would have been the perfect time to address startup time, because getting a 2x-10x improvement (which is what's needed) requires breaking changes.

I don't think there's a lack of interest in the broader Python community, but there might be a lack of interest/manpower in the core team, which leads to the situation wonderfully summarized in the recent xkcd:

https://xkcd.com/1987/

FWIW I was the one who sent a patch to let Python run a .zip file back in 2007 or so, for Python 2.6 I think. This was roughly based on what we did at Google for self-contained applications. A core team member did a cleaner version of my patch, although this meant it was undocumented until Python 3.5 or so:

https://docs.python.org/3/library/zipapp.html

The .zip support at runtime was a start, but it's really the tooling that's a problem. And it's really the language that inhibits tooling.

Also, even if you distributed self-contained applications, the startup time is not great. It's improved a bit because you're "statting" a zip file rather than making syscalls, but it's still not great.

In other words, I have wondered about this "failure" for over a decade myself, and even tried to do something about it. I think the problem is that there are multiple parts to the solution, the responsibility for these parts is distributed. I hate to throw everything on the core team, but module systems and packaging are definitely a case where "distributed innovation" doesn't work. There has to be a central team setting standards that everyone else follows.

Also, it's not a trivial problem. Go is a static language and is doing better in this regard, but still people complain about packaging. (vgo is coming out after nearly a decade, etc.)

I should also add that while I think Python packaging is in the category of "barely works", I would say the same is true of Debian. And Debian is arguably the most popular Linux package manager. They're cases of "failure by success".


> The import code in CPython was a mess, which was apparently cleaned up by importlib in Python 3, through tremendous effort. But unfortunately I think importlib made things slower?

AFAIK importlib is entirely written in Python and kinda portable across Python implementations, while previously most was C code. It's not surprising something gets slower when written in Python.

> Also, even if you distributed self-contained applications, the startup time is not great. It's improved a bit because you're "statting" a zip file rather than making syscalls, but it's still not great.

PyQt applications on Windows typically take two or more seconds before they can do anything, including Enterprise's favourite start-up pastime, splashscreens. Except maybe if you rolled your own .exe wrapper that displayed the splash before invoking any of the Python loading.

That's really, really poor in the age of 4 GHz CPUs from the factory, RAM big enough to fit multiple copies of all binaries on a PC and SSDs with at the very least tens of thousands of IOPS.


Yeah the time it takes is really mind-boggling if you think about it. I recently had occasion to run Windows XP in a VirtualBox on fairly underpowered Macbook Air.

It not only installed really fast, but at runtime it was fast and responsive! And so were the apps! Virtualbox recommends 192 MB of RAM for Windows XP, and it works fine. Amazing. Remember when everyone said Windows was slow and bloated?

On the other hand, I tried compiling Python 2.7 on a Raspberry Pi Zero, which is probably around as fast as the machines at the time of XP (maybe a little slower). This was not a fun experience!

Actually I just looked it up, and the Pi Zero has 512 MB of RAM. So in that respect it has more power. Not sure about the CPU though... I think I ran Windows XP on 300 Mhz computers, but I don't remember. Pi Zero is 700 Mhz, but you can't compare clock rates across architectures. I think they're probably similar though.

---

FWIW I think importing is heavily bottlenecked by I/O, in particular stat() of tons of "useless" files. In theory the C to Python change shouldn't have affected it much. But I haven't looked into it more deeply than that.


IIRC the foundation originally compared the RPi's CPU to a Pentium II running at 266 MHz, which seems about right to me.

IME/IMB startup is almost always CPU bound (to a single CPU thread, of course). Note that the Linux kernel also caches negative dent lookups, so these "is there something here?" stat()s will stay in the dentry cache.


> splashscreens; exe wrapper

I was convinced that IDEA/Eclipse and other JVM-based things used the .exe launcher just for that (the loading screen). But have never decompiled their .exe to verify it :)


EDIT: I should also add that the length of PYTHONPATH as constructed by many package managers is a huge problem. You're doing O(m*n) stat()s -- random disk access -- which is the slowest thing your computer can do.

m is the number of libraries you're importing, and n is the length of the PYTHONPATH.

So it gets really bad, and it's not just one person's "fault". It's a collusion between the Python interpreter's import logic and how package managers use it.


You've characterized the problems well. And yes this is a core problem for python - startup time and import processing is limiting in a lot more cases than just CLI tools. And yes the design of the language makes it hard or possibly impossible to solve.

Also, even if you distributed self-contained applications, the startup time is not great. It's improved a bit because you're "statting" a zip file rather than making syscalls, but it's still not great.

Exactly. There is no silver bullet. The problem is how much code gets run on startup, and how Python's dynamic nature makes traditional startup speedup strategies impossible. Is this even fixable?


I don't think it's fixable in Python unfortunately. As someone else pointed out, the fact that it got WORSE in Python 3, and not better, is a bad sign. Python 3 was the one chance to fix it -- to introduce breaking changes.

As I mentioned, this problem has bugged me for a long time, since at least 2007. Someone else also mentioned the problem with Clojure, and with JIT compilers in general. I'm interested in Clojure too, but my shell-centric workflow is probably one reason I don't use it.

In 2012 I also had the same problem with R, which starts even more slowly than Python. I wrote a command line wrapper that would keep around persistent R processes and communicate with them. I think I can revive some of my old code and solve this problem -- not in Python, but in the shell! Luckily, I'm working on a shell :)

http://www.oilshell.org/

In other words, the solution I have in mind is single-threaded coprocesses, along with a simple protocol to exchange argv, env, the exit code, and stdout/stderr (think CGI or FastCGI). Coprocesses are basically like servers, but they have a single thread to make porting existing apps easy (i.e. turning most command line apps into multi-threaded servers is nontrivial).

If you're interested I might propose something on Zulip. At the very least I want to dig up that old code.

http://www.oilshell.org/blog/2018/04/26.html

I think it's better to solve this problem in shell than Python/R/Ruby/JVM. There's no way all of them will be fixed, so the cleaner solution is to solve it in one place by introducing coprocesses in the shell. I will try to do it in bash without Oil, but it's possible a few things will be easier in Oil.


> introducing coprocesses in the shell

I did this with bash and Python a few years ago when I learned about the "coproc" feature (which, by the way, only supports a single coprocess per bash process, unless I misunderstood it).

But it turns out I tend to open new terminal windows a lot, which meant that the coprocess needs to relaunch all the time anyway, so it wasn't very useful. Even if I start it lazily, to avoid slowing down every shell startup, most of my Python invocations tend to be from a new shell, so there was no real benefit.

Maybe if you have a pool of worker processes that's not tied to any individual shell process, and connect to them with by a Unix-domain socket or something...


Hm yeah I was hoping to do it in a way that's compatible with unmodified bash, but maybe it will only be compatible with Oil to start.

Basically I think there should be a "coprocess protocol" that makes persistent processes look like batch processes, roughly analogous to CGI or FastCGI.

I thought that could be built on top of bash, but perhaps that's not possible.

I'll need to play with it a bit more. I think in bash you can have named multiple coprocesses with their descriptors stored in an array like ${COPROC[@]} or ${MY_COPROC[@]}. But there are definitely issues around process lifetimes, including the ones you point out. Thanks for the feedback.


I looked into the issue with multiple coprocesses in bash again to make sure. While you would naturally think you could simply give them different names, it's unfortunately not supported:

https://lists.gnu.org/archive/html/bug-bash/2011-04/msg00059...

Bash will print a warning when you start the second one, and the man page explicitly says at the bottom: "There may be only one active coprocess at a time."


Best out of 5 times on my Debian testing laptop for a "hello world", in order of worst to best:

    ruby2.5:     83ms (-e 'puts "hi"')
    python3.6:   35ms (-c 'print("hi")')
    python2.7:   24ms (-c 'print("hi")')
    perl5.26.2:  8ms  (-e 'print "hi"')
    C (GCC 7.3): 2ms  (int main(void) { puts("hi"); })


35ms for Python is ok. What we see in reality is that the imports that a real application will use, adds a whole lot more time.

For example, if you want a snappy command line response for a Gtk-using Python program, you probably want to handle command line arguments before even importing Gtk. Maybe it is --help or an argument that you pass on to another running instance, and you want it to be absolutely snappy and fast.


I have read that conditional imports are "un-pythonic", but I tend to do exactly that in order to keep resource usage lower.


  $ time ruby --disable-gems -e 'puts "hi"'
  hi

  real    0m0.009s
  user    0m0.008s
  sys     0m0.000s


Sure, two can play that game. Let's add `-S`, which disables the site module, to the Python invocations.

    perl ........... 0m0.012s
    siteless py27 .. 0m0.018s
    gemless ruby ... 0m0.021s
    siteless py36 .. 0m0.025s
    siteful py27 ... 0m0.034s
    siteful py36 ... 0m0.049s
    gemful ruby .... 0m0.089s


Gotta love the lie that Ruby is slow.


> C (GCC 7.3): 2ms (int main(void) { puts("hi"); })

Not really a fair comparison given the other 3/4 have to do all their parsing and compiling. Unless in those 2ms you include compilation time. Or use tcc -run.


The user doesn't care, they just invoke "hg" or "git", and language is always a choice, so it's valid from that perspective.

But the reason I included it is because it gives a baseline for the overhead of invoking any program, no matter how trivial.


I must say even 2 ms feels rather slow just to execute something hot in cache.


  in the temple of tmux
  for the cult of vi
  we sit and wait 
  for venv to activate


Given it is known how slow Python at starting up, I am puzzled why Mozilla continue to use it in build scripts. Perl is just as portable but starts up like 10 times faster.


I wrote the linked post and maintain the Firefox build system. The reason is that in 2018 (and for the past 10 years honestly) and it is far easier to find people who know Python than Perl. Python is essentially the lingua franca in Firefox land for systems-level tasks that don't warrant a compiled language. As I said in the post, Rust will likely infringe on Python over time due to performance, lower defect rate, long-term maintenance advantages, etc.


>As I said in the post, Rust will likely infringe on Python over time due to performance, lower defect rate, long-term maintenance advantages, etc.

Indeed. As hg is moving to use more and more Rust:

https://www.mercurial-scm.org/wiki/OxidationPlan


indygreg knows because he is the author of that wiki page. :)

https://www.mercurial-scm.org/wiki/OxidationPlan?action=info


I didn't realize that so thanks for pointing it out. But sometimes I also comment for third parties reading along to pick up some info as well.


Can't wait for a static, dependency-free hg, but sadly they aren't going to rewrite it entirely in Rust, just the speed sensitive parts. I don't care if it's written in Rust, D, Go or even C. When we picked fossil over hg and git, the Python dependency made us shy away from hg, although the CLI and the overall experience was better than git. Now we also fossil as a deployment tool; it can be a statically compiled drop anywhere binary.


I would say that there are no immidiate plans. But as libraries get portet to rust it might just make sense to start writing more features in mostly rust.

Basically I have some hope that eventaully rust will take over, however writing the core components is a critical first step and selling it for the immediate benifits is much easier then selling a complete rewrite.


> Rust will likely infringe on Python over time due to performance, lower defect rate, long-term maintenance advantages

No love for Haskell[0]? It does look like the best system scripting language out there right now... I have just never tried it to be sure :(

0: http://www.haskellforall.com/2015/01/use-haskell-for-shell-s...


I imagine there are two aspects to this, they probably started in python and have a lot of it already, and it's probably easier to gen new folks involved which I think is one of their goals.


I've built firefox from source. Python start up time is not really a problem, it's so long to build anything anyway.


I imagine a lot of the pain is for incremental builds where the build system overheads can matter a lot more.


Compile time is absolutely dominated by c++ (and now rust) compilation and linking. I doubt build system language choice will ever bubble up to relevance, so why optimize for it?


Perl is “just as portable” in the same way that a motorcycle can just as easily drive under a steamroller... it’s not gonna be pretty and there’s no easy way out if you do it.


I write Perl scripts for Windows and Linux, and I don't find portability to be especially onerous. Of course there are platform differences to keep in mind, but is that any different from any other cross-platform scripting language?


Build scripts, especially for a C++ app like Firefox, are a place where "slow startup times" are totally irrelevant.


Did you read the link we're discussing? It's in large part about why the slow startup times for Firefox's build scripts are a problem.

> Changing gears, my day job is maintaining Firefox's build system. We use Python heavily in the build system. And again, Python startup overhead is problematic. I don't have numbers offhand, but we invoke likely a few hundred Python processes as part of building Firefox. It should be several thousand. But, we've had to "hack" parts of the build system to "batch" certain build actions in single process invocations in order to avoid Python startup overhead. This undermines the ability of some build tools to formulate a reasonable understanding of the DAG and it causes a bit of pain for build system developers and makes it difficult to achieve "no-op" and fast incremental builds because we're always invoking certain Python processes because we've had to move DAG awareness out of the build backend and into Python. At some point, we'll likely replace Python code with Rust so the build system is more "pure" and easier to maintain and reason about.


>Did you read the link we're discussing? It's in large part about why the slow startup times for Firefox's build scripts are a problem.

The link we're discussing is about Python runs for the Mercurial test suite. The Firefox build is mentioned in passing. It's not what the post is discussing (and doesn't get into numbers there).


You claim that it is “totally irrelevant”. A Firefox maintainer, even in passing, claims otherwise.


Maybe not totally irrelevant, but it's in most cases not a factor compared to build times.

For tests, yes.

I can't fathom what Mozilla does for thousands of Python invocations during build. Maybe it's test runs as well?


It doesn't get into the numbers, but it expressly gets into the problems.


Naive question: If the startup time matters because you're imposing that startup time hundreds or thousands of times - why not remove the startup time?

I'm saying, use the emacs model. Start hg with a flag so it simply keeps running in the background while listening on a port. Run a bare-bones nc script to pipe commands to hg over a port and have it execute your commands.

This isn't a new problem, nor is it even a new solution. No complete re-write of the interpreter or the tool required.

Anyways, that's my 2¢


There's a paragraph in the OP about how they've actually done this:

> Mercurial provides a `chg` program that essentially spins up a daemon `hg` process running a "command server" so the `chg` program [written in C - no startup overhead] can dispatch commands to an already-running Python/`hg` process and avoid paying the startup overhead cost. When you run Mercurial's test suite using `chg`, it completes minutes faster. `chg` exists mainly as a workaround for slow startup overhead.

Just like this isn't what the usual `emacs` command does (it's `emacsclient`), it isn't what the usual `hg` command does either. There are some disadvantages to this solution and some assumptions it makes, which have apparently led the Mercurial maintainers to conclude, like the Emacs maintainers, that it won't work as the default. Hence the desire for solutions that will.


I hate to admit it but it's partly why I don't use clojure (pardon the side-topic) more. I can't bear the boot process and the overall cost.

Python is free to tinker, and all similar interpreters are joyful to use. Anything else is probably better for heavy duty jobs environments.


I feel the same way about Clojure. For a LISP, where interactive development via the REPL is supposed to be one of the value-add of the language, it falls completely short in that aspect. They even have entire libraries and design patterns (Component, etc.) to work around the issue, but I find it ridiculous that your entire program structure is dictated by the fact that the REPL boot up time is too damn slow.


It's the main reason I don't use Clojure. I was so excited to learn a modern Lisp. Got an my tools working and wrote my first cli app. Horrendous load time. I realised it's really only suitable for long running processes and I never do that sort of thing so can't use it.


Knock, Knock, who's there? ---- Long Pause --- Java!


Python is great for prototyping or even real apps if performance isn't so critical. However, more than once I've found myself in the situation where I wrote a bunch of Python code and then end up starting that code up from another app, just like the thread discusses and I immediately feel like this is an anti-pattern.

What's even more annoying is that my Python code usually calls a whole lot of C libraries (OpenCV, numpy, etc.) So it's like this: app->OS process->python interpreter->my python code->C libraries. That just really feels wrong so I'd like two things:

1) better/easier path to embed python scripts into my app e.g. resident interpreter

2) some way of passing scripts to python without restarting a new process, this may exist and I'm unaware


Startup time has also been the biggest gripe I have with Julia so far. Otherwise it's a truly fantastic language to work in. I wasn't able to put the `__precompile__()` function to good use it seems - the time it takes to execute my program didn't change at all for some reason. Or maybe it's not actually the startup time that caused the problem, but the time it took to perform file IO. Anyways my program now takes even much longer time to startup than the Python equivalent (though it runs much faster once started), which is a real disappointment.


precompile doesn't store native compiled code. Though I know from talking to the compiler developers that this is high on the 1.x list. It's an annoyance but at least it has a clear solution in sight.


Truly solving this problem is difficult, but you can hack around it with a zygote process to remove a substantial amount of overhead, in exchange for RAM. While this is generally more of win for server processes, you can see it applied to a CLI proof of concept:

https://github.com/msolo/pyzy


I agree Python's startup time is too slow. But one trick you can use to improve it some is the "-S" flag, which skips site-specific customizations. On my Ubuntu system it brings Python 3.6 startup time down from 36ms to 18ms for me; still not great, but it helps.

The drawback is this may screw up your Python environment, not sure how easy it is to work around it if it does.


Proposed solution: steal undump from emacs. https://news.ycombinator.com/item?id=13073566

Perhaps it would be possible to read in the source files, compile them, and preserve an image of the state immediately before reading input or command line.


I'm pretty sure Python 3 already does this, and that's what the __pycache__ directories it creates when running a command are for.


Those are only bytecode. It helps a little but you still have to load the file from the filesystem and run it on import.


I was kind of amazed how penalized a script could be by collecting all its “import” statements at the top. Once somebody’s command couldn’t even print “--help” output in under 2 seconds, and after measuring the script I told them to move all their imports later and the docs appeared instantly.


I'm a long time python user, but never really peeked under the hood. However, I have a few ideas.

Optimized modules loading: maybe loading a larger 'super' module would be faster than several smaller ones? For example a python program could be analyzed to find it's dependent modules, and then pack all these into a 'super' module.

Once the python program executes, it would load the single 'super' module and hopefully bypass all the dynamic code which each module runs when imported to load up.

As mentioned previously, this is just off the top of my head and would certainly warrant more investigation/profiling to confirm my hypothesis.


I'm pretty sure it's too late by now for Python, but I've had some success with compiling C-based interpreters [0] to C; that is, generating the actual C code that the interpreter would execute to run the program. That way you can reuse much of the interpreter, keep the dynamic behavior and still get nimble native executables.

[0] https://github.com/basic-gongfu/cixl#compiling


Should be able to hot boot the VM with the right tooling. You can reuse HPC "checkpoint" code from supercomputing environments as a generic hammer for Python/Ruby/JVM. Some Russians figured out how to do it in userspace without a kernel mod: https://criu.org/Main_Page


People here comment about how python is slow, but even fast/slow is I'll defined in my opinion. You don't see people hacking tensor flow (generally) in native languages to speed it up, they just enable CUDA. I'm imagining fast definition is limited to massively parallel server workloads with io.


Reminds me of buildout. It's awful piece of software. We used in previous Flask project, and a simple flask shell takes 3 minutes to start. If you type `import` in CPython shell it will literally freeze for a few seconds. Because it injects one sys.path for each packages specified!!!


I'm just curious why more people don't make use of chg to avoid the mercurial startup time. It seemed to solve it for me - are there drawbacks?


Isn’t that really a just a bandaid over the real problem though?

The fact the developers of hg went so far as to make that shows startup time is a real issue.

So why not fix the problem at the source?


Indeed, it seems like a perfectly good solution to me. I guess it's something about purity and not being the perfect solution. Wouldn't it be great if python was as fast as a C program that took many times longer to write? Yes, but that would probably be magic.


At a guess: They didn't hear about it (keeping your ears open is a cost not everyone wants to pay). They don't want to bother with setting it up. They don't want to bother with maintaining it (even if it's as simple as reinstall every time you get a new computer).


That's fair. My experience with it so far has literally just been aliasing hg to chg. It performed all the magic in the background for me.


A recent article in ACM Queue included an off-hand remark that Go's compile time is often faster than Python's startup time. Just sayin'


Would it be feasible to keep a set of Python interpreters around at all times and use a round robin approach to feed each already-on interpreter commands then perform an interpreter environment cleanup out-of-band after a task is complete?


The Java ecosystem had this with Drip and I think it turned out to not be a great idea in practice - the magazine of VMs get exhausted when you don't want it to, they get into odd states and other things I think, can't quite remember.


Or just use the operating system's `fork` system call?

There's also nailgun for Java which sounds like it works a little differently: http://martiansoftware.com/nailgun/


I guess a fork()'ed process triggers copy-on-write behavior in the kernel once the process starts running. So that's latency (the copying) you could still optimize away.


A common solution for web application servers ("preforking").

The idea of keeping persistent interpreters doesn't really work for Python because the interpreter is full of state in places you'd never expect -- it's hard to reset the interpreter to a sane state after it ran some unknown program.


I may be wrong, but I would bet that copy-on-write of pages would be hardly visible for most workloads. Copying is quite fast when you do it in batches (4k per page).


You might want to measure it before you optimize it! Oftentimes I find that forks where I don't write much are quite inexpensive, with little COW action.


You could be right.

Also, could it be that Intel's cache hierarchy plays some really smart tricks behind the scenes, to make this fast?


Kind of like this?

https://github.com/tbug/aiochannel

I also think David Baezlys curio has a wonderful way of explaining the same concept

http://curio.readthedocs.io/en/latest/reference.html#module-...

Asyncio has enabled his sort of programming a lot easier. You could do the same thing I imagine with multiprocessing and Threadpools I imagine


Yes but with the added complexity and resource usage it's not a good general solution. If every app behaved this way we'd be in a worse place overall.


I imagine this could be handled by some kind of "fork". Where you instantly duplicate the whole process with copy-on-write.


What would be really nice is checkpoint and restart (i.e., unexec), but it turns out that it's extremely hard to implement and get right in a non-managed environment.


Slowness is the elephant in the room in Python land. It's like everybody has decided to cover their eyes in front of this massive pachyderm. A massive delusion


Delusion? I don't think many cover their eyes. More likely they've come to accept that for their use cases the performance is good enough and the convenience gain well worth it.


It's weird to see someone make this pitch when C systems software development regularly requires us to try and shave off microseconds. Millisecond delays mean you've already fucked up.


For use cases where performance is important, using an interpreted (implementation of a) language is a bad idea.

There are many great reasons to use Python, but execution speed is not one of them.


Performance and startup performance are really seperate things.

For instance, for many many CLI tasks a python script will be many times faster than a Java tool, just due to the JVM startup. It doesn't really matter if the Java would even run INSTANTLY... the JVM startup time just kills speed for small CLI invocations.


> It doesn't really matter if the Java would even run INSTANTLY... the JVM startup time just kills speed for small CLI invocations

luckily these problems actually do get addressed slowly via AoT/Graal and Substrate VM.

Here comparing a simple hello world program one is written in java and uses Substrate VM to create a binary and compare it with python:

$ time ./hello.py hello world!

real 0m0.041s user 0m0.017s sys 0m0.023s

$ time ./hello.main hello world!

real 0m0.019s user 0m0.008s sys 0m0.010s

of course the comparsion is unfair


Shell is an interpreted language and its startup time is quite fast (5-7 ms on my machine, which is not a particularly fast machine).

In fact, large parts of git were written in shell until they realized that shell is only fast on UNIX because of co-evolution (you can fork without exec, and fork is quite fast), and on other platforms like Windows, existing shell implementations are much slower and there isn't a well-tuned production-ready shell that does things completely differently. Then they started rewriting everything in C.


performance != startup time for lots of applications.

I use a C curses application (dokia) to store some oft-used commands, but anything that won't be run 500x/day or runs longer than ~1/10th of a second I'll write in python for easier/more powerful development


dokia

Google has apparently never heard of this?


https://github.com/skamsie/dokia

Random github repo I found, I modified it to default to paste and not prepend 'cd' and find it convenient for complex bash strings I use once or twice a day every day.

edit

Whups, the link 404s now (i put link in a personal wiki once I realized I liked it). I could put it on github to share if anyone wanted, but I would first ping original author (skamsie) about why he made it private (or deleted) out of respect.


Most of this email thread is comparing Mercurial (Python) to Git. I'm not familiar enough with Git's internals to know why and where the languages are split, but it uses a significant amount of Shell scripting and Perl it its code base. You can put 'git-foo' anywhere on PATH and it'll get picked up. So in the comparison they're making startup time doesn't seem to be an issue for a combination of those languages, but it is for Python. It doesn't sound like their problem is that Python is an interpreted language.


Git is actively rewriting many of their shell and Perl code in C. Performance and portability are given as reasons (having shell and Perl as a dependency on Windows is a bit of a kludge). And shell scripts are much slower on Windows because new process overhead on Windows is ~10x what it is on POSIX platforms. (New threads, however, are faster to create on Windows than on POSIX.)


Why shouldn't the interpreted program start faster? Bytecode is usually at least 2x denser than machine code, so all things being equal, when starting an interpreted program, you should be doing less IO, take fewer page faults, and so run faster, at least if you defer computationally-intensive work to specialized AOT-compiled helpers.

That interpreted programs frequently start slower than their compiled equivalents reflects badly on interpreter implementations, not the concept of interpretation itself.


"There are many great reasons to use Python, but execution speed is not one of them."

Um, I think there are lots of examples where using pythons internal data structures as they were designed results in code that is fast enough.

Even though the language implementation is interpreted, lots of common things can be optimized under the hood using data structure and data type specific execution paths and so on.


Maybe you didn't read the post, but in the use cases specified, performance is not important--the author is mostly speaking about deploy and test scripts. Whether these take 1 minute or 10 is not particularly interesting, but you would of course prefer faster if possible. That's the point here--of course a faster Python interpreter is better, and the Python maintainers should place a higher priority on it than the very low priority that they currently do.


But what else do you switch to?

Imagine you import tons of modules which often are only available in Python. This gets you going really quickly with your project and it runs very smoothly. Transferring this to C++ would probably take so long you won't even finish to find out before you run out of funding.

I have hopes that Rust or some descendant of Rust will get us there in maybe 10 years but in the meantime it would be better to get Python up to speed as good as possible.


If Python had a 10th of the funding JS has, we would have start up time, packaging, gui and mobile apps solved by now.


For some use cases, Go might be a good alternative to Python. It's performant, yet simple and readable and it has a great ecosystem.


I find laying out a go project with dependencies to be miserable though. I write a lot of go and python code and why on God’s green earth did google decide to handle dependencies by having you effectively git clone a library and then have these huge tree deps subdirectories and so on I don’t know. It’s mess. I vastly prefer the approach of either having one canonically designated folder to install dependencies to where each dependency is a top level directory that can be scanned or having them all stored in a project folder relative to root of the workspace similar to node_modules then the current mess.

Drives me nuts. Look at this layout if you don’t know what I’m referencing:

https://golang.org/doc/code.html#remote

Yes I have used dep

And yes I have all kinds of shortcut commands for navigating my Go workspace

But look at this canonical example From the docs and it’s easy to see this is a giant mess that is utterly unnecessary. No other language I’ve used has had such a gross problem with dependency layout. It also leads to gross import strings.

It’s one of my biggest criticisms of Go to be honest.

Thst and it’s reliance on environmental variables that have to be set perfectly in order to actually do anything (thank god for direnv https://direnv.net/)


Isn't what a dependency folder like node_modules pretty much what the vendor folder is to go. Have you tried using a dep mangement tool like glide before?


Thst is one I haven’t tried I tried a few others whose names I forgot because most alt managers seemingly stopped development but this looks active. Thank you!

I now would shift my argument to the fact that Google should really just adopt this as their standard if it works as advertised.

Link for those who haven’t seen it:

https://glide.sh


Personally I found Go almost uniquely UNsuited to Python-style explorative programming. The lack of generics, and the high-ceremony error handling are pretty much 180 degrees from python.

While of course it doesn't have backing from a giant multi-nation, I think Nim (https://nim-lang.org/) is a much better fit for "python, but fast".


Only if you are an experienced programmer. But a lot of great python tools are created by mathematicians, geographers, biologists, students, sys admins, etc.


I agree, but in this case I meant especially command line tools, such as mercurial. Of course, for scripting experiments etc. Python is the better choice, but these applications are normally not too worried about startup time.


Do we have numpy, matplotlib, scipy, pickle etc. for Go?


Do the build processes mentioned in this email make heavy use of numpy, matplotlib, scipy, pickle, etc.?

NumPy is very exciting and all, but it's a subcommunity of the Python community, not 90% of its usage. NumPy users are probably not, in general, trying to spawn twenty five thousand processes in sequence to accomplish some task. The people who are complaining about fractions of a millisecond of startup time are not inverting massive matrices.


People who are importing numpy usually are also not inverting massive matrices. These libraries provide tons of functionality that you would like to use in a short script. For people accustomed to these libraries the code is quick to write in a reliable fashion, easy to read but not quick to run. Everything is great except the startup time (and speed when you have more data).


This group of people is so precise and exclusive that it is irrelevant. "People using NumPy for system scripting who are intensely sensitive to python startup time" is not a large enough group to be trying to argue Python-global policy with.

"People using Python for system scripting who are intensely sensitive to python startup time" is at least large enough to be worth talking about (since speeding up startup time will mostly only help, modulo any possible resources spent to accomplish it), though I'd notice that it hasn't prevented Python from becoming very popular. And plenty of them will find that Go could meet their needs, in a hypothetical universe in which switching languages was free. (That is, I'm not particularly advocating it. It's a last resort for sure.)

(Also this argument is predicated on the false assumption that Go has nothing like those things. They aren't as mature by any means, of course, and I generally consider them a bad idea [2], but they do exist.)

[2]: https://news.ycombinator.com/item?id=16959022


So precise and exclusive that it's irrelevant?

Google, Facebook, Netflix, Uber, Amazon, Microsoft are all using Numpy in their data science pipelines, spinning and taking down dockers for ML-as-a-service. I'm pretty sure they care about startup time of both Go and Python.


Again, read the article for what the topic of conversation is. If you're "spinning up an entire Docker container", a Python startup time is going to disappear into the multiple seconds that already takes. You are not spinning up several hundred docker containers per second, on a sustained basis for hours at a time, on a single piece of hardware, constrained only by Python startup time. That's going to be a vanishing fraction of the problem, even if you are spinning up that many containers that quickly, and the optimization for that is already obvious (don't do that, do more per container).

You are conflating what systems scripting is, which would be what would be managing the docker containers themselves, with what the docker containers would be doing, which would be very likely starting up just one Python instance to "do the thing". I don't imagine there are very many systems scripts out there in the world being started dozens of times per second that use NumPy. Anything that did, again, the obvious optimization would be "don't do that".


IIRC CPython devs reject performance-related patches if they cause the code to become "less readable".

>> I believe Mercurial is, finally, slowly porting to Python 3.

I just gave up on Mercurial since it didn't let me push to BitBucket nor to an Ubuntu VPS via SSH.

For better or worse, Git just works.


I'm confused, since my daily workflow is pushing to Bitbucket via hg and ssh.


Imagine my confusion back then.

I could push fine to BitBucket if I used Python 2 version locally, and same for my VPS if I used the Python 2 version both locally and remotely.

But as soon as I touched the Python 3 version of Mercurial the pull/push problems began. I don't recall the exact error and maybe it's fixed now (this happened like 6 months ago), but I don't think I'll give it another try for some time.


My work is considering switching to Git mostly because we think adopting Bitbucket will force us to. Is that not true? I'd love some reasons to stay with hg...


Bitbucket was originally for hg though. Why would switching to git be better?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: