Hacker News new | past | comments | ask | show | jobs | submit login
Compile Python applications into stand-alone executables (pyinstaller.org)
138 points by privacyonsec on Dec 4, 2021 | hide | past | favorite | 89 comments



IIRC a big issue with PyInstaller is that the built executables are really self-extracting archives, which have to write many files to disk before they can run. Compared to a real compiled executable, it’s slow and inelegant.

Unfortunately high-quality bundling into executables just isn’t a focus of Python (nor of any other high-level language). Personally, I’ve gone back to C++ for building command-line apps - as a developer I’d much rather be writing Python, but that’s no good if I can’t actually deliver software to users.


> Unfortunately high-quality bundling into executables just isn’t a focus of Python (nor of any other high-level language).

You seem to be ignoring Tcl, which solved this problem about 20 years ago with Starkits (https://wiki.tcl-lang.org/page/Starkit) and Starpacks (https://wiki.tcl-lang.org/page/Starpack).


> Unfortunately high-quality bundling into executables just isn’t a focus of Python (nor of any other high-level language)

Wouldn't Dart's [1] and Go's executable support qualify as such (high levels languages and good executable support)?

For example, Sass is also using now Dart [2].

[1] https://dart.dev/tools/dart-compile

[2] https://sass-lang.com/dart-sass


Let me know once dart or go can run a wide ecosystem of libraries and middleware that apply to almost any situation. There's a reason people still use languages like C++, Python, and Java despite them being objectively bad in some respects. Without mainstream adoption you've got a chicken and the egg problem.


Sure, here's your reminder that Go supports a wide ecosystem of libraries and middleware that apply to almost any situation.


I think you're missing my point here. Let's say you're given a pytorch model. You can either run it with native python or grab the golang bindings that are used by 5 people, and if you get stuck there, god help you. You'll be hitting obscure bugs in no time.

Even if something officially supports multiple languages, the obscure ones will be usually get inferiorly implemented bindings just because they're not used as much and will consequently not be as tested and as patched as the main ones. It's unfortunate but that's life I guess.


> I think you're missing my point here. Let's say you're given a PyTorch model. You can either run it with native python or grab the golang bindings that are used by 5 people, and if you get stuck there, god help you. You'll be hitting obscure bugs in no time.

To use a Python-specific framework you're better off using Python. Well duh.


Pytorch, despite the name, is really a C++ library with well developed Python bindings. There's certainly nothing in principle about it that's Python specific.


PyTorch's C++ API is experimental and doesn't have feature parity. PyTorch is a Python lib that heavily uses C++ extensions, but I don't think it has ever been a C++ library.


PyTorch is not a Python binding into a monolithic C++ framework.


That’s one thing I like about Nim. It can compile to C++ and so can directly wrap C++ code. While Nim’s ecosystem is small it’s easy enough to wrap most any C/C++ library. It’s great for OpenCV. Some folks have been updating direct PyTorch C++ api (1)!

1: https://github.com/SciNim/flambeau


These languages face the same challenges as python do once you look beyond compiling pure-<language> to executables. This common misconception is beyond frustrating to constantly have to deal with in these arguments.


I haven't used Dart, but I have 15 years of experience with Python and 10 years with Go. Go absolutely solves many of Python's problems including performance, single-native-binary compilation, dependency management, a lackluster-at-best static typing system, and many others. If you're an experienced Python developer, then you're used to announcements that promise a lot and utterly fail to deliver (consider all of the different package management "solutions", alternative runtimes, etc), but Go really does what it says on the tin.


Could you provide an example of what you mean? It seems obvious that using a non-Go library with Go would be more complicated, but is that not also the case with C or C++? Do they have some special way of using a Go or Python library that Go does not reciprocate?


That's exactly the point. The common misconception that I am frustrated by, is that people compare their experience working on a pure-go project with pure-go dependencies to their experiences working on pure-python projects with non-python (e.g. c , or rust, or anything else really) dependencies.

So to illustrate: if you were to work on a pure-python project with pure-python dependencies, existing tooling (such as pyinstaller, or nuitka, or others) can provide single binary executables, just as easily as you can in go.


pyinstaller is kinda dumb here because it doesn't cache the extracted files; it extracts all of them (which is especially slow on Windows), runs the application from that temporary directory, then deletes it.

At work I've built a simple cross-platform packaging system which avoids issues like this and also separates library components from the main application (the former is rarely changed, the latter is frequently changed). Despite the entire application being around ~25 MB, it starts in around 30-50 ms from a network share on Linux and 1-2 seconds on Windows. On Linux it's just a bash script eventually calling a system-provided python 3 interpreter, on Windows there's a C# .exe which does roughly the same but the package also includes a Python interpreter (which is just the official "Python for embedding into applications" ZIP) which the C# process loads via P/Invoke. The entire thing is less than 1000 lines of Python/bash/C# and works with every package under the sun and apart from the C# stubs, which basically never change, it's supremely easy to "cross compile" because compilation in this system is combining the wheels of all dependencies into a single zip, which also makes builds completely reproducible.


then deletes it

Is that built into python or does the developer call that cleanup? I ask because I recently switched from youtube-dl to yt-dlp on a windows machine and it left multiple temporary directories full of .py files to be cleaned up by other tools. I did cntrl+c out of it at least once. Perhaps I interrupted the cleanup routine? Is the cleanup routine expected to trap kill signals or is that on the developer?


The cleanup code is part of the stub/bootloader, so depending on how the process dies it doesn't run.



- Does not work for native dependencies.

- Does not allow splitting the application up into stable/unstable parts to reduce update sizes.

- Does not directly allow multiple entry points.

- Requires Python to be already installed, or requires bundling Python loosely, which is very slow when used from a network share.

- Uses zipimport under the hood, which is very slow on network shares, and very very very slow on Windows network shares.


You also need to make sure to compile the binaries in the OS version that is close to your target deployment version, I remember had RHEL 6.x Vs RHEL 7 issues due to libc or a similar dependency.


Yes this. Differing glibc versions caused us headaches.


I thought that it is the case only when you bundle into "one file"? If you don't pass the --onefile parameter, all files are in one folder without any archiving, am I wrong?


That’s right - but if you don’t use `-—onefile` you don’t really have a “stand-alone executable” at all.


Of course you have a stand-alone executable. At least if by "stand-alone" you mean a copy you can ship to users. Which is at least the only thing I care about. And tell me which popular (say) Windows software actually comes as a single .exe. They're all many files in multiple folders. And there's no problem with that. You have one installer, which users download, which then extracts the files. Your criticism points out a non-issue.


Correct. I don't use --onefile at all. I go the old-skool way, and package my python Windows app(s) into an installer using Inno Setup.


I've not tried it myself, but heard people having good experience with using nuikita (https://nuitka.net/) for the purpose of building standalone python executables.


I've used it a couple of times. When it works, it's great, but there can be issues with bundling dependencies correctly. As such, I've resorted to using it just with the standard library, or a few 3rd party-modules that are nuitka-friendly.



I love Go for quick and dirty CLI apps. They are a bit bigger than the C app but so much friendlier to develop on. Rust is better for more complex processing, but it is also less straightforward (at least for me).


I say this as a Rust fan and a (sometimes overzealous) promoter; it's just easier to wrap your head around programming in Go. They have made it so easy and enjoyable. It really is a pleasure for quick concepts to table, so to speak.


While not entirely self-contained, `pex` is pretty close. It relies on a Python interpreter in the environment somewhere, but in my experience it successfully packages a venv + code into a single executable "binary" with no extraction stage.


I fully agree. As a close runner up on C++ for bundling into executables is C# in my opinion.


How well does C# work for this cross-platform? Are its executables very large?

I considered using Go for command-line apps - but in my case they often have to interact with existing C++ code. That’s easy enough to do from Python, but I was advised that interop with other languages is an area in which Go is particularly weak.


I think a standalone "Hello World", built with .NET Core, is around 60-70 MB large on both Linux and Windows. The startup times are alright - slower on Windows than on Linux in my experience because of the file system performance.


Lower than this. If I use the AOT compile option with .NET Core 3.1 I get a 40MB binary which goes down to 10MB with the trim option.

Just like Go which embed it’s runtime, .NET Core binary compiled with AOT embed the CLR but, unlike Go, it has options to trim out the pieces that aren’t necessary, which I really like.


I did not know about PublishTrimmed and will try it out. Thank you.


I think this is one of those features that was a bit preview in 5 and you should use the just-released v6 for PublishTrimmed. There are also situations where it doesn't work out of the box (e.g. Winui3!)


> I think a standalone "Hello World", built with .NET Core, is around 60-70 MB large on both Linux and Windows.

Lol what?

What goes into a 70 MB implementation of 'hello world'?


It’s not a 70 MB implementation of Hello world. It’s a 70 MB compressed runtime with a standard library (or at least the most basic parts of it) and a couple KB "Hello world".


Why is the runtime so huge and why is so much of the standard library included?

Java ‘hello world’ compiled into a static binary is just a couple of MB.


The runtime is huge because it includes a very large standard library. Using the .NET linker you can create a "trimmed" binary that only includes the code that actually gets used. A Hello World compiled this way takes 14MiB "Ready to Run" with native code included, or 11MiB without.

For a modern CLI app it's basically that easy. For an older or GUI WPF app it's not. It used to be fashionable in .NET to use lots of runtime reflection, dynamic runtime configuration based on config files, dynamic loading, and runtime code generation. I suppose the language was less featureful back then and people looked longingly at Enterprise Java for some reason. This type of dynamism doesn't work with AOT compilation because the compiler needs to be able to statically determine what code is required. The solution is to remove the dynamism, replace it with compile-time code generation, or annotate the reflection to tell the compiler which types will actually be used. As of .NET 6 most of the standard library has been annotated or otherwise made compatible with trimming[0]. WPF apps will probably never be able to use trimming because that's a huge amount of code and MS has like one intern working on it.

I believe the situation with Java is basically the same? You can't just build a Java 6 Spring app into a small AOT bundle, right?

[0]: https://themesof.net/roadmap?product=.NET&release=6.0&q=trim


See PublishTrimmed: by default it includes all the standard library. "Trimming" is what almost all C-style linkers do by default, but C# doesn't because there are some reflection issues (e.g. you can reference types by strings and construct them at runtime, which would fail in a Trimmed environment unless you explicitly tell it not to trim those types).


Can this couple MB static binary run without JRE?


Well it wouldn't be very 'static' if it couldn't, so yes of course.


Could you point me to where I can learn how to do it?


If you use legacy .NET (which is preinstalled on Windows) a GUI Hello World is about 10 kB.


gRPC is one way to go about it if you have control over both c++ and go. Go binaries work pretty consistently across Linux which is a positive.


I would have thought most command-line users would be content to install with pip.


yes, same for cxfreeze etc. It really comes with a cost. If you have mid ranged applications which might spawn several processes of the executable this becomes really problematic (cpu usage, memory usage).


PyInstaller has been a life-saver for our work. The thing with our work is that we heavily use Python and SciPy stack but we also have quite a few GUI stuff. We preferably have all our code base in Python because our team is mostly familiar with Python. Also keeping both the GUI and the application logic in Python makes it super easy to pass around data (usually numpy arrays) which we do all the time. No server/client architecture or marshalling needed when everything is Python.

And we are mostly developing internal tools, so the executable size isn't a huge concern. But we still want an executable that can be copied from computer to computer and can be opened via a simple double-click, because not everybody in our company is a software engineer, a lot of people are just used to copying the executable to their computer and double-clicking on it to launch it. We don't use the --onefile feature since it makes the launches slow because of the extraction step. So we have a folder with an executable and the dependencies in it.

PyInstaller was a very good fit for our case. Yes the size of the bundle is quite big but it does the job. And allowed us to keep our whole code base in Python.


You are fortunate that your apps depend on SciPy for machine learning and not TensorFlow. TensorFlow is not on the list of supported 3rd party libraries, but I think I will try it anyway later today just to see if it works with pyinstaller.


PyTorch works very well with PyInstaller. The only issue is that it can't use the DataLoader's worker processes, they crash for some reason (even with multiprocessing.freeze_support()).


Thanks, good to know!


Also worth checking out: Nuitka [1]. It actually compiles your Python into machine code (albeit still making use of the CPython interpreter).

Despite the title, pyinstaller doesn't really compile anything, it just bundles your bytecode and the interpreter into a binary. That's often useful, but it's not the same thing. You can get similar-ish results to Nuitka by combining pyinstaller with Cython but it's quite a bit of work.

[1] https://nuitka.net/


> albeit still making use of the CPython interpreter

The CPython runtime. The point of Nuitka is to not use the interpreter :P


This sounds much better than pyinstaller. How was your experience with it? Is it straight forward or did you see issues or had to do tweaks to your code to make it work?


So far I haven't had to make any tweaks to my code, though I don't try to do anything fancy.

Getting it to work with dependencies can be a bit trickier, some of them need some Nuitka-specific hacks. The Nuitka creator/maintainer actually maintains a shocking number of these himself, in the form of Nuitka "plugins". It can be hard to figure out the right command line syntax to enable these, which is obviously a small inconvenience compared to creating them in the first place but can still be annoying. So far I've used Qt and numpy plugins and they work well. Trio doesn't work at the moment but support is likely to be coming soon. As one of the top comments for pyinstaller here says, you'll probably be better off if you're using Nuitka from day 1 rather than suddenly using it for a big exclusive project.

As with pyinstaller, it has the option to bundle everything into a single file. This is convenient but has the same disadvantage of being a bit wasteful, especially on Windows where it extracts files to the temp directory on every run (as another comment says about pyinstaller). On Linux I think it uses a ramdeive.

A very nice feature of Nuitka is a alternative mode where it doesn't put everything into one file. All the pure Python is compiled to a binary but it leaves all the binary dependencies and resources separate, which you can then deploy together in a directory. This is a bit less convenient for users but a bit more efficient and has the very nice property of being LGPL compliant if you're using PyQt/PySide. It's also useful for debugging the single-file mode, since that is really a wrapper around this mode.


Thanks for the detailed note! :)


Not OP but my experience trying out nuitka vs PyInstaller on atbswp[0] is nuitka takes significantly longer (and CPU time) to run and in my case the resulting executable wasn't working, it was because I use wxPython or some hidden import in my dependencies, I don't remember exactly. Bottom-line I think nuitka is great if you (or your dependencies) does not do black magic in the imports, otherwise you're in for a lot of debugging.

0: https://github.com/rmpr/atbswp


I really tried to use PyInstaller and Nuitka to deploy a Python command line [1] to Windows, Mac, and Linux users. We couldn't get around some system dependencies like OpenSSL needing to be available and not broken on user machines. We ended up re-writing the whole program from Python into Go [2].

Using Go solved so many long-tail bugs for us and just simplified the whole process of shipping code to user machines.

Here's the old but working build script that built both PyInstaller and Nuitka [3].

[1] https://github.com/wakatime/legacy-python-cli/tree/standalon...

[2] https://github.com/wakatime/wakatime-cli

[3] https://github.com/wakatime/legacy-python-cli/blob/standalon...


And if you need python-like configurability/reprogrammability, with go, you can use Starlark.

https://github.com/bazelbuild/starlark


Much as I love Python, there certainly are some corners of the ecosystem like that which have issues. Zip file support is another one where it's fine for many common cases but you don't have to try very hard to hit some niggles. The thing is I don't know if that's particular to Python, or whether every language ecosystem has this but Python is so widely used more people hit these issues.


I’ve tried all the options and my view is unless you test bundling from the start of the project you’re in for a world of hurt, either having to manually patch packages (ntlk); or produce executables that are missing dependencies.

It’s the main reason I’m looking at switching to Go for these kinds of apps. Python should have a working solution as part of the standard library.


Agreed. Dotnetcore, rust, golang all have an out of the box option for shipping binaries. I really enjoy using python to get something done fast but I'm not a fan of the whole ecosystem.


Strongly agreed.

Where Python fails is dependency management and packaging, much like this xkcd portrays: https://xkcd.com/1987/

It's a shame, too, because the language is perhaps one of the easiest to read, write and just generally develop in, and has a really rich ecosystem which can be leveraged to great success.

I've felt for a long time that language specs should be more clearly separated from their runtimes. Why couldn't we have statically compiled Python? Why did native Java executables need something like GraalVM be painstakingly introduced over many years, and even then fail to work properly whenever dynamic loading is involved (e.g. Spring framework)? The answer probably lies in the insane complexity all of that involves and making these decoupled isn't feasible with our current tooling, unless we want to spend a decade developing a new language/runtime like that.


> [Java/GraalVM natively compiled apps] fail to work properly whenever dynamic loading is involved (e.g. Spring framework)

What do you exactly expect if Spring/Java heads insist on late binding and religously exercise "dependency injection"? You can configure native-image reflection based on a closed-world assumption wrt what classes are known at compile-time, but TBH it seems futile if devs use shit-tonnes of annotation, dynamisms, and reflection magic. Or, as someone else here said, "in idiomatic Java/Spring code, behavior is expressed through anything and everything, except actual Java code."


I expect the Spring framework to hopefully some day fade into obscurity and die.

Its ample usage of reflection is just evil, polluting your stack traces with needless proxy classes and abstraction upon abstraction upon abstraction. You can debug code pretty easily, whereas doing that with annotations or XML that gets parsed and executed by code that you know nothing about leaves you in hopeless situations more often than you'd like. Instead of solving business problems, you end up solving whatever it is that Spring wants you to do.

Thus, most of your post is spot on. Rather often, it is the frameworks that are keeping us in a pretty unhappy place. In contrast, languages like Go feel a bit more pure in that regard, even generics were only added recently. Not giving the framework developers tools to express endless complexity is probably a good idea.


> Where Python fails is dependency management and packaging, much like this xkcd portrays: https://xkcd.com/1987/

Python packaging has its problems. But other languages are confusing if you have 5 or more overlapping versions from 4 sources too.

> Why couldn't we have statically compiled Python?

We have RPython and Cython.


Doesn’t allow you to create standalone binaries. Cython requires you to package the Python runtime DLLs (on Windows) otherwise it won’t run.


My favorite is the fairly new, but excellent PyOxidizer: https://github.com/indygreg/PyOxidizer

It's written in rust and can embed resources and dependencies in the executable.


pyoxidizer is amazing indeed.

Just repackaged my own open source project for multiple platforms using pyoxidizier.

I wish it would merge into the python ecosystem itself.

Harel https://github.com/harelba


Thanks, I'll take your .bzl as inspiration [0]. How did you go about developing against the Starlark API, any IDE support?

[0] https://github.com/harelba/q/blob/master/pyoxidizer.bzl


The most recent version of pyinstaller has been flagged in our enterprise fireeye endpoint security default rules. What happens is that pyinstaller has a folder in the python pyinstaller library directory that has 4 exes in it that are used to produce the final exe. One is called run.exe. Fireeye quarantines / deletes these. This didn't happen on older versions. Its a pretty big issue if these are stubs that will be flagged or if they will be compiled in with the final executable such that an application deployed to users might get flagged as malicious.


Soon after going through the python environment selection dance (pyenv fwiw, although sometimes conda), I went through the python binary tool selection dance and ended up on pyinstaller.

However I eventually abandoned hope that I was on a fruitful path, and changed course for web server land, embarking on the python web framework selection dance.


Did you do the Python version selection dance before that though? 2.7/3.6/3.8/3.9 :D


2 to 3 did come up at one point, yes. At the time I chose 2 but nothing much came of the choice since I didn't continue with python at the time, and nothing much came from what I was working on either.

The updates to 3 have a bit more impact now that I'm doing most of my work in python. Some of the features dropping seem too cool to not update.


In role of IT manager py2exe and pyinstaller are my goto tool to make script work on machine.

I’m from a IT dept of a oil company not much of savvy apps mostly dull.

As manager have to do lots of data wrangling for this and that form of reports.

scripts to do jobs on machine.

Python is only fullstack i know. Pyinstaller is mine goto tool to build executable and make it portable across environment.


I deployed a Py application (Pandas, PyQT) at work. It was horribly slow on half of the Windows machines ,likely because of the antivirus but I didn't have admin rights to check that. It was impossible to make a single file exe because it took too long to decompress. I spent days trying to reduce all the unused files half-manually (because automated approaches kept failing). In the end I threw all of that out , rewrote the UI and deployed it as a web application on a server. Now the users are happy.


> I deployed a Py application (Pandas, PyQT) at work ...

>In the end I threw all of that out , rewrote the UI and deployed it as a web application on a server. Now the users are happy.

This sounds familiar.


PSA: If you distribute this kind of software, be ready to deal with many antivirus issues. [1] has helped but it's still a very manual and frustrating process on every release.

[1]: https://github.com/hankhank10/false-positive-malware-reporti...


I have found that a useful way to evade antivirus issues with PyInstaller is to build my own copy of the bootloader and encourage use of the 64 bit version over the 32 bit version (since most malware will use the 32 bit version to infect the most computers).

I have a GitHub CI job which is able to automate the process for each release. https://github.com/jellyfin/jellyfin-mpv-shim/blob/master/.g...


Repeating my comment from every time PyInstaller is being discussed:

One issue with PyInstaller is that, by default, it includes all dynamic libraries that the Python interpreter has on your machine. It makes sense, it needs an interpreter and collects what is needed for it to run.

Unfortunately this might include libreadline.so, which is licensed under GPL, making your resulting executable unable to be distributed under a proprietary license.

There are ways to solve this issue, but one has to search and read documentation (and code, in my case -- when I was researching it, the docs were not clear).


The ability to make self contained executables is important. LispWorks and SBCL Common Lisp make nice, compact executables which opened more use cases for me with those languages.

I tried pyinstaller a few years ago, but it does not support some 3rd party libraries like TensorFlow that I frequently use. That said, I like that they clearly list supported 3rd party libraries so it is a quick check if pyinstaller will work for a specific project.


I use pyinstaller for my Kanmail email client [1] and it’s fantastic, but at creating Mac app bundles or Windows exes. Tried making actual standalone binaries for another project and, as others have mentioned, they’re incredibly slow to startup.

Still, I am a huge fan of the project and it makes it possible to make webview desktop “apps” (like or hate them) with Python.

[1] https://kanmail.io


A huge Corp really wanted our app but not the cloud based SPA app we offered - they wanted “executables” with no data stored in the cloud. Ended up creating a electron app, using our SPA and django packaged with pyinstaller that fires up as a child process. Thus a local backend with SQLIte. It’s now rolled out and works great.


tip: Rather than using --onefile to create a single standalone executable, IMO you're far better off creating a dist then packaging that into an Installer using Inno Setup or similar.

Your app will start quicker, because it won't be a case of the "single executable" doing the old unzip-and-run thing every time.


Py2exe/pyinstaller is what I've always done as well. It works really well for distributing Python apps on Windows in my experience. Have been using that combo for many years now and while it's not perfect it is still the best way to distribute Python applications on Windows.


sigh that's not "compiling" as programmers usually use it. PyInstaller merely packages uncompiled python programs into a self-extracting archive. Cython, on the other hand, can truly compile python3 into gcc object files which are then linkable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: