Free-threaded CPython is ready to experiment with

eigenvalue · 2024-07-12T21:16:10 1720818970

Really excited for this. Once some more time goes by and the most important python libraries update to support no GIL, there is just a tremendous amount of performance that can be automatically unlocked with almost no incremental effort for so many organizations and projects. It's also a good opportunity for new and more actively maintained projects to take market share from older and more established libraries if the older libraries don't take making these changes seriously and finish them in a timely manner. It's going to be amazing to saturate all the cores on a big machine using simple threads instead of dealing with the massive overhead and complexity and bugs of using something like multiprocessing.

pizza234 · 2024-07-13T06:30:26 1720852226

> using simple threads instead of dealing with the massive overhead and complexity and bugs of using something like multiprocessing.

Depending on the domain, the reality can be the reverse.

Multiprocessing in the web serving domain, as in "spawning separate processes", is actually simpler and less bug-prone, because there is considerably less resource sharing. The considerably higher difficulty of writing, testing and debugging parallel code is evident to anybody who's worked on it.

As for the overhead, this again depends on the domain. It's hard to quantify, but generalizing to "massive" is not accurate, especially for app servers with COW support.

bausgwi678 · 2024-07-13T08:49:21 1720860561

Using multiple processes is simpler in terms of locks etc, but python libraries like multiprocessing or even subprocess.popen[1] which make using multiple processes seem easy are full of footguns which cause deadlocks due to fork-safe code not being well understood. I’ve seen this lead to code ‘working’ and being merged but then triggering sporadic deadlocks in production after a few weeks.

The default for multiprocessing is still to fork (fortunately changing in 3.14), which means all of your parent process’ threaded code (incl. third party libraries) has to be fork-safe. There’s no static analysis checks for this.

This kind of easy to use but incredibly hard to use safely library has made python for long running production services incredibly painful in my experience.

[1] Some arguments to subprocess.popen look handy but actually cause python interpreter code to be executed after the fork and before the execve, which has caused production logging-related deadlocks for me. The original author was very bright but didn’t notice the footgun.

ignoramous · 2024-07-13T16:32:34 1720888354

> The default for multiprocessing is still to fork (fortunately changing in 3.14)

If I may: Changing from fork to what?

thomasjudge · 2024-07-13T18:15:24 1720894524

"In Python 3.14, the default will be changed to either “spawn” or “forkserver” (a mostly safer alternative to “fork”)."

- https://pythonspeed.com/articles/python-multiprocessing/

lyu07282 · 2024-07-13T18:22:51 1720894971

Same experiences, multiprocessing is such a pain in python. It's one of these things people think they can write production code in, but they just haven't run into all the ways their code was wrong so they figure out those bugs later in production.

As an aside I still constantly see side effects in imports in a ton of libraries (up to and including resource allocations).

coldtea · 2024-07-14T01:01:13 1720918873

>which make using multiple processes seem easy are full of footguns which cause deadlocks due to fork-safe code not being well understood. I’ve seen this lead to code ‘working’ and being merged but then triggering sporadic deadlocks in production after a few weeks

Compared to theads being "pain free"?

skissane · 2024-07-13T07:30:06 1720855806

Just the other day I was trying to do two things in parallel in Python using threads - and then I switched to multiprocessing - why? I wanted to immediately terminate one thing whenever the other failed. That’s straightforwardly supported with multiprocessing. With threads, it gets a lot more complicated and can involve things with dubious supportability

lyu07282 · 2024-07-13T18:45:39 1720896339

There is a reason why it's "complicated" in threads, because doing it correctly just IS complicated, and the same reason applies to child processes, you just ignored that reason. That's one example of a footgun in using multiprocessing, people write broken code but they don't know that because it appears to work... until it doesn't (in production on friday night).

skissane · 2024-07-13T22:12:25 1720908745

I don't agree. A big reason why abruptly terminating threads at an arbitrary point is risky is it can corrupt shared memory. If you aren't using shared memory in a multiprocess solution, that's not an issue. Another big reason is it can lead to resource leaks (e.g. thread gets terminated in a finally clause to close resources and hence the resource doesn't get closed). Again, that's less of an issue for processes, since many resources (file descriptors, network connections) get automatically closed by the OS kernel when the process exits.

Abruptly terminating a child process still can potentially cause issues, but there are whole categories of potential issues which exist for abrupt thread termination but not for abrupt process termination.

lyu07282 · 2024-07-14T07:51:53 1720943513

This is not a matter of opinion. Always clean up after yourself, the kernel doesn't know shit about your application or what state it's in, you can not rely on it to cleanly terminate your process. Just because it's a child process (by default it's a forked process!) not a thread, doesn't mean it can not have shared resources. It can lead to deadlocks, stuck processes, all kinds of resource leaks, data and stream corruption, orphaned processes, etc. etc.

skissane · 2024-07-14T09:03:39 1720947819

> This is not a matter of opinion. Always clean up after yourself, the kernel doesn't know shit about your application or what state it's in, you can not rely on it to cleanly terminate your process.

If you have an open file or network connection, the kernel is guaranteed to close it for you when the process is killed (assuming it hasn't passed the fd/socket to a subprocess, etc). That's not a matter of opinion.

Yes, if you are writing to a file, it is possible abruptly killing the writer may leave the file in an inconsistent state. But maybe you know your process doesn't write to any files (that's true in my case). Or maybe it does write to files, but you already have other mechanisms to recover their integrity in this scenario (since file writing processes can potentially die at any time–kernel panic, power loss, intermittent crash bug, etc)

kaba0 · 2024-07-14T16:14:36 1720973676

Different programming languages have different guarantees when it comes to threads. If IO is hidden behind object semantics, objects aren’t killed by “killing” a thread, and they can be gracefully terminated when they are deemed to no longer be in use.

lyu07282 · 2024-07-14T09:59:53 1720951193

Oh well I see, you will learn the hard way then (:

skissane · 2024-07-15T07:45:57 1721029557

You have no idea what I'm actually doing, yet you are convinced something bad is bound to happen, although you can't say what exactly that bad thing will be.

That's not useful feedback.

mabster · 2024-07-14T23:56:49 1721001409

That's why I've always liked Java's take on this. Throw an InterruptedException and the thread is considered terminated once that has dropped all the way through. You can also defer the exception for some time if it takes time to clean something up.

The only issue there is that sometimes library code will incorrectly defer the exception (i.e. suppress it) but otherwise it's pretty good.

phkahler · 2024-07-12T21:31:41 1720819901

I feel like most things that will benefit from moving to multiple cores for performance should probably not be written in Python. OTH "most" is not "all" so it's gonna be awesome for some.

wongarsu · 2024-07-12T23:09:34 1720825774

I often reach for python multiprocessing for code that will run $singleDigit number of times but is annoyingly slow when run sequentially. I could never justify the additional development time for using a more performant language, but I can easily justify spending 5-10 minutes making the embarrassingly parallel stuff execute in parallel.

throwaway81523 · 2024-07-13T05:29:47 1720848587

I've generally been able to deal with embarassing parallelism by just chopping up the input and running multiple processes with GNU Parallel. I haven't needed the multiprocessing module or free threading so far. I believe CPython still relies on various bytecodes to run atomically, which you get automatically with the GIL present. So I wonder if hard-to-reproduce concurrency bugs will keep surfacing in the free-threaded CPython for quite some time.

I feel like all of this is tragic and Python should have gone to a BEAM-like model some years ago, like as part of the 2 to 3 transition. Instead we get async wreckage and now free threading with its attendant hazards. Plus who knows how many C modules won't be expecting this.

robertlagrant · 2024-07-13T06:25:04 1720851904

Async seems fine? What's wrong with it?

throwaway81523 · 2024-07-13T06:51:52 1720853512

Watch this video and maybe you'll understand ;). Warning, NSFW (lots of swearing), use headphones.

https://www.youtube.com/watch?v=bzkRVzciAZg

This is also good:

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

web search on "colored functions" finds lots of commentary on that article.

wesselbindt · 2024-07-13T07:37:41 1720856261

I've always found the criticism leveled by the colored functions blog post a bit contrived. Yes, when you replace the words async/await with meaningless concepts I do not care about, it's very annoying to have to arbitrarily mark a function as blue or red. But when you replace the word "aync" with something like "expensive", or "does network calls", it becomes clear that "async/await" makes intrinsic properties about your code (e.g., is it a bad idea to put this call in a loop from a performance perspective) explicit rather than implicit.

In short, "await" gives me an extra piece of data about the function, without having to read the body of the function (and the ones it calls, and the ones they call, etc). That's a good thing.

There are serious drawbacks to async/await, and the red/blue blog post manages to list none of them.

EDIT: all of the above is predicated on the idea that reading code is harder than writing it. If you believe the opposite, then blue/red has a point.

adament · 2024-07-13T08:16:44 1720858604

But a synchronous function can and many do make network calls or write to files. It is a rather vague signal about the functions behavior as opposed to the lack of the IO monad in Haskell.

To me the difficulty is more with writing generic code and maintaining abstraction boundaries. Unless the language provides a way to generalise over asyncness of functions, we need a combinatorial explosion of async variants of generic functions. Consider a simple filter algorithm it needs versions for: (synchronous vs asynchronous iterator) times (synchronous vs asynchronous predicate). We end up with a pragmatic but ugly solution: provide 2 versions of each algorithm: an async and a sync, and force the user of the async one to wrap their synchronous arguments.

Similarly changing some implementation detail of a function might change it from a synchronous to an asynchronous function, and this change must now propagate through the entire call chain (or the function must start its own async runtime). Again we end up in a place where the most future proof promise to give for an abstraction barrier is to mark everything as async.

wesselbindt · 2024-07-13T10:11:04 1720865464

> But a synchronous function can and many do make network calls or write to files

This, for me, is the main drawback of async/await, at least as it is implemented in for example Python. When you call a synchronous function which makes network calls, then it blocks the event loop, which is pretty disastrous, since for the duration of that call you lose all concurrency. And it's a fairly easy footgun to set off.

> It is a rather vague signal about the functions behavior as opposed to the lack of the IO monad in Haskell.

I'm happy you mentioned the IO monad! For me, in the languages people pay me to write in (which sadly does not include Haskell or F#), async/await functions as a poor man's IO monad.

> Again we end up in a place where the most future proof promise to give for an abstraction barrier is to mark everything as async.

Yes, this is one way to write async code. But to me this smells the same as writing every Haskell program as a giant do statement because the internals might want to do I/O at some point. Async/await makes changing side-effect free internals to effectful ones painful, which pushes you in the direction of doing the I/O at the boundaries of your system (where it belongs), rather than all over the place in your call stack. In a ports-adapters architecture, it's perfectly feasible to restrict network I/O to your service layer, and leave your domain entirely synchronous. E.g. sth like

  async def my_service_thing(request, database):
    my_business_object = await database.get(request.widget_id)
    my_business_object.change_state(request.new_widget_color)    # some complicated, entirely synchronous computation
    await database.save(my_business_object)

Async/await pushes you to code in a certain way that I believe makes a codebase more maintainable, in a way similar to the IO monad. And as with the IO monad, you can subvert this push by making everything async (or writing everything in a do statement), but there's better ways of working with them, and judging them based on this subversion is not entirely fair.

> ugly solution: provide 2 versions of each algorithm: an async and a sync

I see your point, and I think it's entirely valid. But having worked in a couple async codebases for a couple of years, the amount of stuff I (or one of my collaborators) have had to duplicate for this reason I think I can count on one hand. It seems that in practice this cost is a fairly low one.

bjourne · 2024-07-14T13:59:17 1720965557

What you write is eerily similar to one of the pain points of Haskell. You can write a compiler that is purely functional. But then you want logging so you must put wrap it with the IO monad. And then also every function that calls the compiler and so on.

pansa2 · 2024-07-13T08:30:45 1720859445

> when you replace the word "aync" with something like "expensive", or "does network calls", it becomes clear that "async/await" makes intrinsic properties about your code explicit rather than implicit.

Do you think we should be annotating functions with `expensive` and/or `networking`? And also annotating all of their callers, recursively? And maintaining 4 copies of every higher-order function depending on whether the functions it calls are `expensive`, `networking`, neither or both?

No, we rely on documentation for those things, and IMO we should for `async` as well. The reason we can’t, and why `async`/`await` exist, is because of shortcomings (lack of support for stackful coroutines) in language runtimes. The best solution is to fix those shortcomings, not add viral annotations everywhere.

wesselbindt · 2024-07-13T09:30:08 1720863008

So here I think we differ fundamentally in how we like to read code. I much prefer being able to quickly figure out things of interest about a function by glancing at its signature, rather than look at documentation, or worse, having to read the implementation of the function and the functions it calls (and so on, recursively).

For example, I much prefer a signature like

  def f(a: int) -> str:

over

  def f(a):

because it allows me to see, without reading the implementation of the function (or, if it exists, and I'm willing to bet on its reliability, the documentation), that it takes an integer, and gives me a string. And yes, this requires that I write viral type annotations on all my functions when I write them, but for me the bottleneck at my job is not writing the code, it's reading it. So that's a small upfront cost I'm very much willing to pay.

> Do you think we should be annotating functions with `expensive` and/or `networking`? And also annotating all of their callers, recursively?

Yes, absolutely, and yes, absolutely. That's just being upfront and honest about an intrinsic property of those functions. A function calling a function that does network I/O by transitivity also does network I/O. I prefer code that's explicit over code that's implicit.

pansa2 · 2024-07-13T09:57:49 1720864669

> Yes, absolutely, and yes, absolutely.

Fair enough, that's a valid philosophy, and one in which `async`/`await` makes perfect sense.

However, it's not Python's philosophy - a language with dynamic types, unchecked exceptions, and racy multithreading. In Python, `async`/`await` seems to be at odds with other language features - it feels like it's more at home in a language like Rust.

wesselbindt · 2024-07-13T10:21:32 1720866092

I completely agree with you. However I've always found the dynamic typing approach to be a bit at odds with

  python3 -c "import this" | head -4 | tail -1

I think the fast and loose style that Python enables is perfect for small scripts and one off data science notebooks and the like. But having worked in large codebases which adopt the same style, and ones that avoid it through static typing and in some cases async/await, the difference in productivity I've noticed in both me and my collaborators is too stark for me to ignore.

I think I should've been more nuanced in my comments praising async/await. I believe that what I say is valid in large IO-bound applications which go beyond basic CRUD operations. In general it depends, of course.

pansa2 · 2024-07-13T10:43:48 1720867428

> I think the fast and loose style that Python enables is perfect for small scripts and one off data science notebooks and the like.

Agreed - I only use Python for scripts like this, preferring statically-typed, AOT-compiled languages for larger programs.

That’s why I think Python should have adopted full coroutines - it should play to its strengths and stick to its fast-and-loose style. However, the people who decide how the language evolves are all employees of large companies using it for large codebases - their needs are very different from people who are only using Python for small scripts.

hiddew · 2024-07-13T08:38:43 1720859923

> The reason we can’t, and why `async`/`await` exist, is because of shortcomings (lack of support for stackful coroutines) in language runtimes

The JVM runtime has solved this problem neatly with virtual threads in my opinion. Run a web request in a virtual thread, and all blocking I/O is suddenly no longer blocking the OS thread, but yielding/suspending and giving and giving another virtual thread run time. And all that without language keywords that go viral through your program.

pansa2 · 2024-07-13T08:48:44 1720860524

Yes, this is similar to how Go works. IIRC the same approach was available in Python as a library, “greenlet”, but Python’s core developers rejected it in favour of `async`/`await`.

throwaway81523 · 2024-07-13T18:49:54 1720896594

The Python community seems to have a virulent hatred of threads. I don't understand the reason. Yes there are hazards but you can code in a style that avoids them. With something like BEAM you can even enforce the style. Async/await of course introduce their own hazards.

robertlagrant · 2024-07-15T08:51:03 1721033463

There's no hatred. There are just lots of libraries that can't be multithreaded, due to historical reasons. This is being worked on right now[0], though.

[0] https://peps.python.org/pep-0703

throwaway81523 · 2024-07-16T01:23:16 1721092996

The GIL only gets in the way of parallelism. Yes there is real hatred. You can experience it if you visit #python on libera and ask anything about threads. "Aieee! The non-determinism! The race conditions! Etc." Of course the async requires an async version of the whole i/o system, it can block on long computations or the wrong system calls, etc. And many rock solid systems are written in Erlang, which uses preemptive lightweight processes for all its concurrency needs.

Lots of microcontroller OS's use cooperative multitasking but once there are enough machine resource's, OS's generally become preemptive. Async concurrency is basically cooperative multitasking with similar issues. Does Python give a way to open a file asynchronously in Linux? It's now possible with io_uring but it was impossible for a very long time (like decades). Erlang and GHC both use thread pools to deal with that. The use the old synchronous open(2) call but move it into an auxiliary thread so it won't block the calling thread.

robertlagrant · 2024-07-16T08:36:01 1721118961

> "Aieee! The non-determinism! The race conditions! Etc."

That doesn't sound like real hatred. Those sound like real concerns, which need to be addressed, and the attempt to remove the GIL is doing so very much with those concerns in mind.

kaba0 · 2024-07-14T17:40:20 1720978820

I think it wouldn’t work as nicely with python, which deeply builds on C FFI. Java has a different history, and almost the whole ecosystem is pure, making it able to take advantage of it.

gpderetta · 2024-07-13T10:00:25 1720864825

Except that at least in python async doesn't mean that. Non async functions can do networking, block, or do expensive operations.

On the other hand async functions can be very cheap.

Again, which useful property does async protect?

robertlagrant · 2024-07-15T08:53:51 1721033631

Yeah gotcha - I'm familiar with the colored functions argument. I see the conceptual downside, but it doesn't seem that bad in practice. I have a pre-commit hook that shouts at me if I call an async function synchronously.

eigenvalue · 2024-07-13T00:26:43 1720830403

I personally optimize more for development time and overall productivity in creating and refactoring, adding new features, etc. I'm just so much faster using Python than anything else, it's not even close. There is such an incredible world of great libraries easily available on pip for one thing.

Also, I've found that ChatGPT/Claude3.5 are much, much smarter and better at Python than they are at C++ or Rust. I can usually get code that works basically the first or second time with Python, but very rarely can do that using those more performant languages. That's increasingly a huge concern for me as I use these AI tools to speed up my own development efforts very dramatically. Computers are so fast already anyway that the ceiling for optimization of network oriented software that can be done in a mostly async way in Python is already pretty compelling, so then it just comes back again to developer productivity, at least for my purposes.

goosejuice · 2024-07-13T05:45:34 1720849534

Kind of sounds like you are optimizing for convenience :)

indigodaddy · 2024-07-13T01:55:18 1720835718

Ever messed about with Claude and php?

lanstin · 2024-07-13T02:01:03 1720836063

I don't think we are supposed to use HN for humor only posts.

saagarjha · 2024-07-13T02:58:02 1720839482

You think wrong

jacob019 · 2024-07-13T02:08:28 1720836508

indigodaddy · 2024-07-13T14:46:01 1720881961

Why the downvotes? Totally serious question. Jesus Christ HN

jillesvangurp · 2024-07-13T04:54:33 1720846473

Right now you are right. This is about taking away that argument. There's no technical reason for this to stay true. Other than that the process of fixing this is a lot of work of course. But now that the work has started, it's probably going to progress pretty steadily.

It will be interesting to see how this goes over the next few years. My guess is that a lot of lessons were learned from the python 2 to 3 move. This plan seems pretty solid.

And of course there's a relatively easy fix for code that can't work without a GIL: just do what people are doing today and just don't fork any threads in python. It's kind of pointless in any case with the GIL in place so not a lot of code actually depends on threads in python.

Preventing the forking of threads in the presence of things still requiring the GIL sounds like a good plan. This is a bit of meta data that you could build into packages. This plan is actually proposing keeping track of what packages work without a GIL. So, that should keep people safe enough if dependency tools are updated to make use of this meta data and actively stop people from adding thread unsafe packages when threading is used.

So, I have good hopes that this is going to be a much smoother transition than python 2 to 3. The initial phase is probably going to flush out a lot of packages that need fixing. But once those fixes start coming in, it's probably going to be straightforward to move forward.

jodrellblank · 2024-07-13T03:01:54 1720839714

https://www.servethehome.com/wp-content/uploads/2023/01/Inte...

AMD EPYC 9754 with 128-cores/256-threads, and EPYC 9734 with 112-cores/224-threads. TomsHardware says they "will compete with Intel's 144-core Sierra Forest chips, which mark the debut of Intel's Efficiency cores (E-cores) in its Xeon data center lineup, and Ampre's 192-core AmpereOne processors".

What in 5 years? 10? 20? How long will "1 core should be enough for anyone using Python" stand?

d0mine · 2024-07-13T04:02:31 1720843351

Number crunching code in Python (such as using numpy/pytorch) performs the vast vast majority of its calculations in C/Fortran code under the hood where GIL can be released. Single python process can use multiple CPUs.

There is code that may benefit from the free threaded implementation but it is not as often as it might appear and it is not without its own downsides. In general, GIL simplifies multithreaded code.

There were no-GIL Python implementations such as Jython, IronPython. They hadn't replaced CPython, Pypy implementation which use GIL i.e., other concerns dominate.

imachine1980_ · 2024-07-13T05:28:36 1720848516

Yes but jython am iron aren't the standard, and I feel the more relevant part is inertia, puppy is design whit lots of concern of compatibility, then being the new standard can totally make difference making both cases not a good comparison.

phkahler · 2024-07-16T15:33:04 1721143984

>> What in 5 years? 10? 20? How long will "1 core should be enough for anyone using Python" stand?

If you're looking for a 32x or 128x performance improvement from python supporting multi-core you should probably rewrite in C, C++, Rust, or Fortran and get that 100x improvement today on a single core. If done properly you can then ALSO get the gain from multiple cores on top of that. Or to put it another way, if performance is critical python is a poor choice.

jodrellblank · 2024-07-19T02:14:36 1721355276

"instead of taking advantage of hardware you own, you should do a LOT of work in a language you don't know" - how is that in any way a reasonable suggestion or alternative?

> "Or to put it another way, if performance is critical python is a poor choice."

To put it another way, just because performance isn't critical, doesn't mean that more performance for free is not desirable or beneficial, or that ignoring 127/128ths of available performance is fine.

Derbasti · 2024-07-13T03:42:06 1720842126

A thought experiment:

A piece of code takes 6h to develop in C++, and 1h to run.

The same algorithm takes 3h to code in Python, but 6h to run.

If I could thread-spam that Python code on my 24 core machine, going Python would make sense. I've certainly been in such situations a few times.

Certhas · 2024-07-13T09:57:46 1720864666

C++ and python are not the only options though.

Julia is one that is gaining a lot of use in academia, but any number of modern, garbage collected compiled high level languages could probably do.

rbanffy · 2024-07-13T22:02:50 1720908170

Fair. Add a couple hours to learn enough Julia to write the code.

DanielVZ · 2024-07-12T22:15:46 1720822546

Usually performance critical code is written in cpp, fortran, etc, and then wrapped in libraries for Python. Python still has a use case for glue code.

andmkl · 2024-07-13T00:22:14 1720830134

Yes, but then extensions can already release the GIL and use the simple and industrial strength std::thread, which is orders of magnitude easier to debug.

woodruffw · 2024-07-13T00:52:47 1720831967

Concurrent operations exist at all levels of the software stack. Just because native extensions might want to release the GIL and use OS threads doesn't mean pure Python can't also want (or need) that.

(And as a side note: I have never, in around a decade of writing C++, heard std::thread described as "easy to debug.")

ipsod · 2024-07-13T00:54:20 1720832060

Really? Cool.

I expected that dropping down to C/C++ would be a large jump in difficulty and quantity of code, but I've found it isn't, and the dev experience isn't entirely worse, as, for example, in-editor code-intelligence is rock solid and very fast in every corner of my code and the libraries I'm using.

If anyone could benefit from speeding up some Python code, I'd highly recommend installing cppyy and giving it a try.

woodson · 2024-07-13T01:22:26 1720833746

Thanks, I haven’t come across cppyy! But I’ve worked with pybind11, which works well, too.

ipsod · 2024-07-13T03:08:19 1720840099

Sure! I tried pybind11, and some other things. cppyy was the first I tried that didn't give me any trouble. I've been using it pretty heavily for about a year, and still no trouble.

nly · 2024-07-13T09:19:47 1720862387

Last I checked cppyy didn't build any code with optimisations enabled (same as cling)

ipsod · 2024-07-13T16:17:11 1720887431

It seems like you might be able to enable some optimizations with EXTRA_CLING_ARGS. Since it's based on cling, it's probably subject to whatever limitations cling has.

To be honest, I don't know much about the speed, as my use-case isn't speeding up slow code.

tho34234234 · 2024-07-13T04:35:44 1720845344

It's not just about "raw-flop performance" though; it affects even basic things like creating data-loaders that run in the background while your main thread is doing some hard ML crunching.

Every DL library comes with its own C++ backend that does this for now, but it's annoyingly inflexible. And dealing with GIL is a nightmare if you're dealing with mixed Python code.

MBCook · 2024-07-12T22:09:12 1720822152

But it would give you more headroom before rewriting for performance would make sense right? That alone could be beneficial to a lot of people.

rty32 · 2024-07-12T23:27:31 1720826851

I think it is beneficial to some people, but not a lot. My guess is that most Python users (from beginners to advanced users, including many professional data scientists) have never heard of GIL or thought of doing any parallelization in Python. Code that needs performance and would benefit from multithreading, usually written by professional software engineers, likely isn't written in Python in the first place. It would make sense for projects that can benefit from disabling GIL without a ton of changes. Remember it is not trivial to update single threaded code to use multithreading correctly.

in Python language specifically. Their library may have already done some form of parallelization under the hood

bdd8f1df777b · 2024-07-13T04:09:25 1720843765

> Code that needs performance and would benefit from multithreading, usually written by professional software engineers, likely isn't written in Python in the first place.

There are a lot of simple cases where multi-threading can easily triple or quadruple the performance.

Certhas · 2024-07-13T12:14:08 1720872848

But multiprocessing can't?

I used to write a ton of MPI based parallel python. It's pretty straightforward. But one could easily imagine trying to improve the multiprocessing ergonomics rather than introducing threading. Obviously the people who made the choice to push forward with this are aware of these options, too. Still mildly puzzling to me why threads for Python are needed/reasonable.

bdd8f1df777b · 2024-07-19T13:53:25 1721397205

A common pain point for me is parallel data loading and processing for PyTorch or TensorFlow. Multiprocessing has a lot of bugs and pain points to deal with when doing ML. Examples: https://github.com/search?q=repo%3Apytorch%2Fpytorch+multipr.... Most of these issues do not exist in a multithreading world, because resource sharing is trivial in that case.

Since Python leads over any other languages in the ML community, and ML is a hot topic right now, it makes sense for Python developers to secure the lead by making the life of ML developers easier, which is by introducing GIL-less multi-threading.

pdhborges · 2024-07-13T20:48:29 1720903709

It doesn't to be puzzling just read the motivation section of https://peps.python.org/pep-0703/

Certhas · 2024-07-14T13:03:19 1720962199

I know that document. It doesn't really answer this point though. It motivates the need for parallelism in slow python by noting that this is important once other performance critical code is in extensions.

But the mai point against multiprocessing seems to be that spawning new processes is slow ...

That single "alternatives" paragraph doesn't answer at all why mp isn't viable for python level parallelism.

I am no longer invested in python heavily, I am sure there are discussions or documents somewhere that go into this more. Might be that it's simply that everyone is used to threads so you should support it for sheer familiarity. All I am saying is it's not obvious to a casual observer.

paulddraper · 2024-07-13T20:07:45 1720901265

> should not be written

IDK what l should and shouldn't be written in, but there are a very large # of proud "pure Python" libraries on GitHub and HN.

The ecosystem seems to even prefer them.

fastasucan · 2024-07-16T09:01:31 1721120491

I never understand this sentiment, that shows up in every topic on python. Who descides why something should or should not be written I Python?

Why shouldn't someone who prefers writing in python benefit from using multiple cores?

phkahler · 2024-07-16T15:43:17 1721144597

>> Who descides why something should or should not be written I Python? Why shouldn't someone who prefers writing in python benefit from using multiple cores?

I did use the words "most things". I'm not saying this is a bad development for Python, or that nobody should use it. But if performance is a top priority, Python is the wrong language and always has been.

I use Python from time to time, it's fun and easy to put certain kinds of things together quickly. But each time I do a project with it, the first thing I ask myself is "is this going to be fast enough?" If not I'll use something else.

fuzztester · 2024-07-12T23:42:17 1720827737

[flagged]

fuzztester · 2024-07-12T23:44:33 1720827873

https://news.ycombinator.com/item?id=40949956

fuzztester · 2024-07-13T01:58:11 1720835891

[flagged]

jacob019 · 2024-07-13T02:19:21 1720837161

Bashing Python in Python development thread is not tactful.

fuzztester · 2024-07-13T05:11:58 1720847518

it may not be tactful but it sure is factful (even if that last word is ungrammatical, at least the sentence is poetical). he he he.

jfc. guido knows ;), this thread is getting weirder and weirder, creepier and creepier.

are you literally implying that I should lie about known facts about python slowness?

"tactful" my foot.

then I guess the creators of PyPy and Unladen Swallow (the latter project was by Google) were/are not being tactful either, because those were two very prominent projects to speed up Python, in other words, to reduce its very well known slowness.

There is/was also Cython and Pyston (the latter at Dropbox, where Guido (GvR, Python creator) worked for a while. Those were or are also projects to speed up Python program execution.

https://en.m.wikipedia.org/wiki/PyPy

https://peps.python.org/pep-3146/

Excerpt from the section titled "Rationale, Implementation" from the above link (italics mine):

[

Many companies and individuals would like Python to be faster, to enable its use in more projects. Google is one such company.

Unladen Swallow is a Google-sponsored branch of CPython, initiated to improve the performance of Google’s numerous Python libraries, tools and applications. To make the adoption of Unladen Swallow as easy as possible, the project initially aimed at four goals:

    A performance improvement of 5x over the baseline of CPython 2.6.4 for single-threaded code.
    100% source compatibility with valid CPython 2.6 applications.
    100% source compatibility with valid CPython 2.6 C extension modules.
    Design for eventual merger back into CPython.

]

Your honor, I rest my case.

jacob019 · 2024-07-13T17:44:58 1720892698

tactful - showing skill and sensitivity in dealing with people. Right or wrong, your comments do not contribute to the discussion. Open source development is about collaboration. Your irrelevant comments are disrespectful to all the hard working developers who have been working together to push the limits of what is possible with what may be the most widely used programming language of our time. I use Python for it's ergonomics and community, not for speed. Removing the GIL has been something we've been yearning for since the Python 2 days. And the speed has improved drastically in recent years with each release. You don't win a prize for being right.

fuzztester · 2024-07-18T01:43:11 1721266991

>tactful - showing skill and sensitivity in dealing with people. Right or wrong, your comments do not contribute to the discussion.

just take a look at the depth of your incredibly rotten stupidity and ugliness of mind, you fucktard and dotard (dotard means something like a senile person, in case you didn't know):

hn user jacob019, you little mofo:

you say that "Right or wrong, your comments do not contribute to the discussion."

that itself is a contradiction in terms.

so you are saying that my comments do not contribute to the discussion even if i am right?

(what an utter fool and liar and swine you are.)

then why should i and anyone else think that your comments contribute to the discussion when they so totally fucking wrong, biased, shitheaded, pissheaded, and other choice epithets?

comgrats, you have really jumped the shark in terms of being a creep.

enjoy your miserable life for what it is worth, you worm.

jacob019 · 2024-07-22T21:43:40 1721684620

Seems I hit a nerve. We never stop growing. Good luck to you friend.

fuzztester · 2024-07-13T05:57:12 1720850232

if Google wanted a performance increase of 5x over the then existing python, I guess we can safely say that Python was slow, amirite. and yes, I know the version number mentioned, and what it is today.

fuzztester · 2024-07-13T01:55:43 1720835743

[flagged]

jacob019 · 2024-07-13T02:13:40 1720836820

It's very unpopular to mention Perl, but I did many cool things with it back in the day, and it still holds a special place for me. Perl taught me the power of regex--it's really first class in Perl. I still have some Perl code in production today. But to be fair, it is really easy to write spaghetti in Perl if you don't know what you're doing.

fuzztester · 2024-07-13T05:52:35 1720849955

don't worry about unpopularity, bro. worry about being true. the rest will take care of itself. if not, you are in the wrong company, forum, or place, and better to work on getting out of there.

fuzztester · 2024-07-15T00:15:43 1721002543

> But to be fair, it is really easy to write spaghetti in Perl if you don't know what you're doing.

bro, you really need to educate yourself some more, if you say things like that.

baloney!

first of all, perl is already spaghetti. I mean, it has all those curlicues, aka sigils. perl devs love it that way.

second of all, that point about being able to write spaghetti is not unique to perl.

many people do it in many languages.

in fact it is common here on hn to see the statement "you can write fortran in any language".

dr_kiszonka · 2024-07-13T04:43:54 1720845834

One day your ISP emails you that they increased your upload and download speeds. Sweet, right? Same here with Python.

fuzztester · 2024-07-13T05:00:31 1720846831

[flagged]

fuzztester · 2024-07-13T05:54:12 1720850052

https://news.ycombinator.com/item?id=40951145

saagarjha · 2024-07-13T03:56:26 1720842986

They’re probably downvoting you because you’ve posted like the laziest trope comment there is. lol Python slow everyone is paid to hide the truth amirite

fuzztester · 2024-07-13T05:39:53 1720849193

[flagged]

porksoda · 2024-07-13T10:27:54 1720866474

Your comments feel like you want a reddit style battle of wits. You throw words like ignorant around rather freely. I downvote these comments because they lower the tone of the discussion. We're not here to talk about each other or feel smarter than others.. Well I'm not.

fuzztester · 2024-07-15T00:36:54 1721003814

I don't give a flying fuck about your feelings or your opinions about my comments, just as you should not about mine. i am a free man. you should try to be one too. we can talk to each other, but that does not mean that either of us has to believe or be convinced by what the other person says. is this not fuckingly blindlingly obvious to you?

if not, you have a serious perception problem.

yes, I do throw around words, just like anyone else, but not freely, instead, I do that after at least some thought, which I do not see happening in the people whom I replied to, nor in your comment above.

fuzztester · 2024-07-13T04:49:41 1720846181

[flagged]

geertj · 2024-07-13T05:38:37 1720849117

You might be getting downvoted because many people know Python is among the slower dynamic languages, and there are other reasons to use it. Speaking for myself, the reasons that make me reach for Python for some projects are the speed of development, large ecosystem of libraries, large developer pool, and pretty good tooling/IDE support.

fuzztester · 2024-07-13T06:03:17 1720850597

then it would be much preferable if they said so explicitly, although both of those are points I already know well. both are common knowledge, not just the first, i e. the slowness. the productivity and other benefits are common knowledge too.

downvoting is such a fucking dumb way of disagreeing. how does one know whether the downvote is because a person disapproves of or dislikes what one said or because they think what one said is wrong. no way to know. amirite? :)

wokwokwok · 2024-07-13T07:33:01 1720855981

> there is just a tremendous amount of performance that can be automatically unlocked with almost no incremental effort for so many organizations and projects

This just isn’t true.

This does not improve single threaded performance (it’s worse) and concurrent programming is already available.

This will make it less annoying to do concurrent processing.

It also makes everything slower (arguable where that ends up, currently significantly slower) overall.

This way over hyped.

At the end of the day this will be a change that (most likely) makes the existing workloads for everyone slightly slower and makes the lives of a few people a bit easier when they implement natively parallel processing like ML easier and better.

It’s an incremental win for the ML community, and a meaningless/slight loss for everyone else.

At the cost of a great. Deal. Of. Effort.

If you’re excited about it because of the hype and don’t really understand it, probably calm down.

Mostly likely, at the end of the day, it s a change that is totally meaningless to you, won’t really affect you other than making some libraries you use a bit faster, and others a bit slower.

Overall, your standard web application will run a bit slower as a result of it. You probably won’t notice.

Your data stack will run a bit faster. That’s nice.

That’s it.

Over hyped. 100%.

anwlamp · 2024-07-13T10:47:53 1720867673

Yes, good summary. My prediction is that free-threading will be the default at some point because one of the corporations that usurped Python-dev wants it.

The rest of us can live with arcane threading bugs and yet another split ecosystem. As I understand it, if a single C-extension opts for the GIL, the GIL will be enabled.

Of course the invitation to experiment is meaningless. CPython is run by corporations, many excellent developers have left and people will not have any influence on the outcome.

pansa2 · 2024-07-13T10:54:43 1720868083

> one of the corporations that usurped Python-dev

Man, that phrase perfectly encapsulates so much of Python’s evolution over the last ~10 years.

gomizari · 2024-07-14T10:11:13 1720951873

Just Python evolution?

Uptrenda · 2024-07-13T22:02:30 1720908150

Why would it make single threaded performance slower? Sorry, but that's kind of ridiculous. You're just making shit up at this point.

dragonwriter · 2024-07-15T00:11:32 1721002292

Removing the GIL requires operations that are protected by it when it exists to be made thread safe in ways which they don't need to be with it, which has some overhead. Even having multiple options from which the situationally correct one (e.g., a lighter-weight one in guaranteed single-threaded or GIL active cases) can automatically be selected has some overhead. (Conceptually, you could have separate implementations that can be selected by the programmer with zero runtime overhead where not needed, but that also has more conceptual/developer overhead, so other than particular things where the runtime cost is found to really bite in practice, that's probably not going to be a common approach for core libraries that might need to be called either way. There's no free lunch here.

QkdhagA · 2024-07-13T22:29:30 1720909770

What is "it"?

If you assume two completely separate implementations where there is an #ifdef every 10 lines and atomics and locking only occur with --disable-gil, there is no slowdown for the --enable-gil build.

I don't think that is entirely the case though!

If the --enable-gil build becomes the default in the future, then peer pressure and packaging discipline will force everyone to use it. Then you have the OBVIOUS slowdown of atomics and of locking the reference counting and in other places.

The advertised figures were around 20%, which would be offset by minor speedups in other areas. But if you compare against Python 3.8, for instance, the slowdowns are still there (i.e., not offset by anything). Further down on the second page of this discussion numbers of 30-40% have been measured by the submitter of this blog post.

Actual benchmarks of Python tend to be suppressed or downvoted, so they are not on the first page. The Java HotSpot VM had a similar policy that forbid benchmarks.

wokwokwok · 2024-07-15T02:29:55 1721010595

https://news.ycombinator.com/item?id=40949564

^ read. The OP responds in the thread.

tldr, literally what I said:

> It also makes everything slower (arguable where that ends up, currently significantly slower) overall.

longer version:

If there was no reason for it to be slower, it would not be slower.

...but, implementing this stuff is hard.

Doing a zero cost implementation is really hard.

It is slower.

Where it ends up eventually is still a 'hm... we'll see'.

To be fair, they didn't lead the article here with:

> Right now there is a significant single-threaded performance cost. Somewhere from 30-50%.

They should have, because now people have a misguided idea of what this wip release is... and that's not ideal; because if you install it, you'll find its slow as balls; and that's not really the message they were trying to put out with this release. This release was about being technically correct.

...but, it is slow as balls right now, and I'm not making that shit up. Try it yourself.

/shrug

quietbritishjim · 2024-07-13T17:09:09 1720890549

If you're worried about performance then much of your CPU time is probably spent in a C extension (e.g. numpy, scipy, opencv, etc.). Those all release the GIL so already allow parallelisation in multiple threads. That even includes many functions in the standard library (e.g. sqlite3, zip/unzip). I've used multiple threads in Python for many years and never needed to break into multiprocessing.

But, for sure, nogil will be good for those workloads written in pure Python (though I've personally never been affected by that).

Demiurge · 2024-07-13T04:42:39 1720845759

Massive overhead of multiprocessing? How have I not noticed this for tens of years?

I use coroutines and multiprocessing all the time, and saturate every core and all the IO, as needed. I use numpy, pandas, xarray, pytorch, etc.

How did this terrible GIL overhead completely went unnoticed?

viraptor · 2024-07-13T04:50:46 1720846246

> I use numpy, pandas, xarray, pytorch, etc.

That means your code is using python as glue and you do most of your work completely outside of cPython. That's why you don't see the impact - those libraries drop GIL when you use them, so there's much less overhead.

quietbritishjim · 2024-07-13T19:53:57 1720900437

The parent commenter said they're using the multiprocessing module, so it's irrelevant to them whether those modules drop the GIL (except for the fact that they are missing an opportunity to using threading instead). The overhead being referred to, whether significant or not, is that of spawning processes and doing IPC.

coldtea · 2024-07-14T00:59:21 1720918761

>using simple threads instead of dealing with the massive overhead and complexity and bugs of using something like multiprocessing

I've never heard threading described as "simple", even less so as simpler than multiprocessing.

Threads means synchronization issues, shared memory, locking, and other complexities.

quotemstr · 2024-07-13T03:38:52 1720841932

What about the pessimization of single-threaded workloads? I'm still not convinced a completely free-threaded Python is better overall than a multi-interpreter, separate-GIL model with explicit instead of implicit parallelism.

Everyone wants parallelism in Python. Removing the GIL isn't the only way to get it.

Galanwe · 2024-07-13T14:35:36 1720881336

> It's going to be amazing to saturate all the cores on a big machine using simple threads instead of dealing with the massive overhead and complexity and bugs of using something like multiprocessing.

I'm saturating 192cpu / 1.5TBram machines with no headache and straightforward multiprocessing. I really don't see what multithreading will bring more.

What are these massive overheads / complexity / bugs you're talking about ?

saurik · 2024-07-12T23:54:50 1720828490

FWIW, I think the concern though is/was that for most of us who aren't doing shared-data multiprocessing this is going to make Python even slower; maybe they figured out how to avoid that?

eigenvalue · 2024-07-13T02:45:32 1720838732

Pretty sure they offset any possible slowdowns by doing heroic optimizations in other parts of CPython. There was even some talk about keeping just those optimizations and leaving the GIL in place, but fortunately they went for the full GILectomy.

simonw · 2024-07-13T00:15:56 1720829756

I got this working on macOS and wrote up some notes on the installation process and a short script I wrote to demonstrate how it differs from non-free-threaded Python: https://til.simonwillison.net/python/trying-free-threaded-py...

vanous · 2024-07-13T05:01:29 1720846889

Thanks for the example and explanations Simon!

nine_k · 2024-07-12T20:57:40 1720817860

Python 3 progress so far:

  [x] Async.
  [x] Optional static typing.
  [x] Threading.
  [ ] JIT.
  [ ] Efficient dependency management.

janice1999 · 2024-07-12T21:02:20 1720818140

Not sure what this list means, there are successful languages without these feature. Also Python 3.13 [1] has an optional JIT [2], disabled by default.

[1] https://docs.python.org/3.13/whatsnew/3.13.html

[2] https://peps.python.org/pep-0744/

jolux · 2024-07-12T21:08:10 1720818490

The successful languages without efficient dependency management are painful to manage dependencies in, though. I think Python should be shooting for a better package management user experience than C++.

yosefk · 2024-07-12T21:13:37 1720818817

If Python's dependency management is better than anything, it's better than C++'s. Python has pip and venv. C++ has nothing (you could say less than nothing since you also have ample opportunity for inconsistent build due to mismatching #defines as well as using the wrong binaries for your .h files and nothing remotely like type-safe linkage to mitigate human error. It also has an infinite number of build systems where each system of makefiles or cmakefiles is its own build system with its own conventions and features). In fact python is the best dependency management system for C++ code when you can get binaries build from C++ via pip install...

wiseowise · 2024-07-12T22:01:37 1720821697

> If Python's dependency management is better than anything, it's better than C++'s.

That’s like the lowest possible bar to clear.

yosefk · 2024-07-13T08:00:24 1720857624

Agreed, but that was the bar set by the comment I was replying to, which claimed Python doesn't clear it.

vulnbludog · 2024-07-13T09:30:40 1720863040

On bro

andmkl · 2024-07-13T00:04:31 1720829071

C++ has apt-get etc. because the libraries do not change all the time. Also, of course there are vcpkg and conan.

Whenever you try to build something via pip, the build will invariably fail. The times that NumPy built from source from PyPI are long over. In fact, at least 50% of attempted package builds fail.

The alternative of binary wheels is flaky.

viraptor · 2024-07-13T05:11:54 1720847514

> C++ has apt-get

That's not a development dependency manager. System package management is a different kind of issue, even if there's a bit of overlap.

> because the libraries do not change all the time

That's not true in practice. Spend enough time with larger projects or do some software packaging and you'll learn that the pain is everywhere.

stavros · 2024-07-12T23:07:39 1720825659

That was the entire point, that C++ is the absolute worst.

dgfitz · 2024-07-13T01:02:58 1720832578

I pip3 installed something today. It didn’t work, at all.

I then yum installed a lib and headers, it worked well.

C++ on an msft platform is the worst. I can’t speak for Mac. C++ on a linux is quite pleasant. Feels like most of the comments like yours are biased for un-stated reasons.

viraptor · 2024-07-13T05:15:41 1720847741

This has nothing to do with languages. You can yum install python packages and expect them to work fine. You can install C++ files using an actual dependency manager like vcpkg or conan and have issues.

You're pointing out differences between software package management styles, not languages.

ansgri · 2024-07-14T12:31:01 1720960261

C++ on linux is indeed pleasant if you use only distro-provided library versions. Some specialized library with specific version, also no big deal. Need some upgraded version of a widely-used library -- get containerized or prepare for real pain.

stavros · 2024-07-13T01:05:51 1720832751

If I had a penny for every time I gave up on compiling C++ software because there's no way to know what dependencies it needs, I'd be a millionaire. Python at least lists them.

ahartmetz · 2024-07-13T08:40:42 1720860042

Is that because the compiler failed with "foo.h not found" or the build system said "libfoo not found"? CMake is most common and it will tell you. Worst case it's difficult to derive the package name from the name in the diagnostic.

It's not great, but usually not a big deal neither IME. Typically a couple of minutes to e.g. find that required libSDL2 addon module or whatever, if there is that kind of problem at all.

stavros · 2024-07-13T09:53:49 1720864429

Yes it is, and it's usually such a big deal for me that I just don't use that software. I don't have time to go through a loop of "what's the file name? What package is it in? Install, repeat". This is by far the worst experience I've had with any language. Python has been a breeze in comparison.

dgfitz · 2024-07-13T13:07:59 1720876079

I’m not going to refute your points. If you’re going to wear rose-tinted glasses about all of the bad parts about python, that’s fine, I also like python.

stavros · 2024-07-13T13:08:54 1720876134

What's rose-tinted about "one of them downloads dependencies automatically, the other one doesn't"?

cozzyd · 2024-07-13T20:44:55 1720903495

I mean this is a documentation problem. It's pretty common for python to import something it doesn't say it depends on too, btw...

dgfitz · 2024-07-13T01:35:43 1720834543

If I had a penny every time I heard something like that on sites like this, I’d be a billionaire :)

npalli · 2024-07-14T14:52:11 1720968731

Have you tried vcpkg on msft (work on linux and mac too btw)? I found it to be much better than pip3 and venv nonsense.

yupyupyups · 2024-07-13T01:39:12 1720834752

Mac has the Brew project, which is sort of like apt-get or yum.

__MatrixMan__ · 2024-07-12T23:01:04 1720825264

Python's dependency management sucks because they're audacious enough to attempt packaging non-python dependencies. People always bring Maven up as a system that got it right, but Maven only does JVM things.

I think the real solution here is to just only use python dependency management for python things and to use something like nix for everything else.

adgjlsfhk1 · 2024-07-12T23:15:21 1720826121

Julia's package manager (for one) works great and can manage non Julia packages. the problem with python's system is that rejecting semver makes writing a package manager basically impossible since there is no way to automatically resolve packages.

woodruffw · 2024-07-13T00:57:36 1720832256

Could you clarify what you mean? pip and every other Python package installer is absolutely doing automatic package resolution, and the standard (PEP 440) dependency operators include a compatible version operator (~=) that's predicated on SemVer-style version behavior.

kaba0 · 2024-07-14T17:48:22 1720979302

There is no way to manage non-hosted dependencies, though, in a cross-platform way. Something attempting it is often worse than nothing, on a distro that has different assumptions — e.g. every package manager that downloads a dynamic executable will fail on NixOS, and gives no easy way to hook into how those get executed.

__MatrixMan__ · 2024-07-15T17:38:59 1721065139

I agree that attempting it is worse than nothing, because you now have expectations that may fail at awkward times. But they've gone an done it so here we are.

NixOS is a stark contrast to Python here. It makes things that can't be done deterministically difficult to do at all. Maybe this sounds extreme from the outside, but I'd rather be warned off from that dependency as soon as I attempt to use it, rather than years later when I get a user or contributor than can't make it work for some environmental reason I didn't forsee and now everything rests on finding some hacky way to make it work.

If Nix can be used to solve Python's packaging problems, participating packages will have to practice the same kind of avoidance (or put in the work to fix such hazards up front). I'm not sure if the wider python community is willing to do that, but as someone who writes a lot of python myself and wants it to not be painful down the line, I am.

pletnes · 2024-07-13T11:09:25 1720868965

This is what we used to have and it was much worse. Source: lived that life 10-15 y ago.

__MatrixMan__ · 2024-07-13T15:29:58 1720884598

15y ago I was using apt-get to manage my c++ dependencies with no way of keeping track of which dependency went with which project. It was indeed pretty awful.

Now when I cd into a project, direnv + nix notices the dependencies that that project needs and makes them available, whatever their language. When I cd into a different project, I get an entirely different set of dependencies. There's pretty much nothing installed with system scope. Just a shell and an editor.

Both of these are language agnostic, but the level of encapsulation is quite different and one is much better than that other. (There are still plenty of problems, but they can be fixed with a commit instead of a change of habit.)

The idea that every language needs a different package manager and that each of those needs to package everything that might my useful when called from that language whether or not it is written in that language... It just doesn't scale.

pletnes · 2024-07-14T16:49:40 1720975780

Valid points. I was also thinking about apt install-ing some C library, try to pip install a package, have it fail to build, look for headers/debug packages in apt, set LD_LIBRARY… you know. It hurts, a lot.

And yet, python is a fantastic language because it’s the remote control to do the heavy, complex, low-level, high-performance stuff with relative ease.

__MatrixMan__ · 2024-07-14T17:42:38 1720978958

I agree. I want to help save python from its packaging problems because it has so much traction out there (for good reason). People in this forum might be ok jumping ship to rust or whatever but most python users are students, scientists, business types... They're not going to learn rust and I don't think we benefit from leaving them behind.

I wish there was a standard interface that tools like pip could use to express their non-python needs such that some other tool can meet those needs and then hand the baton back to pip once they are met. Poetry2nix is an example of such a collaboration. (I'm not trying to be a nix maximalist here, it's just that it's familiar).

The python community is large enough to attempt to go it alone, but many other language communities are not. I think we'd see faster language evolution if we asked less of them from a packaging perspective:

> focus on making the language great. Provide a way to package deps in that language, and use _____ to ask for everything else.

Apt and brew and snap and whatever else (docker? K8s?) could be taught to handle such requests in a non alter-your-whole-system kind of way.

Galanwe · 2024-07-12T21:27:45 1720819665

Not sure this is still a valid critic of Python in 2024.

Between pip, poetry and pyproject.toml, things are now quite good IMHO.

arp242 · 2024-07-12T22:15:49 1720822549

I guess that depends from your perspective. I'm not a Python developer, but like many people I do want to run Python programs from time to time.

I don't really know Rust, or Cargo, but I never have trouble building any Rust program: "cargo build [--release]" is all I need to know. Easy. Even many C programs are actually quite easy: "./configure", "make", and optionally "make install". "./configure" has a nice "--help". There is a lot to be said about the ugly generated autotools soup, but the UX for people just wanting to build/run it without in-depth knowledge of the system is actually quite decent. cmake is a regression here.

With Python, "pip install" gives me an entire screen full of errors about venv and "externally managed" and whatnot. I don't care. I just want to run it. I don't want a bunch of venvs, I just want to install or run the damn program. I've taken to just use "pip install --break-system-packages", which installs to ~/.local. It works shrug.

Last time I wanted to just run a project with a few small modifications I had a hard time. I ended up just editing ~/.local/lib/python/[...] Again, it worked so whatever.

All of this is really where Python and some other languages/build systems fail. Many people running this are not $language_x programmers or experts, and I don't want to read up on every system I come across. That's not a reasonable demand.

Any system that doesn't allow non-users of that language to use it in simple easy steps needs work. Python's system is one such system.

simonw · 2024-07-12T22:34:27 1720823667

"I don't want a bunch of venvs"

That's your problem right there.

Virtual environments are the Python ecosystem's solution to the problem of wanting to install different things on the same machine that have different conflicting requirements.

If you refuse to use virtual environments and you install more than one separate Python project you're going to run into conflicting requirements and it's going to suck.

Have you tried pipx? If you're just installing Python tools (and not hacking on them yourself) it's fantastic - it manages separate virtual environments for each of your installations without you having to think about them (or even know what a virtual environment is).

arp242 · 2024-07-12T22:57:35 1720825055

Managing a farm of virtualenvs and mucking about with my PATH doesn't address the user-installable problem at all. And it seems there's a new tool to try every few months that really will fix all problems this time.

And maybe if you're a Python developer working on the code every day that's all brilliant. But most people aren't Python developers, and I just want to try that "Show HN" project or whatnot.

Give me a single command I can run. Always. For any project. And that always works. If you don't have that then your build system needs work.

simonw · 2024-07-13T00:31:36 1720830696

"Give me a single command I can run. Always. For any project. And that always works."

    pipx install X

arp242 · 2024-07-13T07:35:14 1720856114

Right so; I'll try that next time. Thanks. I just go by the very prominent "pip install X" on every pypi page (as well as "pip install .." in many READMEs).

simonw · 2024-07-13T14:06:13 1720879573

Yeah, totally understand that - pipx is still pretty poorly known by people who are active in Python development!

A few of my READMEs start like this: https://github.com/simonw/paginate-json?tab=readme-ov-file#i...

    ## Installation

    pip install paginate-json

    Or use pipx (link to pipx site)

    pipx install paginate-json

But I checked and actually most them still don't even mention it. I'll be fixing that in the future.

arp242 · 2024-07-15T01:08:35 1721005715

Out of the 3 things I want to install 2 don't work. Both of these seem bugs in pipx so I reported one, but the feedback was borderline hostile and it ended up being closed with "unclear what you want". I'm not even going to bother reporting the other bug.

So whatever the goals are, it doesn't really work. And in general pipx does not strike me as a serious project.

pletnes · 2024-07-13T12:19:24 1720873164

Pipx is great! Although, I always seem to have to set up PATH, at least on windows?

zo1 · 2024-07-13T10:56:09 1720868169

I could say the exact same stuff about NodeJs, c++, go, rust, php, etc. All of these are easy to use and debug and "install easily" when you know them and use them regularly, and the opposite if you're new. Doubly-so if you personally don't like that language or have some personal pet peeve about it's choices.

Guys let's not pretend like this is somehow unique to python. It's only until about a few years ago that it was incredibly difficult to install and use npm on windows. Arguably the language ecosystem with the most cumulative hipster-dev hours thrown at it, and it still was a horrible "dev experience".

lordfrikk · 2024-07-16T08:10:08 1721117408

That does not match my experience. I've been working with Python for a year or so and the packaging problems come up every now and then still.

I've installed/built a few packages written in Go and Rust specifically and had no problems.

Ringz · 2024-07-13T05:27:28 1720848448

That single command is pipx.

imtringued · 2024-07-13T18:02:10 1720893730

Pythons venvs are a problem to the solution of solving the dependency problem. Consider the following: it is not possible to relocate venvs. In what universe does this make sense? Consider a C++ or Rust binary that would only run when it is placed in /home/simonw/.

guhidalg · 2024-07-12T23:00:11 1720825211

Normal users who just want to run some code shouldn't need to learn why they need a venv or any of its alternatives. Normal users just want to download a package and run some code without having to think about interfering with other packages. Many programming languages package managers give them that UX and you can't blame them for expecting that from Python. The added step of having to think about venvs with Python is not good. It is a non-trivial system that every single Python user is forced to learn, understand, and the continually remember every time they switch from one project to another.

nine_k · 2024-07-13T00:21:18 1720830078

This is correct. The whole application installation process, including the creation of a venv, installing stuff into it, and registering it with some on-PATH launcher should be one command.

BTW pyenv comes relatively close.

simonw · 2024-07-13T00:32:17 1720830737

I agree with that. Until we solve that larger problem, people need to learn to use virtual environments, or at least learn to install Python tools using pipx.

VagabundoP · 2024-07-13T19:32:51 1720899171

sudo apt install pipx

pipx install package_name

Takes care of the venv and the script/app path is added to system path.

kallapy · 2024-07-13T11:55:21 1720871721

I reject the virtual environments and have no issues. On an untrusted machine (see e.g. the recent token leak):

  /a/bin/python3 -m pip install foo
  /b/bin/python3 -m pip install bar

The whole venv thing is overblown but a fertile source for blogs and discussions. If C-extensions link to installed libraries in site-packages, of course they should use RPATH.

sgarland · 2024-07-12T23:31:47 1720827107

This is mostly a curse of Python’s popularity. The reason you can’t pip install with system Python is that it can break things, and when your system is relying on Python to run various tools, that can’t be allowed. No one (sane) is building OS-level scripts with Node.

The simplest answer, IMO, is to download the Python source code, build it, and then run make altinstall. It’ll install in parallel with system Python, and you can then alias the new executable path so you no longer have to think about it. Assuming you already have gcc’s tool chain installed, it takes roughly 10-15 minutes to build. Not a big deal.

VagabundoP · 2024-07-13T19:30:32 1720899032

Its more probable that you are trying to install the deps in the system python. And using pip instal xxxxx -u will install them in your user directory rather than the system. I'm pretty sure modern Ubuntu warns you against doing that now anyway.

If you're installing for a small script then doing python -m venv little_project in you home dir is straightforward, just active it after [1]

I'm using rye[2] now and its very similar to Rust's Cargo, it wraps a bunch of the standard toolchain and manages standalone python versions in the background, so doesn't fall into the trap of linux system python issues.

[1]https://docs.python.org/3/library/venv.html [2]https://rye.astral.sh/

Galanwe · 2024-07-13T05:21:01 1720848061

Maybe I am biased, because I learned these things so long ago and I don't realize that it's a pain to learn. But what exactly is so confusing about virtualenvs ?

They really not that different from any other packaging system like JS or Rust. The only difference is instead of relying on your current directory to find the the libraries / binaries (and thus requiring you to wrap binaries call with some wrapper to search in a specific path), they rely on you sourcing an `activate` script. That's really just it.

Create a Virtualenv:

    $ virtualenv myenv

Activate it, now it is added to your $PATH:

    $ . myenv/bin/activate

There really is nothing more in the normal case.

If you don't want to have to remember it, create a global Virtualenv somewhere, source it's activate in your .bashrc, and forget it ever existed.

imtringued · 2024-07-13T18:04:22 1720893862

Only python demands you to source an activation script before doing anything.

Galanwe · 2024-07-14T05:18:47 1720934327

Yes, though just to illustrate that it's a matter of taste, I do prefer the solution of virtualenvs requiring to source a script that append to PATH, rather than a solution requiring the use of a wrapper that executes in its PATH.

I never remember how to run Javascript binaries. Is it npm run ? npm run script ? npx ? I always end up running the links in node_modules/bin

vhcr · 2024-07-12T23:05:11 1720825511

Do you have a problem with Node.js too because it creates a node_modules folder, or is the problem that it is not handled automatically?

arp242 · 2024-07-12T23:16:46 1720826206

I don't care about the internals. I care about "just" being able to run it.

I find that most JS projects work fairly well: "npm install" maybe followed by "npm run build" or the like. This isn't enforced by npm and I don't think npm is perfect here, but practical speaking as a non-JS dev just wanting to run some JS projects: it works fairly well for almost all JS projects I've wanted to run in the last five years or so.

A "run_me.py" that would *Just Work™" is fine. I don't overly care what it does internally as long as it's not hugely slow or depends on anything other than "python". Ideally this should be consistent throughout the ecosystem.

To be honest I can't imagine shipping any project intended to be run by users and not have a simple, fool-proof, and low-effort way of running it by anyone of any skill level, which doesn't depend on any real knowledge of the language.

sgarland · 2024-07-12T23:35:09 1720827309

> To be honest I can't imagine shipping any project intended to be run by users and not have a simple, fool-proof, and low-effort way of running it by anyone of any skill level, which doesn't depend on any real knowledge of the language.

This is how we got GH Issues full of inane comments, and blogs from mediocre devs recommending things they know nothing about.

I see nothing wrong with not catering to the lowest common denominator.

arp242 · 2024-07-12T23:49:14 1720828154

Like people with actual lives to live and useful stuff to do that's not learning about and hand-holding a dozen different half-baked build systems.

But sure, keep up the cynical illusion that everyone is an idiot if that's what you need to go through life.

sgarland · 2024-07-13T00:02:27 1720828947

I didn’t say that everyone is an idiot. I implied that gate keeping is useful as a first pass against people who are unlikely to have the drive to keep going when they experience difficulty.

When I was a kid, docs were literally a book. If you asked for help and didn’t cite what you had already tried / read, you’d be told to RTFM.

Python has several problems. Its relative import system is deranged, packaging is a mess, and yes, on its face needing to run a parallel copy of the interpreter to pip install something is absurd. I still love it. It’s baked into every *nix distro, a REPL is a command away, and its syntax is intuitive.

I maintain that the relative ease of JS – and more powerfully, Node – has created a monstrous ecosystem of poorly written software, with its adherents jumping to the latest shiny every few months because this time, it’s different. And I _like_ JS (as a frontend language).

pletnes · 2024-07-13T11:10:53 1720869053

This is the truth right here. The issues are with people using (not officially) deprecated tools and workflows, plus various half baked scripts that solved some narrow use cases.

Arcanum-XIII · 2024-07-12T21:36:55 1720820215

All is well, then, one day, you have to update one library.

Some days later, in some woods or cave, people will hear your screams of rage and despair.

Galanwe · 2024-07-12T21:45:15 1720820715

Been using python for 15 years now, and these screams were never heard.

Dev/test with relaxed pip installs, freeze deployment dependencies with pip freeze/pip-tools/poetry/whateveryoulike, and what's the problem?

neeleshs · 2024-07-12T22:19:35 1720822775

same here. Been using python/pip for 10+ years and this was never a problem. In the java world, there is jar hell, but it was never a crippling issue, but a minor annoyance once a year or so.

In general, is dependency management such a massive problem it is made to be on HN? Maybe people here are doing far more complex/different things than I've done in the past 20 years

mixmastamyk · 2024-07-13T02:10:03 1720836603

Guessing that folks who write such things are lacking sysad skills like manipulating paths, etc.

It does take Python expertise to fix other issues on occasion but they are fixable. Why I think flags like ‘pip —break-system-packages’ are silly. It’s an optimization for non-users over experienced ones.

est · 2024-07-12T21:48:07 1720820887

Deps in CPython are more about .so/.dll problem, not much can be done since stuff happens outside python itself.

galdosdi · 2024-07-13T18:03:00 1720893780

The shitshow that is python tooling is one of the reasons I prefer java jobs to python jobs when I can help it. Java got this pretty right years and years and years earlier. Why are python and javascript continuing to horse around playing games?

whoiscroberts · 2024-07-13T02:27:15 1720837635

Optional static typing, not really. Those type hints are not used at runtime for performance. Type hint a var as a string then set it to an init, that code still gonna try to execute.

zarzavat · 2024-07-13T11:35:36 1720870536

> Those type hints are not used at runtime for performance.

This is not a requirement for a language to be statically typed. Static typing is about catching type errors before the code is run.

> Type hint a var as a string then set it to an int, that code still gonna try to execute.

But it will fail type checking, no?

mondrian · 2024-07-13T14:36:43 1720881403

The critique is that "static typing" is not really the right term to use, even if preceded by "optional". "Type hinting" or "gradual typing" maybe.

In static typing the types of variables don't change during execution.

zarzavat · 2024-07-13T15:15:22 1720883722

If there’s any checking of types before program runs then it’s static typing. Gradual typing is a form of static typing that allows you to apply static types to only part of the code.

I’m not sure what you mean by variables not changing types during execution in statically typed languages. In many statically typed languages variables don’t exist at runtime, they get mapped to registers or stack operations. Variables only exist at runtime in languages that have interpreters.

Aside from that, many statically typed languages have a way to declare dynamically typed variables, e.g. the dynamic keyword in C#. Or they have a way to declare a variable of a top type e.g. Object and then downcast.

neonsunset · 2024-07-13T15:46:13 1720885573

'dynamic' in C# is considered a design mistake and pretty much no codebase uses it.

On the other hand F# is much closer to the kind of gradual typing you are discussing.

mondrian · 2024-07-13T16:25:39 1720887939

Python is dynamically typed because it type-checks at runtime, regardless of annotations or what mypy said.

kaba0 · 2024-07-14T17:56:45 1720979805

Is java dynamically typed as well, then? It does reify generics, so only some part is type checked, what is the level at which it is statically/dynamically typed?

mondrian · 2024-07-20T12:18:58 1721477938

Yeah Java is partially dynamically typed. I think we can safely conclude that all practical languages are hybrids where some of the type checking is done by the compiler and some of the type checking is deferred to the runtime.

Then when we call a language statically typed we mean most of the type checking is usually done statically. Dynamic type checking is the exception, not the rule.

Even in dynamically typed languages like Python, some of the type checking may be done by an optimizer in the compilation stage. The runtime type check guarding some operations may be removed, because the optimizer decides it knows the types of the values involved once and for all.

kaba0 · 2024-07-14T17:54:19 1720979659

That’s not really “the definition”. Different type systems can express different properties about objects, and there are absolutely cases that something changes about a type.

E.g. in structural typing, adding a new field will change the type to a subtype. Will it make any structural typed language non-static?

VeejayRampay · 2024-07-12T21:45:10 1720820710

the efficient dependency management is coming, the good people of astral will take care of that with the uv-backed version of rye (initially created by Armin Ronacher with inspirations from Cargo), I'm really confident it'll be good like ruff and uv were good

noisy_boy · 2024-07-13T01:11:25 1720833085

rye's habit of insisting on creating a .venv per project is a deal-breaker. I don't want .venvs spread all over my projects eating into space (made worse by the ml/LLM related mega packages). It should atleast respect activated venvs.

nine_k · 2024-07-13T03:10:45 1720840245

A venv per project is a very sane way. Put them into the ignore file. Hopefully they also could live elsewhere in the tree.

VeejayRampay · 2024-07-13T11:10:42 1720869042

well that's good for you, but you're in the minority and rye will end up being a standard anyway, just like uv and ruff, because they're just so much better than the alternatives

fastasucan · 2024-07-16T09:06:39 1721120799

I think uv's use of a global cache means that having several .venv with the same packages is less of a problem.

GTP · 2024-07-12T21:14:13 1720818853

I don't get how this optional static typing works. I had a quick look at [1], and it begins with a note saying that Python's runtime doesn't enforce types, leaving the impression that you need to use third-party tools to do actual type checking. But then it continues just like Python does the check. Consider that I'm not a Python programmer, but the main reason I stay away from it is the lack of a proper type system. If this is going to change, I might reconsider it.

[1] https://docs.python.org/3/library/typing.html

sveiss · 2024-07-12T21:21:36 1720819296

The parser supports the type hint syntax, and the standard library provides various type hint related objects.

So you can do things like “from typing import Optional” to bring Optional into scope, and then annotate a function with -> Optional[int] to indicate it returns None or an int.

Unlike a system using special comments for type hints, the interpreter will complain if you make a typo in the word Optional or don’t bring it into scope.

But the interpreter doesn’t do anything else; if you actually return a string from that annotated function it won’t complain.

You need an external third party tool like MyPy or Pyre to consume the hint information and produce warnings.

In practice it’s quite usable, so long as you have CI enforcing the type system. You can gradually add types to an existing code base, and IDEs can use the hint information to support code navigation and error highlighting.

quotemstr · 2024-07-13T03:48:04 1720842484

> In practice it’s quite usable

It would be super helpful if the interpreter had a type-enforcing mode though. All the various external runtime enforcement packages leave something to be desired.

setopt · 2024-07-13T10:35:38 1720866938

I agree. There are usable third-party runtime type checkers though. I like Beartype, which lets you add a decorator @beartype above any function or method, and it’ll complain at runtime if arguments or return values violate the type hints.

I think runtime type checking is in some ways a better fit for a highly dynamic language like Python than static type checking, although both are useful.