Async in Python, for a long time, has been a horrible hack relying on monkey pat...

PaulHoule · 2023-09-19T18:39:01.000000Z

I worked at a place where we had machine learning systems with a big pile of dependencies that pip could not consistently resolve, I figured out what most of the technical problems where but I was still struggling with wetware problems and they eventually put me on a Scala/Typescript project instead.

One big problem is that pip just starts downloading and installing things optimistically, it does not get a global view of the dependencies and if it finds a conflict it can't reliably back out from where it is and find a good configuration. The answer is to do what maven does or what conda does and download the dependency graph of all the matching versions and get a solve before before you start downloading. Towards the end of my time on that project I had built something that assembled a "wheelhouse" of wheels necessary to run my system and would install them directly.

What I figured out was that you could download just the dependencies from a wheel with 2 or 3 range requests because a wheel is just a ZIP file and you can download the header and the directory from the end of the file and then know where the metadata is and download just that. Recently pypi got some sense and now they let you download just the metadata.

And that's the story of Python packaging. Things are really going in the right direction but progress has been slow because the community has mistaken "98% correct" (e.g. wrong) with "has 98% of the features somebody might want" It might have been a lot better if somebody with some vision and no tolerance for ambiguity had gotten in charge a long time ago.

ptx · 2023-09-19T20:27:14.000000Z

A new dependency resolver was introduced [0] with pip 20.3 (in 2020) which sounds like it's meant to address the problem you're describing. Were you using an earlier version of pip or is this still a problem?

[0] https://pip.pypa.io/en/latest/user_guide/#changes-to-the-pip...

paulddraper · 2023-09-19T17:31:40.000000Z

> In Java, 99% of all library dependencies are pure JARs

Yes, this is the difference...Python community has in practice chosen more native dependencies, Java has not. But Java JNI code (if you ever do have it) is just as painful.

> with Postgres being particularly painful

You want to a pure Python package, and those have gotten much better.

asyncpg is really, really good if you want async.

Otherwise, pg8000.

sizeofchar · 2023-09-19T17:33:07.000000Z

It got better, with containers.

PaulHoule · 2023-09-19T18:28:03.000000Z

Where I worked containers just gave the data scientists superpowers at finding corrupted Python runtimes. I don't know where they got one that had a Hungarian default charset, but they did.