Awesome Python

danpalmer · on Feb 17, 2019

There's lots of great stuff in the Python community, lots of very mature, high quality packages.

As with all ecosystems, there's also rubbish, and there's certainly some rubbish on this list. It would be wrong to name packages, but it makes me question "Awesome X" lists, their intentions, the skill in curation behind them, and their usefulness to newcomers.

I would personally not use inclusion on an "Awesome X" list as a signal of quality. The lists have assumed a purely discovery based role for me, which is a shame, because the idea of a curated set of packages or tools for newcomers to an ecosystem is a great one.

bjourne · on Feb 17, 2019

Why would it be wrong for you to name packages? The maintainers of the list surely would appreciate objective criticism. Fwiw, Ive used many of the libs on the list and I wouldn't call any of them rubbish.

deathanatos · on Feb 18, 2019

So, some of this is subjective, but most of it just comes from repeated bad experiences with the libraries. Feel free to take it with a grain of salt, and I'm feeling a bit Sturgeon-y today.

Honestly, I think a lot of the complaints I have below are not so much against the libraries: I use a lot of the stuff below, often because it is the best available. But I think we too often kid ourselves about the states of these libraries, and ignore the ways in which they don't support us — we can "easily work around them", to the point where we might do so unconsciously, and we stop aspiring for better.

There are some … odd? … selections, e.g., ctypes, curses, logging, venv, pathlib, mimetypes, unittest, unittest.mock — these are just modules in the standard library. I suppose they serve a purpose, and perhaps linking to them by broad topic area allows someone unfamiliar with them a hook back into their own stdlib, but that seems like a Google query could do that too.

pipenv: This, someday, will be great, I hope. But recently releases have been essentially completely broken (which has reduced my trust in it), it doesn't completely implement its own spec for Pipfiles, and it still uses virtualenv when the newer venv would be more appropriate. (I also wish they'd adopted something like Cargo's syntax for version numbers. SemVer is a PITA in it, as you must manually write out the range.) Last, it sometimes fails to find packages matching the requirements in the Pipfile, even when solutions exist.

Pandas: I really think Pandas results in hard to maintain software. The library itself is a maintenance headache: it links to one and only one version of numpy, and will error if it detects a change there. We use a sort of internal PyPI index s.t. we can share binaries, and this causes whoever builds Pandas to also choose what version of Numpy is associated with that version of Pandas for all consumers. Panda's main datatype "Dataframe" is about the most obfuscated type name I have ever hit — what is a dataframe? They share a lot of the same issues as lists-of-dicts: it allows the programmer to not have to define the structure of the data they're working with, which leads to subtly different/shifting/implicit types as data flows through a program, and makes it consequently harder to reason about due to the lack of ontology/naming this creates.

gevent: gevent made some sense in Python 2, before there was a standard async framework. But stuff written in it does not compose: you cannot compose two separate pieces of code that independently work w/ gevent: the combination of monkey patching and cooperative threading means all it takes is one piece of code to block and the whole program can grind to a halt. E.g., we recently hit a bug whereby calling sh (a library that launches and waits for a child process) in one thread caused the entire program to arrest. The "problem" is that sh makes two threads: one calls waitpid, and one interfaces w/ the child's stdin/stdout/stderr. The waitpid call is blocking even in gevent, but what was a separate background thread is now a "greenlet"; the waitpid call won't return until the child gets data on stdin and then runs, but the thread that supplies it is blocked by the waitpid: deadlock. It's not a sh bug (sh doesn't have any say over gevent) but gevent doesn't think it's a bug (waitpid is blocking, not our problem). The end result is that introducing gevent requires continually global reasoning about your program, and that is hard to do correctly.

SaltStack/Ansible/Fabric: Salt has issues around not returning results for commands run (it will time out on nodes, but not really report that). We also had issues w/ Salt falling over in large ish environments, b/c large batches of VMs would overwhelm it w/ traffic, disconnect, and then start attempting to reconnect and form a thundering herd. Like many tools in its league (so, it is not alone here), it believe I am running a set of command across a set of homogeneous instances, and that's that. Unfortunately, my needs are more complex, and sometimes VMs have interdependencies that I cannot express in these systems. (E.g., action A on node N1 depends on action B on node N2. Salt really only allows dependencies on actions within a node. Ansible is similar here too. (Salt has some high level support called "orchestration" IIRC to address this, but the model it uses is fundamentally wrong: I want to just build a dependency graph of action that might run against different sets of nodes, or even cross nodes. Even in homogeneous nodes, sometimes I need to elect a master, e.g., due to initializing a Raft peer-set, and that means the rest of the nodes need to wait on that to happen and then join that new peer-set.) Also, YAML is a poor language for writing what really is a program in. I'd much rather have a real language like Python here. (Unfortunately, especially around how these programs model the problem, I feel like the whole space is mired in one broken set of thinking: that my machines are homogeneous and that I am going to run the same set of commands in parallel on them. It isn't.)

sh: I see this get used way too often, when a `subprocess.run` would be far more appropriate. It has some questionable defaults, like TTY allocation.

boto3: It doesn't really … library. The library basically ends up returning to you what appears to be the raw JSON sent back by the server. It doesn't really wrap the AWS APIs in a way meaningful for Python. (That said, it's the library you use to talk to AWS, unfortunately.)

awscli: Defies a lot of basic command line conventions in ways that are annoying. Is it "help command", "command help" or "--help". Required options bug me, too.

uwsgi: This thing's logs talk about vassals and serfs, the emperor and seppuku, and no, I am not making that up; sorry, this has no place in a production system — logs should be straight, boring, and to the point. It implements a custom binary version of HTTP, which makes introspecting systems written with it hard, as you cannot just curl them. (There is a side util that someone wrote called uwsgi_curl, but last I saw it had issues with Unix sockets, which is basically always the case.) It tries to be supervising daemon and HTTP server all in one, and does too much instead of doing one thing well.

fuckit: I mean, I don't think there is anything wrong with the library. I think it's more a thing of "if you have to use it…".

bjourne · on Feb 18, 2019

Thanks, I appreciate your well-argued comment! I agree with all your points though I wouldn't use the word rubbish. Perhaps I just misunderstood the GP or is more "forgiving" to quirky software.

j88439h84 · on Feb 18, 2019

Take a look at http://trio.rtfd.org as a replacement for gevent.

dharmab · on Feb 17, 2019

In my experience, Awesome lists are not often maintained and become out of date within months.

sramsay · on Feb 17, 2019

Which suggests that what we really need is a curated Awesome list of Awesome lists.

indigo945 · on Feb 17, 2019

They exist. The easiest way to discover them is through a curated awesome list of awesome lists of awesome lists [0].

[0]: https://github.com/jonatasbaldin/awesome-awesome-awesome

smittywerben · on Feb 17, 2019

Agreed, but at least Python has a canonical 'Awesome X' list. There needs to be a way to take a package index and have the community categorize and review it.

To have a lifetime of more than a few months, it would need to connect the Pypi and Github API . Even better would be cross-language "ORM-> Django (Python), gorm (Go), Diesel (Rust)" with a way that lets each community hook their package APIs together, categorize the packages, and sort by stars or reviews. They all look similar, but it would need a lot of open source.

jdc · on Feb 17, 2019

Send a PR imo

js2 · on Feb 17, 2019

I'm not sure I'd call this list curated. It would be nice if it were a bit more opinionated. Where there are multiple libraries in a category it provides no guidance on how to choose one over another. Also, in some of the categories, clear winners exist today. For example in the testing category, just use pytest, and run it with tox.

I was going to submit a PR, but the repo has hundreds of PRs and 60 issues already open.

So yeah, it's more a Python smörgåsbord than a curated list. In the end, I'm not sure it's much better than using a search engine.

an4rchy · on Feb 17, 2019

I think it's still a great starting point, especially, for someone who has no place to start when looking for specific libraries. It would be good to annotate with perhaps a Pros/Cons for each item in that list (pull requests, maybe?). So that you can choose whatever fits your needs.

jemurray · on Feb 17, 2019

‘pip’ needs a download counter to feed stats necessary to generate this list automatically.

deathanatos · on Feb 18, 2019

pip has a download counter.

I don't think downloads is a direct measure of good/"awesomeness"; low-level libraries that get linked in often, for example, get a higher download count for that, but might not be something often needed by an end user. (In Rust, I see the library for Aho-Corasick all the time, for example.)

In Python, I believe pyasn1 is in the top 10; but parsing raw ASN.1 is probably not something a dev should be doing that often; a higher-level library like cryptography would probably be more appropriate.

Myrmornis · on Feb 17, 2019

Looks useful but isn't 62k stars a lot?

Myrmornis · on Feb 17, 2019

I guess I've been missing out on a trend to replace search engines with curated lists on github. This meta list is the 6th most starred repo on github with 102k https://github.com/sindresorhus/awesome

thelastbender12 · on Feb 18, 2019

I use Github stars to bookmark a repo I might need later. 62k is not a very big number that way I'd think.

Myrmornis · on Feb 19, 2019

I think the scale on which to measure the number of github stars is what quantile it lies at in the distribution of github stars across projects. I'm not sure what quantile exactly 62k is, but it's very high.