Hacker News new | past | comments | ask | show | jobs | submit login
Boltons: A set of BSD-licensed, pure-Python utilities (github.com/mahmoud)
312 points by pmoriarty on Oct 9, 2018 | hide | past | favorite | 30 comments



Library author and lazy README updater here to report that technically as of October 09, 2018, boltons is 79 types and 146 functions, for a grand total of 225 utilities.

While I'm here, here are the ones I use most:

- OrderedMultiDict: https://boltons.readthedocs.io/en/latest/dictutils.html#bolt... - esp the .inverted() and .sorted() methods

- Exponential backoff and jitter generator: https://boltons.readthedocs.io/en/latest/iterutils.html#bolt...

- remap (recursive map): https://boltons.readthedocs.io/en/latest/iterutils.html#nest... (recipes here: https://sedimental.org/remap.html)

- Atomic File saving: https://boltons.readthedocs.io/en/latest/fileutils.html#bolt...

- Traceback utilties, structured tracebacks: https://boltons.readthedocs.io/en/latest/tbutils.html

Also, my favorite bolton that's not a bolton, glom: https://glom.readthedocs.io/en/latest/

If you have a bolton you'd like to submit, I'm pretty merge-happy. Here are the criteria: https://boltons.readthedocs.io/en/latest/architecture.html


Why not contribute these upstream into the standard library?


Because most of these utilities were created to build things, and contributing to the stdlib has never been on one of those critical paths?

But really, @marmaduke's got it right. The standard library has a MUCH higher barrier to entry than PyPI. Have you seen python-ideas [1]? I truly do not have time to have it out with some of the more vocal elements of that group.

But, down the list:

- OMD might be a good fit for the collections module, I've chatted about it a bit with Raymond Hettinger on and off. He wasn't totally against it, so that's something!

- Exponential backoff is too opinionated for itertools, and generally a lot more high-level and conceptually modern than the rest of Python's built-in networking facilities.

- remap: I really like it, even though I'm typically not a big fan of functional programming in Python. But the Python devs are even less appreciative than I, with GvR disliking lambdas, and Py3 dropping reduce() from the builtins. Plus after years of watching Python core dev, it doesn't seem like the people with the time to work on core Python have much time left over to work with IRL complex APIs and other sources of dynamic nested data?

- Atomic file saving probably should go in the stdlib, but probably not relying on ctypes (on Windows) and I'm not going to pull out my old Windows laptop again just to start writing C.

- Traceback utilities: Probably the one I wish I had time to push for most. traceback's string-only approach feels really dated in the structured-logging age. I'd expect a lot of FUD around messing with error handling.

Maybe now that boltons is pretty popular it's worth giving it a go just to see what lies beyond the typical listserv response: "put it on PyPI and if it's popular, we'll see" :)

[1]: https://mail.python.org/pipermail/python-ideas/


Ah,ok!


In addition to the other reply, there's often a reluctance -- on the part of both the Python core team and the developers of popular third-party code -- to add things to the standard library, since doing so ties you to Python's development process and release cycle. This is one reason why pip is not in the standard library, for example (instead, Python ships a module which will go get pip for you); it needs the ability to develop and release at its own pace and on its own terms.


The standard library has a higher barrier to entry than PyPI.


What sorts of things do you typically use OMD for?


Toolz is one of my favorite python libraries. I use it all the time. It provides even more of this useful functionality. It also has a sped-up version called Cytoolz written with Cython.

https://toolz.readthedocs.io

https://github.com/pytoolz/cytoolz


Toolz is really a game changer since it allows you to write elegant functional code (lazy evaluations, currying, pipes, parallelism) in Python which feels native.

As python has the "batteries included" philosophy, such useful gems should be included into the standard library...


I will kind of disagree.

While having stuff in stdlib can be convenient, it all tends to be where active development goes to die. Compare httplib with something like Requests.


The comparison would be with urllib.request:

https://docs.python.org/3/library/urllib.request.html

Which right there recommends Requests, but for simple tasks it's kind of unfortunate that people bring in a library dependency.


Can anyone comment on the efficiency or lack thereof of toolz?


It uses `itertools` internally very heavily, which means if you generally are using generators and such, you maintain those benefits, as well as get to work with infinite sequences, etc.

If you do find a `toolz` function to be too slow, there's always the `cytoolz` versions.


cytoolz’s frequencies was around 10x as fast as using a vanilla dict and counting elements manually, which was in turn twice as fast as collections.Counter when I measured them a few years ago.

It’s solid work.


How does cytoolz performance compare with toolz under PyPy?


I'm a big fan of @mhashemi's work in Python in general, even if I don't use Boltons personally.

The discussion regarding inclusion in the stdlib is sensible, and I fully agree with @mhashemi's comments. See similar discussions, for instance, regarding dataclasses vs. attrs (https://github.com/ericvsmith/dataclasses/issues/19).

Sometimes, however, it's a bit annoying when something that you think should be widely available isn't included in the stdlib. The cons of course are:

- Fragmentation (several project pursuing similar objectives)

- Not being able to use a powerful construct to simplify code in the stdlib itself (the stdlib can only depend on the stdlib).

Regarding, for instance, functional programming, I maintain a list of interesting projects in this domain in Python: https://github.com/sfermigier/awesome-functional-python/blob...

There are at least half a dozen libraries that strive to achieve similar goals, and would probably benefit from being included in the stdlib (at least partially). It's hard to pick a winner among the list (like other in this thread, I went with toolz).

I believe that a way to solve (at least partially) this conundrum could be to work on some common specifications, like the JS community did with Fantasyland (https://github.com/fantasyland/fantasy-land), and then let projects implements the specification the way they want.

WDYT ?


Some of these are really fantastic. Particularly the atomic file saving and cache utils, but I can see myself using a lot of these handy util modules.

A lot of this stuff is great to implement and test yourself in large codebases, but when you're prototyping or writing a quick notebook, this stuff ranges to very handy to very powerful.



Gob's program!


> Chunked and windowed iteration, in iterutils

I need this. Makes so many functional expressions so simple.


See also more-itertools for lots of other things in this vein.


Could you give some examples?


I wrote some similar code one time that generated SQL which was manually reviewed. It generated a massive WHERE IN clause. Oracle only accepts 1000 values. So I fed it a list of thousands of values, and it chunked them into rows of N objects, and into larger chunks of 1000. Having some code to do that automatically would be quite nice.


Why not insert the values in WHERE IN to a temporary helper table and join on that? It seems somewhat easier and might perform better.


Oracle has a limit on the number of parameters in a query, not just in the `WHERE IN` clause. So you'd need to insert them into the helper table in batches of 1000.


Good point, but isn’t that done automatically by whatever tool your using?

And if the table your selecting from is big enough I would expect that this setup would be faster but it’s hard to say. It would also more easily guarantee that your results will be consistent.


Two that I run into all the time: 1. pairwise iteration over a (cylic) list of points representing the ordered vertices of a polygon will give you a sequence of edges

2. np.diff, takes differences between consecutive pairs. Sometimes you have a different kind of "difference" operation, though. The pairwise iteration helps you express that.


While convenient, some of the utils can be slowish due to input validation which might not be strictly necessary, etc. - check the implementation if using in a performance critical task.

Apart from that, very useful lib, highly recommended.


Be careful when googling ...


First thought was a mental picture of multiple Michael Boltons.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: