It's a nice idea, but i never like writing what amounts to a DSL in strings in my code (yes, that applies to in-code SQL as well, although that's often unavoidable).
I agree, I don't like the magic string approach (even if it is mostly just dot-notation attribute lookup). However, there is some good stuff here, and nested data lookup when value existence is unknown is a pain point for me.
In addition to the string based lookup, it looks like there is an attempt at a pythonic approach:
from glom import T
spec = T['system']['planets'][-1].values()
glom(target, spec)
# ['jupiter', 69]
For me though, while I can understand what is going on, it doesn't feel pythonic.
Here's what I would love to see:
from glom import nested
nested(target)['system']['planets'][-1].values()
And I would love (perhaps debatably) for that to be effectively equivalent to:
nested(target).system.planets[-1].values()
Possible?
---
edit: Ignore the above idea. I thought about this a bit more and the issue that your T object solves is that in my version:
nested(target)['system']
the result is ambiguous. Is this the end of the path query and should return original non-defaulting dict, or the middle of the path query and should return a defaulting dict? Unknown.
Objects are overkill. Use an iterable of getitem lookups. They compose easily and don’t have much overhead (cpu or cognitive). From the library I use at work, inspired by clojure.core/get-in, it would look like this:
def get_in(obj, lookup, default=None):
""" Walk obj via __getitem__ for each lookup,
returning the final value of the lookup or default.
"""
tmp = obj
for l in lookup:
try:
tmp = tmp[l]
except (KeyError, IndexError, TypeError):
return default
return tmp
data = {“foo”: {“bar”: [“spam”, “eggs”]}}
# find eggs
get_in(data, [“foo”, “bar”, 1])
By using __getitem__ you naturally work with anything in the python ecosystem.
I prefer your nested() version. I don't see why your final example would be ambiguous either, if it always returns a defaulting dict which you always have to extract with .values() there is no ambiguity. This is similar to how Javas Optional<T> work. The problem could be that the type checker isn't good enough and writing code like if nested(target)['system'] == "lazer" could pass the type checker.
The other problem that T solves and nested doesn't is being able to reuse a spec. Once you have a spec, whether created with T or directly, you can call glom multiple times with the same spec, pass the spec to another function, tc.
Sorry to double reply, but yes the "T" spec is exactly what I would want. IMO it's yet another step better than using a "spec" made of untyped nested lists and dicts.
I'm fairly sure object() also returns an object which is unique and distinct from all other objects. The only difference as far as I can see is that make_sentinel returns an object that has a unique and distinct type from all other objects, but I don't see why you'd be checking the type of your sentinels in Python.
Hehe, boltons.typeutils.make_sentinel should really be documented better for more experienced developers. A few small advantages: a nice repr, pickleability, and (back to the first advantage really) good rendering in a Sphinx autodoc context. :)
I feel the same way, although my instinct is generally to build a custom generator. Only costs a couple lines but is plain old python and quite explicit
target = {'system': {'planets': [{'name': 'earth', 'moons': 1},
{'name': 'jupiter', 'moons': 69}]}}
glom(target, {'moon_count': ('system.planets', ['moons'], sum)})
# vs
def iter_moons(t):
for planet in target['system']['planets']:
yield planet['moons']
sum(iter_moons(target))
would have to combine with `defaultdict`s if your nested data is only sometimes there though
Why is writing DSLs strictly worse than writing complicated transformations built on top of the limited constructs provided by the language itself? (You said never)
I definitely wouldn't recommend it either. Sorry not to be clear.
In my experience, it's almost impossible to dissuade people from generating strings if the API affords it. We can't even stop people from generating SQL by concatenation!
Musing: does Python have a way (Mypy?) to declare that a method can accept only constant expressions?
This might have been unintentional, but I suspect "Spectre of Structure" and "Python's missing piece" refer to Nathan Marz's specter library for clojure [1], similarly touted as clojure's missing piece. I tend to agree in the case of specter, given the mind-boggling types of transformations that are easily (and simply) expressed in it (and often run faster than idiomatic clojure as well). Highly recommended if you ever need to work with deeply nested data structures.
What's really interesting is that we're approaching the same ideal state from different directions. Specter goes from Clojure's immutability to something more practical, from Python's super dynamic system to something more declarative and immutable.
This is really cool. Did you ever consider an API to do the reverse - to insert a value at a particular point in the data?
My interest stems from this issue[0] on the Ruby issue tracker to make a symmetrical method to Hash#dig (which does something similar to, but more limited than glom) called Hash#bury. The problem in the issue was that inserting a value at a given index in an array proved difficult and unnatural in Ruby, so I was wondering if there were other solutions out there.
Another question occurs to me - does glom only support string keys?
glom not only supports more than string keys, it also supports assigning to non-dictionary objects. That's a part of the API we're working on right now, actually.
As for the data insertion, mutation may be in the future, but for now glom only transforms and returns new objects. Definitely something to think about though, bookmarked! :)
This seems like lenses for python... neat! I often use python to mess around with things, and almost always miss Haskell's lenses when doing so. This seems like an interesting solution.
I haven't played with this yet, but it looks really handy. I deal with much more JSON on the command line than I'd like, so I think having both a single library and command line tool to reshape that data will make that much easier. I've used jq a few times, but when I want to move a little beyond what it does I usually end up writing a Python script. Hopefully this will make that transition smoother.
Haha, I'm all for console usage, but let me tell you, there's nothing quite like that feeling of moving a working spec into a dedicated application with exception handling, logging, etc. :)
I'm not really versed in the idioms/social mores of Python, so please take the following with a grain of salt:
This seems like it usefully solves a problem, but the invocation pattern is suspect to me -- Instead of "glom" taking the target for picking-apart plus a magic little bit of DSL, what if "glom" took a single parameter, the aforementioned DSL, and returned a function that would perform the corresponding search when called on a target? Even if Python or this package optimises away repeatedly searching (by the same spec|in the same manner), the convention the package prescribes is odd to me, right after the first few paragraphs of intro.
The big, classical school of Python definitely prefers top-level functions. Still, I definitely understand that aesthetic, and am on board with not using functools.partial to achieve it. So: https://github.com/mahmoud/glom/issues/14 :)
Similarly, statistical distributions in SciPy can be used in "frozen" form (pre-parameterized) or in a more general form where you supply the parameters at the same time you are requesting some attribute it the distribution. Seems to me to be a situation where one is useful if you expect reuse, and the other is useful if you don't.
It seems to me like the advantage to focus on here is the improved error / `None` handling, which will speed debugging and make handling expected edge cases easier. I've seen a lot of inexperienced developers tripped up entirely by this kind of data access, and seen plenty of experienced developers waste time debugging it because of the exact error cases the announcement references.
The `T` object, which the article describes as its most powerful, can be a useful pattern in some situations, but it's worth pointing out it isn't new or unique to this project.
The author says in another thread here that he first started working on the "stuff leading up to glom" in 2013. One older example, which is virtually identical though less complete, is this Stack Overflow answer I posted in 2012: https://stackoverflow.com/a/9920723/500584
I'd seen the general pattern even before that post, if not the Pythonic syntax. I don't think that it's much of an improvement over defining a `lambda`, so again I would say the thing to focus on is the improved debugability and the simpler, dot-notation-as-generic-attribute-or-item-accessor syntax. I think `T` is largely a distraction, or should be reserved for advanced users.
I would like to see the author debugging an application with 10 levels of object wrapping that had one of the middle object’s name misspelled.
Libraries like these shine only if they have brilliant tracing and debugging capabilities; otherwise are too easy to reduce to literally a single function.
affordances to add tracing prints, or drop into a pdb at any level
The Inspect specifier type provides a way to get visibility into glom’s evaluation of a specification, enabling debugging of those tricky problems that may arise with unexpected data.
Inspect can be inserted into an existing spec in one of two ways. First, as a wrapper around the spec in question, or second, as an argument-less placeholder wherever a spec could be.
Inspect supports several modes, controlled by keyword arguments. Its default, no-argument mode, simply echos the state of the glom at the point where it appears:
It looks quite similar in spirit to Clojure's Specter library (https://github.com/nathanmarz/specter), and even seems to have a nod to it (The Spectre of Structure).
Striking a balance between ease of use / simplicity and powerful features is a tough exercise but you did well.
I can foresee the CLI being quite useful to do away with the run-of-the-mill sed / awk / grep [...] mess. Specifically for the less CLI inclined people out there.
Can it be used bidirectionally, without having to repeat the work?
I have a need to transform between pairs of structures, in both directions, and ever since I found JsonGrammar (https://github.com/MedeaMelana/JsonGrammar2) I've been pining for a Python version.
It depends on the complexity of the spec, but we've already done some programmatic building of glomspecs, so for many cases I think the answer is yes! Once we feel out the patterns I think glom will gain some utilities for this purpose.
It seems like a subset of glom specs would be uniquely invertable. For example, the spec `{'c': 'a.b'}` could trivially invert to `{'a.b': 'c'}`. I'm not sure how you'd invert more complex specs which make function calls, e.g. sum or len.
I had a quick look, but I didn't see filtering expressions, only shaping expressions. It seems like glom is more of a result shaper/mapper. Can you filter with glom (maybe with lambdas or something)? I could see the two going together quite well if you were "glomming" a big Python object.
There is already a well established Gnome project with the same name: http://www.glom.org It is a GTK+ front-end to PostgreSQL, similar to Microsoft Access.
The CLI is in a pretty preliminary state, usable but not as robust as it will be in a few weeks. It only supports built-in parsers (JSON and Python literals) What formats are you thinking? YAML?
The writing style is just insufferable. Even the API documentation is littered with hyperbole and self-congratulation. We get it, you're proud of your work and extremely proud of yourself.
> "as simple and powerful as glom"
> "big things come in small packages"
> "small API with big functionality"
> "power is only surpassed by its intuitiveness"
> "simplicity is only surpassed by its utility"
> "shortest-named feature may be its most powerful"
For heaven's sake, give it a rest!
It's a big red flag about your priorities that when I go looking for a precise specification, I can't find answers to simple questions and instead end up wading through incessant marketing phrases. I tried, and I finally gave up halfway through the API doc. It might even be the case that glom is a good idea—but you're making it really hard to trust you as a source of objective information about it.
Show, don't tell. My advice to you: you'll generate more interest if you delete every congratulatory word on those pages and focus entirely on helping your readers understand what glom does instead of trying to sell it to them.
How I wish one could publish a dry document and expect people to read all the way to the bottom. I've published enough libraries to know that's not the case. glom's free software so it's all there, as "shown" as can be.
Thank you for glom, and don't let those comments get you down.
Personally speaking – my heuristic is that maintainers who put effort into marketing copy (even if it's awkwardly exaggerated) are the kind of people who really want their users to enjoy the project, and that often predicts a low-friction experience. Keep doing what you're doing!
Dude, this looks really nice! I use nested stuff like that all the time - nested layers of flat data ;) - and glom (+T) do seem super appropriate and nice!
I was frustrated. I apologize. I do still believe the frequency and intensity of hyperbolic language is a real obstacle to understanding and appreciating your project, and I hope you take that feedback to heart. But to call you "extremely proud of yourself" was unnecessarily personal, and I'm sorry I said that.
You're yellin at a tutorial man. You gotta let some flavor text slide. :)
That said, I'm no liar. Kurt and I (as a team), really did write stuff leading up to glom in 2013 (years ago), and have written stuff like it enough times that I've lost count (countless :P). If this isn't research, I don't know what is. Heck, I'm even getting a fun little peer review!
You've created something useful and it will save a lot of programmers a ton of time.
I didn't find your writing insufferable. I've written tongue in cheek (or over the top) posts about my projects in the past. If they can't see the humor and the usefulness of the project, their loss.
Thank you for creating Glom and thank you for posting it on HN. Count me in as one of your users.
don't sweat it dude. I've noticed a lot of people on hn are crabby assholes for absolutely no good reason. like on the post linking to Google's codelabs (where there are hundreds of tuts about all sorts of things in the Google ecosystem) there were only two comments and they were complaints. and recall that every time an electron app is posted almost every comment is whining about the performance. and every time a rust article is posted there's whining about how it's more complicated than js. and every time there's a js article posted there's whining about how it's not type safe like rust. and every time someone posts a personal page someone has to point out how it's "garbage on mobile" as if they're doing people a favor pointing out flaws (as if they don't understand that mobile is the most heterogeneous platform out there). I swear people don't know how to be grateful for free shit or just keep their mouths shut when something doesn't tickle their own particular fancy. I wager it's a defense mechanism because they themselves aren't making anything and so they need assert their superiority in some way (because people that are busy doing stuff don't have time to complain about things irrelevant to their own work). kudos to you and Kurt for releasing a library to the community that's different and interesting and fuck the haters.
you know what's actually insufferable? taking pot shots at someone giving you something for free. either say thank you or move on. it's like yelling at your mom for making you breakfast in the morning: downright unseemly.
if the condescending tone of the top comment is ignored it becomes valuable feedback. good libraries/api's (free or otherwise) do not need to use marketing buzzwords to sell themselves when a clear demonstration of the functionality is usually more than enough.
see the python requests library documentation for a good example
Did you seriously just recommend Requests, a project which uses "Non-GMO", "organic", and "grass-fed" to describe itself on the very page you linked, as a good example of not using buzzwords?
Come on. kreitz holds the title of Python marketspeak tycoon for a reason. :P
>if the condescending tone of the top comment is ignored it becomes valuable feedback. good libraries/api's (free or otherwise) do not need to use marketing buzzwords to sell themselves when a clear demonstration of the functionality is usually more than enough.
this fallacy is called affirming the consequent. yes good libraries might not need marketing but that does not say anything about whether good libraries can have marketing.
its anyone's best guess whether apis/libraries with documentation that have buzzword-y marketing get more usage than those that entirely market themselves based on functionality.
Think of buzzwords as familiar faces for the readers. Sure, you can overdo it and make a buzzword soup, but a few buzzword can give the reader a quick idea of the product.
I think the more practical piece of advice is for people like zestyping to constructively offer their (valid) perspective on writing style without personalizing the criticism. I recognize however that offering such advice may be a fruitless endeavor depending on the person (like expecting a leopard to change its spots). Source: zestyping needlessly insulted me in front of colleagues over 10 years ago and it still stings a bit :-D
Publicly deriding me as having an irreparable character flaw, based on something I said over ten years ago, which I can't possibly defend or apologize for because I have no idea who you are or what you're referring to—doesn't that seem a little low, though?
It sounds like this is still bothering you after all this time. Please consider reaching out to me (my e-mail address is my HN username at gmail); I'd be glad if we could sort this out in a private conversation. I can't promise that I'll take back what I said without knowing what it was, but I will do my best to understand what you experienced and why it was upsetting to you.
I prefer the `get_in()` method from Toolz: http://toolz.readthedocs.io/en/latest/api.html#toolz.dicttoo...