Hacker News new | past | comments | ask | show | jobs | submit login
Glom – Restructured Data for Python (sedimental.org)
313 points by mhashemi on May 9, 2018 | hide | past | favorite | 97 comments



It's a nice idea, but i never like writing what amounts to a DSL in strings in my code (yes, that applies to in-code SQL as well, although that's often unavoidable).

I prefer the `get_in()` method from Toolz: http://toolz.readthedocs.io/en/latest/api.html#toolz.dicttoo...


I agree, I don't like the magic string approach (even if it is mostly just dot-notation attribute lookup). However, there is some good stuff here, and nested data lookup when value existence is unknown is a pain point for me.

In addition to the string based lookup, it looks like there is an attempt at a pythonic approach:

  from glom import T
  spec = T['system']['planets'][-1].values()
  glom(target, spec)
  # ['jupiter', 69]
For me though, while I can understand what is going on, it doesn't feel pythonic.

Here's what I would love to see:

  from glom import nested
  nested(target)['system']['planets'][-1].values()
And I would love (perhaps debatably) for that to be effectively equivalent to:

  nested(target).system.planets[-1].values()
Possible?

--- edit: Ignore the above idea. I thought about this a bit more and the issue that your T object solves is that in my version:

  nested(target)['system']
the result is ambiguous. Is this the end of the path query and should return original non-defaulting dict, or the middle of the path query and should return a defaulting dict? Unknown.

The T object is a good solution for this.


Objects are overkill. Use an iterable of getitem lookups. They compose easily and don’t have much overhead (cpu or cognitive). From the library I use at work, inspired by clojure.core/get-in, it would look like this:

    def get_in(obj, lookup, default=None):
        """ Walk obj via __getitem__ for each lookup,
        returning the final value of the lookup or     default.
        """
        tmp = obj
        for l in lookup:
            try:
                tmp = tmp[l]
            except (KeyError, IndexError, TypeError):
                return default
        return tmp


    data = {“foo”: {“bar”: [“spam”, “eggs”]}}
    # find eggs
    get_in(data, [“foo”, “bar”, 1])
By using __getitem__ you naturally work with anything in the python ecosystem.


I prefer your nested() version. I don't see why your final example would be ambiguous either, if it always returns a defaulting dict which you always have to extract with .values() there is no ambiguity. This is similar to how Javas Optional<T> work. The problem could be that the type checker isn't good enough and writing code like if nested(target)['system'] == "lazer" could pass the type checker.


The other problem that T solves and nested doesn't is being able to reuse a spec. Once you have a spec, whether created with T or directly, you can call glom multiple times with the same spec, pass the spec to another function, tc.


That’s called defining a function.


My thoughts also turned to toolz.

Here's a example comparison:

glom

  glom(target, ('system.planets', ['name']))
  # ['earth', 'jupiter']
toolz

  list(pluck('name', get_in(('system', 'planets'), target)))
  # ['earth', 'jupiter']


The real advantage over toolz/get_in is in the breadth of types glom can support (http://glom.readthedocs.io/en/latest/api.html#setup-and-regi...), the extensive fallback behavior (http://glom.readthedocs.io/en/latest/api.html#advanced-speci...), and "T" specifier (https://sedimental.org/glom_restructured_data.html#true-pyth...), which allows performing object-oriented traversals and calls.

Check the post for more examples of where path-based access falls short of the mark!

(Sidenote: I find it very clunky that get_in() takes a "default" kwarg and _also_ a n"o_default" kwarg. Use a Sentinel object! http://boltons.readthedocs.io/en/latest/typeutils.html#bolto... )


Sorry to double reply, but yes the "T" spec is exactly what I would want. IMO it's yet another step better than using a "spec" made of untyped nested lists and dicts.

DSL in strings = bad

DSL in native syntax = good


Sentinel is one of my favorite objects in Python.


What's the advantage of make_sentinel vs just doing _MISSING = object()?


A Sentinel is guaranteed unique and distinct from all other objects.


I'm fairly sure object() also returns an object which is unique and distinct from all other objects. The only difference as far as I can see is that make_sentinel returns an object that has a unique and distinct type from all other objects, but I don't see why you'd be checking the type of your sentinels in Python.


Hehe, boltons.typeutils.make_sentinel should really be documented better for more experienced developers. A few small advantages: a nice repr, pickleability, and (back to the first advantage really) good rendering in a Sphinx autodoc context. :)


a nice __repr__ is the main thing


I feel the same way, although my instinct is generally to build a custom generator. Only costs a couple lines but is plain old python and quite explicit

    target = {'system': {'planets': [{'name': 'earth', 'moons': 1},
                                     {'name': 'jupiter', 'moons': 69}]}}

    glom(target, {'moon_count': ('system.planets', ['moons'], sum)})
    # vs
    def iter_moons(t):
        for planet in target['system']['planets']:
            yield planet['moons']

    sum(iter_moons(target))
would have to combine with `defaultdict`s if your nested data is only sometimes there though


For simple cases like this it doesn't even cost a couple lines as sum() can take a generator expression:

  sum(planet['moons'] for planet in target['system']['planets'])


Combining ideas from parent and grandparent:

  sum(planet.get('moons', 0) for planet in target['system']['planets'])


That's a great example! Mind if I use it? ;)


You can also use a list comprehension like so.

>>> sum([x['moons'] for x in target['system']['planets']])


the only string parsing is 'a.b.c', which is mostly equivalent to T.a.b.c so you can completely ignore that capability

the very slight difference is that using T you must be explicit about attribute access vs key access whereas 'a.b.c' will try both


Why is writing DSLs strictly worse than writing complicated transformations built on top of the limited constructs provided by the language itself? (You said never)


The issue isn't dsls, it's string dsls. Doing it within the language can provide stronger garauntees.

As a simple example, you get some syntax checking.


Precisely. Until my autocomplete, syntax highlighting, and linting engines recognize it, I'll pass.


I think y'all might be getting a bit up in arms about a non-issue. Seasoned glommers (and experienced Pythonists), will almost always be using T: http://glom.readthedocs.io/en/latest/api.html#object-oriente...

While mentioned in the post, it's not front-and-center because that's the sort of Python superpower that can look scary to less-experienced devs.


They’re pretty comparable when the strings are constants. The fun begins when the string is dynamically created.


I wouldn't recommend dynamically generating strings. Strings are just a shorthand for Path objects, use Path instead: http://glom.readthedocs.io/en/latest/api.html#specifier-type...


I definitely wouldn't recommend it either. Sorry not to be clear.

In my experience, it's almost impossible to dissuade people from generating strings if the API affords it. We can't even stop people from generating SQL by concatenation!

Musing: does Python have a way (Mypy?) to declare that a method can accept only constant expressions?


This might have been unintentional, but I suspect "Spectre of Structure" and "Python's missing piece" refer to Nathan Marz's specter library for clojure [1], similarly touted as clojure's missing piece. I tend to agree in the case of specter, given the mind-boggling types of transformations that are easily (and simply) expressed in it (and often run faster than idiomatic clojure as well). Highly recommended if you ever need to work with deeply nested data structures.

[1] https://github.com/nathanmarz/specter


Total coincidence! Reading the README, Nathan and I are definitely on the same wavelength though. When I get a chance I'll add it to the analogies doc: http://glom.readthedocs.io/en/latest/by_analogy.html :)


What's really interesting is that we're approaching the same ideal state from different directions. Specter goes from Clojure's immutability to something more practical, from Python's super dynamic system to something more declarative and immutable.


This is really cool. Did you ever consider an API to do the reverse - to insert a value at a particular point in the data?

My interest stems from this issue[0] on the Ruby issue tracker to make a symmetrical method to Hash#dig (which does something similar to, but more limited than glom) called Hash#bury. The problem in the issue was that inserting a value at a given index in an array proved difficult and unnatural in Ruby, so I was wondering if there were other solutions out there.

Another question occurs to me - does glom only support string keys?

[0]: https://bugs.ruby-lang.org/issues/11747


glom not only supports more than string keys, it also supports assigning to non-dictionary objects. That's a part of the API we're working on right now, actually.

As for the data insertion, mutation may be in the future, but for now glom only transforms and returns new objects. Definitely something to think about though, bookmarked! :)


Check the updateIn, mergeIn, mergeDeepIn methods in immutable.js. Maybe even asMutable, asImmutable for mutation.



'string-key', T['string-key'], T[not-string-key] :-)


There is also this Python lens library. https://github.com/ingolemo/python-lenses

I can't say how they compare, but they have some overlapping features.


My favourite approach to this so far, that I would like other libraries to copy, is Elixir’s Access protocol, which gives you e.g.:

    foo = %{key: [[1, 2], [3, 4], [5, 6]]}

    path = [:key, Access.all, Access.at(0)]

    get_in foo, path
    # => [1, 3, 5]

    update_in foo, path, &(&1 * 10)
    # => %{key: [[10, 2], [30, 4], [50, 6]]}

    foo
    |> put_in([:key, Access.all], “foo”)
    |> put_in([:new_key], “bar”)
    # => %{key: [“foo”, “foo”, “foo”], new_key: “bar”}
That third form is essentially the equivalent of building up a complex object through a series of mutations—but entirely functional.


This seems like lenses for python... neat! I often use python to mess around with things, and almost always miss Haskell's lenses when doing so. This seems like an interesting solution.


I haven't played with this yet, but it looks really handy. I deal with much more JSON on the command line than I'd like, so I think having both a single library and command line tool to reshape that data will make that much easier. I've used jq a few times, but when I want to move a little beyond what it does I usually end up writing a Python script. Hopefully this will make that transition smoother.


Haha, I'm all for console usage, but let me tell you, there's nothing quite like that feeling of moving a working spec into a dedicated application with exception handling, logging, etc. :)


I'm not really versed in the idioms/social mores of Python, so please take the following with a grain of salt:

This seems like it usefully solves a problem, but the invocation pattern is suspect to me -- Instead of "glom" taking the target for picking-apart plus a magic little bit of DSL, what if "glom" took a single parameter, the aforementioned DSL, and returned a function that would perform the corresponding search when called on a target? Even if Python or this package optimises away repeatedly searching (by the same spec|in the same manner), the convention the package prescribes is odd to me, right after the first few paragraphs of intro.


The big, classical school of Python definitely prefers top-level functions. Still, I definitely understand that aesthetic, and am on board with not using functools.partial to achieve it. So: https://github.com/mahmoud/glom/issues/14 :)


Python regex library does this, optionally


Similarly, statistical distributions in SciPy can be used in "frozen" form (pre-parameterized) or in a more general form where you supply the parameters at the same time you are requesting some attribute it the distribution. Seems to me to be a situation where one is useful if you expect reuse, and the other is useful if you don't.


It seems to me like the advantage to focus on here is the improved error / `None` handling, which will speed debugging and make handling expected edge cases easier. I've seen a lot of inexperienced developers tripped up entirely by this kind of data access, and seen plenty of experienced developers waste time debugging it because of the exact error cases the announcement references.

The `T` object, which the article describes as its most powerful, can be a useful pattern in some situations, but it's worth pointing out it isn't new or unique to this project.

The author says in another thread here that he first started working on the "stuff leading up to glom" in 2013. One older example, which is virtually identical though less complete, is this Stack Overflow answer I posted in 2012: https://stackoverflow.com/a/9920723/500584

I'd seen the general pattern even before that post, if not the Pythonic syntax. I don't think that it's much of an improvement over defining a `lambda`, so again I would say the thing to focus on is the improved debugability and the simpler, dot-notation-as-generic-attribute-or-item-accessor syntax. I think `T` is largely a distraction, or should be reserved for advanced users.


I would like to see the author debugging an application with 10 levels of object wrapping that had one of the middle object’s name misspelled.

Libraries like these shine only if they have brilliant tracing and debugging capabilities; otherwise are too easy to reduce to literally a single function.


http://glom.readthedocs.io/en/latest/api.html#debugging

affordances to add tracing prints, or drop into a pdb at any level

The Inspect specifier type provides a way to get visibility into glom’s evaluation of a specification, enabling debugging of those tricky problems that may arise with unexpected data.

Inspect can be inserted into an existing spec in one of two ways. First, as a wrapper around the spec in question, or second, as an argument-less placeholder wherever a spec could be.

Inspect supports several modes, controlled by keyword arguments. Its default, no-argument mode, simply echos the state of the glom at the point where it appears:


It looks quite similar in spirit to Clojure's Specter library (https://github.com/nathanmarz/specter), and even seems to have a nod to it (The Spectre of Structure).


Oh nice! Total coincidence, I assure you. Still, I should read on this and add it to the analogy doc: http://glom.readthedocs.io/en/latest/by_analogy.html

Declarative data transformation generates a lot of comparisons (almost all of them great, though!).


Ah you beat me to it! :)


Looks really neat.

Striking a balance between ease of use / simplicity and powerful features is a tough exercise but you did well.

I can foresee the CLI being quite useful to do away with the run-of-the-mill sed / awk / grep [...] mess. Specifically for the less CLI inclined people out there.


in a similar spirit, i wrote "sanest", sane nested objects, tailored specifically for json fornats: https://sanest.readthedocs.io/

it does not have the exact same feature set though. my focus was mostly on both reading and modifying nested structures in a type safe way.


Compare/contrast with pstar: https://github.com/iansf/pstar


Can it be used bidirectionally, without having to repeat the work?

I have a need to transform between pairs of structures, in both directions, and ever since I found JsonGrammar (https://github.com/MedeaMelana/JsonGrammar2) I've been pining for a Python version.


It depends on the complexity of the spec, but we've already done some programmatic building of glomspecs, so for many cases I think the answer is yes! Once we feel out the patterns I think glom will gain some utilities for this purpose.


Thanks, that's awesome!


It seems like a subset of glom specs would be uniquely invertable. For example, the spec `{'c': 'a.b'}` could trivially invert to `{'a.b': 'c'}`. I'm not sure how you'd invert more complex specs which make function calls, e.g. sum or len.


I had a quick look, but I didn't see filtering expressions, only shaping expressions. It seems like glom is more of a result shaper/mapper. Can you filter with glom (maybe with lambdas or something)? I could see the two going together quite well if you were "glomming" a big Python object.


Filtering is supported through the OMIT value: http://glom.readthedocs.io/en/latest/api.html#glom.OMIT (another example: http://glom.readthedocs.io/en/latest/snippets.html#filtered-... )

Lambdas and functions are always a safe fallback, but glom does its best to keep your specs readable and roundtrippable (gotta love a nice repr()).


That's cool! Thank you for pointing it out!


There is already a well established Gnome project with the same name: http://www.glom.org It is a GTK+ front-end to PostgreSQL, similar to Microsoft Access.


This reminds me a little of the excellent dpath lib: https://github.com/akesterson/dpath-python


I'm probably being dense, but I don't see a good description of the input data types supported - the CLI says "json or python".

It would be great to have clarification if this is JSON only, or supports other data structures, or parsers could be plugged in?


From within Python, all objects are supported by default. If you can parse it, you can glom it. You can even register additional behaviors for specific types to keep your specs tight: http://glom.readthedocs.io/en/latest/api.html#setup-and-regi... (example: http://glom.readthedocs.io/en/latest/snippets.html#automatic...)

The CLI is in a pretty preliminary state, usable but not as robust as it will be in a few weeks. It only supports built-in parsers (JSON and Python literals) What formats are you thinking? YAML?


Aha. Thanks! I understand better now, I didn't realize it was so general-purpose :)


This is pretty neat and is something I've been thinking about for a current project.

Does anyone know if something similar exists in Java/Scala land?


I think this is similar to lenses in FP languages - check out Monocle for Scala.


Aside: are there any libraries for JSON or dict-like formats with xpath-style querying that are as quick under the hood as lxml is for xml?


I think this project defines the first step at transitioning pip to a js-like repository of single-function modules. Hurray for kool-aid.


I'm curious about how the author replaced DRF with glom


DRF still takes care of negotiating formats, etc., but it replaces the serializers. I'll see about getting an example in the repo, stay tuned.


Shameless plug: https://www.npmjs.com/package/safely-nested

Nothing special, glom just reminded me of it.


Looks slick!


The writing style is just insufferable. Even the API documentation is littered with hyperbole and self-congratulation. We get it, you're proud of your work and extremely proud of yourself.

> "as simple and powerful as glom"

> "big things come in small packages"

> "small API with big functionality"

> "power is only surpassed by its intuitiveness"

> "simplicity is only surpassed by its utility"

> "shortest-named feature may be its most powerful"

For heaven's sake, give it a rest!

It's a big red flag about your priorities that when I go looking for a precise specification, I can't find answers to simple questions and instead end up wading through incessant marketing phrases. I tried, and I finally gave up halfway through the API doc. It might even be the case that glom is a good idea—but you're making it really hard to trust you as a source of objective information about it.

Show, don't tell. My advice to you: you'll generate more interest if you delete every congratulatory word on those pages and focus entirely on helping your readers understand what glom does instead of trying to sell it to them.


Hey Ka-Ping! Maybe I did get carried away :)

How I wish one could publish a dry document and expect people to read all the way to the bottom. I've published enough libraries to know that's not the case. glom's free software so it's all there, as "shown" as can be.

But referring you to the code wouldn't be very considerate either. Instead, here's this literate code version that I prepared in advance. Hopefully this will be of more help to you: http://glom.readthedocs.io/en/latest/faq.html#how-does-glom-...


Thank you for glom, and don't let those comments get you down.

Personally speaking – my heuristic is that maintainers who put effort into marketing copy (even if it's awkwardly exaggerated) are the kind of people who really want their users to enjoy the project, and that often predicts a low-friction experience. Keep doing what you're doing!


Dude, this looks really nice! I use nested stuff like that all the time - nested layers of flat data ;) - and glom (+T) do seem super appropriate and nice!


My only concern is that the documentation renders poorly on mobile. So I couldn't read the right half of the page.


I was frustrated. I apologize. I do still believe the frequency and intensity of hyperbolic language is a real obstacle to understanding and appreciating your project, and I hope you take that feedback to heart. But to call you "extremely proud of yourself" was unnecessarily personal, and I'm sorry I said that.


Looks like a nice package. I will try it in my next project. The upcoming features sounded interesting. Keep up the good work!


In http://glom.readthedocs.io/en/latest/tutorial.html#access-gr...,

> After years of research and countless iterations, the glom team landed on this simple construct:

'years of research', 'countless iterations', 'glom team', really?


You're yellin at a tutorial man. You gotta let some flavor text slide. :)

That said, I'm no liar. Kurt and I (as a team), really did write stuff leading up to glom in 2013 (years ago), and have written stuff like it enough times that I've lost count (countless :P). If this isn't research, I don't know what is. Heck, I'm even getting a fun little peer review!


You've created something useful and it will save a lot of programmers a ton of time.

I didn't find your writing insufferable. I've written tongue in cheek (or over the top) posts about my projects in the past. If they can't see the humor and the usefulness of the project, their loss.

Thank you for creating Glom and thank you for posting it on HN. Count me in as one of your users.


don't sweat it dude. I've noticed a lot of people on hn are crabby assholes for absolutely no good reason. like on the post linking to Google's codelabs (where there are hundreds of tuts about all sorts of things in the Google ecosystem) there were only two comments and they were complaints. and recall that every time an electron app is posted almost every comment is whining about the performance. and every time a rust article is posted there's whining about how it's more complicated than js. and every time there's a js article posted there's whining about how it's not type safe like rust. and every time someone posts a personal page someone has to point out how it's "garbage on mobile" as if they're doing people a favor pointing out flaws (as if they don't understand that mobile is the most heterogeneous platform out there). I swear people don't know how to be grateful for free shit or just keep their mouths shut when something doesn't tickle their own particular fancy. I wager it's a defense mechanism because they themselves aren't making anything and so they need assert their superiority in some way (because people that are busy doing stuff don't have time to complain about things irrelevant to their own work). kudos to you and Kurt for releasing a library to the community that's different and interesting and fuck the haters.


This needs to become a copy pasta + a version where we s/hn/reddit/g and every friggin salty post needs it shoved to them.


Also, often it's better if power surpasses intuitiveness a bit. :)


you know what's actually insufferable? taking pot shots at someone giving you something for free. either say thank you or move on. it's like yelling at your mom for making you breakfast in the morning: downright unseemly.


if the condescending tone of the top comment is ignored it becomes valuable feedback. good libraries/api's (free or otherwise) do not need to use marketing buzzwords to sell themselves when a clear demonstration of the functionality is usually more than enough.

see the python requests library documentation for a good example

http://docs.python-requests.org/en/master/


Did you seriously just recommend Requests, a project which uses "Non-GMO", "organic", and "grass-fed" to describe itself on the very page you linked, as a good example of not using buzzwords?

Come on. kreitz holds the title of Python marketspeak tycoon for a reason. :P


You forgot the line that's the biggest offender of all, "... for Humans".


>if the condescending tone of the top comment is ignored it becomes valuable feedback. good libraries/api's (free or otherwise) do not need to use marketing buzzwords to sell themselves when a clear demonstration of the functionality is usually more than enough.

this fallacy is called affirming the consequent. yes good libraries might not need marketing but that does not say anything about whether good libraries can have marketing.


its anyone's best guess whether apis/libraries with documentation that have buzzword-y marketing get more usage than those that entirely market themselves based on functionality.


Think of buzzwords as familiar faces for the readers. Sure, you can overdo it and make a buzzword soup, but a few buzzword can give the reader a quick idea of the product.


I think the more practical piece of advice is for people like zestyping to constructively offer their (valid) perspective on writing style without personalizing the criticism. I recognize however that offering such advice may be a fruitless endeavor depending on the person (like expecting a leopard to change its spots). Source: zestyping needlessly insulted me in front of colleagues over 10 years ago and it still stings a bit :-D


I think that practical advice is good.

Publicly deriding me as having an irreparable character flaw, based on something I said over ten years ago, which I can't possibly defend or apologize for because I have no idea who you are or what you're referring to—doesn't that seem a little low, though?

It sounds like this is still bothering you after all this time. Please consider reaching out to me (my e-mail address is my HN username at gmail); I'd be glad if we could sort this out in a private conversation. I can't promise that I'll take back what I said without knowing what it was, but I will do my best to understand what you experienced and why it was upsetting to you.


Cool... it looks like... errr... javascript :D




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: