Hacker News new | past | comments | ask | show | jobs | submit login
Gron – Make JSON Greppable (github.com/tomnomnom)
366 points by capableweb on Nov 6, 2020 | hide | past | favorite | 91 comments



From the FAQ, before someone asks the obvious:

Why shouldn't I just use jq?

jq is awesome, and a lot more powerful than gron, but with that power comes complexity. gron aims to make it easier to use the tools you already know, like grep and sed.

gron's primary purpose is to make it easy to find the path to a value in a deeply nested JSON blob when you don't already know the structure; much of jq's power is unlocked only once you know that structure.


In simpler words, as a user: The jq query language [1] is obtuse, obscure, incredibly hard to learn if you need it for quick one liners once in a blue moon. I've tried, believe me, but I should probably spend that much effort learning Chinese instead.

It's just operating at the wrong abstraction level, whereas gron is orders of magnitude easier to understand and _explore_.

1: https://stedolan.github.io/jq/manual/


> In simpler words, as a user: The jq query language [1] is obtuse, obscure, incredibly hard to learn if you need it for quick one liners once in a blue moon.

I don't agree that jq's query language is obtuse. It's a DSL for JSON document trees, and it's largely unfamiliar, but so is xpath or any other DOM transformation language.

The same thing is said about regex.

My take is that "it's obtuse" just translates to "I'm not familiar with it and I never bothered to get acquainted with it".

One thing that we can agree though is that jq's docs are awful at providing a decent tutorial for new users to ramp up.


I'm pretty good with regular expressions. I have spent a lot of time trying to get familiar with jq. The problem is that I never use it outside of parsing JSON files, yet I use regular expressions all over the place: on the command line, in Python and Javascript and Java code. They are widely applicable. Their syntax is terse, but relatively small.

jq has never come naturally. Every time I try to intuit how to do something, my intuition fails. This is despite having read its man page a dozen times or more, and consulted it even more frequently than that.

I've spent 20+ years on the Unix command line. I know my way around most of it. I can use sed and awk and perl to great effect. But I just can't seem to get jq to stick.

Aside, but there's a lot of times when "I know jq can do this, but I forget exactly how, let me find it in the man page" and then... I find jq's man page as difficult as jq itself when trying to use it as a reference.

Anyway, $0.02.

Edited to add: as a basic query language, I find it easy to use. It's when I'm dealing with json that embeds literal json strings that need to be parsed as json a second time, or when I'm trying to manipulate one or more fields in some way before outputting that I struggle. So it's when I'm trying to compose filters and functions inside jq that I find it hard to use.


Agreed. I only use jq once a month or once in two months at most. Every time I want to do something I just search for my use case since I can't seem to remember the syntax.


But I have tried to learn jq's syntax (it's pretty much a minilanguage) and it has been incredibly difficult.

I also remember what when I first tried learning regex it was also very difficult. That is until I learned about finite state machines and regular languages, after that CS fundamentals class I was able to make sense of regex in a way that stuck.

Is there a comparable theory for jq's mini-language?


Not a theory per se, but my "lightbulb moment" with jq came when I thought about it like this:

jq is basically a templating language, like Jsonnet or Jinja2. What jq calls a "filter" can also be called a template for the output format.

Like any template, a jq filter will have the same structure as the desired output, but may also include dynamically calculated (interpolated) data, which can be a selection from the input data.

So, at a high level, write your filter to look like your output, with hardcoded data. Then, replace the "dynamic" parts of the output data with selectors over the input.

Don't worry about any of the other features (e.g. conditionals, variables) until you need them to write your "selectors."

YMMV, but that's what's worked for me


I don't know of any formal theory, but it feels a bit like functional programming because you don't often use variables (an advanced feature, as the manual says). I kind of got a feel for it by realizing that it wants to push a stream of objects through transformations, and that's about it. A few operators/functions can "split" the stream, or pull the stream back into place. Like, uh,

in.json {"a":1,"b":2}

jq -c '{a}' in.json

{"a":1}

The . is the current stream, so if I just do ". , .", it's kind of pushing two streams along:

jq -c '.,. | {a}' in.json

{"a":1}

{"a":1}

Then, of course, say:

jq -c '{a, b, c: .}' in.json

{"a":1,"b":2,"c":{"a":1,"b":2}}

It was going through the . stream, and I pulled the . stream right back in while doing so.

So it kind of helps to keep straight in my head when I've kind of got multiple streams going, vs multiple values.

Someone (almost anyone) can probably explain better with formal theory, but I just kind of got a feel for it and kind of describe it like this.


I think it's more like "I'm not familiar with it and getting it to do something that seems like it should be easy is surprisingly hard, even though I'm putting in some effort." I've become pretty good at jq lately, but for several years before that I would occasionally have some problem that I knew jq could solve, and resolved to sit down and learn the damn thing already, and every time, found it surprisingly difficult. Until you get a really good understanding of it (and good techniques for debugging your expressions), it's often easier just to write a python script.

I love jq, and without detracting from it, gron looks like an extremely useful, "less difficult" complement to it.


Adding: in fact, gron's simplicity is downright inspired. It looks like all it does is convert your json blob into a bunch of assignment statements that have the full path from root to node, and the ability to parse that back into an object. Not sure why I didn't think of that intermediate form being way more greppable. Kudos to the author.

Just as an example, this just took me about a minute to get the data I wanted, whereas I probably spent a half an hour on it yesterday with jq:

  curl -s https://static01.nyt.com/elections-assets/2020/data/api/2020-11-03/national-map-page/national/president.json | gron | grep -E 'races.*(leader_margin_votes|leader_margin_name_display|state_name)' | grep -vE 'townships|counties' | gron -ungron


If it had a name that didn't collide horribly with jQuery in search, I think it would be fine.


I spend a decent amount of time at the command line wrangling data files. It's fun for me to get clever with other tools like awk and perl when stringing together one liners and I enjoy building my knowledge of these tools, but jq has just never stuck.


Is it possible that you learned awk and Perl when you were but a child, and now your aging system is becoming read-only?

- a fellow read-only system


Quite possibly, I did first play with Perl about 15 years before encountering jq. Some days I do feel as though my head is simply out of room, as my brain has been replaced by a large heap of curly braces, semi colons and stack traces.


read-only - getting older it sure feels like that.


I mean, I've been using grep and sed for 15 years now and I still struggle with anything beyond matching literals since they use a "nonstandard" regexp syntax and GNU and BSD variants behave very differently making for a lot of bugs on scripts that need to work on both Linux and MacOS (of coure you can install GNU on macos and BSD on linux, but the whole advantage of bash scripts is that you assume certain things are installed on the user's system and if you can't satisfy that assumption you may as well use Python or similar). I think gron has value for those simpler grep cases, but for anything beyond that, jq is the way to go (incidentally I'm very dissatisfied with all of the tools that aspire to be "jq for yaml" or even the relative dearth of tools for converting YAML to JSON on the command line).


>GNU and BSD variants behave very differently making for a lot of bugs on scripts that need to work on both Linux and MacOS

Perl shines for this use case (assuming it is present in the machines you are working with). It is slower than grep/sed/awk for most cases, but it is more powerful and better portable across platforms.

>converting YAML to JSON on the command line

check out https://github.com/bronze1man/yaml2json


> Perl shines for this use case (assuming it is present in the machines you are working with). It is slower than grep/sed/awk for most cases, but it is more powerful and better portable across platforms.

Agreed.

For better or worse, when performance is not a concern in my scripts, I just shell out to "perl -pe" rather than trying to deal with grep, sed or awk.

It just works.


`jq` started to click for me after watching this introductory video[1] closely and playing with some examples as I went.

The slides are linked in the video description and at [2]. You'll need them because unfortunately the video is produced in such a way that the speaker video window often obscures important parts of his presentation.

[1] https://www.youtube.com/watch?v=_ZTibHotSew

[2] https://www.slideshare.net/mobile/btiernay/jq-json-like-a-bo...


I gave up on jq and use jello[1][2] instead. gron too looks nice.

[1] https://blog.kellybrazil.com/2020/03/25/jello-the-jq-alterna... [2] https://github.com/kellyjonbrazil/jello


Thanks - I'm glad to find out that I'm not the only one that struggled with jq.


While it's true that jq's DSL has a bit of a learning curve, being able to try expressions and see immediate feedback can help immensely.

Here is a demo of a small script I wrote that shows jq results as you type using FZF: https://asciinema.org/a/349330 (link to script is in the description)

It also includes the ability to easily "bookmark" expressions and return to them so you don't have to worry about losing up an expression that's almost working to experiment with another one.

As a jq novice, I've personally found it to be super useful.


I really can't relate to the language being hard to learn. I've been using jq for a while now and have only had to look at the docs once when a key contained special characters and dot notation didn't work. I've usually been able to just guess the syntax in a couple of tries, but that might be because I'm just used to using weird notation for manipulating data structures (css, emmet, xpath, various "functional" globs of map/filter/reduce/zip...)


Even simpler: gron is a tool made in the spirit of Unix while jq is not.


It's just an array programming language. Not everything has to be C, and I think it's unfair to call a language obtuse and incredibly hard to learn just because you're not used to the style.


I regularly use jq to summarize the structure of big JSON blobs, using the snippet written here (I alias it to "jq-structure"): https://github.com/stedolan/jq/issues/243#issuecomment-48470...

For example, against the public AWS IP address JSON document, it produces an output like

    $ curl -s 'https://ip-ranges.amazonaws.com/ip-ranges.json' | jq -r '[path(..)|map(if type=="number" then "[]" else tostring end)|join(".")|split(".[]")|join("[]")]|unique|map("."+.)|.[]'
    .
    .createDate
    .ipv6_prefixes
    .ipv6_prefixes[]
    .ipv6_prefixes[].ipv6_prefix
    .ipv6_prefixes[].network_border_group
    .ipv6_prefixes[].region
    .ipv6_prefixes[].service
    .prefixes
    .prefixes[]
    .prefixes[].ip_prefix
    .prefixes[].network_border_group
    .prefixes[].region
    .prefixes[].service
    .syncToken
This plus some copy/paste has worked pretty well for me.


Wow, it looks so easy, why doesn't everybody do that? /s

That jq query looks like an unwise Perl one-liner.


Yes, it is arguably an unholy contrivance, but someone's already written it, and invoking it as a shell alias or likewise is both easy and useful.

    $ jq-structure my-file.json


Hey! That's kind of how how I use the CLI (API?) at AWS. It works pretty well! And fortunately (for me), not too much thinking involved.

BTW: I have a D3 front-end dashboard/console for the app (not admin) that makes this a little bit harder, but D3 is pretty organized (and well-documented), if you can figure out what you are trying to do with it.


It feels like finding a deeply nested key in a structured document is a job for XPath. Most people including myself until recently ignore that XPath 3.1 operates on JSON.


FWIW, my two cents:

I like that jq's query expression syntax is command line (bash) friendly. My hunch is that xpath expressions would be awkward to work with.

I've done too much xpath, xquery, xslt, css selectors. For my own work (dog fooding), I settled on mostly using very simple globbing expressions. Then use the host language's 'foreach' equiv for iterating result sets.

Globbing's double asterisk wildcard is the feature I most miss in other query engines. https://en.wikipedia.org/wiki/Glob_%28programming%29

Looping back to command line xpath: there's always some impedance match between the query and host languages. IIRC, one of the shells, like chubot's oilshell or fish?, has more rational expression evaluation (compared to bash).

You especially see this with regexs. It's a major language design fail that others haven't adopted Perl's first class regex intrinsics. C# has LINQ, sure. But that's more xquery than xpath. And I've never liked xquery.

In other words, "blue collar" programming languages should have intrinsic path expressions. Whatever the syntax.

YMMV.


Is there a command line tool that supports XPath on JSON?


Yes, xidel. The author hangs out here.


I use xmllint for html and xml. I don't think it supports json


Most people ignore XPath.


Can jq do what gron does?


Technically no, because it offers no comparable way to interface to the line-based world of unix tools for interop.

Practically, most things you'd do with gron and grep, sed, awk, ... you could do using only jq as well. Jq comes with massive cognitive overhead though and has a bunch of very unpleasant gotchas (like silently corrupting numbers with abs > 2^53, although very recent jq graciously no longer does that iff you do no processing on the number).

I find jq pretty useful, but I have no love for it.


Actually, I think it might be possible to implement gron in jq (you can produce "plaintext" not just json, and the processing facilities jq offers might be powerful enough to escape everything appropriately, but it's not something I'm curious enough to find out to try).


> Can jq do what gron does?

It really depends on what you want to do, and thus what think gron does.

If all you want to do is search for properties with a given value then yes, jq does that very well.

Unlike gron, jq even allows users to output search results as valid json docs. Hell, jq allows users to transform entire JSON docs.

However, if all you want to do is expand the JSON path at each symbol then I don't know if jq supports that usecase. But then again, why would anyone want to do that?


The linked readme demonstrates gron outputting search results as json.


I love that this handles big numbers without modifying them unlike `jq` - https://github.com/stedolan/jq/issues/2182

   $ echo "{\"a\": 13911860366432393}" | jq "."
   {
     "a": 13911860366432392
   }

   $ echo "{\"a\": 13911860366432393}" | gron | gron -u
   {
     "a": 13911860366432393
   }
I can now happily uninstall `jq`. I've been burned by it way too many times.


ouch, I did not know that! thanks for the warning, need to check if my installed version has the fix already.


I think you should just invest the hour or so it takes to learn jq. Yes, it's far from a programming language design marvel. But it covers all of the edge cases, and once you learn it, you can be very productive. (But, the strategy of "copy paste a oneliner from Stackoverflow the one time a year I need it" isn't going to work.)

I think structured data is so common now, that you have to invest in learning tools for processing it. Personally, I invested the time once, and it saves me every single day. In the past, I would have a question like "which port is this Pod listening on", and write something like "kubectl get pod foo -o yaml | grep port -A 3". Usually you get your answer after manually reading through the false-positives. But with "jq", you can just drive directly to the correct answer: "kubectl get pod foo -o json | jq '.spec.containers[].ports'"

Maybe it's kind of obtuse, but it's worth your time, I promise.


How about a tool which output nicely formatted JSON with every line annotated with a jq expression to access that value.

But then I squint a little bit at the default gron (not ungron) output, and that's actually what I see.


But how do you get that '.spec.containers[].ports'?

It seems to me that for your example use case, gron is at least useful to first understand the json structure before making your jq request. And, for simple use cases like this one, enough to replace jq altogether.


Well, the schema of the JSON is something you have to come up with on your own. I happen to have seen like 8 trillion pod manifests so I know what I'm looking for, but if you don't, you have to figure out the schema in some other way. To reverse engineer something, I usually pipe into keys (jq keys, jq '.spec | keys', jq '.spec.containers[] | keys', etc.)

For Kubernetes specifically, "kubectl explain pod", "kubectl explain pod.spec", etc. will help you find what you're looking for.


> Well, the schema of the JSON is something you have to come up with on your own. I happen to have seen like 8 trillion pod manifests so I know what I'm looking for, but if you don't, you have to figure out the schema in some other way.

Well, or you just do

kubectl get pod pod -o json | gron | grep port

and you will get the answer to the original question + the path.


Better to learn xidel http://www.videlibri.de/xidel.html which is standards based and more sane to read.


With respect, it might have been just an hour for you.

For me, it was hours and hours and hours and days and days wasted with jq, before I found gron.

Not looking back.


I don't find this grep-able at all:

    json[0].commit.author.name = "Tom Hudson";
Now I need to escape brackets and dots in regex. Genius!

I have 5 line (!) jq script that produces this:

    json_0_commit_author_name='Tom Hudson'
This is what I call grep-able. It's also eval-able.

> What if there's json object with commit and json_commit?

Then I'll use jq to filter it appropriately or change delimiter. The point is ease of use for grep and shell.


I think this is a good point. It's definitely hard to grep for certain parts of gron's output; especially where there's arrays involved because of the square brackets. I find that using fgrep/grep -F can help with that in situations where you don't need regular expressions though.

It's not an ideal output format for sure, but it does meet some criteria that I considered to be desirable.

Firstly: it's unambiguous. While your suggested format is easier to grep, it is also lossy as you mention. One of my goals with gron was to make the process reversible (i.e. with gron -u), which would not be possible with such a lossy format.

Secondly: it's valid JavaScript. Perhaps that's a minor thing, but it means that the statements are eval-able in either Node.js or in a browser. It's a fairly small thing, but it is something I've used on a few occasions. Using JavaScript syntax also means I didn't need to invent new rules for how things should be done, I could just follow a subset of existing rules.

FWIW, personally I'm usually using gron to help gain a better understanding of the structure of an object; often trying to find where a piece of known data exists, which means grepping for the value rather than the key/path - avoiding many of the problems you mention.

Thanks for your input :) I'd like to see your jq script to help me learn some more about jq!


With `fgrep` and quoteing in 's You don't have to escape anything.


What is your jq script?


    #!/usr/bin/jq -rf
    
    tostream | select(length == 2) | (
     ( [ .[0][] | tostring | gsub("[^\\w]"; "_") ] | join("_") )
     + "=" +
     ( .[1] | tostring | @sh )
    )


Can you share the jq script ?



While I like what jq let's me do, I actually find it really difficult to use. It's very rare that I attempt to use it without having to consult the docs, and when I try to do anything remotely complex it often takes ages to figure it out.

I very much like the look of gron for the simpler stuff!


"Or you could create a shell script in your $PATH named ungron or norg to affect all users"

You could also check argv[0] for if you were called via the `ungron` name. Then it would be as simple as a symlink, which is very easy to add at install/packaging time.

(I know it's fairly broadly known, but this is the "multicall binary" pattern: https://flameeyes.blog/2009/10/19/multicall-binaries/)


Please submit this as an issue to the repo.


Looks like someone has done so here:

https://github.com/tomnomnom/gron/issues/77


Yup, and it’s been merged :-)


The author of tool did a really nice tutorial on doing bug bounty recon using Linux in which he also used Gron: https://youtu.be/l8iXMgk2nnY?t=1335


So it basically flattens a json file into lines of flattened-key = value. Which makes it easy to just grep


Same as html2/xml2.


This was on the front page before, which generated some good discussion:

https://news.ycombinator.com/item?id=16727665


Yeah, two years ago, and no commits (roughly) since.

(Has bug reports, has PRs, it's not 'done'.)


I'm sorry to say this is down to my (the author's) mental health issues over the last few years.

I hope to be able to face dealing with people's issues and PRs soon.


I'm sorry to hear that - I didn't mean it as a criticism, (you're free to work on or not work on whatever you want of course!) I was just surprised at the level of traction the post was getting I suppose.

All the best.


No problem at all! I'm as surprised as you are to be honest! Still makes me happy to see people getting use out of any of my tools though :)


Wow, I am typically hesitant to adopt new tools into my flow. Often times they either don't solve my problem all that much better, or they try to do too much.

This looks perfect. Does one thing and does it well. I will be adopting this :-)


I achieve something similar with the following function in ~/.jq

  def flatten_tree: [leaf_paths as $path | {"key":$path | join("."), "value": getpath($path)}] | from_entries;
e.g.

  curl -s http://consul.service.consul:8500/v1/catalog/service/brilliant-service | jq -r 'flatten_tree'
I haven't felt the need to devise the reverse transformation yet, but it works great for grepping blobs of unknown structure


If you'd like to use something like this in your own APIs to let your clients filter requests or on the CLI (as is the intention with gron), consider giving "json-mask" a try (you'll need Node.js installed):

  $ echo '{"user": {"name": "Sam", "age": 40}}' | npx json-mask "user/age"
  {"user":{"age":40}}
or (from the first gron example; the results are identical)

  $ gron "https://api.github.com/repos/tomnomnom/gron/commits?per_page=1" | fgrep "commit.author" | gron --ungron
  $ curl "https://api.github.com/repos/tomnomnom/gron/commits?per_page=1" | npx json-mask "commit/author"
If you've ever used Google APIs' `fields=` query param you already know how to use json-mask; it's super simple:

  a,b,c - comma-separated list will select multiple fields
  a/b/c - path will select a field from its parent
  a(b,c) - sub-selection will select many fields from a parent
  a/*/c - the star * wildcard will select all items in a field


This is a fantastic idea.

I have installed gron on all my development machines.

Will probably use it heavily when working with awscli. I'm not conversant enough in the jq query language to not have to look things up when writing even somewhat complex scripts. And I don't want to learn awscli's custom query syntax. :)

Thought at first that it might be possible to replicate gron's functionality by some magic composition of jq, xargs, and grep, but that was before I understood the full awesomeness of gron - piping through grep, sed maintains gron context so you can still ungron later.

Nice work, thank you!


Very nice! I don't like that it can also make network requests. It's potential security hole and completely unnecessary given that we already have curl and pipes for that.


Catj is worth a mention. Similar, but written in node.js.

https://github.com/soheilpro/catj


I use this all the time when working with new APIs.


1. Is there a name/"standard" for the format gron is transforming json into?

2. Thesis: jq is cumbersome when used on a json input of serious size/complexity because upfront knowledge of the structure of the json is needed to formulate correct search queries. Gron supports that "uninformed search" use-case much better. Prove me wrong ;)


1. There isn't really a name for it, but it's a subset of JavaScript and the grammar is available here specified in EBNF, with some railroad diagrams to aid understanding: https://tomnomnom.github.io/gron/

2. That's pretty much exactly why I wrote the tool :)


gron outputs Javascript!


Is this better than the old solution of json_pp < jsonfile | grep 'pattern' ?

While that's only useful for picking out specific named keys without context, that's often good enough to get the job done. Added bonus is that json_pp and grep are usually installed by default so you don't have to install anything.


JSON Path Names at https://www.convertjson.com/json-path-list.htm will do this too plus break down all the pieces in a nice searchable table format. Disclosure - author.


There is also JMESPath that implements a proper spec.

https://gatling.io/2019/07/31/introducing-jmespath-support/


JMESPath has a spec, that is true, but JMESPath also has some serious limitations [1]. If I'm doing any JSON manipulation on the command line then I'll reach for jq.

That said, gron certainly looks like it offers simplicity in cases where using jq would be too complicated.

[1] https://news.ycombinator.com/item?id=16400320


My humble attempt at building a tool similar to this and jq but with saner DSL syntax https://github.com/jsqry/jsqry-cli2


Assuming GRON is short for "grep json", it seems like the tool could have a better name. It looks like it is useful beyond just the grepping case.


I Love it, I have never used jq without first googling how to do the thing I want to do. It does not do 1 thing well, it does many things not so well.



Gron is awesome, I use it frequently to quickly find jsonpath queries to kubernetes manifest properties.


Fantastic tool. Thanks for sharing.


Also fx is very handy for easily inspecting json and is one like using too.

https://github.com/antonmedv/fx


Oh, it's interactive. Perfect!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: