I can't stand jq. I realize this is an unpopular opinion, and our codebase at work has plenty of jq in the bash scripts, some of it even code that I wrote. I begrudgingly use it when it's the best option for me. But something about it rubs me the wrong way - I think it's the unintuitive query syntax and the need to search for every minute step of what I'm trying to do, and the frequency with which that leads to cryptic answers that I can only decipher if I am some sort of jq expert. But I have this instinctive reaction to all DSL languages that embed themselves into strings, like htmx and tailwind (both embedded in attribute string values). I realize some people like it, and it's a well-made piece of software, and I will even admit that sometimes there is no better choice. But I guess I just hate that it's necessary? I guess I could also admit it's the least-bad option, in the sense that it's a vast improvement over various sed/awk/cut monstrosities when it comes to parsing JSON in bash. Certainly once you find the right incantation, it's perfect - it transforms some raw stdin into parsed JSON that you can manipulate into exactly what you need. But for me, it ranks right next to regex in terms of "things I (don't) want to see in my code." I hate that the jq command is always some indecipherable string in the middle of the script. The only real alternative I've ever used is piping to a Python program that I define inline in a heredoc, but that ends up being at least as nasty as the JQ script.
> I hate that the jq command is always some indecipherable string in the middle of the script
It might be worthwhile to just learn how jq works. At the end of the day, you need to learn some language to parse json. I hate DSLs too, but I cannot think of anything as useful and concise as jq.
> but that ends up being at least as nasty as the JQ script
That's exaxtly why jq is so nice. Nice alternatives just don't exist
> That's exaxtly why jq is so nice. Nice alternatives just don't exist
Write a simple Python script, parse JSON into native objects, manipulate those objects as desired with standard Python code, then serialize back into JSON if necessary. Voila, you have a readable, maintainable, straightforward solution, and the only dependency (the Python interpreter) is already preinstalled on almost every modern system.
Sure, you may need a few more lines of code than what would be possible with a tailor-made DSL like jq, but this isn't code golf. Good code targets humans, not "least possible number of bytes, arranged in the cleverest possible way".
The simple existence of DSL tools like jq is the testament to the fact that people don't want to go to a generic language to solve every kind of problem.
I'm also convinced that a big subset of "use generic language for everything" do it because they want to use their shiny hammer on that nail as well.
> Sure, you may need a few more lines of code than ...
jQ integrates very nicely into bash script. Especially in between pipes a short&simple jq-snippet can work wonders for readability of the overall script.
On the other hand, if the bash script becomes too complex it may be a good idea to replace the entire bash script with python (instead of just the json-parsing-part)
> ... if the reader happens to be familiar with the niche language "jq".
Eh. Linux/Unix has always had an affinity for DSLs and mini-languages. If you're willing to work with bash, sed, awk, perl, lex, yacc, bc/dc etc. jq doesn't seem like it should cause too much consternation.
> Especially in between pipes a short&simple jq-snippet
Many of them are not short and simple though. And each time you do a some transformation, you pretty much need to go in/out of jq at each step of it want to make some decisions or get multiple types of results without processing the original multiple times.
The point in my career at which I used jq the most was when I was doing a lot of work with Elasticsearch doing exploratory work on indexed data and search results. Doing things such as trying to figure out what sort of values `key` might have, grabbing ids returned, etc.
Second to this, I've mostly used jq to look at OpenAPI/swagger files, again just doing one-off tasks, such as listing all api routes, listing similarly named schemas, etc.
From what I've seen in the companies I've worked for, this is fairly consistent, but naturally I can't speak for everyone's use-cases. At the end of the day, I don't think most people use jq in places where readable or maintainable would be most appropriate.
Yea except the python solution is probably going to be several hundred lines, instead of a few.
Python is often not installed in server environments unless it's a runtime environment for Python.
Want to use a non standard library? Now your coworkers are suddenly in Python dependency hell. Better hope anyone else that wants to use this is either familiar with the ecosystem, or just happens to have an identical runtime environment as you.
Or someone could just curl/apt/dnf a jq binary to use your 3 line query, instead of maintaining all of this + 200 lines of Python.
I got to jq for the same reason I go to regular expressions. If you tell me this is too complex
(?:[A-Z][a-z]+_?(\d+))
Then I don't know what to tell you. Do you think that's too complex and should be a python script too? I don't think so. It looks complex, but if you just learn it, it's easier than a 'simple' script to do the same thing.
I'd argue it's good code if you don't have to sift through lines of boilerplate to do something so trivial in jq or regex syntax.
I do lots of exploratory work in various structure data, in my case often debugging media filea via https://github.com/wader/fq, which mean doing lots of use-once-queries on the command line or REPL. In those cases jq line-friendly and composable syntax and generators really shine.
> Something not having alternatives doesn't make it necessarily nice
Of course not, but compared to every alternative today, jq is eons better than everything else. It's conciseness, ease of use, ease of learning all make it awesome. So as of right now, it is the nicest thing to use by far.
Personally though, I don't think I do wish for better. Jq is missing nothing that I want.
I really like jq, but I think there is at least one nice alternative to it: jet [1].
It is also a single executable, written in clojure and fast. Among other niceties, you don't have to learn any DSL in this case -- at least not if you already know clojure!
I hadn't seen this before. At a quick glance, the syntax looks fine. Though I don't know what command line utility I'd need to use it. It makes me wonder how hard a translator from jq syntax to jsonpath would be... Then we could have our cake and eat it too.
In my opinion (potentially nor popular) JQ has this appeal to nerds the same way that stuff like Perl does. I say this as someone who did Perl for 20years but now prefers python or JS…
For many people regexes are as bad as the jq queries… and vice versa. I would not recommend to write python script instead of regexp, but indeed it may work the same for small data and be more readable.
I love régex and been mastering it since 1999. So much that in 2013 I used it in production to parse binary protocol with dynamic sized fields. I believe the project is still talking 10k plus devices. Google must’ve just released protocol buffers… I would love to finally see regexes which can work over custom flow of objects and also on trees.
I also loved XPath which is very powerful and very comprehensible, then there is CSS1/2/3 which are again for queries to structures tree like data.
The prospect of now learning jq does not appeal me that much even though I appreciate its ingenuity. I may recommend it to dev/ops colleagues now and then, but for me this syntax is a lot of additional cognitive pressure which does not necessarily pay up. Of course if there is large amount of JSON data - it is the Swiss knife.
But nowadays I’ll likely use some LLm to generate the jq query for me. Also would joke with my bash-diehard colleagues who would love one more DSL…
For simple things like navigating down one key, or one array entry, I know by heart, and it's incredibly useful. But anything more complicated, and I'm too lazy to lookup the documentation.
jq will fall into the bucket along with sed/awk of "tools I once wished to become an expert on, but will never do so because ChatGPT came along".
Would also put regex into that bucket, but they're so ubiquitous that I've already learned regexes. I wonder if the new wave of coders learning coding via ChatGPT will think of regexes the same way I think of sed/awk.
I think these very terse languages are precisely the ones you shouldn't unleash ChatGPT on. It needs to be really exact and if it is wrong, you can easily end up with something that is an infinite loop or takes exponential time with respect to the input.
My way of using ChatGPT is just to ask it to give me some complicated sed/awk command, and then I can usually understand easily if the command is correct, or easily look it up. So it is very good for learning.
many problems seem to have the property that it's easier to verify a solution than to come up with one. If someone provides a filled-out sudoku puzzle, it's relatively straightforward to check if they've followed the rules and completed it correctly. However, actually solving the puzzle from scratch requires a different kind of thinking and might take more time.
I've also found that learning by "ask ChatGPT, paste, verify" is so much faster and more fun than banging my head against concrete to deeply read documentation to reason about something new.
I've started doing this for new programming languages and frameworks as well, and it shortens the learning curve from months down to days.
Agree - by the time I need more than grep and reach for json parsing, it’s already complicated enough for a Python script. stdin pipped to json.loads ain’t that bad.
Def. seen jq thrown into sed/awk scripts where a readable programming language was the right move. People spend hrs finding the right syntax to these things ~ not always well spent.
I've got similar feelings about it and recently I started experimenting with writing scripts in Nushell rather than bash + jq. I get the json object as a proper type in the script, get reasonable operations available on it and don't have to think of weird escaping for either the contents or the jq script. It cuts down the size my scripts by about a half and I'm very happy with the results.
Yeah, Python is like 10-20x the number of lines required to do the same thing as jq (especially with the boilerplate of consuming stdin), but that's also why it's more readable. But generally I agree - I would choose jq over some weird bash/python hybrid most of the time. I just wish it was more immediately readable.
Simple jq programs are easy to read because simple jq programs are just path expressions, and the jq language is optimized to make path expressions easy to read. Path expressions like
.[].commit | select(.author == "Tom Hudson")
which basically says "find all commits by Tom Hudson" in the input.
`.[]` iterates all the values in its input (whether the input be an array or an object). `.commit` gets the value of the "commit" key in the input object. You concatenate path expressions with `|`, and array/object index expressions you can just concatenate w/o `|`, so `.[]` and `.commit` can be `.[] | .commit` and also `.[].commit`. Calls to functions like `select()` whose bodies are path expressions are.. also path expressions.
Perhaps the most brilliant thing about jq is that you can assign to arbitrarily complex path expressions, so you can:
The syntax is strange probably because of this trying to make path expressions so trivial and readable.
jq programs get hard to read mainly when you go beyond path expressions, especially when you start doing reductions. The problem is that it resembles point free programming in Haskell, which is really not for everyone.
The other thing is that jq is very much a functional programming language, and that takes getting used to.
Also, here’s something that seems not widely appreciated: You can write super clever unreadable one-long-line jq programs embedded in bash scripts (I hear you on the point-free thing), or you can write jq programs that live in their own files, with multiple lines, indentation, comments, and intermediate assignments to variables with readable names. I recommend the latter!
This also won't work since it'll crash on missing fields. e.get("commit", {}).get("author", "") maybe (ignoring the corner case of non-list top level object).
This is a non-problem solved by the jq example. Clearly nobody sane writes (or consumes) APIs which sometimes produce array of object, sometimes produce singular objects of the same shape... Or maybe I'm spoiled from using typed languages and cannot see the ingenuity of the python/javascript/other-untyped-hyped-lang api authors that it solves?
> Clearly nobody sane writes (or consumes) APIs which sometimes produce array of object, sometimes produce singular objects of the same shape...
Has nothing to do with arrays, it has to do with the fact that Python dicts with string indexes and Python objects with properties are different things, unlike JS where member and index access are just different ways of accessing object properties.
> Or maybe I'm spoiled from using typed languages and cannot see the ingenuity of the python/javascript/other-untyped-hyped-lang api authors that it solves?
This isn't an untyped thing, this is a JavaScript (and thus JSON) and Python have type systems (even if they usually don't statically declare them) and those type systems and thus the syntax around objects are different between the two.
Oops, yep totally. Even more futzy! Think if I was doing this a lot I'd totally pull out one of those "dict wrappers that allow for attr-based access" that lots of projects end up writing for whatever reason
I wish it had won over jq because JMESPath is a spec with multiple implementations and a test suite where jq is... well jq and languages have bindings not independent implementations.
> I wish it had won over jq because JMESPath is a spec with multiple implementations and a test suite where jq is... well jq and languages have bindings not independent implementations.
jq has multiple implementations too! In Go, Rust, Java, and... in jq itself.
> jackson-jq aims to be a compatible jq implementation. However, not every feature is available; some are intentionally omitted because thay are not relevant as a Java library; some may be incomplete, have bugs or are yet to be implemented.
Where JMESPath has fully compliant 1st party implementations in Python, Go, Lua, JS, PHP, Ruby, and Rust and fully compliant 3rd party implementations in C++, Java, .NET, Elixer, and TS.
Having a spec and a test suite means that a all valid JMESPath programs will work and work the same anywhere you use it. I think jq could get there but it doesn't seem to be the project's priority.
I've found Ruby much nicer for writing dirty parsing logic like this in a "real" language, it lets you be more terse and "DRY" than Python. Which in bigger software projects doesn't hurt me as much but when I'm primarily trying to write something that otherwise would be well handled by SQL or JQ I found Ruby the better middleground for me.
"Indecipherable string" to me means you likely don't understand the language or how it works.
The language itself works very well for what it needs to do.
It does not work the same way as something like parsing an object and manipulating it in python.
It is a query language. You are building up a result not manipulating objects.
Definitely unintuitive if you are coming from a programming language.
Once learned it makes a lot more sense and is even preferable depending on your needs.
> it's the unintuitive query syntax and the need to search for every minute step
I love jq as a power tool and have the same challenges. I think the best path would have been for JavaScript to adopt something akin to JsonPath, although I more often reach to jq out of familiarity than use it in kubectl.
I hadn't looked into JsonPath as a standard, and on closer inspection, it looks to be stalled out. Maybe I'll keep piping kubectl get <resource> -ojson | jq '<what I'm looking for>'.
The responses to this comment seem to miss a vital point that the comment is making: languages executed within a different primary language are usually opaque to the tools in use. Those tools are usually aimed purely at the primary language, not any secondary languages used within it. Tools for the secondary language are now much harder to use because they (usually) have to be invoked and used via the primary language.
If I’m working on a Python script which has some jq embedded in it, then these problems probably exist:
- My editor will only syntax colour the Python, and treat jq code as a uniform string with no structure
- My linter will only consider Python problems, not jq problems
- My compiler, which is able to show parsing errors at compile time rather than runtime, will not give me any parsing errors for jq until execution hits it (yes, Python has a compilation step)
- jq error messages that show a line number will give me a relative line number for the jq code, rather than the real line number for where that code lives in the Python file
- My debugger will only let me pause and inspect Python, and treat the jq execution as a black box of I/O
I’m discussing this as a jq problem, but this happens far more commonly with SQL inside any host language. No wonder ORMs are so popular: their value isn’t just about hiding/abstracting SQL, it’s about wrangling SQL as a secondary language inside a different primary one.
- Microsoft’s LINQ for C#
- Webdev-focused IDEs which aim to correctly handle HTML and Javascript inside server-side languages (e.g. PHP)
jq is way too much for what I need. I hacked together a filter in C to reformat JSON and I like it better than every JSON library/utility I have tried. For simple reformatting, jq is slow and brittle by comparison. Also, I can extract JSON from web pages and other mixed input. All the JSON utilities I have tried expect perfectly-formed JSON and nothing else.
I also find VisiData is useful for adhoc exploring of JSON data. You can also use it to explore multiple other formats. I find it really helpful, plus it gives that little burst of adrenaline from its responsive TUI, similar to fx and jless mentioned.
For my toolbox I include jq, gron, miller, VisiData, in addition to classics like sed, awk, and perl.
I understand where you're coming from and often feel the same, but I'm also afraid that this is a clear case of inherent complexity: querying JSON is just a complex problem and requires a complex query language, regardless of how well a piece of software implementing it is designed. The same is valid for regexes of course.
The main problem is treating one-thing and many-things the same way. Its not a great PL design choice (and its why we can't have slurp as a filter). If streams (not arrays) were also first-class, we would easily have `smap`, `sselect` etc and the code would look like a functional programming language where | is the pipeline operator.
Otherwise, its fine if you try to keep the thought "everything is a 'filter' or a composition of filters, and a 'filter' is a function that either maps, flatMaps or filters things" in your mind at all times
`jq` and `GNU Parallel` share a world in my brain where I know they're wonderful tools, but I spend more time grokking the syntax of each one as rarely as I need either, than just writing a bash/sed/awk/perl, ruby, or python script to do what I need.
`jq` solves the problem of JSON in legacy shells. But I think the real problem is that the world is stuck using Bash rather than a more modern shell that can parse JSON (as well as other data structures) as natively as raw byte streams.
The problem with Bash is to do anything remotely sophisticated you end up embedding DSLs (a bit of awk, some sed, a sprinkle of jq, and so on and so forth) into something that is itself already a DSL (ie Bash).
Whereas a few more modern shells have awk, sed and jq capabilities baked into the shell language itself. So you don’t need to mentally jump hoops every time you need to parse a different type of structured data.
It’s a bit like how you wouldn’t run an embedded Javascript or Perl engine inside your C#, Java or Go code base just to parse a JSON file. Instead you’d use your languages native JSON parsing tools and control structures to query that JSON file.
Likewise, the only reason jq exists is because Bash is useless and parsing anything beyond lists of bytes. If Bash supported JSON natively, like Powershell does (and to be clear, I’m not a fan of Powershell but for whole different reasons) then there would be literally no need for jq.
Community refuses to admit that powershell is much better alternative to bash/python combo and here we are stuck in this mess.CI/CD scripts spaguetti is usually the most unstable piece of code in a company.
> Community refuses to admit that powershell is much better alternative to bash/python combo
Because its not.
Powershell is very nice as a glue language for .NET components, and its better as a general purpose shell/scripting language than the old DOS-inspired Windows Command Prompt, for sure.
I greatly dislike case-insensitivity. It's a source of many problems for users and implementors.
For implementors case-insensitivity means the need for full Unicode support is urgent, while Unicode canonical equivalence does not often make the need for full Unicode support urgent. In practice one often sees case-insensitivity for ASCII, and later when full Unicode support is added you either have to have a backwards compatibility break or new functions/operators/whatever to support Unicode case insensitivity.
For users case-insensitivity can be surprising.
For code reviewers having to constantly be on the lookup for accidental symbol aliasing via case insensitivity is a real pain.
Why does it have to be bash+python? I'm finding myself using node.js scripts glued together by bash ones these days unless I'm working on a lot of data. Doing that means you can work with json natively.
`json.loads` in Python exists, and Python does the intuitive thing when you do `{"a": 1} == {"a": 1}`, at least for most purposes (you want the other option? `is` is right there!). Stuff like argparse is not the easiest thing to use but it's in the standard library and relatively easy to use as well.
Not going to outright say that node.js scripts are the worst thing ever (they're not), but out-of-the-box Python is totally underrated (except on MacOS where `urllib` fails with some opaque errors untill you run some random script to deal with certs)
Assuming <data> will be a key-value-object aka dict, it would be something like this:
import json
data = json.loads('<data>')
bar = None
if foo:=data.get('foo'):
bar = foo[0].bar
print(bar)
If you can't be sure to get a dict, another type-check would be necessary. If you read from a file or file-like-object (like sys.stdin), json.load should be used.
I love nodejs, it's my go-to language for server side stuff.
Even with that bias though, I have to admit that it's awful for typical command line script stuff.
Dealing with async and streams and stuff for parsing csv files is miserable (I just wrote some stuff to parse and process hundreds of gigs of files in node, and it wasn't fun).
Python is the right tool for that job IMHO.
Also, weirdly, maybe golang? I just came across this [1] and it has one of my eyebrows cocked.
Any not-designed-specifically-for-shell language will suck for shell, more or less. Ruby, python, node, whatever, they all have the same problem - you write stuff too much and care about stuff you shouldn't care while in shell.
You're probably right. I just wish there was an easier way to handle json on the command line that didn't turn into its own dsl. The golang scripting seems interesting, might be what motivates me to learn the language.
Apparently, the old community need to literary die with their old habits for new to take place. There is no amount of good argumentation that can be fruitful here. And there is tone of it, pwsh is simply on another level then existing combos.
The fact that you have to learn a new language to parse JSON is frankly insulting. If you've gotten to the point you're parsing JSON with a shell script, you should've switched to a real language a week ago.
Some people are weird and awe at the ellegance of piping 8 obscure commands, but if I'm given this shit and have to keep it working, I'm rewriting it on the spot.
Are you rewriting it in the first language you learned?
Sometimes less general tools are nice. If they fit the problem space well, they can be very expressive without feeling unwieldy. And in some contexts reducing the power/expressivity is actually a good thing (e.g. not using a C interpreter to make your program and your config file use the same 'language')