My assumption is that he probably just made assumptions about the format. Stuff like "my strings won't be represented by hexadecimal escape sequences" and "my json file will be split up line by line".
It's really convenient when you can get away with stuff like that, and even if it's not a "proper" solution, at the end of the day it really doesn't always have to be.
FWiW my preference for JSON related data transforms and driven computation is almost always JQ - it is a Turing-complete interpreter language with JSON iterator as the first-citizen object.
I suggested the above for those that can't justify the time it takes to make JQ perform well for whatever relatively minor needs they have.
JQ is back under active development and there are a number of optimised | Go | Rust spinoff github projects that'll likely do some interesting work now that the mainline is starting to release again.
The main benefit (indeed the purpose) of gron is that the output is much easier to manipulate into what you need using basic gnu utils, instead of trying to shoehorn it into jq's syntax.
Having used gron in bash scripts, I think "…is much easier to manipulate […] using gnu utils" is overselling it somewhat.
You can `grep` through it, yes. But since the output is still structured to be Javascript (which makes it nice in some more immediate ways if you're working interactively) makes it tricky to deal with with e.g. `awk` -- you'd still have to parse the JS line and implement string unescaping, which still makes it more complicated than necessary. And if you work e.g. on JSON in a bash variable, running `gron -v` also is cumbersome enough for me to want to use another tool.
So, I still like `gron`, I just think it has a rather small niche. One that maybe could be a bit larger if there was a way for it to output delimited records with user-specified delimiters (provided you know the output enough to be sure they don't appear in any strings).
> I too can never remember jq syntax when I need to. I usually just end up writing a Python script
Same here! That's why for small things I made pyxargs [1] to use python in the shell. In another thread I also just learned of pyp [2] which I haven't tried yet but looks like it's even better for this use case.
ChatGPT is a fantastic usecase for this. Last week I had to extract a bunch of data that was in some nested arrays, pasted a json sample into ChatGPT and asked it to use jq to extract certain fields when certain conditions were met, a couple of tweaks later and I had exactly what I needed.
Honestly, whenever I need to use jq, I just search my bash history (fzf ftw) for the last time I used it. How'd I get it to work the first time? lost to the sands of time...
jq is definitely tough to learn, I can never remember it.
But, the whole argument against jq as a unitasker not worth learning and traditional unix tools being better is weird. Traditionally Unix tool mentality was “do one thing only and do it well” then pipe it together. jq fits perfectly into the Unix toolset.
Yes jq is definitely a tool like that. The problem nowadays is maybe more that there are many competing options. jq seems most popular for JSON parsing but once in a while another tool is used that maybe has better ergonomics but a different syntax etc. You can master jq but it will not be enough since there is no consensus. This used to be easier I believe, the tools in common use were fewer. So let's imagine awk got JSON support. Would that be for the better or worse?
Also, Alton Brown doesn't advise against unitaskers because he's some kind of hater. The reason you don't have unitaskers in the kitchen is because they take up physical space. You have to decide which tools are worth the space they take up, and it's hard to justify that for a tool you will only use once in a while. That's not a concern with computers any more, so the reasoning doesn't apply here.
I recall a tool that rewrites json to a dot notation that is easily grep-able. It prepended each value so all the parents were in front, something like a.b.c=d
"Those tried and true commands we were referring to? None other than the usual awk sed cut grep and of course the Unix pipe | to glue them all together. Really, why use a JSON parsing program that only could only do one function (parse JSON) when I could use a combination of tools that, when piped together, could do far more?"
IMHO, drinking the UNIX Kool-Aid means not only using coreutils and BSD userlands but also using the language in which almost all of those programs are written: C. For me, that means gcc and binutils are amongst the "tried and true commands". Also among them is flex. These are found on all the UNIX varieties I use, usually because they are used in compiling the OS. As such, no special installation is needed.
When I looked at jq source code in 2013, I noticed it used flex and possibly yacc/bison. No idea if it still does.
Using UNIX text processing utilities to manipulate JSON is easy enough. However if I am repeatedly processing JSON from the same source, e.g., YouTube, then I use flex instead of sed, etc. It's faster.
jq uses flex in the creation of a language interpreter intended^1 to process any JSON. I use flex not to create a language interpeter but to process only JSON from a single source. The blog author uses shell script to process JSON from a single source.^2 I think of the use I make of flex as like a compiled shell script. It's faster.
The blog author states than jq is specific to one type of text processing input: JSON. I write a utility that is specific to one source of JSON.
2. I also used flex to make simple utility to reformat JSON from any source so it's easer to read and process with line-oriented UNIX utilities. Unlike jq and other JSON reformatters it does not require 100% correct JSON; e.g., it can accept JSON that is mixed in with HTML which I find is quite common in today's web pages.
> When I looked at jq source code in 2013, I noticed it used flex and possibly yacc/bison. No idea if it still does.
It has bison and flex files in the source code currently.
> jq uses flex in the creation of a language interpreter intended^1 to process any JSON. I use flex not to create a language interpeter but to process only JSON from a single source. The blog author uses shell script to process JSON from a single source.^2 I think of the use I make of flex as like a compiled shell script. It's faster.
Like, you have a flex template and you fill in with ad-hoc C code? Nice, would find it more readable than a jq script, although a basic jq script is just a mix of bash and javascript and when I grok it for the 123th time (because it's innatural, odd, inside a shell) it gets better.
Still use awk since I learnt it in 1991 ans only use the others when I can make awk do all I need so that means leaning on python for more complex logic when required. Tried ha a few times and it did my head in too (yes I'm a lazy idiot obviously - guilty as charged because if I wasn't lazy I would write a JDON parser I could actually use!).
I naturally gravitate to the simplest solution with the most well known and proven tools. Not fan of boiling oceans or immersing in obscurity when there is a deadline to meet.
> If I can’t do it with grep sed awk and cut and a for or while loop then maybe I’m not the right guy to do it. Oh sure, I could learn Python. But I don’t need Python on a daily basis like I do bash.
I get the sentiment but to me it looks like everything looks like a nail when you only have a hammer.
Those are fine tools for an ad-hock one-liner. But if you're building something that doesn't fit into a line or two in the terminal you're better of with a proper scripting language. I don't care what it is — Python, Perl, Ruby, JavaScript, whatever — anything's better than Shell script. Shell is just too brittle with too many gotchas and edge cases. It's extremely hard to write more than about 10 lines of shell script without bugs. It's even harder to write if you don't know your exact shell. The only exception is that you know your script has to be executed on a system that only has a shell on it and you absolutely can not install/require any other interpreter.
The trick here is to get the hell away from JSON quickly. I only ever use jq to turn JSON into text and then use my crusty old tools on that.
If I’m working on my own thing I’ll use a text based format.
Although recently I parsed json out with head, tail and sed. It was numeric data nicely formatted in lines so it was easier to just remove all the semantic structure than actually parse it.
They had a simple problem, they used simple tools they were familiar with.
That doesn't make other solutions worse, or their solution better, it just means the problem was simple enough that it barely mattered one way or another.
I resent the insular tone of jq. If I’m going to be processing the data in any way I’d prefer to make the leap into a familiar environment rather than kid myself I’m querying the doc like SQL
i think more linux distros should adopt powershell, its really good, and i think solve this new tools , old tools issue better
i think with pwsh you wont need to use bash when you can and python when you have to, you just use pwsh, it bridge the gap between shell and programming better than ... a polyglot bash + python combo
i find that duckdb is great for processing json files. i can never remember how to use jq other than for pretty printing. using duckdb in a bash script with the sql in a heredoc is pretty powerful.
Parsing JSON using 'cut' and 'awk' and 'sed' sounds like the jankiest thing ever.
Sure, maybe you get it working, after much pain. Then someone decides to format your input JSON a bit differently, and everything fails catastrophically.
This is the same problem described in the infamous answer "You can't parse [X]HTML with regex":
(Now, if you've been caught in a time-travel accident, and if you must parse JSON using classic Unix tools, consider using lex and yacc. They are sufficiently powerful to do the job correctly.)
JSON is a particularly bad example, given that most of the commonly used scripting languages have standard libraries to parse JSON into a native data structure.
I thought this was going to be about modern replacements for `ls` or `more`. Or maybe `ip` vs `ifconfig`.
awk and friends are good tools to process text. Not JSON. A new field, or a format change, will thrash your brittle script contraption. Please use the right tool for each job.
awk, sed etc. belong to the museum, now that we have so many tools and libraries that can handle structured data.
The whole early Unix obsession with plain text files was a step in the wrong direction. One grating holdover of that is the /proc filesystem. Instead of a typed, structured api you get the stuff as text to be parsed, file system trees and data embedded in naming conventions.
I unironically disagree with this, structured data is incredible and powerful.
But an important part of the early internet was "its Just Text".
And in fact, the reason why JSON is so great is that if you want to use it as Just Text it works just the same!
It's a translation layer between systems that really demand highly structured data and flexible systems where as long as you can thunk about it, you can get from anywhere to anywhere else with a few simple programs that are on every machine in tbe known universe.
I disagree. I think awk and sed are useful for some uses, even though it is not very good for JSON. (I also think JSON is not the best format for structured data anyways.)
In the /proc file system, they could have made the data format better even without using JSON though. (For example, null-terminated text strings might be better than using parentheses around it; since, then, what if the data includes parentheses? You could use the PostScript format for escaping (string literals in PostScript are written with parentheses around it, and can be nested), but, it would be better and simpler to not require any escaping, isn't it?
(Anyways, my own ideas of operating system design, one is that it does have structured data in a much better way, so avoids this and other problems. There is a common structure for most things, and would be designed to improve the efficiency compared with JSON, XML, etc, as well as other advantages. This can also be used inherently with the command shell, so unlike Nushell, the system is more designed for this.)
The title should read modern GNU tools. Linux is a kernel and supplies nothing useful for command line lovers.
Nonetheless, the more modern tools are generally better, faster and more feature rich. I'd pick them over the older versions unless there was a compelling reason not to.
Yes, I was a bit confused about this too. Bash (1989) itself is a modern Linux tool. The Unix classic (released between 1973 and 1985 according to the author) would be sh (1979) or perhaps ksh (1983).
I wonder if the script created by the author would go through edge cases of JSON, like escaped characters. I doubt and this sounds like a famous joke to me:
Definitely not. I wrote it to parse the already very nicely formatted output that I had. For example to get value B so I could search replace based on value D, when values A,B,C,D, are in order on separate lines.
valueb=$(grep D $json -B2| grep B)
now valueb = B when I started with D, and I can sed 's/B/D/' pretty easily. Now, the JSON I got were a big giant blog as it often is, I'd have had to at least use gron, which is a nifty tool I learned about by posting this here, and for that I'm grateful!
Thanks. For known cases I would probably do the same. But I've already burnt my hand with these regexps so many times, I feel itchy when I see a brick. Also lost some files due to `cut` + unexpected spaces, but this is another story
Show the damn code. Otherwise I'm just going to presume that the awk&sed reimplemented Json parsing in an indecipherable, buggy way.