Modern Linux Tools vs. Unix Classics: Which Would I Choose?

mrkeen · on Sept 24, 2023

> Then, it parses the JSON with the aforementioned Unix staples in a for loop.

Show the damn code. Otherwise I'm just going to presume that the awk&sed reimplemented Json parsing in an indecipherable, buggy way.

mid-kid · on Sept 25, 2023

My assumption is that he probably just made assumptions about the format. Stuff like "my strings won't be represented by hexadecimal escape sequences" and "my json file will be split up line by line".

It's really convenient when you can get away with stuff like that, and even if it's not a "proper" solution, at the end of the day it really doesn't always have to be.

umvi · on Sept 24, 2023

I too can never remember jq syntax when I need to. I usually just end up writing a Python script to extract the part of the JSON I need.

defrost · on Sept 24, 2023

If JQ is too much, see GRON &| Miller

gron transforms JSON into discrete assignments to make it easier to grep for what you want https://github.com/tomnomnom/gron

Miller is like awk, sed, cut, join, and sort for data formats such as CSV, TSV, JSON, JSON https://github.com/johnkerl/miller

eternityforest · on Sept 24, 2023

Really cool but... Python exists. If I am gonna do anything with JSON that is probably well into "Time to use a full programming language" territory

defrost · on Sept 24, 2023

FWiW my preference for JSON related data transforms and driven computation is almost always JQ - it is a Turing-complete interpreter language with JSON iterator as the first-citizen object.

I suggested the above for those that can't justify the time it takes to make JQ perform well for whatever relatively minor needs they have.

JQ is back under active development and there are a number of optimised | Go | Rust spinoff github projects that'll likely do some interesting work now that the mainline is starting to release again.

kitd · on Sept 24, 2023

Nushell is also good for this, especially when you need shell capabilities too.

https://www.nushell.sh/

e40 · on Sept 24, 2023

Gron looks interesting, but I wish there was an option for bash output!

kitd · on Sept 24, 2023

The main benefit (indeed the purpose) of gron is that the output is much easier to manipulate into what you need using basic gnu utils, instead of trying to shoehorn it into jq's syntax.

btschaegg · on Sept 24, 2023

Having used gron in bash scripts, I think "…is much easier to manipulate […] using gnu utils" is overselling it somewhat.

You can `grep` through it, yes. But since the output is still structured to be Javascript (which makes it nice in some more immediate ways if you're working interactively) makes it tricky to deal with with e.g. `awk` -- you'd still have to parse the JS line and implement string unescaping, which still makes it more complicated than necessary. And if you work e.g. on JSON in a bash variable, running `gron -v` also is cumbersome enough for me to want to use another tool.

So, I still like `gron`, I just think it has a rather small niche. One that maybe could be a bit larger if there was a way for it to output delimited records with user-specified delimiters (provided you know the output enough to be sure they don't appear in any strings).

elesiuta · on Sept 24, 2023

> I too can never remember jq syntax when I need to. I usually just end up writing a Python script

Same here! That's why for small things I made pyxargs [1] to use python in the shell. In another thread I also just learned of pyp [2] which I haven't tried yet but looks like it's even better for this use case.

[1] https://github.com/elesiuta/pyxargs

[2] https://github.com/hauntsaninja/pyp

Grimburger · on Sept 24, 2023

I tend to frustrate colleagues with this but it's honestly so much more manageable.

A python script is the sane extensible choice rather than some esoteric bash incantation that you have no clue about 6 months after writing it.

paiute · on Sept 24, 2023

I would say the same thing about sed and awk. I have a few basics down solid, but anything complex is best done with python.

raincole · on Sept 24, 2023

There is nothing wrong with choosing a language like Python/Ruby over "command line sorcery".

ElCapitanMarkla · on Sept 24, 2023

ChatGPT is a fantastic usecase for this. Last week I had to extract a bunch of data that was in some nested arrays, pasted a json sample into ChatGPT and asked it to use jq to extract certain fields when certain conditions were met, a couple of tweaks later and I had exactly what I needed.

fragmede · on Sept 24, 2023

Honestly, whenever I need to use jq, I just search my bash history (fzf ftw) for the last time I used it. How'd I get it to work the first time? lost to the sands of time...

alexwasserman · on Sept 24, 2023

jq is definitely tough to learn, I can never remember it.

But, the whole argument against jq as a unitasker not worth learning and traditional unix tools being better is weird. Traditionally Unix tool mentality was “do one thing only and do it well” then pipe it together. jq fits perfectly into the Unix toolset.

mongol · on Sept 24, 2023

Yes jq is definitely a tool like that. The problem nowadays is maybe more that there are many competing options. jq seems most popular for JSON parsing but once in a while another tool is used that maybe has better ergonomics but a different syntax etc. You can master jq but it will not be enough since there is no consensus. This used to be easier I believe, the tools in common use were fewer. So let's imagine awk got JSON support. Would that be for the better or worse?

bigstrat2003 · on Sept 24, 2023

Also, Alton Brown doesn't advise against unitaskers because he's some kind of hater. The reason you don't have unitaskers in the kitchen is because they take up physical space. You have to decide which tools are worth the space they take up, and it's hard to justify that for a tool you will only use once in a while. That's not a concern with computers any more, so the reasoning doesn't apply here.

geocrasher · on Sept 25, 2023

Replace kitchen space with head space.

mongol · on Sept 24, 2023

I recall a tool that rewrites json to a dot notation that is easily grep-able. It prepended each value so all the parents were in front, something like a.b.c=d

But I have forgot the name..

Edit: was already mentioned in the thread! Gron

ninjin · on Sept 24, 2023

There is also json2tsv [1] that follows a similar philosophy and I have had some fun combining it with awk(1) recently for database ingestion.

[1]: https://www.codemadness.org/json2tsv.html

1vuio0pswjnm7 · on Sept 24, 2023

"Those tried and true commands we were referring to? None other than the usual awk sed cut grep and of course the Unix pipe | to glue them all together. Really, why use a JSON parsing program that only could only do one function (parse JSON) when I could use a combination of tools that, when piped together, could do far more?"

IMHO, drinking the UNIX Kool-Aid means not only using coreutils and BSD userlands but also using the language in which almost all of those programs are written: C. For me, that means gcc and binutils are amongst the "tried and true commands". Also among them is flex. These are found on all the UNIX varieties I use, usually because they are used in compiling the OS. As such, no special installation is needed.

When I looked at jq source code in 2013, I noticed it used flex and possibly yacc/bison. No idea if it still does.

Using UNIX text processing utilities to manipulate JSON is easy enough. However if I am repeatedly processing JSON from the same source, e.g., YouTube, then I use flex instead of sed, etc. It's faster.

jq uses flex in the creation of a language interpreter intended^1 to process any JSON. I use flex not to create a language interpeter but to process only JSON from a single source. The blog author uses shell script to process JSON from a single source.^2 I think of the use I make of flex as like a compiled shell script. It's faster.

The blog author states than jq is specific to one type of text processing input: JSON. I write a utility that is specific to one source of JSON.

1. Sometimes it's not used as intended, e.g., https://github.com/makenowjust/bf.jq

2. I also used flex to make simple utility to reformat JSON from any source so it's easer to read and process with line-oriented UNIX utilities. Unlike jq and other JSON reformatters it does not require 100% correct JSON; e.g., it can accept JSON that is mixed in with HTML which I find is quite common in today's web pages.

foul · on Sept 26, 2023

> When I looked at jq source code in 2013, I noticed it used flex and possibly yacc/bison. No idea if it still does.

It has bison and flex files in the source code currently.

> jq uses flex in the creation of a language interpreter intended^1 to process any JSON. I use flex not to create a language interpeter but to process only JSON from a single source. The blog author uses shell script to process JSON from a single source.^2 I think of the use I make of flex as like a compiled shell script. It's faster.

Like, you have a flex template and you fill in with ad-hoc C code? Nice, would find it more readable than a jq script, although a basic jq script is just a mix of bash and javascript and when I grok it for the 123th time (because it's innatural, odd, inside a shell) it gets better.

zoom6628 · on Sept 24, 2023

Still use awk since I learnt it in 1991 ans only use the others when I can make awk do all I need so that means leaning on python for more complex logic when required. Tried ha a few times and it did my head in too (yes I'm a lazy idiot obviously - guilty as charged because if I wasn't lazy I would write a JDON parser I could actually use!).

I naturally gravitate to the simplest solution with the most well known and proven tools. Not fan of boiling oceans or immersing in obscurity when there is a deadline to meet.

pointlessone · on Sept 24, 2023

> If I can’t do it with grep sed awk and cut and a for or while loop then maybe I’m not the right guy to do it. Oh sure, I could learn Python. But I don’t need Python on a daily basis like I do bash.

I get the sentiment but to me it looks like everything looks like a nail when you only have a hammer.

Those are fine tools for an ad-hock one-liner. But if you're building something that doesn't fit into a line or two in the terminal you're better of with a proper scripting language. I don't care what it is — Python, Perl, Ruby, JavaScript, whatever — anything's better than Shell script. Shell is just too brittle with too many gotchas and edge cases. It's extremely hard to write more than about 10 lines of shell script without bugs. It's even harder to write if you don't know your exact shell. The only exception is that you know your script has to be executed on a system that only has a shell on it and you absolutely can not install/require any other interpreter.

baz00 · on Sept 24, 2023

The trick here is to get the hell away from JSON quickly. I only ever use jq to turn JSON into text and then use my crusty old tools on that.

If I’m working on my own thing I’ll use a text based format.

Although recently I parsed json out with head, tail and sed. It was numeric data nicely formatted in lines so it was easier to just remove all the semantic structure than actually parse it.

nickelpro · on Sept 24, 2023

They had a simple problem, they used simple tools they were familiar with.

That doesn't make other solutions worse, or their solution better, it just means the problem was simple enough that it barely mattered one way or another.

dmarinus · on Sept 24, 2023

I'm often in the same position, for me it's often quicker to hack something together with what I know than using a new technology.

It's just a matter of experience with the toolchain.

You only have to be cautious when your input is mangled in some way (with escape characters etc) and your toolchain cannot cope with such things.

There are many different versions of cut, awk, sed and even jq. I don't consider this very portable.

ludwigvan · on Sept 24, 2023

Using awk/sed to parse json seems to be using the wrong tool for the job.

As an alternative to jq with easier to remember syntax, see https://fx.wtf/

Recent discussion: https://news.ycombinator.com/item?id=37567009

rusk · on Sept 24, 2023

Personally I prefer ‘python -m json.tool’

I resent the insular tone of jq. If I’m going to be processing the data in any way I’d prefer to make the leap into a familiar environment rather than kid myself I’m querying the doc like SQL

riwsky · on Sept 24, 2023

Everything old is gnu again

INTPenis · on Sept 24, 2023

This post reads so funny, "cheating"? Just use the right tool for the job buddy.

I just recently built jq into a container image so I could parse json inside of it. Installing jq isn't a hassle if you do it once.

Stop focusing on culture, kool-aid, and cheating and just get the job done.

systems · on Sept 24, 2023

i think more linux distros should adopt powershell, its really good, and i think solve this new tools , old tools issue better

i think with pwsh you wont need to use bash when you can and python when you have to, you just use pwsh, it bridge the gap between shell and programming better than ... a polyglot bash + python combo

telez · on Sept 24, 2023

i find that duckdb is great for processing json files. i can never remember how to use jq other than for pretty printing. using duckdb in a bash script with the sql in a heredoc is pretty powerful.

fragmede · on Sept 24, 2023

Of course, the tool that's missing is Perl.

shmerl · on Sept 24, 2023

jq is powerful, but it's nowhere simple.

goku12 · on Sept 24, 2023

While I have the same experience, I have a hunch that jq DSL would be easy if the data model is explained properly.

ekidd · on Sept 24, 2023

Parsing JSON using 'cut' and 'awk' and 'sed' sounds like the jankiest thing ever.

Sure, maybe you get it working, after much pain. Then someone decides to format your input JSON a bit differently, and everything fails catastrophically.

This is the same problem described in the infamous answer "You can't parse [X]HTML with regex":

https://stackoverflow.com/questions/1732348/regex-match-open...

(Now, if you've been caught in a time-travel accident, and if you must parse JSON using classic Unix tools, consider using lex and yacc. They are sufficiently powerful to do the job correctly.)

fragmede · on Sept 24, 2023

Raw JSON, absolutely. I usually use jq -C plus indexing into the dict but then sed/awk/grep (rg) do just fine.

zoky · on Sept 24, 2023

JSON is a particularly bad example, given that most of the commonly used scripting languages have standard libraries to parse JSON into a native data structure.

sdf4j · on Sept 24, 2023

I thought this was going to be about modern replacements for `ls` or `more`. Or maybe `ip` vs `ifconfig`.

awk and friends are good tools to process text. Not JSON. A new field, or a format change, will thrash your brittle script contraption. Please use the right tool for each job.

ori_b · on Sept 24, 2023

Yes. Pipe the json through gron or similar first.

yomritoyj · on Sept 24, 2023

awk, sed etc. belong to the museum, now that we have so many tools and libraries that can handle structured data.

The whole early Unix obsession with plain text files was a step in the wrong direction. One grating holdover of that is the /proc filesystem. Instead of a typed, structured api you get the stuff as text to be parsed, file system trees and data embedded in naming conventions.

blq10 · on Sept 24, 2023

I unironically disagree with this, structured data is incredible and powerful.

But an important part of the early internet was "its Just Text".

And in fact, the reason why JSON is so great is that if you want to use it as Just Text it works just the same!

It's a translation layer between systems that really demand highly structured data and flexible systems where as long as you can thunk about it, you can get from anywhere to anywhere else with a few simple programs that are on every machine in tbe known universe.

zzo38computer · on Sept 25, 2023

I disagree. I think awk and sed are useful for some uses, even though it is not very good for JSON. (I also think JSON is not the best format for structured data anyways.)

In the /proc file system, they could have made the data format better even without using JSON though. (For example, null-terminated text strings might be better than using parentheses around it; since, then, what if the data includes parentheses? You could use the PostScript format for escaping (string literals in PostScript are written with parentheses around it, and can be nested), but, it would be better and simpler to not require any escaping, isn't it?

(Anyways, my own ideas of operating system design, one is that it does have structured data in a much better way, so avoids this and other problems. There is a common structure for most things, and would be designed to improve the efficiency compared with JSON, XML, etc, as well as other advantages. This can also be used inherently with the command shell, so unlike Nushell, the system is more designed for this.)

zgs · on Sept 24, 2023

The title should read modern GNU tools. Linux is a kernel and supplies nothing useful for command line lovers.

Nonetheless, the more modern tools are generally better, faster and more feature rich. I'd pick them over the older versions unless there was a compelling reason not to.

Yes, I've used both. For almost forty years now.

eesmith · on Sept 24, 2023

I think you mean GNU/Linux?

jq is neither a GNU project nor distributed under a GNU license, which is what your suggested title implies.

zoky · on Sept 24, 2023

Okay, who let rms in here?

sbuk · on Sept 24, 2023

'Modern tools vs. GNU coreutils on the Unix and Unix-like command line' would be more apt (no pun intended). Remember, GNU is Not Unix...

em500 · on Sept 24, 2023

Yes, I was a bit confused about this too. Bash (1989) itself is a modern Linux tool. The Unix classic (released between 1973 and 1985 according to the author) would be sh (1979) or perhaps ksh (1983).

p0w3n3d · on Sept 24, 2023

I wonder if the script created by the author would go through edge cases of JSON, like escaped characters. I doubt and this sounds like a famous joke to me:

  Do you wanna hear a joke?
  Parsing XML with regexp

I did it too. But this was nightmare

geocrasher · on Sept 25, 2023

Author here.

Definitely not. I wrote it to parse the already very nicely formatted output that I had. For example to get value B so I could search replace based on value D, when values A,B,C,D, are in order on separate lines.

valueb=$(grep D $json -B2| grep B)

now valueb = B when I started with D, and I can sed 's/B/D/' pretty easily. Now, the JSON I got were a big giant blog as it often is, I'd have had to at least use gron, which is a nifty tool I learned about by posting this here, and for that I'm grateful!

p0w3n3d · on Sept 25, 2023

Thanks. For known cases I would probably do the same. But I've already burnt my hand with these regexps so many times, I feel itchy when I see a brick. Also lost some files due to `cut` + unexpected spaces, but this is another story