Hacker News new | past | comments | ask | show | jobs | submit login
Tips on adding JSON output to your CLI app (kellybrazil.com)
183 points by kbrazil on Dec 5, 2021 | hide | past | favorite | 110 comments



I hadn't seen jc before (by the author of this piece: https://github.com/kellyjonbrazil/jc ) - what a great idea! It has parsers for around 80 different classic Unix utilities such that it can convert their output to JSON.

    ~ % dig example.com | jc --dig | jq
    [
      {
        "id": 61315,
        "opcode": "QUERY",
        "status": "NOERROR",
        "flags": [
          "qr",
          "rd",
          "ra"
        ],
        "query_num": 1,
        "answer_num": 1,
        "authority_num": 0,
        "additional_num": 1,
        "opt_pseudosection": {
          "edns": {
            "version": 0,
            "flags": [],
            "udp": 512
          }
        },
        "question": {
          "name": "example.com.",
          "class": "IN",
          "type": "A"
        },
        "answer": [
          {
            "name": "example.com.",
            "class": "IN",
            "type": "A",
            "ttl": 85586,
            "data": "93.184.216.34"
          }
        ],
        "query_time": 29,
        "server": "10.0.0.1#53(10.0.0.1)",
        "when": "Sun Dec 05 15:12:08 PST 2021",
        "rcvd": 56,
        "when_epoch": 1638745928,
        "when_epoch_utc": null
      }
    ]


Oh Christ yes, it supports lsof! The output from lsof has always been hard to script up.


Turns out I can cross out a todo item off my list, because I wanted to make exactly the same thing.

Now, to find a query tool with a saner language than Jq...


If you like python you might check out jello[0]. I basically wrote it to give you the power and simplicity of python without the boiler plate in a jq-like form-factor. Jello also allows you to use dot-notation instead of dict bracket notation, so it does make things easier on the command line.

Also, there is jellex[1], which is a TUI wrapper around jello that can help you build your queries.

[0] https://github.com/kellyjonbrazil/jello [1] https://github.com/kellyjonbrazil/jellex


What kind of thing are you trying to do?

jq can get pretty deep but for most things in this area I'm not sure how it could improve upon, but would be interested in hearing alternatives.

https://github.com/fiatjaf/jiq

Is a realtime feedback wrapper which I find useful when crafting one-off command line uses for jq and it starts getting crazy.


Stepping just a little beyond regular ‘loop and filter’ is already difficult without consulting the manual each time — being exacerbated by the impossibility of finding those things in the manual without skimming through most of it. Making an ‘if’ for variations in the input structure is easily a twenty-minute job. Outside of the basic features, Jq's syntax is increasingly arcane and unintuitive—maybe those working with it daily do remember the ‘advanced’ stuff, but I don't.

I actually collected a sizable list of alternatives to Jq:

https://github.com/TomConlin/json2xpath

https://github.com/antonmedv/fx

https://github.com/simeji/jid

https://github.com/jmespath/jp

https://jsonnet.org

https://github.com/borkdude/jet

https://github.com/jzelinskie/faq

https://github.com/dflemstr/rq

https://github.com/mgdm/htmlq

And there's someone else's list of stuff around Jq: https://github.com/fiatjaf/awesome-jq

However, personally I think that next time I might instead fire up Hy, and use the regular syntax with the functional approach for any convoluted processing I come up with. Last time I mentioned this, another HNer made a Jq-like tool with Lisp-like syntax: https://github.com/cube2222/jql (from https://news.ycombinator.com/item?id=21981158).



New parser contributions for JC are always welcome!


Another DON'T, silently switching between JSON and human-readable depending on whether the output destination is a pipe. Just an extra hassle when I'm writing my downstream command. Or could be phrased as a DO: give the user a switch to pick the output format, if you have both.


I started to try to make an argument for pipe autodetection, but I just can't. It seems like a useful feature but is actually a trap. Shell scripts that are going to rely on json output should always explicitly specify that they want json output, and giving them the ability to shortcut that by autodetecting a pipe will only make it easy to ignore that - And then break when some other format comes into vogue. Human readable output should generally be the default, unless the programs are explicitly designed as only part of a pipeline.

Having multiple outputs is a great feature, though. I'm especially fond of tooling in Kubernetes that allows you to nicely pipe things in and out in multiple formats.


Pretty much every linux tool pulls these shenanigans. I hate it when there's no flag to control output.


What about things like ANSI color codes?


My preference would be to leave them in if they're the default, but to have an option to switch entirely to a more machine-readable format. It's a single line of sed to strip them out[0], and it's a bigger pain to figure out why a program has different behavior when debugging.

[0] https://superuser.com/a/380778


I typically run the command output through `cat` to see if it does different things when outputting to the terminal vs piping.

The biggest changes are typically ANSI color codes and column length. Some programs do strange things, like changing how they escape characters (ls).

JC turns off ANSI color codes when it’s output is not a terminal, but that’s it.


Changes in column length isn’t something that can’t be helped. The issue is that when a command outputs to a TTY, you can use that file descriptor to get the width of the TTY (ie your terminal emulator). But if you are piping a command then it’s STDOUT is no longer a TTY and thus you have no idea how wide the terminal is.


Hope it supports NO_COLOR (http://no-color.org/)

    env NO_COLOR=1 ...


I hadn’t seen this before - thanks for sharing!


Good question; those are more reasonable to me since they aren't visible as characters and don't change the structure. So if I'm looking at the colorized output I can still use that as the basis for a sed or awk script operating on the non-colorized version.


Oh, and I still want `--color=always` or equivalent for when I'm piping into a pager.


CLI and JSON would be amazing if terminals made a step forward too. Because both raw json and triangled in-browser console json logging just suck for daily reading. A new terminal could either detect patterns or use explicit cues in json to format structures, and show raw data on demand. E.g. this json5: [{_repr:"ls:file", name:"foo.txt", type:"text/plain", size:512, access:"664", …}] could be presented as usual ls does, but processed further as json. A whole lot of representations (and editors) – including ui-based – could be added to the system (e.g. /usr/local/share/repr/ls:file (+x)) to format any sort of data, instead of formatting it in-program with getopt and printf. And when there is no repr file, well, you still have triangles mode. We’re too stuck with text=text=text idiom. Structure=text=ui would be so much better.

(I’m aware of powershell and am ignoring it consciously)


This kindof reminds me of the ipython display protocol

https://carreau.github.io/posts/29-JupyterCon-DisplayProtoco...


Such a shell already exists

https://github.com/lmorg/murex

You’d have to learn a new shell syntax but at least it’s compatible with existing CLI tools (which Powershell isn’t)


I was talking about terminal [emulators], not a shell. A shell has nothing to do with how the output stream is displayed. Murex is more like powershell in this regard, which is on/around the level of implementation that I personally find inappropriate.


I know you said terminal but short of rewriting the entire stack, TTYs and all, you have to work with what you’ve got.

With murex, it does actually have a lot of intelligence built in that adapts how the output stream is displayed (eg rendering images in the terminal, “prettifying” JSON if STDOUT is a TTY, colourising STDERR red, etc.

Plus I disagree that your GP points about how JSON is handled should be a shell thing. You’re talking about data being re-encoded in different formats depending on the output. That absolutely should be a shell thing (where the logic of the pipelines are handled) with the terminal being a dumb rendering client. The last thing I want is output to be modified by the terminal leaving me scratching my head as to whether a command is running correctly or whether the terminal is displaying it weirdly.


You’re talking about data being re-encoded in different formats depending on the output.

I see now that my comment is vague in this part, but no, my idea is not about a terminal transforming the output. It’s about a presentation layer only, just like ansi escape codes (but programmable through repr-scripts). The stream remains intact, you only see it formatted. Murex does ~that, but it takes ownership of data processing by its own syntax and commands (like an IDE of CLI), while I believe that it should be done by separate tools like jq (like coreutils of CLI). It would be okay to use a dumb terminal with this system, you just would have to parse json by your mind to read output into it. See also my other comment for clarity https://news.ycombinator.com/item?id=29456775


There’s no practical difference between jq doing it or shell builtins. Either way it is a piped process.

Murex could ship it’s coreutils as external executables like GNU but that wouldn’t be a particular efficient way of doing it. Whereas Bash, zsh, Fish etc all have builtins just the same as murex (eg if/endif, for, switch, read, echo, time, jobs, etc).

The only difference between murex and bash in that regard is that murex builtins can do more intelligent data parsing than just dumb byte streams. Of course if you wanted to use jq with murex you still can. Just as you can still use sed, awq, Perl -pie and others too.


Nushell does at least some of this


See also libxo:

> The libxo library allows an application to generate text, XML, JSON, and HTML output using a common set of function calls. The application decides at run time which output style should be produced. The application calls a function "xo_emit" to product output that is described in a format string. A "field descriptor" tells libxo what the field is and what it means.

* https://github.com/Juniper/libxo

Then add an "--output-format" option.


A +1 from me on something like `--format` - pipe auto-detection feels unnecessary and like an inevitable footgun.

As just one example, the Azure CLI defaults to human-readable output, but has an "output" parameter so you can have JSON if you want - I've never once wanted any kind of format auto-detection, and I have to say that I still don't.


Yup! In fact, there's only one good reason to do automatic pipe detection, and that's if your tool normally outputs ANSI escape codes, which aren't something you want in something going to a pipeline.


Do you advocate for ls to change its behavior? Given that it changes its output format based on pipe detection in ways that involve more than just ANSI escape codes.

Perhaps this is something that is actually not a "fact," and is something more like an "opinion."


TL;DR: principal of least surprise. Bringing `ls` into this isn't a great idea.

Advocate? No. Would I excuse it? Yes.

When I write tools, I write two kinds: ones that care intended to be consumed by something and ones that are not. If something is even marginally the former, I assume the former.

I can excuse a tool having output changing behaviour if it's the latter, and only in the case of dropping ANSI escape codes, but only because it makes sense to pipe such output into, say `pbcopy` and `tee`, without the escape codes for capture.

Automagically removing ANSI escape codes is the _most_ I'd be comfortable with, and only because is _reduces surprise_. Something sent to stdout should be the same as something sent to a pipeline, but I can forgive somebody adding ANSI codes to make things clearer to a human reader, for whom those are invisible in the stream.

Would I advocate for `ls` not to send ANSI escape codes to a pipeline? Yes. Do I think it's a good idea to pipe from `ls`? Most certainly not! `ls` is written with assumptions about the consuming terminal that defeat the principal of lease surprise.


So... if you're okay with how ls changes its output format based on automatic pipe detection, then it seems like there is perhaps more than "one" good reason to do automatic pipe detection. :-)

Which is eminently reasonable. But your hard-line stance expressed above is less so.

> Bringing `ls` into this isn't a great idea.

I perceived your hardline stance as unreasonable personally, and so I wondered where the format changing functionality that most 'ls' implementations perform based on pipe detection fit in your worldview. It seems to me like your hard-line stance is not actually so hard-line. But you do a lot of mental gymnastics to get there, and you end up asserting that 'ls' shouldn't be used in pipelines. Which seems kinda nuts to me to be honest.

The better answer here IMO is that one should use "good judgment" when it comes to changing things based on automatic pipe detection. It's a matter of taste and there are more use cases for it than simply removing ANSI codes.


I don't see what's hardline about it: my line is just the principle of least surprise. And to be as emphatic as possible about it, I'm talking about tools that are meant to be consumed by pipelines.

> I perceived your hardline stance as unreasonable personally, and so I wondered where the format changing functionality that most 'ls' implementations perform based on pipe detection fit in your worldview.

Because the output of `ls` is a trashfire for parsing already. I'd strongly discourage _anyone_ from attempting to parse its output.

> But you do a lot of mental gymnastics to get there, and you end up asserting that 'ls' shouldn't be used in pipelines. Which seems kinda nuts to me to be honest.

No, I've just had to fix a bunch of broken-ass shell scripts in the past because people didn't realise that the output of `ls` doesn't obey the principle of least surprise. No mental gymnastics here, just a lot of experience fixing other people's problems. About the only good thing that ever comes out of these is that is gives me a way to show people what they can do with safer, more predictable tools, like `find`.


"there's only one good reason to do automatic pipe detection" and "please try to adhere to the principle of least surprise" are two different statements. The first is making precise numeric claim about the number of good reasons to do something, while the latter is clearly about espousing a principle. Principles are to be balanced.

I'd encourage you to pop up a level here and look at your original comment that sparked this thread:

> Yup! In fact, there's only one good reason to do automatic pipe detection, and that's if your tool normally outputs ANSI escape codes, which aren't something you want in something going to a pipeline.

There's no nuance there. No balancing of principles. No discussion about "least surprise." Just an opinion masquerading as a fact, and one that I happened to disagree with absent any other context. Your replies since then seem to think you presented a more nuanced case than what you actually did.

You've clarified a bit since then, but really haven't acknowledged that your original claim appears to be quite a bit stronger than where you've settled after a few comments into this thread.

> No, I've just had to fix a bunch of broken-ass shell scripts in the past because people didn't realise that the output of `ls` doesn't obey the principle of least surprise. No mental gymnastics here, just a lot of experience fixing other people's problems. About the only good thing that ever comes out of these is that is gives me a way to show people what they can do with safer, more predictable tools, like `find`.

So now we've moved on from discussing when and where tools should do automatic pipe detection to best practices for shell scripting. Yes, of course, I wouldn't try to parse the output of `ls` in a shell script. But that doesn't mean I'm not going to use `ls` in pipelines ever. Shell scripts are just a subset of what shell is used for. I also use it interactively. In such cases, `ls | grep foo` is a pretty common thing for me to do to see what's in the current directory. Never had a problem with it. But if I were to follow your advice, you'd want me to do what, do something like `find ./ -maxdepth 1 -name 'foo'? (Although, that includes hidden files.) That's a lot less convenient.


Except when you want it. E.g of feeding the output to a pager, or grepping something out and then sending it to the terminal. I hate when tools make this assumption and changes things behind my back.


I said _one_ good reason. If a tool did that, I might be a bit annoyed, but I'd at least understand _why_: I'd be less surprised if the colours and formatting disappeared, but tools like awk, cut, sed, &c., behaved.

Ideally, I'd just prefer is libxo was common outside of FreeBSD, and I didn't have to worry about massaging stuff into structured data.


The problem with auto-detection is that in POSIX-like shells it only works for the most simplest of problems (eg `ls` becoming a single column list when piped) because ultimately everything is treated by the shell as white space delimited list and treated by the OS as an untyped stream of bytes.

However more modern shells fix this problem with having typed pipelines and builtins written to understand more than just a flat file of bytes.

Take _murex_ for example (disclaimer, I'm the author of that shell):

  » jobs
  PID   State      Background  Process  Parameters
  2104  Executing  true        exec     sleep 9000000
  2240  Executing  true        exec     sleep 9000000
It's readable but what if I wanted to pass it as a table?

  » jobs | cat
  ["PID","State","Background","Process","Parameters"]
  [2104,"Executing",true,"exec","sleep 9000000"]
  [2240,"Executing",true,"exec","sleep 9000000"]
ok, so it auto-detects it is running as a pipe and outputs it as a jsonlines table. That would be annoying in Bash. But with a type aware shell, that shell knows it's a jsonlines table, eg

  » jobs | debug | [[ /Data-Type/Murex ]]
  jsonl 
...but what can we do with a jsonlines table? Well you can select individual columns:

  » jobs | [ PID State ]
  [
      "PID",
      "State"
  ]
  [
      "2104",
      "Executing"
  ]
  [
      "2240",
      "Executing"
  ]
run SQL against it

  » jobs | select * where PID > 2200
  ["PID","State","Background","Process","Parameters"]
  ["2240","Executing","true","exec","sleep 9000000"]
iterate through each row

  » jobs | foreach proc { if { =$proc[0]>2200 } then { echo $proc } }
  [2240,"Executing",true,"exec","sleep 9000000"]
or even just convert it into another format, like CSV

  » jobs | format csv
  PID,State,Background,Process,Parameters
  2104,Executing,true,exec,sleep 9000000
  2240,Executing,true,exec,sleep 9000000
...or YAML...

  » jobs | format csv
  - - PID
    - State
    - Background
    - Process
    - Parameters
  - - "2104"
    - Executing
    - "true"
    - exec
    - sleep 9000000
  - - "2240"
    - Executing
    - "true"
    - exec
    - sleep 9000000
And it all just works without you having to think or even know what data format is traversing the pipeline.

However unfortunately none of this is possible with Bash. And thus the majority of tools are forced to be dumb to compensate.


I don't think typed pipelines in a shell is a virtue. I think dumb bytes are ideal. Text manipulation can be laborious but it's always straightforward.


I used to think that when Powershell was my only exposure typed pipelines. But after using murex I’ve totally changed my opinion. You still have POSIX pipes there for when you want dumb bytes. But you also have typed pipes layered on top to allow one to manage structural data without having to think about it.

It’s a bit like having jc integrated into the shell, except that jc supports TOML, YAML, tables of various formats and all sorts - and autodetects the content type too - so you don’t have to learn a dozen different tools for managing a dozen different content types.

So it’s the best of both worlds.


It's only straightforward if you're using tools that understand what those dumb bytes are. Often you'll get away with a mixture of grep, sed and awk but the moment you turn to something like jq you've done away with the assumption that the pipeline is just dumb bytes. However rather than having to explicitly remember which laborious set of commands work well together, murex will provide you with a suite of builtins that have the same flags and behaviour regardless of the structure of the bytes being read in by them. And if you prefer your pipes to be untyped POSIX byte streams then murex still works here because it will suggest (via tab-completion suggestions) the right commands to use against each previous command.


> It's only straightforward if you're using tools that understand what those dumb bytes are. Often you'll get away with a mixture of grep, sed and awk but the moment you turn to something like jq you've done away with the assumption that the pipeline is just dumb bytes

I guess my claim is that the minute you need to parse the dumb bytes as anything other than strings, your problem is complex enough that you shouldn't be using shell to solve it, and you should write a proper program instead. That's not hard and fast, for example I use fish and `math` is very powerful.

`jq` is also a bad example, I think, in many ways... it explicitly doesn't follow the Unix philosophy, and instead does it's work by parsing an opaque string according to it's own DSL. Much better is a tool like gron, which allows you to process JSON using familiar tools like grep.


libxo is integrated into FreeBSD and many of its core utilities, so structured output is supported out of the box there.


perfect for the systemd island. Horrors for everyone else.

Output should be readable, not structured.


MacOS Monterey added a CLI tool for speedtesting, and I noticed they have a

  -c: Produce computer-readable output
So I tried it out:

  ~ networkquality -c | jq '{dl: (.dl_throughput / 1000000), ul: (.ul_throughput / 1000000)}'
awesome!

  {
    "dl": 176.861488,
    "ul": 6.742952
  }
(Starlink in Perth, Western Australia)


I don't really understand the point about flattening.

> This way I can easily filter the data in jq or other tools without having to traverse levels.

How is doing `jq '.cpu.speed'` any harder than doing `jq '.cpu_speed'`?

IMO as long as you aren't going insane with nesting levels, it's actually better to have a proper structure than dumping everything into an ugly flat object.


The article is not advocating flattening willy nilly. The point is to have bias for flatter structures to make it easier for the user, but of course not all structures can or should be flat. On the other hand, don’t over-engineer your data structure so it makes finding things difficult.

Grabbing an attribute is not necessarily any harder in a deeply nested structure, but filtering based on multiple deeply nested attributes in different branches can make a query quite complex.


> The article is not advocating flattening willy nilly.

It certainly seems like it does when the first example for flattening is oversimplifying an already simple structure that doesn't really need flattening. Maybe that was not meant to be a serious example but rather just for ease of understanding, but then the article should have probably said so.

> filtering based on multiple deeply nested attributes in different branches can make a query quite complex

Can you elaborate on this please? Maybe I'm just too tired to think clearly at 1 am, but I don't see how filtering is any harder. You would just do something like `jq '.foo | select(.bar.baz >= 42 and .qux.moo.asd == "abc")`.


I can’t think of one off the top of my head, so maybe I’m misremembering why I originally decided I preferred flatter structures. It’s possible it was due to some limitation on my part as I was first learning to work with JSON.

Looking back at some other JSON output, like `ip addr`, I’m not seeing anything egregious. There are just a couple useless top-level keys in the `iostat` output that make the queries longer and don’t seem to add value.

Using the flat structure with a type field lends itself well to lazily outputting JSON lines, so maybe that’s why I tend to represent cli output that way. It’s sort of how you would format the data when sending to ELk or Splunk.

Since I initially thought of JC as a cli tool, I was trying to reduce line length. Even the name JC was selected so it wouldn’t significantly increase the line length. So not nesting data inside meta objects is something I would encourage.


> It certainly seems like it does when the first example for flattening is oversimplifying an already simple structure that doesn't really need flattening.

For the purposes of making an example of how the concept is applied, making a verbose example is counterproductive. This doesn't require explanation because developers don't make work for themselves they don't need to, in practice.


That jq expression is surely more difficult to summon than a line-based grep pattern.


Grep would not be able to do numeric comparison, would it?


Not directly, but once you want to do that, IMO, you've moved out of "shell script" territory and into "actual program" territory.


Deep nesting makes it harder for accessing data through a systematic way.

If you are accessing a deep nested data that means you have to account for layers of existence of keys. If cpu exist then see if speed exist then access speed. Nothing wrong with deep nesting as long as you can guarantee a key and data will be generated but more often than not when the data is not being generated the JSON data and the key will not simply exist.

And people do get carried away with nesting. Also it is nice to have core information available at the surface level of JSON file.


In addition to an option for writing output as JSON, consider also adding an option for streaming output to stdout. Those two features were added to GCC9 gcov and are what enabled me to write a tool that parallelizes coverage report generation.

In practice this enabled generating coverage reports orders of magnitude faster than traditional gcov wrappers like lcov


If you’re going to make a schema, which is a good idea, then make a command line option that emits it.


Yep, in JC you can see the schema for any parser like so:

$ jc -h --dig


If you ever need to fight against this annoying json trend (e.g., when your tool only emits certain information in stupid ungrepable json), consider filtering the output through a gron so that it becomes saner.


Jq makes life easier. I'd say for complex output, using filters is more readable than using awk to extract random positions of substrings


Thank you, sounds very useful for quick queries: https://github.com/TomNomNom/gron


Grep is for simple text, jq allows much more powerful searching/selection. Don't get me wrong, I believe it should be a choice and not force JSON on users, but for some it's useful.


why is JSON ungrepable?

grep key file.json | awk -F: '{print $2}'

if you're already searching for a key, seems like you're just wanting the value.

granted, i hardly ever (have i actually ever??) interact with JSON this way, so i'm not exactly familiar with pitfalls.


The pitfall is that JSON has zero guarantees for how often line breaks do and don't occur, and is often used to represent hierarchical data. Grepping for 'key: foo', and some liberal use of -A and -B may find you what you're looking for, but grep is simply the wrong tool for that job. (And how do you handle a key with newlines in it?) jq [0] is the right tool, but jq's syntax is it's own, and is harder to use (unless you use it regularly).

[0] https://stedolan.github.io/jq/


when all you have is a hammer, everything looks like a nail. i can see rudimentary attempts at trying to get some data on a system you might not have full control over and are just needing something. i'm sure we've all been logged into a remote system maintained by someone else. cavemen throwing rocks at the spaceship type scenarios.


When it's not formatted, just emitted as a single line.

(Granted grep still works, but...not nicely.)


gotcha. i'm imagining a pretty gnarly regex to return the data after the first colon after the match but only up until the next comma or square bracket or curly bracket. yeah, that's unpleasant.

thankfully, there are tools like jq to help maintain one's sanity


To be honest I find it funny that anyone would attempt to “parse” such a simple format with the wrong tool when there are plenty that can extract data in dot notation.


JSON is greppable if all you need a simple key-value from a known format and indentation. It's much harder if you don't know the indentation/line breaks, or if it's whitespace-free, or if your key can ever appear in your data.


Or just:

    awk -F: '/key/ {print $2}' file.json


    "kb_read_s": 0.12
This worries me. JSON doesn’t have support for fixed point math, does it? When will some random POSIX tool spit out scientific notation at me.

Also, if you just output a flat schema, is there much of a point in this vs just:

    cpu: 0.2
    kw_read_s: 0.12
The difference is that you can use a JSON parser vs splitting on new lines and colons?

I do like the idea of JSON output as an option but before every bug and mistake gets canonized as POSIX or some other standard can we at least talk about the output format for a bit?


You'll prefer JSON the minute it becomes:

    cpu: 0.2
    kb_read_s: 0.12
    mac_addr: 10:AA:FF:00:55:66


    dict((key.strip(), val.strip()) for key, _, val in line.strip().partition(‘:’) for line in text.split(‘\n’) if line and line.strip())
Also who says we can’t have a universal parser for this format just like we have for JSON? Not everyone needs to write the one liner like above, just use the libtextformat.parse(text) or whatever we would call it.


Parsing outputs is a point where unix philosophy always breaks down for me. It looks like this:

  {program -> human-text -> parser}+
When it could look like this:

  program
    -> {struct-text -> program}+
    [-> formatter] -> human-text .


I don’t think your approach is a problem. I do think that JSON might be not the ideal format for struct text. Especially if you really just use it as a flat dict of key value pairs.


Maybe not, but more often than not you really have an array of objects, and that's where JSON helps.


Look at my original example. The JSON punctuation just muddled the text format without adding any additional value. The point of JSON is the (very limited) type system, nested schema, and heterogeneous format (if array A has two elements B and C, they don’t need to be the same type). None of that seems applicable to what you want out of a UNIX utility.


What I was getting at is:

    cpu: 0.2
    kw_read_s: 0.12
is describing a single 'object'. How do you describe an array or list of these objects? Something like:

    cpu: 0.2
    kw_read_s: 0.12

    cpu: 1.3
    kw_read_s: 0.4
Do you use a blank line to denote a new 'object'? In JSON it would be done this way, and this conforms to how the vast majority of command output maps (as they tend to be rows of columnar data):

    [
      {
        "cpu": 0.2,
        "kw_read_s": 0.12
      },
      {
        "cpu": 1.3,
        "kw_read_s": 0.4
      }
    ]
The other benefit to JSON here is that the formatting doesn't matter. This could also be expressed as a block of text with no spaces or newlines between elements:

    [{"cpu": 0.2,"kw_read_s": 0.12},{"cpu": 1.3,"kw_read_s": 0.4}]
Finally, this can also be streamed, using JSON Lines:

    {"cpu": 0.2, "kw_read_s": 0.12}
    {"cpu": 1.3, "kw_read_s": 0.4}


The OP specifically advocates against your proposed pattern and points out why nesting isn’t ideal. Also JSON doesn’t support fixed point numbers so JSON.encode(JSON.parse(input)) != input even if your equality assumes that formatting doesn’t matter.


You're not wrong, but sometimes simple formats aren't as simple as they look. It sounds like what you want is basically a format like /etc/os-release:

    NAME="Rocky Linux"
    VERSION="8.5 (Green Obsidian)"
    ID="rocky"
    ID_LIKE="rhel centos fedora"
    VERSION_ID="8.5"
    PLATFORM_ID="platform:el8"
    PRETTY_NAME="Rocky Linux 8.5 (Green Obsidian)"
    ANSI_COLOR="0;32"
    CPE_NAME="cpe:/o:rocky:rocky:8.5:GA"
    HOME_URL="https://rockylinux.org/"
    BUG_REPORT_URL="https://bugs.rockylinux.org/"
    ROCKY_SUPPORT_PRODUCT="Rocky Linux"
    ROCKY_SUPPORT_PRODUCT_VERSION="8"
On the surface this seems great, but those quotation marks are kind of annoying. Is it possible there's an escape syntax that's used in case the name also includes quotes? eg VERSION="8.5 (Green \"Aqua\" Obsidian)"? Is it also possible you can embed newlines in between the quotes too? Who knows... Thankfully with JSON there is a simple spec.


    json.parse(input)


Memory inefficient.

Type support is lacking. Not date support for example.

Unnecessary quotes around keys make it harder for humans to read.

Different implementations allow for repeated keys.

    otherformat.parse(input) // just as reasonable


  cut -d: -f2-


> JSON doesn’t have support for fixed point math

Plain text doesn't have support for numbers at all, which isn't much of a solution.


Right. So if we replace plain text then let’s maybe do better than JSON or not do it at all.

Let’s put it this way: if I proposed XML as the substitution for plain text, would you rather keep plain text or switch to XML?


My opinion about JSON is that-

1. You don't need to categorize every piece of data 2. You don't need to include everything in a single JSON file.

Deep nesting JSON is very annoying. The key-value pair structure of JSON is simply being abused at this point. Also, I really don't appreciate using numerical values as keys. Please use a list.


I took a leaf out of zfs’s book and make all my apps’ output look like zfs/zpool.

Optional header, selectable columns, one line per record, machine readable (raw) vs human readable numbers.

I’ve nothing against JSON output but I just don’t need it when you can print out two columns, select on the first, and print the second.

  $ users -H -o name,hair |
  > awk ‘$1 == “gorgoiler” {print $2}’
  gray
Admittedly, that awk invocation is so commonly used it could probably be a lot more terse. Also, this whole house of cards collapses when you have data containing spaces.


This is one area where I find Powershell preferable. It allows commands to return structured data in a standardized way, which really helps interop between programs from different publishers


Imagine if http APIs has similar output to lsof or df -h - nobody would write a script to use them! JSON makes a lot of sense, but a human-parseable format is also needed.


You're conflating two different concerns:

1. human review

2. scripting / automation

In case one, human readable formats are obviously preferable. But the moment you need to script a command, you want it in a machine readable format.

A perfect example of the differences between the two are how badly spaces in file names are handled. Granted POSIX deserve a lot of the blame here too.


come to the powershell-side, we have ConvertFrom-Json


Honestly I do not really understand this idea because, AFAIK, JSON was designed for Javascript in a web browser. By and large, Javascript-enabled web browser expect access to generous amounts of memory. This is not the case for the common UNIX userland programs. These programs do not expect large amounts of memory and many are written with the intent that they may be used to process text line-by-line. This JSON idea reminds me of Windows "PowerShell". Microsoft actually has to limit how much memory can be used by the shell. Why is that.

https://devblogs.microsoft.com/scripting/learn-how-to-config...

One of the things I like most about the UNIX userland is that I can use small programs to edit vary large files, without needing lots of memory. I want programs that are designed to accomodate the possibility of line-by-line processing.

If the intent is to make output network friendly, maybe something like netstrings is useful. Easy to parse. Low memory footprint.

Seems to me this JSON idea is not designed to improve performance, agility or resource efficiency but to ignore the UNIX example in favour of a different, slower, approach that is perceived as easier for some people to use. Namely those who do not want to spend the time to learn how to use an existing, faster solution with lower resource requirements.


This isn't some sort of philosophy debate that you're trying to make it out to be. The output of a lot of tools quite simply isn't in a machine-friendly format, and it can be a nightmare to try to write a parser for them yourselves.

You are misinterpreting the Unix philosophy. It's fine to use a bunch of sed, awk, grep, etc. when you're either transforming text or processing already well-structured data. But trying to write a full-fledged parser for something with only human-readable output, especially as a shell script, definitely goes against that philosophy. Congratulations, you've managed to piece together 50 commands in a pipeline and create a monstrosity that's far from the minimalist philosophy.

In fact, I would argue that by using `jc` together with `jq` you can actually create some nice pipelines for parsing the data that will be much more in line with the Unix philosophy.

Nobody ever said this was designed to improve performance, but I have a hard time believing your claims about it being significantly slower which is not backed up by any source. Most likely, eliminating the JSON conversion would be at most an unnecessary micro-optimization. But if your code was truly performance-critical, you wouldn't be piecing it together with shell pipelines that cause a bunch of unnecessary forks, you'd write it in something like C instead.

And the "JSON was designed for the web browser" argument doesn't hold much water either. You're about several decades too late for that, JSON is extremely ubuquitous and used in a lot of non-browser contexts. Sure, some people depending on their needs may use other formats like XML or protobuf, but JSON is still very common.


Yes. The performance point boiled down to this:

> These programs do not expect large amounts of memory and many are written with the intent that they may be used to process text line-by-line

Which is only a problem if you are being very silly, don't choose NDJSON (newline-delimited JSON) and instead shove 10GB of data in a big [] array that the parser has to read in all at once. Almost every single JSON library can do NDJSON already. One of the most heavily used JSON-over-stdio applications is the Language Server Protocol, which uses JSON-RPC 2.0 and is entirely NDJSON. Same for about 15 different log-yeeting tools. Nobody has ever suggested switching LSP to plain text for performance reasons, only lower-overhead binary formats that don't throw out everything gained by having structure at all.

Large memory use by JSON is not something inherent to the encoding that plain text is somehow immune to. All sorts of CLI programs read stdin in all at once, and you don't see plain text getting slammed for exorbitant memory use.

In the context of the original post, `jc` etc, we're talking about essentially a constant sized output that's just much easier to parse, so the complaint is not relevant to those at all.


> But if your code was truly performance-critical, you wouldn't be piecing it together with shell pipelines that cause a bunch of unnecessary forks, you'd write it in something like C instead.

You are underestimating the power of unix tools. A chain of unix tools can match or exceed the performance of C programs written by average programmers. That is a true beauty of unix and partly why it is still relevant today. The author has little idea about performance and doesn't understand how unix works; otherwise he wouldn't make arrogant claims like:

> With jc, we can make the linux world a better place until the OS and GNU tools join us in the 21’st century!


> AFAIK, JSON was designed for Javascript in a web browser

It wasn't. It was _inspired_ by JS's syntax (and that of Python), but wasn't designed for it. Crockford designed it as a lightweight data exchange format that used a familiar syntax. Quoting from the json.org website itself:

> JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.

JSON isn't terribly difficult to parse, nor does it require "generous amounts of memory". Shy of something like s-expressions, it's about as straightforward as you can get when it comes to structured data.

Netstrings are really useful but they encode strings, not structured data.


JSON in shell might not be faster (or it might be, I've not benchmarked), but it certainly is more efficient to do select and filter and whatnot using something like jq. It's not about network, it's about making the output more predictable when running a script. I've lost count the number of times I've tried to capture a specific part of the output of a command only to be tripped up by an edge case like spaces or something else.


Powershell idea isn't new, it is how REPL worked across all Xerox PARC workstations.

While Windows isn't a whole language OS like those, .NET and COM gets pretty close to it, and that is what PowerShell knows about, instead of raw text.

This is what is missing across most traditional UNIX shells, integrate raw text, UNIX IPC (and newer ones like D-BUS/gRPC), shared libraries, structured data, into a single REPL experience.


In actuality most CLI output is quite small and can easily be represented as a single JSON document. There are a few commands that can produce huge amounts of output and those are good candidates for using JSON Lines as noted in the article.

JC has streaming parsers that lazily output JSON lines for these types of commands. (ls, ping, vmstat, iostat, etc.)


The default behaviour of PowerShell is to stream objects one by one, the same as typical UNIX shells. In principle, it can process unlimited amounts of data on a single pipeline with a small, fixed amount of memory. The exceptions are commands like Sort-Object, which do require everything to be held at once in RAM. In theory, it could do an offline sort like the UNIX "sort" command does, but the issue is that that might break some scripts that rely on .NET objects that aren't serializable. If you're super keen, it would be possible to add this feature and develop a "Sort-ObjectOffline", at the risk that very rarely it might shred some objects...

The problem with JSON is that it does not support streaming by default. It's possible to use non-standard JSON-like formats to work around this, but then you're no longer using JSON!


> The problem with JSON is that it does not support streaming by default.

ndjson is worth knowing about. We use it for things too large to stream.

https://github.com/ndjson/ndjson-spec


"The probelm with JSON is that it does not support streaming by default."

This summarises the problem I have with JSON more succintly. It was not designed for streaming, thus "it does not support streaming by default".

Non-standard, line-oriented JSON formats are usable, although as a user I cannot see how they offer any significant improvement over previous approaches with fewer brackets, braces, colons, commas and quotes (BBCCQ). Consider the BSD utility mtree or the BSD-version of stat. These have options to output text in "shell-friendly" formats,1 minus all the BBCC and excessive Q. Sure, people could add options to utilities to output XML, or line-oriented JSON, but generally they don't. Why is that. Perhaps there is a reason.

You said it best: "It's possible use non-standard JSON-like formats to work around [JSON's limitations], but then you're no longer using JSON!"

Maybe JSON is just about hype or something. An attractant for today's "developers". This would explain why I am just not attracted by it.

1.

https://man.netbsd.org/mtree.8

https://man.netbsd.org/stat.1


A counter example where the utility does output XML is https://man.netbsd.org/envstat.8

There were some upvotes then a series of downvotes as the top comment in this thread changed. Thus, opinion on this issue is mixed.

JSON of course stands for "object" notation. Once we start delimiting the "object" with a newline, or limit its size/length, it arguably starts to sound more like string and the problem of memory requirements is abated.

As the parent comment suggested, when the "object" notation is used for delimited JSON it's really not "JSON" anymore. Is it still describing "objects", or is it describing strings, with the addition of brackets, braces, commas, colons and more quotes. Let the reader decide.

It is reasonable to ask what was the purpose of JSON in the first place, and, if it was designed for sending data over a network, whether previous solutions like netstrings could accomplish the same things as delimited JSON.


for me it is:

readable,

handy since probably every language has libs that work with it fine,

there's a lot of tools that work with jsons e.g generating code classes from json,

it's insanely popular,

really easy to learn


JSONL is the answer for arrays. “[…50mb…]” is too big to be processed in a streaming mode, but particular items usually cannot be split anyway, and “{…}\n50mb more” is what you need.


When I was at Amazon long time ago, there was a suite of tools that would use structured text.

I think it was called "recs"


There's also GNU recutils: https://www.gnu.org/software/recutils/


I don't know if it's just me but I've found jq and/or json output of little relevance to my day-to-day command line usage. I will almost always reach for `python -c` before I reach for jq -- for better or worse.


That’s why I created jello[0]. You get the power of python without the boilerplate so the experience is closer to jq with python syntax.

[0] https://github.com/kellyjonbrazil/jello


Starred and bookmarked, will definitely reach for this the next time I need to work with json in a pipeline. Thanks for the link! Cool project.


This would be a great option for find/xargs. -print0/-0 do the job for each other but then there's a disconnect with a user being able to view the list of files.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: