Hacker News new | past | comments | ask | show | jobs | submit login

You're asking me why I use a tool instead of parsing a binary format manually? Does that really need explanation?

If that is your attitude, why use any command-line tools ever? Why use "ls" when you can call readdir()? Why use "ps" when you can parse /proc?

You just pointed me to Kernighan and Pike a second ago. I didn't expect I would need to justify why piping standard tools together is better than programming everything manually.




I never said anything about not liking command line tools. In fact I love them and think they do a awesome job!

In any case you just proved my point. You think its insane to parse binary data while scripting and I do too. That is why I think the passing binary objects is insane on the shell.

Now if you were talking about text base objects (not binary ones) then that is an entirely different story and I feel that is what we do today. In your example you have rows which could be called objects, and members which would be separated out in columns. To argue a different text base format is better than another is not something I am interested in doing -- mostly because there are a million different ways one could format the output. If you were to do "objects" I think they would have to be in binary to get any of the benefits one could perceived.

To be honest I feel the output you posted is a bug in readlef. I would expect all data from that column to be in the same base.

I will level with you I can see some benefits of having binary passed between command line programs but I think the harm it would do would outweigh the benefit.

But if you you really wanted to do that you could. There is nothing stopping command line utility makers from outputting a binary or any other formats of text. You don't need shell to make that happen.

What I think everybody is asking for is for command line developers to standardize their output to something parsable -- which I feel that most command line utilities already do that. They give you many different ways to format the data as it is. Some do this better than others, and I think that would hold true even if somebody forced all programs to only produce binary, or json text format when pipped.


This isn't about binary vs text, it is about structured vs. unstructured.

The legacy of UNIX is flat text. Yes it may be expressing some underlying structure, but you can't access that structure unless you write a parser. Writing that parser is error-prone and unnecessary cognitive burden.

PowerShell makes it so the structure of objects is automatically propagated between processes. This is undeniably an improvement.

I'm not saying PowerShell is perfect. From what I understand the objects are some kind of COM or .NET thing, which seems unnecessary to me. JSON or some other structured format would suffice. What matters is that it has structure.

I still don't think you appreciate how fragile your ad hoc parsers are. When things go wrong, you say you "feel" readelf has a bug. What if they disagree with you and they "feel" it is correct? There is no document that says what readelf output promises to do. You're writing parsers based on your expectations, but no one ever promised to meet your expectations. But if the data was in JSON, then there would be a promise that the data follows the JSON spec.


> From what I understand the objects are some kind of COM or .NET thing, which seems unnecessary to me. JSON or some other structured format would suffice.

They are .NET objects, which, in some cases wrap COM or WMI objects. The nice thing about them isn't just properties, though. You can also have methods. E.g. the service objects you get from Get-Service have a Start() and Stop() method; Process objects returned from Get-Process allow you to interact with that process. Basically wherever a .NET class already existed to encapsulate the information, that was used which gets you a lot more functionality than just the data contained in properties.


If the data was in JSON it would promise a that it followed the JSON spec -- but its not, it follows its defined spec, which in the case of readelf is apparently undefined.

Other programs that expect to be machine parsable define in great detail the output. In your initial post I replied to you mentioned ps. In the case of ps it has many options to help you get the data you want without using standard parsing tools. That is because its output was expected to be consumed by both humans and possibility other programs.

Now take readelf on the other hand. It clearly talks about in its man page about being more readable. Its author cares about how it will look on a terminal and even goes through the effort to implement -W which makes it nice to view on larger terminals. It even shows in print_vma, where somebody wen tout of their way to print hex if the number was larger than 99999. If the author really cared about the ability to be parsed they would have added a OUTPUT FORMAT CONTROL section that would provide you the contract you are looking for. Just saying if the data was in JSON does not solve your problem. Why? Because the author of readelf did not spend time to define its output properly in the man page it is not likely he/she would have implemented a json output type when piped little alone take the time to provide the object structures in the man page.

You say it's not about binary vs text but I don't think that can be said. There are lots of things to consider.

* Speed of encoding and decoding. * Memory consumption issues with larger objects needing to be fully decoded before being able to be used or processed. * Binary data would need to be encoded and would likely result in much more overhead.

Its not clear to me that a binary option would not be better than a text one. Pipes today are not just used for simple scripts and system management.

There are lots of things that concern me, maybe it is just the implementation details.

* Not all command line programs are written with the expectation to be parsed. How do we handle that? Force the programmer to make all output parsable regardless if they ever intended on the program being used in some script? * Would a program put the same thing to stdout even if it was flowing to a terminal? Are terminals not for humans? * Would structure be enforced? One of the awesome things about stdin/stdout is that you can send ANY data you want.

That all said I would love it if programs who intended on their output to be parsed offered a JSON output. I am not against structured output. I am against forcing programmers to shoehorn their output into some format that may not be the best for their program. I think a well designed and documented command line tool that expects to be parsed by other programs will go out of its way to ensure the format is documented and adhered to when operating.


It does follow a standard. It's DSV. Unix tools are really good at handling that. Awk and cut specifically.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: