Hacker News new | past | comments | ask | show | jobs | submit login
Useful Uses of cat (two-wrongs.com)
274 points by todsacerdoti 3 months ago | hide | past | favorite | 180 comments



The reason these sequences of commands always start with cat, for me at least, is that I just cat’d the file only to find it was too long or noisy.

    cat filename.txt 

    Up | grep “thing I want”
Is fewer keystrokes than

    cat filename.txt

    grep “thing I want” filename.txt
Or more likely

    cat filename.txt

    grep filename.txt “thing I want” 

    grep “thing I want” filename.txt


You can also do:

    grep “thing I want” !$
Bash (and similar) will replace !$ with the last parameter of the previous command.

This is a trick I’ve used lots when wanting to perform a non-piping operation on a command I’ve ‘cat’ed (eg ‘rm -v !$’)

I’d never criticise anyone for “useless” use of ‘cat’ though. If the fork() overhead was really that critical then it wouldn’t be a shell command to begin with.


Even easier, press alt+. (fewer keystrokes too! cli golf is fun), it'll copy-and-paste the last parameter from the previous command. If you press . multiple times it'll go further back into your history.


As I tend to live with set -o vi, [esc] _ gets the same. I hardly ever remember to set:

  bind '"\e."':yank-last-arg
So [esc] _ does it on systems where I haven't customised my environment. A major drawback though is it doesn't go through history like alt . does.

However, when 'set -o vi' is enabled you can easily go through history and edit with familiar vi keystrokes, or press 'fc' and fix the previous command in EDITOR, or 'v' to edit the current command in EDITOR.


Command line vi modes are my bread and butter

I think you meant 'vv' and I wonder if it's not something that's set up by oh-my-zsh ? It's great though


'v' will start the visual editor. 'fc' will edit the previous command. Both invoke EDITOR in bash's vi readline command mode.

I wasn't aware that bash and zsh did it differently, I assumed they'd both use the same readline - now I'm aware there's more to understand about it :)


Ha my bad I assumed zsh because I saw the `bind` command being used. I did not know that it was also in bash

I find that zsh's mode is actually better than readline's For example it can handle visual mode which is why the same function is `vv` and not just `v` Zsh can handle text objects too, very useful to be able to ci" for example Fish's can too and is quite good IIRC

Too bad the `v` command does not work in gdb so it seems it's more of a bash thing than a readline thing

Relevant : https://superuser.com/questions/1543120/make-readline-edit-i...


Often the shortcuts that people point out in these threads are too niche for me to remember, but that seems like a pretty great one, I’ll try to remember it.


alt + . is easy to remember, but did you know you there is a way to recall any argument from the previous command in bash?

Press Escape, then the number of the argument from the previous command, confirm with Ctrl+Alt+Y.

Example:

  > command arg1 arg2 arg3
Escape, 1, Ctrl+Alt+Y gives you arg1.


alt + <number> + alt + . should do the same thing. No need for ctrl+alt+y, is there?


Not sure that would work everywhere but !$ definitely does for retrieving the last arg of the last command. 40 years of muscle memory right there


Alt + . is a readline binding, so that's available wherever GNU readline is used. Immediate examples that come to mind are Python PDB shell and GDB.


And thus you get vim modes there too much if you set your .inputrc for it !


Nobody mentions $_ ? It gives you the last argument used, so:

  cat filename.txt

  grep "what I want" $_
expands "$_" to "filename.txt"


I just use <esc>. On the command line to bring back the last argument.

Then I can look at it before hitting return


shopt -s histverify


I’m sure there are cases where the fork overhead matters. But, alas, I don’t type or read that fast.


!! is the previous command line BTW

useful shorthand

> sudo !!

for example.

Again, bash specific.


my preference for this example is "arrow-up, ctrl+a sudo" but will not work inside "screen" out of the box of course


!!, !$ etc are old and supported by zsh as well.


<!$ grep ... or <$_ grep ... work, too.


Press alt and dot (full stop) to insert last word from the previous command line:

    $ cat file
    $ grep stuff alt-.
Alternatively, make use off the READNULLCMD mechanism in Zsh:

    $ < file
translates to

    $ ${READNULLCMD:-more} < file
Thus you can

    $ < file
then UP (or ctrl-p which I find more ergonomic) and continue with "grep stuff":

    $ < file grep stuff
(Redirections can be anywhere in the command.)

https://zsh.sourceforge.io/Doc/Release/Redirection.html#Redi...


I look through a lot of logs, so I've aliased `not` to `grep -vE` and using the method you describe over several iterations i end up with a history that has a lot of

    cat log | not spam1 | not spam2 | not 'spam(3|4)' | .... | less


Zsh global aliases can be nice for commands commonly used in pipelines:

  alias -g V='| grep -vE'
  alias -g L="| $PAGER"
Then you can do

  cat log V spam1 V spam2 L
I also like

  alias -g G='| grep'
  alias -g X='| xargs'
The possibilities are endless...


So are the injection possibilities. That makes me very nervous.


If you already have injection to my command line or environment I don't see how not setting an alias is going to save me?


What's the scenario?


Something sets $PAGER to “rm -rf /“ or something along those lines.

Really I’d be more worried about accidental invocations. Aliases are not scoped, so if you’re dealing with one of those programs with non standards option handling and uppercase single character switches…


Tons of things use environment variables like EDITOR or PAGER, and tons of things can "inject" "rm -rf ~" or whatnot. But if they can inject that, they can probably also go all sorts of other nefarious things. There is nothing special about a global alias here.

And an accidental typo is not really a big deal, certainly not an "injection possibility".


I actually use `alias -g L='| nvimpager'`,

As for accidental invocations, yeah I agree it feels dangerous, but in a few hundred thousand lines of shell history since I set them, never had a problem.


Haven’t had a problem, or haven’t noticed you’ve had a problem? I kid - but only partially.

What I really wish for is some sort of tool that would let me pipeline like that, but also easily examine each step in the chain for sanity. Sort of a workbook for shell.


Auto-expanding aliases seems like it would help:

  function expand-alias() {
      zle _expand_alias
      zle self-insert
  }
  zle -N expand-alias
  bindkey -M main ' ' expand-alias


I've been using Linux for a very long time and never thought of that as an alias. I love it! Thank you for sharing.


I really like that alias. Thanks!


I've always though UUoC while working interactively are fine. One shouldn't interrupt their flow to fix minutia in a one time command.

UUoC criticism, to me, belongs when one sits down to script.


Eventually I will want to debug/change the script where we get right back to the situation where I want cat at the front of the line.


Or what I find myself doing often:

cat “filename.txt”

Up | grep “thing”

Up | grep -v “not thing”

Up | grep -v “other thing”

Etc. it’s just easier to build this way even if the initial cat is unnecessary.


> grep filename.txt “thing I want”

> grep “thing I want” filename.txt.

…every time


Try:

  < file.txt grep pattern
Less keystrokes.


That doesn't work for the first step though, where you want to print the file to stdout without processing it in any way


Fair. One could get used to the following for the first step:

  < file.txt cat
But then one has to ctrl-w cat. It is a pity that this is not an alternative to cat for a single file:

  < file.txt


What's really useless are all the "useless use of cat" comments (and shellcheck warnings although I take it for a shell script there could be cases where one less process may be justified [although really if you're at that point you've got other things to worry than cat, sadly]).

I use "cat ... | ... | ... " like in TFA and just like many in this thread because it simply makes sense. It's more intuitive. It's easier to read. It requires less braincycles to remember how this or that command wants its parameter passed, etc.

I think the "useless use of cat" movement made its time: it failed. Many of us are never going to give up our use(less|ful) of cat (you decide). So stop wasting your time complaining about it.


I use shellcheck a lot, and I've found so far that its complaints fall neatly into one of two buckets: "oh, whoops, nice catch" and "shut up, shellcheck".

"useless use of cat" goes into the latter bucket. Complaining at me about it does not actually improve the code; it's just a nag about a bad habit that, arguably, isn't even a bad habit.


I think you're mistaken here, and confusing two different usages of cat.

"Useless" uses of cat aren't bad habits during interactive usage, for all the reasons people mention here which I won't rehash.

For scripts, however, the story is different than for one-off commands. For one thing, it's slower due to the extra forks and copying of data across pipes, so there's at least that. For another, it prevents the command from inspecting the other end of its pipe, which can negatively impact usage in some case. (For example, if the program knows its input is from a terminal, it may flush its output on every newline it sees.) Moreover, a bunch of the arguments for the interaction case (like "it's fewer keystrokes" or whatever) don't even apply to the script case in the first place...

The end result here is that you definitely shouldn't assume some habit is just fine with scripting merely because it's fine when you're typing on the terminal, or vice-versa.


Those are all reasonable points, but:

For shell scripts, I would argue quite vehemently that the most important goals should be correctness and readability, with performance being a very distant third concern. I'd even be tempted to argue that performance shouldn't be a consideration at all, except of course that argument would be misinterpreted to support some absurd edge case until I'd have to admit that of course performance is a little bit of a concern. But in any case, I can't recall a single example of a cat pipe being the root cause of an unacceptable performance problem in a shell script.

On the readability point, the example that probably irritates me most often is a cat pipe into some commands into a while loop. I much prefer this:

    cat file.conf | sed -e 's/pattern/replacement/g' -e 's/reallybigolhonkinpattern/other-replacement/g' | tr... | while read line; do...
to this:

    sed -e 's/pattern/replacement/g' -e 's/reallybigolhonkinpattern/other-replacement/g' wrongfile.conf | tr... | while read line; do...
or this:

    sed -e 's/pattern/replacement/g' -e 's/reallybigolhonkinpattern/other-replacement/g' | tr... | while read line; do
        stuff...
    done < file.conf
...and that's a pretty common pattern where the edge case of reading input from a terminal doesn't apply.

So this is the kind of thing that makes me go "shut up shellcheck" instead of "thanks shellcheck!"


Fork performance is a much more severe problem on Windows (WSL1, MSYS2, etc.) than Linux, so I'm not claiming you'll personally run into it per se, but it can affect users of some scripts.

But: performance was just one of the problems I cited. I gave you more than that -- one was a correctness reason (which you do care about) and had nothing to do with performance. And, again, incorrect buffering (which can make the script literally unusable in some cases) was just one example. I've seen needless redirection interfere with Ctrl+C handling too, though I don't recall the exact example. Oh, and there's terminal coloring and ANSI escape processing too, which programs detect differently. Point is, being unable to see the end of the pipe can definitely cause an unnecessary mess in some cases.

As for readability - honestly, part of the reason you find it less readable is that you're missing something else. Namely, this:

  sed -e 's/pattern/replacement/g' -e 's/reallybigolhonkinpattern/other-replacement/g' wrongfile.conf | tr... | while read line; do...
should really have been:

  sed -e 's/pattern/replacement/g' -e 's/reallybigolhonkinpattern/other-replacement/g' -- wrongfile.conf | tr... | while read line; do...
which is in fact both more correct (at least when the file name isn't hard-coded, which is the common case in shell scripts) and more readable than your example; you can immediately spot where the file name is. The difference between that and cat "$blah" | sed ... is very minor at that point (and in fact you should be doing cat -- "$blah" as well...); anybody reading a command like sed without a pipe input knows to look for an input argument. The important point regarding readability here is, it's not like the code gets overly tricky if you write it one way vs. another way. It's just a matter of spending 1-2 extra seconds glancing over. So it's very much a minor thing to be prioritizing above everything else. (If the logic became harder to reason about, that'd be a different story, and it'd put more weight on the readability aspect.)


It's unclear if you missed the filename being wrongfile.conf. Embedding a -- in the middle of a long series of arguments isn't the magic pixie dust that suddenly makes the filename argument stand out the way that it does when it's the very first argument in the pipe.

Yes, I saw your other points, and I chose this example because it is an example drawn from real-world use where there is zero objective reason to wag a finger about "useless use of cat". Those other points are not relevant in this example, and piping a cat into some other commands into a while loop is pretty typical shellcode. Forcing me to move a filename argument into the middle of a long line for stylistic reasons should be obviously wrong. It is one case where shellcheck is over-reaching and being a nuisance rather than helping me catch errors.

This has been argued better and to death already: https://stackoverflow.com/a/16619430, http://oletange.blogspot.com/2013/10/useless-use-of-cat.html, https://news.ycombinator.com/item?id=23341711, https://news.ycombinator.com/item?id=36116208, https://news.ycombinator.com/item?id=6367319, https://news.ycombinator.com/item?id=1116085, etc.


If you deliberately chose a specific example then you forgot what this discussion was about? You're giving an existence proof. Yes, there exist situations where it's fine. But this discussion was about what constitutes good habits, not about cherrypicked counterexamples. The whole point of paying attention to good and bad habits is that they sometimes make a difference, and if your habits are bad, you'll sometimes get yourself in trouble. And it's not like you can reasonably expect shellcheck to distinguish the benign cases from the potentially problematic ones either. It has to give you a recommendation about what constitutes good habits, and avoiding redundant calls to cat is a good habit for all the reasons I just listed, even if in some cherrypicked cases it provides ~no benefit.


The concept presented is something I can agree with in principle, but "transforming a filename into the content of the file" is a really thin justification for a responsibility.

By all means don't build something where you have cascading effect and need to retest an entire pipeline, but this is _not_ it.

P.S.

And if you really really want to keep it separate, just do "< access.log head -500 etc etc etc" (no I didn't forget a pipe. And yes the "< inputfile" works even if it's in front of what you're calling).


> And if you really really want to keep it separate, just do "< access.log head -500 etc etc etc" (no I didn't forget a pipe. And yes the "< inputfile" works even if it's in front of what you're calling).

Or just use `cat` and let the pipe separate the different steps. "< access.log head" is nice but it breaks this representation where each step is piped into the next one. Sure, once you’re done fiddling you can rewrite the thing to remove the "cat", but when you are constructing the thing I find it clearer to use cat.


> Sure, once you’re done fiddling you can rewrite the thing to remove the "cat"

Or just… don’t?


Sure. Don't. Or do. Whatever you prefer.


> it breaks this representation where each step is piped into the next one

No it doesn't?

  < access.log head -n 500 | grep mail | perl -e …
is completely valid, and reads right-to-left as well as the cat version. IMO using stdin is preferable to either solution in TFA.


I’m not talking about valid syntax but about visual representation. My eyes don’t scan "<access.log head" the same as "cat access.log | head".


I agree that it's more elegant, but it requires you to remember the operator precendence of < vs |


It's not really a precedence relation as much as it's part of shell syntax. There are separators[0], and anything sequence of tokens between a separator is a command[1]. A command decomposes further, of course, but the only thing the programmer needs to know here is that separators are a different kind of thing than commands.

[0]: newline, ‘||’, ‘&&’, ‘&’, ‘;’, ‘;;’, ‘;&’, ‘;;&’, ‘|’, ‘|&’, ‘(’, or ‘)’.

[1]: https://www.gnu.org/software/bash/manual/bash.html#Simple-Co...


This works but it's so ugly :(


I think it's beautiful? Reads left-to-right, has nice symmetry if you pipe to an output file like:

    < infile some_cmd > outfile
and real clarifies what's a shell command, what's a redirection, etc.


It would make more sense like:

    infile > some_cmd > outfile
Then the arrow would be pointing in the right direction. But then it would be unclear whether "infile" is a file or a command. Which is why people use:

    cat infile | some_cmd > outfile
You can now interpret "cat" as a keyword that specifies that "infile" is a file.


It is more pleasant to the eye if you remove the spaces:

    <infile some_cmd >outfile
Just like you wouldn't add spaces in the middle of '2>&1' when redirecting stderr to stdout:

    <infile some_cmd >outfile 2>&1


> Just like you wouldn't add spaces in the middle of '2>&1' when redirecting stderr to stdout:

Mostly because if you do it doesn’t work: '2 >&1' is not the same as '2> &1' (invalid syntax) which is not the same as '2>& 1', which …is the same as '2>&1'.


> "transforming a filename into the content of the file" is a really thin justification for a responsibility.

Uh... I dunno, but my lizard brain thinks that the whole idea of mediating filesystem operations on storage and IPC mechanisms like pipes is a lot more complicated, magic, and deserving of a single command than merely filtering the data on stdin.

I agree with the article and the logic, and think this historic meme was basically wrong originally. You string up your chain of pipelines with the first element being "where does it come from?" and not merely whatever the first operation happens to be just because that operation allows for some kind of file input or redirection syntactically.


> transforming a filename into the content of the file" is a really thin justification for a responsibility

This is one of those things where I think it is until it isn't.

I sometimes second-guess myself when I think I might be over-single-responsibilifying. "Well in practice these two things are so trivial that this feels a little silly."

It often turns out to have been a good call in hindsight, especially when working with other people who aren't necessarily thinking about these things at all. If the responsibilities have been sufficiently split up, they're more likely to change only the part that needed to be changed, and less likely to complectify the two things together that really shouldn't've been. Or when I go "oh wow that thing that I thought I overly-abstracted sure composes well with this unexpected new thing!"

Hardcore separation of concerns is just another method of defensive programming.

> < access.log head -500 etc etc etc

It's too bad that the syntax is so different. Why does the first stage not end with "|"? There's space for shell syntax improvements, here. Maybe a 'cat'-like builtin that translates `cat foo | bar` into `bar <foo` so you can have the nice syntax but don't needlessly create processes would leave everyone happy.


in the example given in the website, it could all be done in the perl! no need for the pipeline at all.


I definitely think if you want to use `cat` then just go ahead, it's fine. Sometimes these things are a power play, a way to distinguish between people who know the social codes and those who don't. In this case, it probably had a reasonable origin even if it's now more of a way to beat on newcomers. On old systems, memory was limited, disk was slow and forking was expensive. Saving a process in a script or one liner was a noticeable improvement performance-wise.

I learned some bash from an old-timer who would write an infinite-loop like this:

  while :; do 
    # loop body here
  done
This works because the `:` is a way to set a label, and it implicitly returns 0. It's just a weird wrinkle of the language. So, why not do `while true`? On old systems, `true` was not a builtin and would call `/usr/bin/true`. Writing the loop this way saves a process fork on each iteration.

On a modern system, you'd be hard pushed to measure the difference, so it really doesn't matter which style you prefer.


> This works because the `:` is a way to set a label, and it implicitly returns 0. It's just a weird wrinkle of the language.

Do you have a source for that? I thought it was just POSIX built-in for true. Like `.` vs. `source`. What's a label in this context anyway?


Hah, yeah I was completely wrong on that! Should have fact checked myself. That's a falsehood I absorbed at some point and didn't question.


Well there's a blog post idea for you, sounds primed for the front page already - Falsehoods Programmers Believe About Colons...


: is the prefix for labels in DOS batch files.


No need to ask for a source. The word "label" in the POSIX shell documentation only occurs in the description of `case`, and it doesn't happen in the manual page for bash, dash, zsh, etc.


Equally surprised. I know ':' is a label in sed, but labels in (ba)sh, I'm not aware. If it's indeed a label, is there a goto?


> This works because the `:` is a way to set a label, and it implicitly returns 0.

Nope. Unix shell doesn't have labels (are you mixing with DOS batch files?).

: is a shell builtin that does nothing. In the bash man page, look for the first entry of the "SHELL BUILTIN COMMANDS" section. https://www.gnu.org/software/bash/manual/html_node/Bourne-Sh...


infinite loop in C:

  for(;;){
      // loop body here
  }


I'm in the camp of using

    <input X|Y|Z >output
The point of this syntax is that I can readily replace it with

    F() { X|Y|Z }
    <input F >output


Since I like spaces around stuff I started putting two spaces after the infile at some point.

  < infile  x | y | z > outfile
I just didn't like how the filename was so close to the command name ;)


I'm in this camp as well, starting with input file redirection just makes so much sense to me.

    </proc/0/environ xargs -0
I also don't tend to want enormous volumes of text in my terminal scrollback so I generally view files or pipe verbose commands to `less`, then when I find what I want to send to the terminal I use the `|` less command to pipe it to `cat`.

Or to grab just a few lines for my later reference:

    kubectl get po/my-pod -o yaml | less
    /* find the lines I'm interest in */
    -N
    |^sed -n 34,35p


Oh `F()`. I should use redirection more with my Bash functions, and add it to my list (pun intended) of functional tricks [1].

[1] https://evalapply.org/posts/shell-aint-a-bad-place-to-fp-par...


This is by far the best explanation I've read for using that method. Thank you!


I have an old note named the same as this blog post:

    cat /dev/sdb > backup.img # make a disk image

    cat /dev/sdb > /dev/sdc   # clone disk

    cat ~/Downloads/*         # play Russian roulette with your terminal

    cat > file                # minimalistic text editor, ^D to exit saving, ^C to exit erasing the file

    cat << wq > file          # nearly complete emulation of ed

    grep -r bongo . | cat     # shorter than typing --color=never

    cat -v file               # cause 20 points of damage to wizards of bell labs

    cat file > file           # empty a file without removing the file

    cat meow meow > meows     # duplicate file contents


Oh and another one I use all the time is as an "identity transform" when selecting between filters, e.g.

    dostuff () { 
        if [[ $1 = clean ]]; then 
            grep -v dirt
        else 
            cat
        fi | do_other_stuff
    }

See also https://www.in-ulm.de/~mascheck/various/uuoc/ (via https://lobste.rs/s/rtvp2u/useful_uses_cat#c_0xpqkr )


`cat x | Y | Z` is Subject-Verb-Object.

`Y x | Z` is Verb-Subject-Object.

That's why I prefer using cat "uselessly".


   < x Y | Z > w
Y takes input redirected from x, piped into Z, which outputs into w.


No, that's obviously taking the inner product of the <x| bra and |Z> ket while applying the Y operator, and multiplying by a scalar w


> Y takes input redirected from x, piped into Z, which outputs into w.

I.e.

  x | Z | tee w | Y
? that's... something else entirely.


No, the command is

    < x Y | Z > w
where x and w are files, not commands.

Something that "cat file | ..." advocates might be overlooking is that a redirection ("<inputfile", ">outputfile", "2>errorfile") can appear anywhere within a simple command, so these:

    command -option < file
    < file command -option
    command < file -option
are all exactly the same -- and of course very similar to

    cat file | command -option
If the purpose of typing "cat file | command" is to put the input file at the beginning (which does make logical sense), you can achieve the same thing with "< file command". Admittedly, it does look at bit strange if you're not accustomed to it. (It even works with csh and tcsh.)


So I take it you're also an advocate for this?

  < <(curl http://...) command -option
Because that's what you get if you address the actual argument and still insist on input redirection.

Input redirection is inconsistent with every other command to retrieve data. Not only does it not have the same syntax, it's combining two actions into one step of the pipeline.


Redirection isn't a step in the pipeline; it's syntax which indicates to the shell that it should perform certain file descriptor manipulations when it arranges the pipeline.

I would say that < <(command arg) is a useless use of process substitution (UUoPS). You just want command arg |.

The redirection variant doesn't eliminate command and does not move arg out of command's argument position; it's just superfluous syntax.

Just because we want "< file" instead of "cat file |" doesn't imply that we want "< <(command arg)" instead of "command arg |". It's not even the same rewrite pattern at all.

However, if someone wrote:

  cat <(curl https://example.com/file) | next
then that now the UUoC pattern "cat file |". We can apply the transformation to eliminate cat:

  < <(curl https://example.com/file) next
Now in so doing, we have moved the process substitution such that there is an obvious match for the UUoPS pattern. We apply that rewrite rule as well:

  curl https://example.com/file | next


> Redirection isn't a step in the pipeline

"Get data" is the pipeline step we're talking about. Using "< file" combines it with the first transformation step, instead of keeping it as its own separate step as all other such data sources require.


< <(...) is precisely what you need when you want to pipe input into a while read loop and mess with variables.


> Redirection isn't a step in the pipeline

It isn't implementation-wise, but it is semantically. If anything, < is a useless syntax that should never have been in the shell to begin with. If you want to peephole optimize cat... just do that, just like it replaces tons of other commands with built-ins.


The shell < has never intuitive to me, for some reason, but this has helped me see it in a new light. Thanks!


cat is a verb though


cat x is a subject


"x" is not a subject.


There is another reason: to make sure the program doesn't do anything funny with the file, like modifying it. I know it won't happen with "head" but for commands I am not familiar with, it is a way to be sure. And example of a command that does "something funny" is gunzip. With just a file as an argument, it will decompress the .gz file and erase the original instead of reading it and dumping its content to stdout.

I usually prefer to do "< file command" though. "cat" adds an extra layer of indirection, forcing stream processing and hiding the original file, but that's usually unnecessary. If you really don't trust the program, it is not an adequate solution anyways.


If you want to hide the file's device and inode number from the program that will be consuming the file's contents, then, yes, using cat makes sense. I've never had to do that. Just redirecting stdin is enough.


Or use zcat.


I like this sort of blog post - simple, clear example in a short format. Of course we can nitpick the implementation but I think the principle is solid: don't be afraid to sacrifice some brevity for modularity or composability.

I do think he could have woven in the Parnas source a bit better. The first quote seems to rely a lot on the surrounding context of the essay and its never contextualized. This bit:

> The problem is that these subsets and extensions are not the programs that we would have designed if we had set out to design just that product.

Just feels a bit disjointed in the article. I get the high-level message but thought it could be woven into the blog authors narrative better. Anyways, it sounds like a good article though.


Author here. You're right. Originally I had a much longer excerpt but it got a bit rambly. I overcorrected and cut out too much; I'll see what I can do to fix it. Thanks!


It's a nitpick BTW. Of course a bit more context would help I think, but I probably appreciate your edits to keep it brief too. As I mentioned, I really appreciate how concise and to the point it was. Thanks for writing.

I actually noticed your reply because I came back to this comment to save your blog. Not just for the content but the page style. The eggshile white (or similar) color, font, foot(side)notes. It's nice.


I can't be the only one who clicked on this hoping for an explanation of why they owned a cat.


I thought it might reference the book " 101 Uses for a Dead Cat "

https://en.wikipedia.org/wiki/101_Uses_for_a_Dead_Cat

"It consisted of cartoons depicting the bodies of dead cats being used for various purposes, including anchoring boats, sharpening pencils and holding bottles of wine."

"By December 7, 1981, it had spent 27 weeks on the New York Times Best Seller list."


Yeah... sadly no cat was found ):


I also use cat this way, and for me the biggest reason is it just allows for a more intuitive left-to-right reading of any pipeline.

Things like this:

head -n 500 access.log | grep ...

head -n 500 <access.log | grep ...

Feel like you start with the filename, then go leftwards to the first operation, then start reading rightward again through the pipe. At least in my brain, it feels slightly more awkward.


I wish math notation and computer programming had just settled on postfix over prefix early on, it's so much more natural to read. Of course, we kind of get it with object oriented programming, some languages have UFCS [1], F# has that pipe operator etc.

It's funny, when learning programming, I think Haskell was the language that introduced me to the pattern of having a chain of operators processing a stream to build up a result (and I'd later cover it again in SICP), and I loved how clean it looked compared to imperative code. But I now find it one of the harder to read languages due to it all being prefix, whereas Java/Kotlin/C#/Javascript now all have stream constructs that use method calls, so read left-to-right, source-to-sink

And I'm reminded that I need to give Forth a proper go sometime

[1] https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax


You can do...

  <access.log head -n 500 | grep ...
... though that's less familiar to many, I'm sure.


Neat, I didn't realize you could order it like that.

I'm realizing now another (and potentially stronger influence) is just years of muscle memory starting pipelines with cat.


If we're going to talk about unnecessary extra processes like useless cat, we should merge the head and grep commands into a sed, and possibly just merge everything into perl:

    <access.log sed -n '/mail/p; 500q' | perl -e ...
If perl is processing the file line-by-line then filtering lines by regex and stopping at line X is trivial, and you don't even need sed.


I think your point is valid and I don't dispute it, but if I had a nickel for every time I said "just do it all in perl" and regretted it, I'd be... well, perhaps not a rich man, but I'd have lunch covered for a few weeks.


Perl has the advantage of only having one implementation, unlike sed and grep (e.g. BSD or GNU) and /bin/sh (can be one of many POSIX shells), so upgrading this pipeline to 100% perl is safer in some respects. The example in the article is light on details so it's hard to comment very deeply.

I have heard snarky Perl putdowns ad nauseam at work and on HN and may have regretted using it a handful of times but I can say worse or similar for other popular tools, languages...


I usually do it all in awk:

    awk '/mail/ && NR <= 500 {...}' access.log
If you want N matched lines:

    awk '/mail/ && i < 500 {i++; ...}' access.log


I wish I knew awk as well as I know perl, since then I wouldn't need to hear recommendations for CPAN modules and spurious style prescriptions.


Another reason to do `cat | foo` is because you can then quickly change it to `zcat | foo` or some other source for the data by editing the very beginning of the command.


Right, but when using grep, one can use zgrep directly to go through archived files.


less is useful for this if you have the lesspipe [0] preprocessor installed

0: https://github.com/wofr06/lesspipe


Most immediate use: I have to google less shell syntax.


Stream of consciousness troubleshooting dies on the search page. Especially with ads.


well, whole thing can be written in the perl -ne ...

but that absolutely kills modularity / composability

so yes, "cat xyz" plays a source - and can be replaced with another source - without touching all other stuff.


> so yes, "cat xyz" plays a source - and can be replaced with another source - without touching all other stuff.

This. I often have to read logs and being able to apply the same oneliner on older logs just by adding a "z" in front of "cat" to read .log.gz instead of .log files is extremely useful.


Good point! I often experiment with cat response.json and then replace cat with curl once I'm happy.


After read the article... all the reasoning... I think I could go with this for such reasons exposed:

    perl -ne 'last if $. > 500; /mail/ && print' access.log
When you're done with the part after the &&, remove the 'last if $. > 500'

For me, the most useful use of (gnu) cat, is cat -A weird.file

It saves my day or solves weird issues (X-files) with files generated by (not so) junior sysadmins, copy/pastes, end of lines, invisible diffs, etc... many times each year.


But then it's not modular! I need this one-liner to be more complicated because I read a classic paper recently.


Typo in the title – should be "cat" not "Cat" (same as the h1 in the linked article)


The only time I made serious use of big pipelines was when I was trying to diagnose some bug in the production system using the log files. At that time, I would make a pipeline with lots of `head`, `tail`, `grep`, etc. I would reorder the filters, change the file it is running on, sometimes run it on multiple files. Using `cat` to feed the initial input was way better than trying to avoid UUoC. Concatenating files became possible, reordering became simple. It was just overall a much better experience with `cat`. In my daily life if I am writing shell pipelines, I still use cat because it just flows better in my brain. Step 1, take the input; step 2, apply filter; step 3, apply other filters; step 4, print it out. When I don't even know which filter I am going to be using, it is easier to write the cat and glob the files while my brain catches up on whether I want grep or sed or something else.


Among the uses of cat, I saw no one mention you can write or append text to a file, like this :

    cat > file.txt  # after this command, just type your text
Or to append :

    cat >> file.txt
Can also work with Here Doc syntax :

    cat > file.txt << EOF
    This is the text to write down
    EOF


Somewhere along the way it had been so long since I used cat to concatenate files that I had to relearn to do so. That felt either very silly or, “I’ve forgotten more about X than you’ll ever know”

But probably the former.


Related, I find bash and other shells the only language where I can stream data through processes.

Like, R can pipe, and pandas can .pipe() but they both complete the function on the entire data before it pushes a copy of the output to the next step.

Why isn’t this flow of data a primitive outside of shell?


I am not sure about the design choices of the pandas library or of R, but generally, many programming languages limit the number of processes they spawn during evaluation. Moreover, designing a pipeline that produces a complete output before it is processed by the next function tends to be simpler in these languages. The shell environment is explicit about process creation and offers interactivity, leading to an expectation that a pipeline of processes might not complete before being interrupted. As a result, true pipelining is highly effective in such contexts. Some languages, like Haskell, or certain libraries, employ lazy evaluation to achieve similar outcomes. For instance, by using the polars library instead of pandas, one can achieve true row-by-row pipelining for specific operations in Python.


Another rather useful use of `cat` is to make the input stream appear in a specific location withing another stream. e.g. My website [1] maker [2] uses `cat -` to inject content into templates.

  shite_template_common_default_page() {
      cat <<EOF |
    $(shite_template_common_header)
    <main id="main">
      $(cat -)
    </main>
  EOF
      shite_template_standard_page_wrapper
  }
[1] https://evalapply.org

[2] https://github.com/adityaathalye/shite?tab=readme-ov-file#te...


This blogger does not seem to know that in the POSIX shell syntax, redirections can be specified anywhere in the command:

   < access.log head -n 500 | grep mail | perl -e
Now you can delete "head -n 500 |".

> If we then delete only the head processing step we’re left without a step that transforms the string access.log into the contents of the access log.

By introducing "cat access.log" we have the same problem: if we delete only the cat processing step, we're left without a step that transforms the string access.log into the contents. For the useless cat to have the nice property that you can cleanly delete it from the command line, you need:

   < access.log | cat | head -n 500 ...
:)


The useless cat plays the role of ‘a process that produces a stream’, which gives you higher confidence that you can substitute it with a different ‘process that produces a stream’ - like the curl command mentioned in the article, or perhaps a server whose stdout output you want to analyse.

It should be the same as giving your pipeline a file input handle via < access.log… but why take the risk?


That's the whole mistake. Streams are not sourced by processes, but by kernel objects. You don't need a process to read bytes from a serial port, for instance. You don't need "cat /dev/ttywhatever | program". Interrupt handlers in the drivers already drive the activity of bytes being received.


I'm sorry, that doesn't work:

  $ echo "foo" > access.log     # create the access.log with some content
  $ < access.log | cat | head -n 5
  # no output
This works:

  $ cat access.log | head -n 5
  foo
  $ < access.log cat | head -n 5
  foo
  $ < access.log cat | cat | head -n 5
  foo
  $ < access.log head -n 5
  foo


this is basically the midwit-meme in action:

- uses: cat file.txt | grep foo # doesn't know why or that you don't have to

- uses: grep foo file.txt # knows that's religiously the right way, knows maybe one or two options and that "cat" is bad.

- uses (when appropriate): cat file.txt # knows all the ways to do it, pros/cons of each, can make sound judgements and trades in real-time

all three groups are actually mostly doing nothing that wrong.

if someone or shellcheck points out "useless use of cat", and it's directed at the 1st or 2nd category it's helpful. it's either saying "was that a mistake?" or "did you know that you didn't have too...".

if someone points out "useless use of cat" just to be a pedant to throw rocks at an adult who knows full well what they are doing, not as helpful.

all that being said, if you are in that 3rd category, it's not just knowledge but maturity required not just personal preference alone. the author sure seemed to have a chip on their shoulder. i suspect the truth is somewhere in-between.


`cat -n` prints with line numbers, which is occasionally handy.


nl?


It's not more modular to use cat instead of head's filename argument. OP invented their own definition of modularity to fit their narative and bias. What are you going to use if you need to tail? cat a 1GB file through the kernel to reach tail's stdin? You're not just spawning an extra process you're copying the data from one process to another via read/write system calls for no good reason.


If the example used tail rather than head, I would expect passing the file (rather than piping contents) should be noticeably faster for a large file.


I wanted to test this and found a surprising result, although with shuf instead of tail...

  ( head -n40000 filelist_h.txt | grep 'v\|$' >> /dev/null; )       0.00s user 0.02s system  43% cpu  0.045 total
  ( tail -n40000 filelist_h.txt | grep 'v\|$' >> /dev/null; )       0.00s user 0.01s system  22% cpu  0.066 total
  ( shuf -n40000 filelist_h.txt | grep 'v\|$' >> /dev/null; )       0.71s user 1.37s system  12% cpu 16.874 total
  ( cat filelist_h.txt | head -n40000 | grep 'v\|$' >> /dev/null;   0.00s user 0.01s system 101% cpu  0.017 total
  ( cat filelist_h.txt | tail -n40000 | grep 'v\|$' >> /dev/null;   0.05s user 0.45s system  54% cpu  0.930 total
  ( cat filelist_h.txt | shuf -n40000 | grep 'v\|$' >> /dev/null;   0.60s user 0.91s system  96% cpu  1.565 total
The results are fairly repeatable:

- as expected, head and tail alone are equally quick alone

- shuf is ridiculously slow due to overhead compared to cat | shuf

- cat | shuf is just a bit slower than cat | tail.

- cat | tail is slower than tail.

- cat | head is faster than head (but only because of overhead)

Caveats are that this is WSL2 and the file is 480 MB (5 million lines) in a mounted Windows directory, although that helps magnify that slow I/O can influence how you pipe commands.


Why would "shuf -n40000 filelist_h.txt" be slower than "cat filelist_h.txt | shuf -n40000"?

Seems like there's a random-access optimization that's severely impacting I/O performance.


I have no earthly idea, but WSL2 is known[1] for its snail-like cross-platform file transfers. Some of my colleagues install MSYS2 in parallel with WSL2 so they can sync our nightly much faster but still not have to jump through all the hoops of getting our proxy, certificates, network shares, etc working in MSYS2.

I use the extra time waiting for WSL2 to get a coffee.

[1] https://github.com/microsoft/WSL/issues/4197


If the shell were to treat cat as a builtin, could it implement cat| as a stdin redirect?

I wish I could write "useless cat" without people pestering me about it.


Bash allows the $(< file) syntax instead of $(cat file), and I think the latter might be converted to the former.


It's built-in in Zsh: The substitution ‘$(cat foo)’ may be replaced by the faster ‘$(<foo)’.

https://zsh.sourceforge.io/Doc/Release/Expansion.html#Comman...


Is that only for command substitution?


What do you mean? It's specifically for $(cat …) / $(< …): the latter is a faster equivalent of the former. Other than that, see my comment about $READNULLCMD.


Thoughtful article - thanks. Humorous aside: I recently came across a hilariously named, occasionally useful, related utility: "tac".


It ships with coreutils! `rev` is another one in a similar vein, though that's not coreutils.


I'm a bit surprised `rev` isn't in coreutils, it's ancient. Originally written because `cut` doesn't allow selecting fields from the end of a line. So to get the last field in each line, you reverse each line (right to left) with `rev`, select the first field with `cut`, and then reverse it again with `rev`.


On my Ubuntu machines, `man rev` indicates that it came with a package called util-linux, so it seems like it might actually considered even more essential than coreutils. Funny enough, that manpage lists `tac` under its "See Also" section, so maybe these two packages are tied together at the hip.

You describe a really clever use case for rev. On the surface, my intuition was that reversing, cutting, and reversing again wouldn't work if you had escape sequences, but I suppose that's a moot point with cut!


If only output redirection syntax was w> instead of >w. Then you could write pipelines with compelling HTML-esque symmetry:

<x Y | Z w>


$ killall cat | cat | cat | cat | cat | cat

Terminated: 15

Which cat died first?


I’m guessing none, since if I assume the commands are executed sequentially then, at the time of execution of killall, the cat commands are not yet running.

Fascinated to know the actual answer though.


Pipelines are executed in parallel.


The Schrödinger’s cat.


an useful use, back in time, was to assemble partial ISO file of OpenSolaris SXDE, the first open version before Ian Murdock/OpenIndiana. It's more an anecdote but considering the usefulness of netcat in many cases such "cat binary use" might still be relevant in some embedded scenarios.


This is a pointless argument, if only for one shell grammar rule:

    <filename.txt grep mypattern
That syntax works the same as "grep mypattern <filename.txt".


I don't agree with this argument. You can make the same argument against cat

>cat access.log | head -n 500 | grep mail | perl -e …

>we find that cat performs two responsibilities:

>1. Printing an error to stderr if the file doesn't exist

>2. Copying a file to stdout


Hm, what commands are there that don't support filenames?

`tr` is the only one I can think of, excluding ones that really can only operate on already-open file descriptors (the `read` builtin, `flock` in certain modes, ...)


> When we’re satisfied with our Perl script, it’s not unreasonable to think that we might want to run it across the entire access log rather than just the first 500 records. If we then delete only the head processing step we’re left without a step that transforms the string access.log into the contents of the access log. We can move that responsibility into the grep call, but this would mean we hade to change some existing component in order to remove another – no good!

What? Who cares, but also, if you do, then try this:

  <the_file the_pipeline_here
taking his example:

  <access.log head -n 500 | grep mail | perl -e ...

  <access.log grep mail | perl -e ...

  <access.log perl -e ...
Yes, you have a lot of freedom over where you put the redirections. I often write

  nroff -man >/dev/null blah.1
to check for errors in a man page I'm editing, and you can see that intersperses I/O redirection with command-line words.

So if you're really concerned about what edits you might have to make to

  head -n 500 access.log | grep mail | perl -e ...
to remove the head(1) and/or the grep(1) and so have to move where the `access.log` goes, well, just write

  <access.log head -n 500 | grep mail | perl -e ...
and now you don't have to move `access.log`.

> The natural solution is a useless use of cat.

The problem with useless uses of cat(1) is mainly that it betrays a misunderstanding of how the shell works, so "that's a useless use of cat" is a way to teach someone something they're missing. (Useless uses of cat are also useless uses of CPU cycles and energy, but the vast majority of the time those will be in the noise, so it's not a huge deal.)


Similar to TFA's footnote 4 (switching from `cat` to `curl`), it's easier to edit the filename with `cat`, because it's nearer the start.


I use cat a lot. It allows me to remember less switches of other commands, and simplify syntax of complex commands (pipes chain)

I treat cat as a stream-fier: given a file it creates a stream on stdout

BTW Is it true that the origin of cat name is that in slang to cat means to vomit ? (to vomit a file to stout) ? This is what I always known, but recently I read that cat stand for conCAT

I prefer cat as vomit....


cat is cool but I also like it's sibling tac, I use it quite a bit.

It's also possible to make a much faster cat (I have, considered naming it cheetah),10%+ faster.


Honestly, I've being UUOC all my life and everything is fine. Never run into a problem. I mostly use `zsh` and `< file | cmd` works easily even when you remove the `cmd` so that's fine. But honestly, it's both bought and lost me nothing, which means I shouldn't care about it.

What I do like is doing something like:

    diff -aui <(xxd binary1) <(xxd binary2)


came looking for feline-related content. left disappointed.


Useful uses of cat:

- pest control

- entertainment

- transporting small solar arrays


- Neighbourhood open WiFi networks mapping (aka: the War Kitteh)

From this gem of a DEFCON talk: https://youtube.com/watch?v=rJ5jILY1vlw


My cat recently discovered that her fur can absorb a toilet bowl full of water. This seems like it could be useful, if you need a very wet floor immediately.


> transporting small solar arrays

I definitely need to hear more about this one.


solar panel need sun.

cat like sleeping in sun.

attach solar panel to cat.

cat occasionally moves to remain in sun as it naps.


Here you go. This is the second time in the past few years that cat was used to process bikers in the NW WA wilderness. The first time was fatal. I used to live about 15min from where this incident occurred. https://www.seattletimes.com/seattle-news/cougar-attacks-cyc...


Especially since the versatility of A Cat is unbounded.


And here I had been hoping on tips and tricks for printing the file catalog on TRSDOS, Flex, Sinclair or RISC OS.


I was half-expecting it to be about this strange book: https://en.wikipedia.org/wiki/101_Uses_for_a_Dead_Cat


Try tumblr


Or reddit


Those looking for useful uses of the category of (locally small) categories will similarly be disappointed.


I'm sure this generic "joke" would be a real gut buster over on Reddit.


As a cat servant I care not about gut-busting.


Ah, HN. The home of upvoting contrarian pedantry.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: