Hacker News new | past | comments | ask | show | jobs | submit login

I agree with the sentiment, but my critique applies so generally that it must be noted: if a command accepts a filename as a parameter, you should absolutely pass it as a parameter rather than `cat` it over stdin.

For example, you can write this pipeline as:

    grep '^x' foo.txt \
        | sed 's/a/b/g' \
        | awk '{print $2}' \
        | wc -l > bar.txt
This is by no means scientific, but I've got a LaTeX document open right now. A quick `time` says:

    $ time grep 'what' AoC.tex
    real    0m0.045s
    user    0m0.000s
    sys     0m0.000s

    $ time cat AoC.tex | grep what
    real    0m0.092s
    user    0m0.000s
    sys     0m0.047s
Anecdotally, I've witnessed small pipelines that absolutely make sense totally thrash a system because of inappropriate uses of `cat`. When you `cat` a file, the OS must (1) `fork` and `exec`, (2) copy the file to `cat`'s memory, (3) copy the contents of `cat`'s memory to the pipe, and (4) copy the contents of the pipe to `grep`'s memory. That's a whole lot of copying for large files -- especially when the first command grep in the sequence usually performs some major kind of reduction on the input data!



In my opinion, it's perfectly fine either way unless you're worried about performance. I personally tend to try to use the more performant option when there's a choice, but a lot of times it just doesn't matter.

That said, I suspect the example would be much faster if you didn't use the pipeline, because a single tool could do it all (I'm leaving in the substitution and column print that are actually unused in the result):

    awk '/^x/{gsub("a","b");print $2; count++}END{print NR}' foo.txt




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: