Unix V5, OpenBSD, Plan 9, FreeBSD, and GNU implementations of echo.c

chaosfox · on July 19, 2011

slightly related but, I also find it funny that most of the time we are running none of these..

  $type echo
  echo is a shell builtin

scythe · on July 19, 2011

Okay, so... from UNIX v5, OpenBSD added a -n flag that prevents a trailing newline, Plan 9 adds the -n flag and pushes the argv into a buffer (why?) before printing, FreeBSD does all that and also prevents a trailing newline if the last argument ends with "\c" (why?), and GNU does... something complicated.

mikelward · on July 19, 2011

"adds the argv into a buffer (why?)"

It's calling "write", which is a system call. Without the buffer, it would call write once for argv, and once for the newline if nflag is not set. Calling a system call twice would result in twice as many context switches, and thus be very slightly slower.

scythe · on July 19, 2011

Oh, that makes sense.

Speaking of clever tricks, would it be faster or slower to use c99's variable-length arrays instead of malloc?

alexgartrell · on July 19, 2011

IIRC faster, because variable-length arrays in c99 just bump down the stack pointer, while malloc can be pretty expensive.

But a single malloc is nothing compared to the cost of a context switch into kernel mode.

apaprocki · on July 19, 2011

You can simply use alloca() as a replacement for malloc() to avoid depending on VLA support.

tedunangst · on July 19, 2011

alloca sucks. The error handling is "haha, you overflowed your stack." (VLA sucks for the same reason.)

cpeterso · on July 19, 2011

Or just declare a large static buffer.

tedunangst · on July 19, 2011

"Nobody will ever echo this much data!"

tmp43522 · on July 19, 2011

It's actually not a bad suggestion

    #define BUF_INITIAL 1024
    char buf[BUF_INITIAL];
    
    int main(int argc, char** argv)
    {
        char* p;
        ...
        p = len > BUF_INITIAL ? malloc(len) : buf;
        ...
        write(1, p, len);
        ...
    }

cpeterso · on July 19, 2011

djb's allocator alloc() does this. He preallocates a 4 KB static buffer before hitting system malloc(). Avoiding the overhead of malloc() is pretty important for the performance of systems like qmail that fork many small processes.

tedunangst · on July 19, 2011

That's ridiculous. The overhead of fork is about a bajillion times higher than malloc.

adestefan · on July 20, 2011

With a COW fork() I bet it's smaller. I smell a test coming on, but alas it's late here and I'm going to bed.

I'm also guessing that 4k was chosen because a malloc() of 1 page is faster than a malloc() of >1 page. Of course that's with the assumption that the systems use a 4k page size.

aidenn0 · on July 20, 2011

Are you sure? I'd bet it's greater, I'd not bet that it's an order of magnitude better, fork() has been optimized a lot more than brk().

sliverstorm · on July 19, 2011

Many small processes? How many?

4 KB * (many) could be frightening.

jff · on July 19, 2011

If your system has 32,000 processes and you allocate 4 KB to each of them... that's 128 MB. I'm not crapping my pants at that figure because even the oldest machine in my office, an old Thinkpad, has 2 GB of RAM.

GreySyntax · on July 19, 2011

Oh wait they will

Someone · on July 19, 2011

If speed is important, why the copying of the data? Only 'slightly' non-portable:

  int main (int argc, const char * argv[])
  {
    for( int i = 1; i < argc; ++i)
    {
        ((char *)argv[i])[-1] = ' ';
    }
    size_t len = strlen(argv[argc - 1]);
    ((char *)argv[argc - 1])[len] = '\n';
    write(1, argv[1], argv[argc - 1] - argv[1] + len + 1);
    return EXIT_SUCCESS;
  }

ori_b · on July 20, 2011

It has to do with the way plan9 guarantees atomic writes of small buffers. A single write() call will correspond to a single read() call if the buffer is big enough to receive it.

haberman · on July 19, 2011

Isn't that what stdio is for (fopen()/fread()/fwrite()/etc)? It performs buffering for you.

p9idf · on July 19, 2011

Stdio would add unnecessary overhead. And why bother with stdio (or its Plan 9 equivalent) when it's so easy to do the buffering faster yourself?

haberman · on July 19, 2011

Overhead such as...? The only one I can think of is the mutex lock/unlock, but there are the _unlocked variants for those.

"And why bother with stdio (or its Plan 9 equivalent) when it's so easy to do the buffering faster yourself?"

For the same reason you'd use any abstraction: to keep your program simpler. Even if you consider it "so easy" there are plenty of opportunities there for off-by-one errors and buffer overflows that would disappear if you used stdio instead.

bodyfour · on July 20, 2011

Remember that echo.c evolved long before UNIX has shared libraries. So avoiding stdio completely might mean the size of your binary drops from 50K to 5K. If you're on a PDP-11 with 256KB of RAM (being shared by several users) this would make a big difference.

My earliest UNIX programming was in the mid-80s on a machine with a (luxurious!) 1.5MB of RAM. I definitely remember avoiding stdio when writing small utilities that I wanted to start up and run fast.

Also remember that back in those days echo was not a shell builtin. (Hell, back then the testing operator '[' wasn't even a builtin. Some UNIXes like OS/X still have a vestigial /bin/[ executable!) Programs like echo that ran from shell scripts constantly had to be coded to start up as fast as possible.

dchest · on July 20, 2011

This was actually the reason for FreeBSD's more complicated version. Revision 106835:

    Put echo on a diet, removing unnecessary use of stdio and getopt.
    
    Before...
    -r-xr-xr-x  1 root  wheel  58636 Oct 28 05:16 /bin/echo
    After...
    -rwxr-xr-x  1 root  wheel  12824 Nov 12 17:39 /usr/obj/usr/src/bin/echo/echo

http://svnweb.freebsd.org/base?view=revision&revision=10...

EdiX · on July 20, 2011

> Some UNIXes like OS/X still have a vestigial /bin/[ executable!

This includes linux, but it's in /usr/bin/[

p9idf · on July 19, 2011

Please point out the plentiful opportunities for off-by-one errors in Plan 9's echo.c. If you send me a copy of Plan 9's echo.c using stdio I will run it and benchmark it for you.

Wilya · on July 19, 2011

Actually, most of what the GNU is doing is recognising some escape sequences (including characters in hex form).

dchest · on July 19, 2011

It seems like "-n" flag appeared in UNIX v7: http://www.bsdlover.cn/study/UnixTree/V7/usr/src/cmd/echo.c....

enneff · on July 20, 2011

Plan 9's echo is used to talk to drivers from shell scripts, some of which expect to receive messages in a single buffer. That's why they went out of their way to make echo use a single write.

p9idf · on July 19, 2011

cat.c is also interesting. The GNU version is an appalling unreadable mess. V6's assembly implementation in easier to understand.

V6: http://www.bsdlover.cn/study/UnixTree/V6/usr/source/s1/cat.s... V7: http://www.bsdlover.cn/study/UnixTree/V7/usr/src/cmd/cat.c.h... Plan 9: http://plan9.bell-labs.com/sources/plan9/sys/src/cmd/cat.c BSD: http://www.koders.com/c/fidF501905968D8BE7BBDD355C3C8DB62804... GNU: http://git.savannah.gnu.org/cgit/coreutils.git/plain/src/cat...

bdonlan · on July 19, 2011

V6's assembly version also does less. The reason GNU's is so complex is because it has a line-numbering feature (cat -n) not supported in V6 or V7, and also tries to read and write large chunks, to avoid overhead from calling stdio functions in a loop. It also tries to take advantage of non-portable extensions where possible, but fall back to portable code when not supported. Yeah, it looks a bit complex at first, but it's not really that bad if you actually take the time to read it.

p9idf · on July 19, 2011

Those features simply do not belong in a program whose purpose is to concatenate its input. If you want to number a file's lines, 'echo ,n | ed file | sed 1d' or 'awk ''{ print NR " " $0 }''' will do just fine. You could even wrap your ed or awk script into a shell script with a descriptive name like "lineno" rather than something silly like "cat -n". The reason GNU's is so complex is because it does many things and does them poorly. The V6 implementation does exactly what is says on the tin, does it well, and does nothing more: it catenates files.

ianb · on July 20, 2011

I am confused by your definition of "poorly". Are you asserting that GNU cat is slow, or unportable, or uses too much memory, or some other actual noticeable problem?

p9idf · on July 20, 2011

Yes. Gnu cat is slow. http://hnwriteup.blogspot.com/2011/07/gnu-cat-vs-plan-9-cat....

pyre · on July 20, 2011

But now you've spawned many processes when one could have been used, and you'll incur the wrath of people that think that 'grep foo file' is 1000x more efficient than 'cat file | grep foo'.

ez77 · on July 19, 2011

I liked your use of ed. You can avoid one pipe with the -s option: `echo ,n | ed -s file`. Also, another POSIX one-process option besides awk: `pr -tn file` (with different padding).

cpeterso · on July 19, 2011

GNU's Hello World (version 2.7) example is 586 KB gzipped.

https://www.gnu.org/software/hello/

rat · on July 20, 2011

"The GNU Hello program produces a familiar, friendly greeting. Yes, this is another implementation of the classic program that prints “Hello, world!” when you run it.

However, unlike the minimal version often seen, GNU Hello processes its argument list to modify its behavior, supports greetings in many languages, and so on. The primary purpose of GNU Hello is to demonstrate how to write other programs that do these things; it serves as a model for GNU coding standards and GNU maintainer practices."

IvarTJ · on July 19, 2011

That must have been one of Ken Thompson’s more productive days.

(alluding to a quote from him that I can’t source, “One of my most productive days was throwing away 1000 lines of code.”)

silentbicycle · on July 19, 2011

Nah, the codebase just hadn't been touched by the FSF yet.

See also, "UNIX Style, or cat -v Considered Harmful" (http://harmful.cat-v.org/cat-v/).

It seems telling that the GNU echo's source is "derived from code echo.c in Bash."

andrewcooke · on July 19, 2011

i don't get while people are being snide about code that does more and so has more lines. a small amount of code is very nice and elegant, but if it doesn't do what people need then it's pointless.

icebraining · on July 19, 2011

I think their point is that you should separate it in different utilities/binaries which would be very simple and have less bugs, and let users combine them as they wish.

For example, instead of cat -v, you'd have a second utility called, 'nonprint' which would just translate non-printing chars, and you'd call it using

     cat file1 file2 | nonprint

mikelward · on July 19, 2011

"nonprint" existed as "vis" on some systems AFAIR.

p9idf · on July 19, 2011

The Unix Programming Environment has a discussion of vis on page 172. It notes that you can use 'sed -n l' to do the same thing.

bh42222 · on July 19, 2011

coliveira · on July 19, 2011

You are making the system more complicated for everyone because of features that only a few users know about. This is how code bloat starts its life cycle.

If you need more features from a basic utility like echo or cat you should create your own version, maybe with a slightly different name, and leave the original as it is.

Duff · on July 19, 2011

That's exactly what happened.

Those crazy kids at Berkeley cooked up BSD, which was written to meet their needs and subsequently forked into a few variants. The GNU people made a GNU collection of core utilities that met their particular needs and desires.

The Unix nerds at my University felt as you did, and ran a Unix System 5 variant into the late 90's.

dkarl · on July 19, 2011

Anyone can find those features in the man page. I guarantee that the number of people who use those features is much larger than the number of people who read the source before today.

The UNIX style promoted in the "cat -v Considered Harmful" paper may have made sense at one time, but it doesn't make sense anymore. For example:

It seems that UNIX has become the victim of cancerous growth at the hands of organizations such as UCB. 4.2BSD is an order of magnitude larger than Version 5, but, Pike claims, not ten times better.

This logic gives the same consideration to people who are digging in the source code for these utilities as to people who actually use them. When you consider the relative numbers, that's a very elitist attitude (for some value of "elite").

Also consider the explanation given in another comment for why "cat -n" is unnecessary:

If you want to number a file's lines, 'echo ,n | ed file | sed 1d' or 'awk ''{ print NR " " $0 }''' will do just fine.

Munging text like that is a pretty common skill for Unix users, but by no means universal. If the man page for cat is pretty simple and readable, and the feature doesn't bloat the code to the point of causing maintenance problems, and there's somebody who's willing to write the code, then enabling "cat -n" a win for users.

xilun0 · on July 19, 2011

Not a very good strategy to avoid bloat... Not even mentioning you will end up with 100 times the number of tools you have now. I would not call such a system more simple, indeed it would be inferior on all possible points.

In a real system, even basic utilities rarely looks like a CS101 homework result. This is perfectly fine, especially in this case the amount of feature in gnu echo is perfectly reasonable, and the size of the executable will likely depends on various header than code size when the code is so small anyway.

Confusion · on July 19, 2011

You are making the system more complicated for everyone because of a program that only a few users know about. That is how code bloat starts its life cycle.

If you need a basic utility like echo or cat, you should create your own version and don't bother others with it.

bdonlan · on July 19, 2011

/bin/echo on debian sid (x86):

13 .text 000028dc 08048b90 08048b90 00000b90 24 CONTENTS, ALLOC, LOAD, READONLY, CODE

/bin/cat on debian sid (x86):

13 .text 0000775c 080491b0 080491b0 000011b0 24 CONTENTS, ALLOC, LOAD, READONLY, CODE

That's 12kb and 32kb, respectively. It may have been a lot back in the day, but it's plenty small enough on today's systems.

1337p337 · on July 20, 2011

It's just good practice, in general, to keep individual programs simple. If you have a look at DMR's description of why the pipe was invented, it suddenly clicks.

All of these tools are intended to be composable, analogous to functions. 'cat -v' is like a function with too many arguments, one that does too much. If you need, for example, to allocate a block of zero'd memory, you don't add new flags to malloc(); you use memset() after allocation, you write a for loop, or you use calloc().

Likewise, the basic tools available on Unix can and should be thought of as functions, which take some number of arguments, and the implicit argument of an input channel. They produce as their output an integer as a return value and two output channels, stdout and stderr. Making a function that does too much (and this is as subjective for functions as it is for command-line tools) is known to be bad practice, but for the shell, it is often misunderstood. To misunderstand this is to misunderstand the core principals of the Unix environment.

It has nothing to do with the typical non-programmer user. On, say, Linux or OSX, the user doesn't write functions or talk to the shell very often. They click buttons in a GUI that doesn't in a meaningful sense offer composable programs, and it's an inefficient but simple way to interact with the machine, a way that matches their habits and understanding. cat, echo, sed, and awk aren't for these users; they're for programmers, and the typical user does not know or care whether cat can show non-printing characters, but as a programmer, I certainly care about a clean design for my environment.

mikelward · on July 19, 2011

The "-n" option was added to UNIX around v6.

The "-n" special case opened the floodgates for many more options. And what if I actually wanted to print "-n"? There's no way to do it.

bch · on July 19, 2011

> And what if I actually wanted to print "-n"? There's no way to do it.

Good point.I first tried "echo \-n", and "echo -- -n". No luck. I get the correct visual effect with "echo - ^Hn" (^H generated with ^V, backspace), but the embedded backspace is still actually part of the output :P "echo - ^Hn | col" strips it. Seems like quite an oversight, really, and is prime example of how bugs sneak into code via features.

edit: ilikejam solved it w/o leaning on other tool. As he said "not pretty", but simpler than what I have: http://news.ycombinator.com/item?id=2781034

pyre · on July 20, 2011

Use the source... ;-)

  % env POSIXLY_CORRECT=1 echo -n  
  -n

Also note that if your shell is Bash, then your echo is a built-in:

  % type echo
  echo is a shell builtin

That's why 'env' is necessary.

bch · on July 20, 2011

Your Linux is showing ;).

   kamloops$ uname -s
   NetBSD
   kamloops$ echo $0
   sh
   kamloops$ env POSIXLY_CORRECT=1 echo -n
   kamloops$

ilikejam · on July 19, 2011

Sure you can:

dave@cronus $ ./echo -n "-n

dave@cronus > "

-n

dave@cronus $

Not pretty, though.

mturmon · on July 19, 2011

Cute, but does not work with bash builtin echo, for which two -n's equals one -n.

   bash-3.2$ echo -n -n foo
   foobash-3.2$

Like always with echo -n, no matter what you try, it's not portable.

ilikejam · on July 19, 2011

So...

[dave@mini ~]$ echo -n "-n foo

> "

-n foo

[dave@mini ~]$

Easy!

mturmon · on July 19, 2011

All respect for persistence, but that's not a solution, because the OP wanted to echo just "-n". I put the foo in to better show what was happening.

For similar reasons,

  echo "" -n

does not work, etc.

ilikejam · on July 20, 2011

' echo -n "-n

" ' works fine with the bash built-in. It echo's "-n".

mikelward · on July 19, 2011

You're right. I should have said "no sane way to do it". :-)

martinp · on July 19, 2011

Or echo -en "-n\n".

bdonlan · on July 19, 2011

The printf utility works far better if you want detailed control of your output: printf "%s\n" -n

AlecSchueler · on July 19, 2011

`echo -e -\\x6E` works in GNU echo and Bash builtin (and likely others).

mikelward · on July 19, 2011

You can see the original UNIX sources at http://minnie.tuhs.org/cgi-bin/utree.pl

pixelbeat · on July 19, 2011

One should avoid echo anyway due to portability issues. Use printf instead.

robtoo · on July 19, 2011

More *nix systems have a /bin/echo than /bin/printf or /usr/bin/printf

mikelward · on July 19, 2011

Yes, and they all do something dfferent. Things like "-e" and treatment of "\c" are not uniform.

If a system has printf, and I think all recent Unix-like systems do, then it works at least 98% the same.

gose · on July 19, 2011

Here's Mac OS X's:

http://www.opensource.apple.com/source/shell_cmds/shell_cmds...

It's close to the FreeBSD implementation.

alister · on July 20, 2011

I just wish that in the early UNIX days they reserved some of the flags to mean one thing only, and required all commands to have them (where it made sense). Like:

-r recursive (i.e., it should always be recursive mode if a command operates on files, and should always exist if it makes sense for that command)

-v verbose

-s sort

-i ignore case

-q quiet (suppress output)

If there were, say, 20 well-chosen standard flags (and they were enforced) it could have given the UNIX tools another level of nice regularity.

brown9-2 · on July 19, 2011

I do not have much experience reading C code. Is the use of gotos and labels in the GNU code common?

sjs · on July 19, 2011

In C it's generally accepted that forward jumps using goto are okay because the version that avoids goto would be complicated and confusing. Often used for error handling and memory management.

sliverstorm · on July 19, 2011

Error handling because you want to make a beeline to the handler, and memory management because you need it to be quick and goto is cheap?

sparky · on July 19, 2011

In both cases, to avoid duplicating code. If you malloc() something in your function and intend on free()ing it before returning, it's often considered best practices to write the return block (which includes the free()s) once, and use "goto returnblocklabel" if you need to return early.

So rather than this:

  void test()
  {
    int *x = malloc(1000);
    int *y = malloc(1000);
    for(int i = 0; i < 999; i++) {
      if(badThing) {
        free(x);
        free(y);
        return;
      }
      doStuff()
      if(otherBadThing) {
        free(x);
        free(y);
        return;
      }
    }
    free(x);
    free(y);
    return;
  }

You'd have:

  void test()
  {
    int *x = malloc(1000);
    int *y = malloc(1000);
    for(int i = 0; i < 999; i++) {
      if(badThing)
        goto ret;
      doStuff()
      if(otherBadThing)
        goto ret;
    }
  ret:
    free(x);
    free(y);
    return;
  }

Jach · on July 20, 2011

I realize your example is contrived, but a simple 'break' statement (which in essence is a goto...) would work just as well. :) I somewhat vaguely recall a situation in my C class that I wanted to use goto to avoid duplicate code but the professor had previously threatened huge negative points if his scripts detected one. (That whole semester was just as much about conforming your code to his narrow specifics because "that's what happens in the real world." as learning C.)

Peaker · on July 19, 2011

Can you explain what you mean by use of gotos for memory management?

sjs · on July 20, 2011

Here's an example:

    void do_stuff(char *name) {
        char *my_name = strdup(name);
        if (condition_signalling_no_work()) goto done;
        if (do_something_with(my_name)) goto done;
        
        ...
        
        done:
          free(my_name);
          return;
    }

If it helps think of it as a finally block. The function does some things and no matter what it has to run that block at the end.

Peaker · on July 20, 2011

Ah, so for resource cleanup in general, not memory management in particular.

Peaker · on July 19, 2011

For error handling it is considered fine.

The way it is used in that code is pretty horrible, IMO, and splitting it into functions and using "return" in place of goto would have been far better.

hackermom · on July 19, 2011

A somewhat related by-the-way: if you grep for "goto " in the Linux kernel sources, you'll find several hundreds of occurrences.

schiptsov · on July 20, 2011

touch.c is another good example. Especially one from Open Solaris. ^_^

hackermom · on July 19, 2011

Following the general software trend, it just keeps growing bigger and slower while still doing nothing new. But is this used anywhere, really? At least Bash uses its builtin.

p9idf · on July 19, 2011

Bash's implementation[1] isn't especially fast. It gets its speed by not having to fork, which is expensive on systems with dynamic linking.

[1] http://git.savannah.gnu.org/cgit/bash.git/plain/builtins/ech... http://git.savannah.gnu.org/cgit/bash.git/plain/support/rech... http://git.savannah.gnu.org/cgit/bash.git/plain/support/zech...

pyre · on July 20, 2011

  > while still doing nothing new

Really? So the GNU implementation is an exact match for the functionality of the SysV implementation? If so, I have a bridge I'd like to sell you...

nvictor · on July 19, 2011

wow :O