Hacker News new | past | comments | ask | show | jobs | submit login
A Little Story About the `yes` Unix Command (matthias-endler.de)
452 points by mre on Oct 11, 2017 | hide | past | favorite | 172 comments




FWIW, GNU true.c is that way because the GNU policy is for programs to all have --help and --version arguments.

Other fun fact, in some UNIXes, true looked(looks?) like:

  #!/bin/sh
  # Some multiple line copyright notice from AT&T
Yup, that's a shell script with only comments, all of which are a copyright notice for what is, essentially, an empty file. I saw that on Solaris something like 15 years ago. I'm sure it made its way to other flavors of UNIX.


There are some amusing notes about the /bin/true copyright claim here: http://trillian.mit.edu/~jc/humor/ATT_Copyright_true.html


"Note also the claim that there is no warranty, which in this case presumably means that if the program actually does something, you can't sue them."

I love this.


Now you've got me wondering what it would mean to try and enforce the copyright on an empty file. You'd think there would be some prior art, right? :)


There’s a lovely story[1] from the early days of personal computing:

“GO.COM contained no program bytes at all – it was entirely empty. However, because GO.COM was empty, but still a valid program file as far as CP/M was concerned (it had a directory entry and file-name ending with .com), the CP/M loader, the part of the OS whose job it is to pull programs off disk and slap them into the TPA, would still load it!

So, how does this help? Well, using the scenario above:

• the user exited WordStar • the user ran DIR (or whatever else they needed) and at some future point would be ready to re-run Wordstar • the user now ‘loaded’ and ran GO.COM • the loader would load zero bytes of the GO.COM program off disk into the TPA – starting at address 0100h – and then jump to 0100h – to run the program it just loaded [GO.COM]! • result – it simply re-ran whatever was in the TPA when the user last exited to DOS – instantly [WordStar in this example]!

So, GO.COM, which consisted of zero bytes of code – and sold for £5 a copy is, I figure, the most profitable program ever written (as any other program will return mathematically fewer £s per byte than GO.COM did)!”

1. http://web.archive.org/web/20160304014157/http://peetm.com/b...


Thanks! I remember reading about this a while back, but lost the link.


Empty files remind me of an episode where I worked. We were supposed to provide a set of files for a court case. But there was a missing zero byte file due to a glitch in processing. So just create one, right? NOPE. The client insisted we find and copy the original empty file from the original media.


You just know it's because, if you just made a new zero-byte file, the opposition in the case would claim that you forged the file and that the evidence was then tainted. Stupid, but your goal here isn't to reproduce the file, it's to stop a stupid argument from convincing stupid people to make a stupid decision.


> your goal...[is] to stop a stupid argument from convincing stupid people to make a stupid decision.

This is the best explanation of legal practice I have ever seen.


But... to be sent, the file is recreated a bunch of times right? Because a computer only ever copies, and never “moves”, right? How can it then possibly matter whether the 0b file was recreated by a computer from the instruction of copying the file (which then just turns into an instruction to create the file), or if it was created directly through instructing the computer to create the file?

Gah, something something colored bits. http://ansuz.sooke.bc.ca/entry/23


Computers do "move" files, in some cases, by updating the entry in the filesystem registry. In such a case, it might be that no change was made to the file on disk, so no copy was made at all.


I interpreted "provide a set of files for a court case" as some form of sending it either through the Internet or to some portable storage media. He also said "The client insisted we find and copy the original empty file from the original media", which also indicates that. I therefore intended "move" to mean "move from one storage media/computer to another". You're of course correct that if you "move" a file on one file system, the bits won't actually be copied (or moved) anywhere.


Sure, no argument. I was responding only to the "computer only ever copies" statement. Sometimes it's hard to know how literal someone is being. :)


It's like homeopathic copying: A zero byte string can contain trace amounts of the original.


You wanted to provide a zero-byte file of the wrong colour. Unlike in physical reality, in legal world it strongly matters what colors are your bits (or in this case, what color is the lack of them).

http://ansuz.sooke.bc.ca/entry/23


Not a wholly insane request if they were interested in the file metadata as much as the file data.


You might think, but metadata travels separately from the actual file, so that wasn't a factor.


Depending on how it's copied, they can be preserved.


Well, I can see the point of leaving a trail that shows you aren't fraudulently claiming the file is zero bytes. Otherwise, it's just your word.


Yeah, if they didn't find a copy on the backup/original media to confirm it's zero bytes, then really all they've provided is their assumption that it's zero bytes.

I'm no lawyer, but the testimony of the IT guy (and associated reasoning/etc) seems like a totally different kind of evidence than a true and accurate copy of a file from an original medium. And having the IT guy testifying seems like way more of a pain than just making him find the file.


"prior art" is a term concerning patents, whereas this is copyright.

There needs to be a certain level of creative work to qualify for copyright, and an empty program that does nothing is rather unlikely to qualify.

The definite case on that subject must be John Cage's "4:33", which is 4 minutes and 33 seconds of silence. Its copyright has been upheld in court. Yes, the estate of John Cage sued somebody who "quoted" to much of his work of silence.

That sounds somewhat silly, yet it's well-reasoned: the infringing artist actually acknowledged Cage in the CD inset. And 4'33 isn't really only silence. Audio recordings feature the pianist sitting down, open and closing the piano cover, and the audience making various noises. It's somewhat "out there" obviously, but just the fact that it's rather well-known kind of shows that it did what it was supposed to do.


I'm surprised that neither GNU yes nor GNU cat uses splice(2). Here's what I get. Note that ./rust is the final Rust program from the article modified to print stats and exit after a set byte count[1], ./splice is a simple C program that uses splice to copy the input string from a generated anonymous temporary file to stdout, and ./consume uses splice to move data from stdin to /dev/null. In all tests the default string of "y\n" is used.

  $ ./rust | cat >/dev/null
  5.220 GB/s (8589934592 bytes in 1.645 seconds)

  $ ./splice | cat >/dev/null
  14.514 GB/s (8589934592 bytes in 0.551 seconds)

  $ ./rust | ./consume 
  9.312 GB/s (8589934592 bytes in 0.922 seconds)

  $ ./splice | ./consume 
  25.118 GB/s (8589934592 bytes in 0.318 seconds)
As you can see, cat(1) is the real bottleneck.

splice is Linux-only. Most systems have sendfile(2), including Linux, but I didn't test it. The implementation and semantics of sendfile vary across platforms.

[1] I did this before I wrote ./consume (which takes an optional byte count limit), having assumed that GNU cat was using splice. As cat doesn't have a way to limit the number of bytes read/written, and other tools like head or tail definitely don't use splice, the simplest way to limit the benchmark without introducing a bottleneck was to have the producers themselves print stats and exit. The relevant diff is

    + let mut count = 0u64 as usize;
    + let bytes = (1u64 << 33) as usize;
    ...
    -while locked.write_all(filled).is_ok() {}
    +while locked.write_all(filled).is_ok() {
    +  count += filled.len();
    +  if count >= bytes {
    +      break;
    +  }
    +}


> I'm surprised that neither GNU yes nor GNU cat uses splice(2). ... splice is Linux-only.

The fact that splice is Linux only is most likely why. GNU programs are portable to many different OSes, many of which no longer exist in any real form. They try to not use any 'single OS' specific feature if at all possible.


splice(2) and tee(2) were basically tailor made for cat(1) and tee(1), and for some reason I was under the impression that GNU tee was using splice(2) and/or tee(2). Using these could be trivial--just a few extra lines of code that could fall-through to the existing methods, for a huge speed-up in performance. (Performance matters because the consumers could be CPU bound, and an inefficient cat or tee might be taking away resources that could be used by the consumer.)

Regarding portability, GNU tail uses the Linux-specific inotify(2) to respond faster to writes. Like alot of OSS, coreutils uses the BSD .d_type member extension[1] of struct dirent to avoid unnecessary stat() calls. There are many other more intrusive OS-specific details baked into coreutils, but often it's the nature of the problem--in many situations you're dependent on platform-specific details or features. For the most part, these nitty-gritty platform-specific details are far more intrusive in terms of code complexity than the performance optimizations.

[1] Missing from Solaris, and probably most other SysV derivatives.


Most importantly, GNU programs ought to work on Hurd.


Forgive me if I'm missing something subtle, but aren't the first two examples Useless Use Of Cat?


No, the use of cat is intentional.

The output of yes(1) is normally consumed by another program, not redirected to a file, so this is what the examples emulate.


Yes they are. Why not just send it straight to /dev/null?


1) Because splice(2) requires either the source or sink to be a pipe. The source is a regular file so the sink has to be a pipe.

2) Because the example Rust program(s), emulating yes(1), had no [simple] way to measure throughput except by piping to another program. We can't fairly compare program A that writes directly to /dev/null with program B that writes to a pipe even if program A can measure its throughput.

2a) What jwilk said.

3) For some reason I thought that glibc had optimizations to elide fwrites to /dev/null, and some of my code was using stdio (e.g. for the final trailing bytes less than the pagesize). I could've sworn either glibc or bash did this, but I can't find any mention of it, now. I realize it would be crazy difficult and ugly for glibc to do this (because of dup2, etc), but glibc does alot of crazy things, and in any event I didn't bother checking beforehand.

Mostly it comes down to fairly comparing benchmarks and kernel facilities. Otherwise, yes, those would be classic examples of Useless Use of Cat.


Show us the source of your programs. :)


Here's a cleanup version:

  #ifndef _GNU_SOURCE
  #define _GNU_SOURCE 1
  #endif
  
  #include <errno.h>    /* ERANGE errno */
  #include <inttypes.h> /* strtoumax(3) */
  #include <limits.h>   /* LONG_MAX */
  #include <stdint.h>   /* uintmax_t */
  #include <stdio.h>    /* fflush(3) fprintf(3) fread(3) fwrite(3) tmpfile(3) */
  #include <stdlib.h>   /* EXIT_FAILURE */
  #include <string.h>   /* strlen(3) */
  
  #include <err.h>      /* err(3) errx(3) */
  #include <fcntl.h>    /* loff_t splice(2) */
  #include <sys/time.h> /* struct timeval gettimeofday(2) */
  #include <unistd.h>   /* _SC_PAGESIZE pread(2) sysconf(3) write(2) */
  
  #ifndef HAVE_SPLICE
  #define HAVE_SPLICE __linux__
  #endif
  
  #define MIN(a, b) (((a) < (b))? (a) : (b))
  #define MAX(a, b) (((a) > (b))? (a) : (b))
  
  #ifndef howmany
  #define howmany(x, y)  (((x) + ((y) - 1)) / (y))
  #endif
  
  #define UMAX_PREC (sizeof (uintmax_t) * CHAR_BIT) /* NB: assumes no padding */
  #define UMAX_HALF ((UMAX_PREC + 1) / 2)
  #define UMAX_LO(n) ((n) & ((UINTMAX_C(1) << UMAX_HALF) - 1))
  #define UMAX_HI(n) ((n) >> UMAX_HALF)
  
  static inline _Bool
  add_overflow(uintmax_t *r, const uintmax_t a, const uintmax_t b)
  {
          if (~a < b)
                  return 1;
          *r = a + b;
          return 0;
  }
  
  /*
   * Implement multiplication using a polynomial with four multiplications and
   * three additions, except we can optimize out some operations.
   */
  static inline _Bool
  mul_overflow(uintmax_t *_r, const uintmax_t _a, const uintmax_t _b)
  {
    uintmax_t a[2] = { UMAX_LO(_a), UMAX_HI(_a) };
    uintmax_t b[2] = { UMAX_LO(_b), UMAX_HI(_b) };
    uintmax_t r[2];
  
    /* if both are non-0, we'd always overflow */
    if (a[1] && b[1])
      return 1;
  
    /* either a[1] or b[1] must be 0 here, so no intermediate overflow */
    r[1] = (a[1] * b[0]) + (a[0] * b[1]);
  
    /* if the result has MSW bits set, we'd overflow */
    if (UMAX_HI(r[1]))
      return 1;
  
    r[0] = a[0] * b[0];
  
    return add_overflow(_r, r[0], r[1] << UMAX_HALF);
  }
  
  #if 0
  static uintmax_t
  add(uintmax_t a, uintmax_t b)
  {
    uintmax_t r;
    if (add_overflow(&r, a, b))
      errx(1, "arithmetic overflow (%ju * %ju)", a, b);
    return r;
  }
  #endif
  
  static uintmax_t
  mul(uintmax_t a, uintmax_t b)
  {
    uintmax_t r;
    if (mul_overflow(&r, a, b))
      errx(1, "arithmetic overflow (%ju * %ju)", a, b);
    return r;
  }
  
  static uintmax_t
  toumax(const char *p)
  {
    char *pe;
    uintmax_t n;
  
    errno = 0;
    if (UINTMAX_MAX == (n = strtoumax(p, &pe, 10)) && errno != 0)
      err(1, "%s", p);
    if (*pe != '\0' || p == pe)
      errx(1, "%s: invalid number", p);
  
    return n;
  }
  
  static uintmax_t
  gcd(uintmax_t a, uintmax_t b)
  {
    uintmax_t c;
  
    while (a != 0) {
      c = a;
      a = b % a;
      b = c;
    }
  
    return b;
  }
  
  static uintmax_t
  lcm(uintmax_t a, uintmax_t b)
  {
    return mul(a, b) / gcd(a, b);
  }
  
  static size_t
  xgetpagesize(void)
  {
    long n = sysconf(_SC_PAGESIZE);
    if (n <= 0)
      err(1, "sysconf");
    return n;
  }
  
  static double
  gtod(void)
  {
    struct timeval now;
    gettimeofday(&now, NULL);
    return now.tv_sec + (now.tv_usec / 1000000.0);
  }
  
  int
  main(int argc, char **argv)
  {
    _Bool cflag = 0, nflag = 0;
    uintmax_t count = UINTMAX_MAX, lines = 0, total = 0;
    int optc;
  
    while (-1 != (optc = getopt(argc, argv, "c:n:"))) {
      switch (optc) {
      case 'c':
        if (nflag)
          errx(1, "-c and -n mutually exclusive");
        count = toumax(optarg);
        cflag = 1;
        break;
                  case 'n':
                    if (cflag)
          errx(1, "-c and -n mutually exclusive");
        lines = toumax(optarg);
        nflag = 1;
        break;
      default:
        return EXIT_FAILURE;
      }
    }
    argc -= optind;
    argv += optind;
  
    const char *const line = (argc > 0)? argv[0] : "y";
    const size_t linesize = strlen(line) + 1;
    const size_t pagesize = xgetpagesize();
    const size_t blocksize = MAX(mul(pagesize, 16), lcm(linesize, pagesize));
    const uintmax_t filesize = mul(blocksize, howmany((1UL << 20), blocksize));
  
    if (nflag)
      count = mul(linesize, lines);
  
  //  fprintf(stderr, "linesize:  %zu\n", linesize);
  //  fprintf(stderr, "pagesize:  %zu\n", xgetpagesize());
  //  fprintf(stderr, "blocksize: %ju\n", blocksize);
  //  fprintf(stderr, "filesize:  %ju\n", filesize);
  
    FILE *fh;
    if (!(fh = tmpfile()))
      err(1, "tmpfile");
    for (size_t i = 0; i < filesize; i += linesize) {
      int n = fprintf(fh, "%s\n", line);
      if (n == -1)
        err(1, "fprintf");
      if ((size_t)n != linesize)
        errx(1, "wrote %d bytes, expected %zu", n, linesize);
    }
    if (0 != fflush(fh))
      err(1, "fflush");
  
    double begin = gtod();
  #if HAVE_SPLICE
    int fd = fileno(fh);
    _Bool eof = 0;
    do {
      loff_t p = 0;
      size_t r;
      ssize_t n;
  
      /* NB: no LOFF_MAX available */
      _Static_assert(sizeof p <= sizeof (long), "unexpected type for loff_t");
      if (filesize > LONG_MAX)
        errx(1, "filesize too large (%ju > %jd)", filesize, (intmax_t)LONG_MAX);
  
      while ((size_t)p < filesize && count >= blocksize && !eof) {
        r = MIN(blocksize, count);
        r -= r % blocksize;
        n = splice(fd, &p, STDOUT_FILENO, NULL, r, SPLICE_F_MOVE);
        if (n == -1)
          err(1, "splice");
        p += n;
        count -= n;
        total += n;
        eof = n == 0;
      }
    } while (!eof && (count || (!cflag && !nflag)));
  #endif
    {
      char buf[BUFSIZ], *p, *pe;
      size_t n;
  
      while (count) {
        rewind(fh);
        if (!(n = fread(buf, 1, MIN(sizeof buf, count), fh))) {
          if (ferror(fh))
            err(1, "fread");
          break;
        }
  
        p = buf;
        pe = &buf[n];
        while (p < pe) {
          if (!(n = fwrite(p, 1, (size_t)(pe - p), stdout)))
            err(1, "fwrite");
          p += n;
          count -= n;
          total += n;
        }
      }
    }
    if (0 != fflush(stdout))
      err(1, "fflush");
  
    double elapsed = gtod() - begin;
    fprintf(stderr, "%.3f GB/s (%zu bytes in %.3f seconds)\n", ((double)total / elapsed) / (1UL<<30), total, elapsed);
  
    return 0;
  }


hmm going in the direction of [binary] instead of [shell script], why not just go further and ditch the whole C stdlib by passing -ffreestanding and then compiling

  void _start(void) {
    exit(0);
  }


Once read similar things before, on reddit: https://www.reddit.com/r/unix/comments/6gxduc/how_is_gnu_yes...


That comment on the bsd one from theo is rather funny.


On my Mac I'm getting

$ yes | pv -r > /dev/null

[34.9MiB/s]


This topic comes up every now and then. I thought this post was particularly insightful,

"One thing to keep in mind when looking at GNU programs is that they're often intentionally written in an odd style to remove all questions of Unix copyright infringement at the time that they were written.

The long-standing advice when writing GNU utilities used to be that if the program you were replacing was optimized for minimizing CPU use, write yours to minimize memory use, or vice-versa. Or in this case, if the program was optimized for simplicity, optimize for throughput.

It would have been very easy for the nascent GNU project to unintentionally produce a line-by-line equivalent of BSD yes.c, which would have potentially landed them in the 80/90s equivalent of the Google v.s. Oracle case."

https://news.ycombinator.com/item?id=14543640


That last rust example is less readable than modern template-metaprogramming variants of C++.

There is something elegant about being able to beat it out on speed with about 30 lines of C


The main function is egregiously overcomplicated.

This is how I'd write it instead:

  fn main() {
        match env::args_os().nth(1) {
                None => write(b"y\n"),
                Some(arg) => {
                        let mut arg = to_bytes(arg);
                        arg.push(b'\n');
                        write(&arg);
                }
        }
        process::exit(1);
  }
It's just as efficient and far more readable. It avoids Cow, a lambda, and the chain of combinators.


Doesn't this only print once, and not over and over, like `yes` should?


The write function takes care of repeating it. See the complete source code in the blog post.


> There is something elegant about being able to beat it out on speed with about 30 lines of C

What is this referring to? If it's referring to GNU C, the linked C implementation is about 100 non-comment lines, and is three times slower on the author's machine than the Rust version.


see sister comment


Ah, I'm sorry, I thought that program was meant to be humorous. :P


As other have pointed out elsewhere what matters here is not much the performance of the language but how much you can avoid calling IO code by making the path as straightforward as possible. You can get comparable performance ( 6.5 GB/s vs 8. GB/s on my machine) with 10 lines of python code, which probably took me the same time to write as the original rust version.


I'm sure a lot of people would like to see the code for comparison. I know I would.


The code is below. I have to admit that after a few more test I realized that the performance greatly depends on the length of the input, in some cases going as low 2 GB/s and some others reaching the performance of the GNU yes installed on my system. This is probably due to the fact that I build the buffer in a quite naive way. With a bit more work maybe that can be fixed.

    #!/usr/bin/python
    import sys

    yes = "yes\n"
    print(sys.argv)
    if len(sys.argv) == 2:
        yes = sys.argv[1] + "\n"

    size = 4096
    buf = bytes(yes * size, 'utf-8')

    while True:
	sys.stdout.buffer.write(buf)


Simply putting a comment above each function would make it bearable.


not as readable as this faster variant

  #include <sys/uio.h>

  //2048 "y\n" s
  static char __attribute__((aligned(4096))) str[] = "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n";
  static struct iovec v[] = {
  	{str, 4096},			//adjust number of these to taste
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  };
  
  
  int main(int argc, char** argv) {
  
  	while(1) {
  		//scatter-gather write (on linux there is basically no way for write to write an odd number of characters, except EOF, so we need not check for partial writes
  		(void)writev(1, v, sizeof(v) / sizeof(*v));
  	}
  	return 0;
  }


Bah, that'll never pass code review. How about this:

  #include <sys/uio.h>
  
  #define str(x) #x
  #define expand(x) str(x)
  #define y1 y\n
  #define y2 expand(y1) expand(y1)
  #define y4 y2 y2
  #define y8 y4 y4
  #define y16 y8 y8
  #define y32 y16 y16
  #define y64 y32 y32
  #define y128 y64 y64
  #define y256 y128 y128
  #define y512 y256 y256
  #define y1024 y512 y512
  #define y2048 y1024 y1024

  //2048 "y\n" s
  static char __attribute__((aligned(4096))) str[] = y2048;
  static struct iovec v[] = {
  	{str, 4096},			//adjust number of these to taste
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  	{str, 4096},
  };
  
  
  int main(int argc, char** argv) {
  
  	while(1) {
  		//scatter-gather write (on linux there is basically no way for write to write an odd number of characters, except EOF, so we need not check for partial writes
  		(void)writev(1, v, sizeof(v) / sizeof(*v));
  	}
  	return 0;
  }


Both suffer from way to much binary bloat, for small performance hit we can shave off a whole 4Kb.

Is anyone going to take a stab at an enterprise edition?


Here you go.

   <?xml version="1.0" encoding="UTF-8"?>
   <xsl:stylesheet
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      version="3.0">

      <xsl:output method="text"/>
  
      <xsl:template match="/">
         <xsl:call-template name="y"/>
      </xsl:template>

      <xsl:template name="y">
         <xsl:text>y&#10;</xsl:text>
         <xsl:call-template name="y"/>
      </xsl:template>
   </xsl:stylesheet>


"Enterprise edition" would be the same thing but written in Java and require an Oracle backend.


Don't forget LDAP, SAML and SNMP support. What's the OID for 'y' again?



Could reuse type from 2.16.840.1.113883.18.347. It's an extended yes/no, to account for "flavours of null". (Not even joking...)


I just died a little inside.


That doesn't have the same functionality - it doesn't accept words from the command line.


And the naive "readable" version is half the speed of python. Makes you wonder how typical that would be in real code?


Fun fact:

  $ python yes.py | pv -r > /dev/null
  [11.2MiB/s]

  $ python3 yes.py | pv -r > /dev/null
  [4.95MiB/s]
As for why the naive rust version is slower, it's because without adding a BufWriter in rust, stdout is line-buffered, so each line emits a write system call, while with python, stdout is buffered. Python 2 emits writes of 4096 bytes, and python 3... 8193 bytes (edit: not a typo, this is 8KiB + 1). That's the likely cause for it being slower.

Edit: A minimal version of the naive rust version would be:

  fn main() {
    loop {
      println!("y");
    }
  }
On the same machine as with the python tests above, I get:

  $ ./yes | pv -r > /dev/null
  [4.81MiB/s]
which is actually as slow as python 3, despite doing 4 thousand times more system calls.

A version with buffering would look like:

  use std::io::{stdout,BufWriter,Write};

  fn main() {
    let stdout = stdout();
    let mut out = BufWriter::new(stdout.lock());
    loop {
      writeln!(out, "y").unwrap();
    }
  }
And produces 129MiB/s on the same machine. And that's strictly doing the same as what the python version does (with a default buffer size of 8KiB, apparently).

And fwiw, on the same machine, both GNU yes and the full rust solution from OP do 10.5GiB/s.


> As for why the naive rust version is slower, it's because without adding a BufWriter in rust, stdout is line-buffered, so each line emits a write system call, while with python, stdout is buffered. Python 2 emits writes of 4096 bytes, and python 3... 8193 bytes. That's the likely cause for it being slower.

Does it have nothing to do with the fact that string-of-bytes is the default in Python 2, whereas string-of-characters is the default in Python 3? Or is that perhaps related to the explanation you gave? Forcing the byte interpretation, Python 3 is slightly faster than Python 2 for me. Forcing the character interpretation, Python 2 wins, but not by as much as before.

Bytes:

  while True:
      print(b'y')
Characters:

  while True:
      print(u'y')


Your bytes version outputs lines of, literally, `b'y'`.

The characters versions is still a clear win for python2 on my machine (8.9MiB/s vs. 5.6MiB/s)

It's also worth noting that the buffering behavior of python is only happening because the output is a pipe to pv. If it were the terminal, it would be line buffered, like the naive rust version.


python3 seems to do much better than either of those for me when using an unbuffered write(1, ...) syscall (plus it prints the correct thing)

    $ cat yes3.py
    stdout = open(1, 'wb')
    while True:
        stdout.write(b'y\n')
    $ python3 yes3.py | pv -r > /dev/null
    [13.7MiB/s]

    $ cat yes2.py
    import os
    stdout = os.fdopen(1, 'wb')
    while True:
        stdout.write('y\n')
    $ python2 yes2.py | pv -r > /dev/null
    [7.77MiB/s]


For better comparison with my numbers, I ran your scripts on my machine:

  $ python3 yes3.py | pv -r > /dev/null
  [18.4MiB/s]
  $ python2 yes2.py | pv -r > /dev/null
  [10.2MiB/s]
In both cases, a 4KiB buffer is used by python. That's still way slower than the equivalent rust code with a 4KiB buffer (use BufWriter::with_capacity(4096, stdout.lock()) instead of BufWriter::new(stdout.lock())).


Out of curiosity:

yes2.py:

  import os
  stdout = os.fdopen(1, 'wb')
  while True:
      stdout.write('y\n')


  $ python yes2.py | pv -r > /dev/null
  [9.12MiB/s]

  $ pypy yes2.py | pv -r > /dev/null
  [45.5MiB/s]
So pypy does a good job of speeding it up.

Off a quick 9 second run, python2 with profiling:

     ncalls  tottime  percall  cumtime  percall filename:lineno(function)
          1    4.272    4.272    9.301    9.301 yes2.py:1(<module>)
   30856080    5.029    0.000    5.029    0.000 {method 'write' of 'file' objects}
          1    0.000    0.000    0.000    0.000 {posix.fdopen}
          1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


I got 1.26Gb/s in Python2 on MacOS (i7 mac air)

    import os
    ys = 'y\n' * 2048
    while True:
        os.write(1, ys)
update: seems to peek at around 3gb/s with ys = 'y\n' * 2 * * 16

(had to separate the asters to stop HN swallowing them)

Plain old `yes` gets a measly 33Mb/s!!


On my Macbook with way too much running, I get about 800MiB/s in Python2 with your script, but 1.12GiB/s with this Python3 script:

    import sys
    
    s = b'y\n' * 1024
    
    write = sys.stdout.buffer.write
    while True:
        write(s)
    
    # 1.12GiB/s
Plain old yes comes in at 22Mib/s.

The Python3 docs say to "use the underlying binary buffer object" when reading or writing binary data.

https://docs.python.org/3/library/sys.html


tested on my C1 on scaleway:

  perl -e 'print "y\n" while 1' | pv -r > /dev/null
  [3.37MB/s]
  perl -e 'print "y\n" x 2048 while 1' | pv -r > /dev/null
  [ 425MB/s]
  yes | pv -r > /dev/null
  [11.1MB/s]


>because without adding a BufWriter in rust, stdout is line-buffered, so each line emits a write system call,

Why the hell is it line-buffered when writing to a pipe? Yet another common sense “enhancement”?


It's always line buffered. Unlike python, where its behaviour depends on whether stdout is a tty or not.


That doesn’t answer my question, let me reformulate it: what’s the rationale behind this behavior?


For a program like this, execution is I/O bound rather than CPU bound, so even if Python is generally less CPU efficient than Rust the effect is overwhelmed by a different I/O strategy. It's just like how Node has greater max throughput for trivial server workloads than a naive C implementation.


According to empirical evidence in my sibling post, that's not true in this case.


Your post seems to demonstrate _exactly_ what the post you're replying to is saying: most of the speed has to do with I/O strategy (buffering and syscall usage), not the actual speed of the involved language. Maybe I'm missing a distinction you're making, but I'm not following how your previous comment leads to what you're saying here.


The rust version that does exactly what python does (buffered output) is an order of magnitude faster (even if I force the rust buffer size to be 4KiB like with python2).


To head off this comment chain, I've softened the language in my original comment to "overwhelmed by" rather than implying that non-I/O factors are wholly irrelevant. :)


That doesn't make a whole lot of sense when the guy above you posted that two versions of python (2 vs 3) have significantly different output throughput rates.

If it's I/O bound why are they changing at all?


It is I/O bound, but Python 2 and 3 do I/O differently. From the same comment that you're referring to:

> Python 2 emits writes of 4096 bytes, and python 3... 8193 bytes (edit: not a typo, this is 8KiB + 1).


>If it's I/O bound why are they changing at all?

Who said I/O functions didn't change between Python2 and Python3?


The GNU variant was discussed recently at: https://news.ycombinator.com/item?id=14542938

The commit that sped up GNU yes has a summary of the perf measurements: https://github.com/coreutils/coreutils/commit/3521722

yes can be used to generate arbitrary repeated data for testing or whatever, so it is useful to be fast


Interesting, but is there actually a practical benefit of having GigaBytes of the word yes being generated or is this all just optimisation porn?


Seems like it might even be better if yes was rate limited to a couple of lines per second or less. It would be more than enough for its intended use and when users inevitably run it in a shell to see what it does it wouldn't generate tons of output.

It seems like a violation of "do one thing well" to use it for generating data for testing, isn't dd and /dev/zero a better way to do that?


Someone else ITT pointed out a good reason:

> yes can be used to generate arbitrary repeated data for testing or whatever, so it is useful to be fast


I thought about that too, but I suspect it may be some sort of race to idle thing.


Really? You didn’t even mention reducing system calls? That’s basically what full_write does: try to output a whole buffer with one system call.

In your regular program, even with just a normal setvbuf call to set up block buffering would make a huge difference.


  import sys

  while True:
      sys.stdout.write("y\n" * 2**16)
gives me over 4GiB/s


Sure, but you're building strings in memory. It might not be a lot of memory, especially if you're able to run Python, but the native `yes` command can run on the smallest of embedded systems, which is why its speed is impressive.


The author says "no magic here" for the C version:

  for (;;)
    printf("%s\n", argc>1? argv[1]: "y");
but it's not totally obvious to me whether the argument to printf would be evaluated on every iteration of the for loop or not. Does the compiler know that those don't change, and is the answer to that question fairly basic C knowledge or not?


The compiler can assume that argc will not change within the loop so it can optimize it. I just looked up the assembly output from gcc and it pulls the argc>1 outside of the loop and replaced the printf with puts. So something like:

  if argc>1
      for (;;) puts(argv[1])
  else
      for (;;) puts("y")
The replacing of printf with puts is based on gcc having specific knowledge about the printf library function.


This is exactly the OpenBSD implementation.


Those performance improvements make the code more complicated too. How fast does the yes command actually need to be?


I would desire the yes command to be as slow as possible while still performing its basic function of automatically confirming questions that come up during an installation.

This is because I clusterssh into 40 machines and hey I haven't formally accepted them into my known_hosts file yet so I type "yes" to acknowledge them, but whoops I had already accepted two of them so now they are spitting out the letter 'y' as fast as they possibly can and now I have to wait for all of that output to transfer over the wire onto my machine despite pressing ctrl+c a minute ago.


Then just use yes | head -50? There's no need to artificially slow it down when there are more reasonable means of capping output than relying on SIGINT.


yes | head -50 will not accept new signatures into known_hosts.


I'm not familiar with clusterssh, but assuming that it passes flags to ssh you can add `-o StrictHostKeyChecking=no`.

Of course, the parameter does what it says, but the security implications are really the same as for blindly yessing your way through. The best is to pre-populate your hosts file using ssh-keyscan.


Oh, thank you. I was unaware of ssh-keyscan.


Ah, so, this raises an interesting question: what do people use yes for?

One answer (a bad answer) is "generating artificial load".

This makes the I/O strategy oddly relevant -- different strategies result in different numbers of syscalls and (potentially) kernel lock acquisitions.

Couple this with someone trying to benchmark something's behavior with an antagonist load, and the story gets downright painful to contemplate in terms of confusing results.

Source: in my younger years, I did this. In more recent years, I've seen other engineers do it.


In this case it's probably just for fun and exercise, but generally don't think of speedy code as extending more effort, but as making the computer waste less effort. Well, unless the program isn't blocking anything any just waiting for the network, and other such cases. But any code that actually does run, especially in a loop, doesn't run in a vacuum, so I'd rather err on the case of speed. These tools don't get rewritten every 2 years in a new framework, which probably helps. Why not hone them if they're going to be that heavily used?


The git commit that sped things up mentioned using yes to generate test data.


Yeah — probably it doesn't matter how fast the yes command is. It will be faster than the program prompting for y/n.

FreeBSD's was nice and simple until someone decided to make it faster in r319897.


The yes command will repeat whatever string you give it, so it's useful for load/traffic testing. Speed helps this particular use-case.


That's a possible secondary use, but not really what the command is for.

Why not just use /dev/zero or /dev/urandom for generating load?


Because the yes command will repeat whatever string you give it. If you need to parse the incoming load in a certain way, /dev/{zero,urandom} won't be suitable.


yes will repeat arbitrary data (including binary).


I know right. The only time I have ever used it has required just 1 y/n and a new line. In that case, these "performance improvements" might make it slower for real world use cases.


This could be applied to something like memset, another deceiving simply function


Can anybody explain to me what's this syntax? This is the first time i see anything like it, and i've been programming in C since i was a teenager.

    main(argc, argv)
    char **argv;
    {
    }


That's a K&R (Kernighan and Ritchie) style function declaration. Compilers still support it but the version you'd be more familiar with (ANSI C style) has been standard since at least the late 80s IIRC. ANSI C was standardised in 1989 but that process had been in progress for something like 5 years beforehand.


Does it have to do with how, in BCPL (one of C’s ancestors), everything was basically an integer? So everything is an integer unless told otherwise?


Specifically, everything is an integer with the auto lifetime (gets discarded with the stack frame) unless otherwise specified.

I think most modern compilers will warn on encountering such short hand due to the common error of accidentally declaring integers when you meant to assign a value to a variable you forgot to declare.


main and argc are declared as int (technically auto int) by default. argv is declared and given its type between the function head and the function block, which is perfectly legal but quite unusual these days

This syntax is probably nice if you use a line oriented editor like ed


I was mostly talking about the odd re-declaration of argv. Everything else is quite clear.


It's not that hard or unlikely to run into it in old C code. The game Nethack is still being developed and it's programmed this way too (anything I looked at in its code at least).


all the reddit users from the linked article missed the SIGPIPE trick!

you don't need to check the return value from write() as your process will be terminated with SIGPIPE if it tries writing to a closed pipe.

saying that, none of them check the return code correctly: if the consumer only reads a byte at a time you could eventually get 'yyyyyyyyyyyyyyyyyy' (without any newlines)

quite impressive that so many implementations of "yes" have the same bug :)


You can't rely on SIGPIPE to terminate it or your program will fail to terminate on error if its output is redirected to a disk file.


write(2) is a blocking call (unless O_NONBLOCK is set), so you should never have a partial write while the descriptor is connected to something valid.


please demonstrate that consumer.

on linux there is no way to do this without your consumer being a kernel driver that actually [purposefully accepts one byte at a time. all file io and pipes have basically no way to accept one byte only (except EOF)


you wouldn't see it on Linux as pipes are implemented using complete pages (always powers of 2), but there's probably some OS out there with a different implementation where you can set the buffer size to be an odd number, and then you'd see the bug with plain simple cat

I didn't say you'd likely see it in practice :)


Not impressive at all. Basically he had to write a lot of manual buffering code to reach GNU yes throughput. I would suggest to use an infrastructure which already provides proper IO.

e.g. perl or clisp.

    $ perl -C0 -e'print "y\n" x (1024*8) while 1' | pv > /dev/null
    ^C.4GiB 0:00:11 [6.17GiB/s] [              <=>                                                     ]

    $ yes | pv > /dev/null
    ^C.3GiB 0:00:07 [6.64GiB/s] [         <=>
And with linux-only vmsplice the record stands at 123GB/s https://www.reddit.com/r/unix/comments/6gxduc/how_is_gnu_yes...


I guess a stretch goal would be to make a "shouldi" command that can consume more y's per second than yes can produce. Of course at that point the shell itself would probably become the bottleneck.


The shell can't be a bottleneck. If you run `yes | shouldi`, all the shell does is setup the pipe, give one of its ends as stdin to shouldi, and the other end as stdout to yes.


The shell does not even enter into the picture. The shell sets up the pipe and starts your programs. The `yes` program writes its output to stdout which is buffered in kernel and then directly read by your hypothetical `shouldi` program.


yes >/dev/null ?


Rust strike force is back, rewriting Unix tools in a sane language


    main(argc, argv)
    char **argv;
    {
      for (;;)
         printf("%s\n", argc>1? argv[1]: "y");
    }
Is beautiful, readable, and minimal. The "optimized" Rust version is complicated and over 50 lines of code. At what point does performance optimization go too far?


This is an unfair comparison. For starter this version too is unoptimized, so you need to compare this with the first Rust code, which is about the same size.

Moreover a lot of the "bloat" that comes with Rust code isn't because it's more performant. Neither C++ and Rust try to be faster languages (after all, the optimization tricks you can do with your program is limited to what you can do with Assembly and its cousin, C), they try to be safer by providing more abstractions and restrictions (compile time checks).


As a non-Rust developer my first thought upon seeing the optimized version was “On the other hand let’s not visit Rust, ‘tis a silly place.”


There was a lot of things about Rust that made think it was weird before I actually gave in and tried it. Now I adore Rust. You should try it.


Since it can be used to repeat arbitrary data, it is liable to be used in performance-sensitive tasks (mostly along the lines of pumping dummy data to a program or piece of hardware that's being benchmarked).


Funny, I just learned about this command a couple of days ago as a simple way to max out your CPU. I was trying to drain the battery on my Macbook Pro and running 4 of these at the same time did the trick nicely. Redirected to /dev/null and run in the background: "yes > /dev/null &"


Curiously, yes(1) is not standardized by POSIX.

Are there any UNIX systems that don't have it?


any idea why it is so much faster on fedora 26?

$ yes | pv -r > /dev/null

[10.3GiB/s]


benchmarking is useless when comparing different conditions

ie your computer is probably faster than op's


Of course. Though 37MiB/s compared to 10Gi/s probably means that the GNU version used by Fedora has a faster implementation than Apple's version.


You're right, my mistake, I didn't check the speed of the OP's


I use the yes command to defrost my lunch. I open up a couple of tabs running

yes > /dev/null

Then place my frozen lunch on the back of my macbook. Give it an hour or so and boom, defrosted.


Hah, nice! Somewhat related, I once worked on a project that used a high reliability PC meant for extended use in "extreme" outdoor environments. One of the issues they (manufacturer) worried about was the pcb and solder joints experiencing thermal fatigue fron lots of seasonal and night/day temperature cycles.

Their ingenious solution was to always run the system towards the warmer end of its spec, and so it included a program that would monitor the temperature inside the case, and would spawn/kill a bunch of threads doing compute intensive math in order to keep the temperature constant when the users workload wasn't enough!


If only mining cryptocurrency had been a thing back then...


That's amusing.

The laser cutter I operate simply has reverse cycle air conditioner mounted in one of the cabinet doors.


Couldn't they slow down the cooling fans? Or was it a passively cooled PC?


Passively cooled. The thing was originally meant for extended use in "extreme" industrial environments (eg at a power substation, inside a wind turbine, etc..), so it had no vents or moving parts at all. Everything was heatpiped to the metal case, which looked like a heatsink.

Similar looking model from same company: https://selinc.com/products/3355/


Reminds me of http://thedailywtf.com/articles/Just-a-WarmUp , a trick which I also discovered independently many years ago and used to keep my fingers warm while attending lectures in a nearly-unheated room.


Prior art for https://xkcd.com/1172/ ?


Geez man, at least mine some cryptocurrency.

Then your hot lunch will buy you another hot lunch down the road!


> I open up a couple of tabs

So, you're basically thawing your lunch manually. Have you considered the "parallel" command?


Hrm, no I hadn't. I don't know much about this command. Could you give me a (thawing) example?


It's pretty straightforward:

  parallel -- yes yes yes yes >/dev/null
If you've more processors than that, add on extra yesses.


That manual counting of cores is what I thought we would let `parallel` handle! I have never actually used parallel, though, so I don't know how to best do it.


one way on a leenukz

    parallel -- `sed -ne 's:^processor.*:yes:p' < /proc/cpuinfo`
on openbsd:

    parallel -- `perl -e 'printf("yes "x\`sysctl -n hw.ncpu\`)'`
don't try the latter in a shell script without being prepared to debug tho.


Right. I was unclear. What I really meant was that I was thinking that `parallel` could automatically spin up more and more jobs until it sensed that there is no further performance to be gained. I'm not sure to what extent that is true, though.


I just realized a simple shell script would make a passable imitation of JMeter, without the horrible UI and xml.


A shell script wouldn't be enough (I've tried). However there are plenty of CLI based load testing tools, including one I've written myself. And if you need something more advanced then there is always Gatling, which is run via the command line and produces proper HTML reports and graphs plus is extended in code (eg in Scala) rather than GUI controls


OT but if you want a nice alternative to jmeter, check out Gatling. Instead of XML, you can code your scenarios in scala.

edit : hnlmorg beat me to it :)


    for p in `seq 1 $(sysctl -n hw.ncpu)`; do yes > /dev/null & done
Alternatively you could run Boinc with a cancer research project or something - defrost your food AND help the world become a better place.


$(nproc --all) on Linux


Lol I defrost the chocolate like this by placing it near the air vent. But care should be taken to make sure that the wrapper doesn’t touch the chocolate while peeling off, because I don’t think the hot air from the vent is hot enough to kill all harmful microbes.


I think this method is much more healthy for you.


that's interesting. i open and close the mac slack app 5 times to defrost my lunch.


I just open a webgl page, seems more effective


I really don't know whether or not your comment is serious, but I damn sure want it to be


maybe it's supposed to be slow, wouldn't a faster 'yes' spam stdin much faster? you only need to hit yes occasionally and faster than a second


k3

   while[1;`0:"yes\n"]


tldr; watched pootie tang


  env::args().nth(1).unwrap_or("y".into());
this ridiculously complicated syntax to perform such a simple thing is why I will never accept Rust. What a clumsy, ugly language. I’ll just stick with learning ANSI common LISP, that pays immediate dividends.


It's fairly idiomatic so you get used to it but if you prefer something a little more C-ish you can write it like that (as a side note the code in TFA forgets to import std::env and BufWriter):

    use std::env::args;

    fn main() {
    
       let expletive =
            match args().nth(1) {
                Some(arg) => arg,
                None => "y".into(),
            };
    
        loop {
            println!("{}", expletive);
        }
    }
This is the equivalent to the unoptimized C version. I would argue that it's a little more readable too, substituting "loop" instead of the idiomatic "for(;;)" and the more verbose match syntax instead of the ternary (the terser rust equivalent being the `unwrap_or` you seemed to dislike).

Personally I think I'd use the "unwrap_or" version, when you're familiar with the language it's completely transparent, easier to parse and expresses intent better I think. For an outsider I can see why it would look like a strange incantation though (but the same could be said about ternaries or CL's "do" construct for instance).

The only thing I'd deem inelegant here is the `"y".into()`. It's probably not obvious what it does and why it's necessary to somebody not familiar with Rust.


That example is indeed more readable, but it still reads like methods on objects which is a fundamental issue for me. Any non trivial program written in an object-oriented paradigm eventually becomes incomprehensible, whereas the functional approach doesn’t rely on the state machine model. Two different approaches. I didn’t care for objects back in 1990 and I certainly care even less for them now. I upvoted your example in appreciation for the effort you went through though.


I agree that for these types of constructs a more functional style could look better. That being said the OOP approach as the merit of reading in the proper order: you get your args, pick the 1st and then use that or the default "y" value.

If you rewrite this to use function calls instead of methods you end up with something like:

    (or (cadr (args)) "y")
Which is fine if you're used to lisp syntax but it's really a matter of taste at this point.


The only difference from

  expletive = 'y' if sys.argc == 1 else sys.argv[2]
is the memory safety guards, which happens to be the key feature of Rust


But what if one doesn’t care about memory safety because one has other languages at one’s disposal, like shell, AWK, Lisp? What benefit does Rust bring then? And at what price?

And who is to say that the programmers of Rust built a perfect language that always generates code which is bounds-safe? I question that as I know from experience that no human has ever written 100% correct code 100% of the time; even machines aren’t capable of achieving that. Therefore, I hold that the entire promise of Rust is flawed. I don’t like Rust one bit.


We had a large community discussion about this last year: https://brson.github.io/fireflowers/


I don't think that theme is bright enough.

Perhaps neon green with yellow letters would do the trick?


Problem is that Rust is not memory safe anymore, since they switched from GC to refcounting and added unsafe. Pair that with being no concurrency safety (deadlocks, races) there are only macros left which do make rust attractive.

On the other side there are proper languages which do provide all safeties and beat rust or C++ in performance, such as pony. A proper type-system does help (compile-time guaranteed), but you could also add such features to the run-time system (e.g. a GC or safe threading with a single-writer system such as parrot). With rust you don't have any of it, you have to manually add locks or mutexes into your threaded code, and try to avoid unsafe.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: