GNU Parallel – The command line power tool

ealloc · on Aug 14, 2013

I use parallel all the time for embarrasingly parallel scientific computations on a cluster. It is very easy to use and elegant, and it's one of the programs I'm most grateful for.

Recently the developers fixed a major bug for me, that child jobs on other nodes would not be killed when parallel was killed. This was the only thing stopping me from recommending it to my labmates, now there's no reason not to use it!

ole_tange · on Aug 14, 2013

Remember to show them 'parallel --bibtex'

oofabz · on Aug 14, 2013

I use GNU Parallel. I like it because its interface is simple - input is piping filenames to it, just like xargs, and output is nicely collated to the screen.

I used to use ppss, which does the core task just as well, but the interface is more complex.

I mostly use these tools to optimize large numbers of PNGs before deployment, using optipng, pngout, and/or my own lossypng. These programs take a while to run so using all my cores gets the job done a lot quicker.

felixr · on Aug 14, 2013

The documentation of GNU parallel (https://www.gnu.org/software/parallel/man.html) also contains a lot of nice examples on how to use parallel.

rcthompson · on Aug 14, 2013

i use GNU parallel exclusively in place of xargs simply because it has --dry-run.

_ZeD_ · on Aug 14, 2013

wouldn't an "echo" just do the same?

    $ mkdir a
    $ cd a
    $ touch b c d e
    $ find -type f | xargs echo rm
    rm ./b ./d ./e ./c
    $ ls
    b  c  d  e
    $ find -type f | xargs rm
    $ ls
    $

jasomill · on Aug 14, 2013

One advantage of "parallel --dry-run" over "xargs echo" is that the former quotes its output:

    $ touch 'Ham
    Jam
    Spam'
    $ touch 'J.R. "Bob" Dobbs'
    $ find . -type f -print0 | parallel -n1 -0 --dry-run echo
    echo ./Ham'
    'Jam'
    'Spam
    echo ./J.R.\ \"Bob\"\ Dobbs
    $ find . -type f -print0 | xargs -n1 -0 echo echo
    echo ./Ham
    Jam
    Spam
    echo ./J.R. "Bob" Dobbs

For my own "dry runs", though, I've always preferred passing the command line to

    #include <stdio.h>
    
    int main(int argc, char* argv[]) {
        char** argp;
        int i;
        printf("argc = %d\n", argc);
        for (i = 0, argp = argv; *argp != 0; ++argp, ++i) {
            printf("argv[%d] = %s\n", i, *argp);
        }
        return 0;
    }

to remove all reasonable doubt.

0xbadcafebee · on Aug 14, 2013

Why does the parallel example look weird? It put the single quotes in all the wrong places, and completely around Jam. It should have printed

  echo './Ham
  Jam
  Spam'

but your output looks different. As an example of how it should look, try this:

  $ find . -type f -print0 | xargs -n1 -0 perl -e'print "\"$_\" " for (@ARGV)'
  "./Ham
  Jam
  Spam"

davvolun · on Aug 14, 2013

Having never used parallel, I still believe parallel was correct.

./Ham' 'Jam' 'Spam

would be identical to ./Ham\nJam\nSpam (if \n were the correct translation to the newline in this case) or './Ham Jam Spam'

This would be identical to what you wrote, but only punts to quotes when it doesn't have a canonical method of representing the character otherwise. The fact that you don't need to explicitly concatenate two strings in the shell may be what's throwing you off?

Interestingly enough, 'Ham\n\nJam\nSpam' becomes

./Ham' '' 'Jam' 'Spam

So parallel is just literally outputting all newlines using quotes. I believe this would be identical, if you analyzed it and saw that two newlines are next to each other:

./Ham'

'Spam' 'Jam

seryoiupfurds · on Aug 14, 2013

It looks like it's only single-quoting the characters that need it, in his example the newlines.

lotsofcows · on Aug 14, 2013

Quoting is nice. That's better than bash -x debug mode too.

rcthompson · on Aug 14, 2013

What if your command has a pipe in it? Then putting echo in front won't work, because the command after the pipe is still executed. The dry run option always works, and doesn't require editing the command itself.

dredmorbius · on Aug 14, 2013

Nice. I just had to check to see if xargs has a similar feature. It doesn't, though the --no-run-if-empty and --verbose options are both handy. I believe I've used xargs with "echo <commandlist>" to proof output before committing it.

You can simply re-run that piped to bash (or your shell of choice) to execute commands if you wish (say, if parallel isn't available).

E.g.,

     echo foo bar baz | xargs -n1 -t echo ls | bash

... will execute 'ls foo; ls bar; ls baz', while showing the expanded command.

rcthompson · on Aug 14, 2013

Yes, you can use echo, but that won't work if the command is something like "ls DIR | wc -l".

dredmorbius · on Aug 14, 2013

Depending on how you want to do your counts or piping, you could run that after the xargs / parallel execution, which would be much more efficient (fewer processes and execs) anyhow.

zurn · on Aug 14, 2013

Anybody have a link to a version viewable without proprietary plugins? "Flash Player 9 (or above) is needed to view presentations"

dreen · on Aug 14, 2013

This looks very similar but is actually a year newer than the slides on slideshare:

http://www.luga.de/Angebote/Vortraege/GNU_Parallel_LIT_2011/...

shrike · on Aug 14, 2013

I use GNU Parallel with s3cmd to move big data sets in and out of S3. I can easily saturate any network connection. I was able to GET ~2TB from S3 onto a Gluster cluster in a little more than an hour by using GNU Parallel to spread the GETs across 8 instances. Incredibly powerful, easy to use tool.

adrianN · on Aug 14, 2013

Wow, I must have reinvented this particular wheel at least five times.

mineo · on Aug 14, 2013

This is exactly what I feel like every time I see an article/presentation about (maybe even really small) tools that just get the job done but I didn't know about and didn't even think about looking for although I can think of so many cases where they would've been incredibly useful.

ole_tange · on Aug 14, 2013

A lot of the users of GNU Parallel have felt exactly the same way.

gnoe · on Aug 14, 2013

Is parallel buggy or is it just me? For example if i have a list of ip addresses:

  $ cat ips.txt | sort | uniq -c | sort -rn

   3 127.0.0.1
   2 192.168.1.1
   1 192.168.1.2

Now i want to reformat the output of uniq -c, i want the count to the last column:

  $ cat ips.txt | sort | uniq -c | sort -rn | parallel --colsep ' ' echo {2} {1}

But gives empty output.. what gives? It only works if I double pipe it thru parallel like this:

  $ cat ips.txt | sort | uniq -c | sort -rn | \
      parallel --trim lr echo | parallel --colsep ' ' echo {2} {1}

  127.0.0.1 3
  192.168.1.1 2
  192.168.1.2 1

ole_tange · on Aug 16, 2013

You have more than 1 space from uniq. 2 options:

  parallel --colsep ' +' echo {2} {3}

or:

  parallel --colsep ' ' echo {7} {8}

gnoe · on Aug 17, 2013

Thanks!, but how come the whitespace is not trimmed by --trim lr? The manpage says it trims whitespace left and right if --colsep is used.

jftuga · on Aug 14, 2013

I wrote a similar program for windows.

https://github.com/jftuga/Windows/tree/master/mp

The only file you need to download is mp.exe. Source code is mp.au3.

RachelF · on Aug 15, 2013

nifty tool!

guangnan · on Aug 15, 2013

Load test with parallel:

  cat urls | parallel --jobs 4 --load 6 'curl -s -w "%{time_total}\n" -o /dev/null {}'

ck2 · on Aug 14, 2013

I love pssh for simplicity but I guess I better look at fancier stuff too.