Hacker News new | past | comments | ask | show | jobs | submit login
GNU Parallel - build and execute command lines from standard input in parallel (savannah.gnu.org)
150 points by jcsalterego on Oct 17, 2010 | hide | past | favorite | 36 comments



I'll express here what I feel while reading any man pages..

Examples should be at the top!

(10 years of frustration in that one line message :p)


I've just resigned to do a search for /EXAMPLES as soon as I need any.


LESS='+/EXAMPLE' man parallel


This is very cool, but a little opaque at first read... To get a quicker and digestible intro, watch the video introduction (linked on the page): http://www.youtube.com/watch?v=OpaiGYxkSuQ


Well, the example(at least the first one) in that video is bit skewed. First, he runs gzip and then immediately runs 'parallel gzip' without dropping disk caches. So in the later case the bottleneck would be CPU rather than disk IO(everything read from disk cache in the RAM). IMO I expect for the work that is IO bound we won't see any significant improvement using parallel or anything similar.


Ideas for next video are most welcome. The ideal task:

1. Is single threaded 2. Takes a lot of CPU 3. Is a task that everyone can understand and relate to and which is close to a real world scenario

I have loads of examples meeting requirement 1+2. It is 3 that is the hard part.

Post them to parallel@gnu.org


How about doing something with imagemagick or mencoder? I think video encoding/decoding gives a nice balance between disk and cpu usage.


Here's an imagemagick example; over six minutes with xargs, under 20 seconds with parallel

  $ ls *.png |wc -l
  3580

  $ time ls|sed 's/\(.*\)\..*/\1/'|parallel convert {}.png {}.ppm
  ls --color  0.00s user 0.01s system 63% cpu 0.016 total
  sed 's/\(.*\)\..*/\1/'  0.01s user 0.00s system 39% cpu 0.025 total
  parallel convert {}.png {}.ppm  97.39s user 61.87s system 890% cpu 17.883 total

  $ time ls|sed 's/\(.*\)\..*/\1/'|xargs -I {} convert {}.png {}.ppm
  ls --color  0.01s user 0.00s system 63% cpu 0.016 total
  sed 's/\(.*\)\..*/\1/'  0.01s user 0.00s system 39% cpu 0.025 total
  xargs -I {} convert {}.png {}.ppm  93.08s user 47.88s system 38% cpu 6:10.88 total


Ah, one of these again.

I wrote a simpler one a couple of years ago (http://code.google.com/p/spawntool/) myself. All it does is read commands from stdin, one per line, and keep a desired number of processes running until all command lines are exhausted. Simple.

I wrote my own because I got tired of all kinds of substitution and quoting issues with xargs. With spawn I only need to generate the shell commands and instead of piping them to bash I pipe them to spawn. Also, this means I can easily review your command line generation with less (so that quotes etc. are good) until I eventually switch to sh or spawn.


If it is simple I would love to see the examples from http://www.gnu.org/software/parallel/man.html#example__worki... converted to spawn.


See also the 'push' shell: http://code.google.com/p/push/


It seems like 90% of the uses for this can be taken care of with xargs:

    echo "file1 file2" | xargs -P 2 gzip


As I understand it, xargs only runs on the local machine; GNU parallel can run on remote machines as well. So parallel is the cluster-friendly version of xargs's -P.


Yep. There also is dxargs which looks very useful http://www.semicomplete.com/blog/geekery/distributed-xargs.h...




Don't you need

    echo "file1 file2" | xargs -P2 -n1 gzip

?


Probably, although it doesn't seem to be the focus of xargs.

And the version of xargs that is included with Solaris 10 doesn't have the -P option. In which case, installing gnu parallel is a slightly easier option than installing a different version of xargs.


Nice example from the docs:

Convert .mp3 to .ogg running one process per CPU core on local computer and server2:

    parallel --trc {.}.ogg -j+0 -S server2,: \ 'mpg321 -w - {} | oggenc -q0 - -o {.}.ogg' ::: *.mp3


On first impressions I like this much more than ppss. The distributed setup is much easier and the documentation is more thorough.


Someone managed to get this working on os x?

Standard "./configure && make && make install" outputs errors.


The MacPorts version worked for me.

Perhaps the Portfile will have the patches you need to manually compile on OS X:

http://trac.macports.org/browser/trunk/dports/sysutils/paral...


Just tried it and got errors galore. Oh well, I'll keep trying.


Make sure you are using GNU Parallel and not another version of parallel. Try:

parallel --version


In what ways is this better than make -j?


Because it can be run on the command line, ad hoc. Make -j is great for pre-existing command lists and dependencies. But as the man page describes, parallel is like xargs, which I use all the time on the command line for ad hoc actions (frees me from having to write a bash loop).


make requires a Makefile, whereas one can pass parameters directly to parallel.

There also seems to be a few more options revolving around job success/failure and how to react -- a) ignore failed jobs and report how many at the end, b) cleanly exit as soon as a job fails and c) stop all jobs as soon as one fails.


Those (a) (b) and (c) points sound like strengths of make, to me.


Sorry, those were features of parallel, not make (unless I'm mistaken).



parallel is intended to work on arbitrary commands.


So is make.


It can, but that was not its intended purpose. That is, you can figure out a way to map your task to a dependency hierarchy and save it to a Makefile, but why do that when you could use something designed for that?


Would it work on a PS3 running ubuntu?


should work if you install moreutils package in ubuntu (lynx)


The "parallel" in moreutils is an unfortunate naming collision, it is a trivial (< 200 LOC) program that is in no way comparable to GNU parallel.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: