Hacker News new | past | comments | ask | show | jobs | submit login
Parallel – A command-line CPU load balancer written in Rust (github.com/mmstick)
148 points by 0xmohit on Aug 26, 2016 | hide | past | favorite | 63 comments



Further overloading this overloaded utility name screws up packagers and hinders usage of any of the variants. Rename it please.


Agreed. GNU Parallel does something similar and this is confusing.

But this does claim to be a reimplementation of much of GNU Parallel's functionality. If it's compatible, maybe there isn't too much of an issue.

In the interest of silly UNIXism, I hereby vote for the name "serial" instead. It's just like "more" and "less", right?


I believe it intends to support a subset of GNU parallel and be command-line compatible with that subset. And "perpendicular" would be a far better joke name, IMO. :P


Hitachi already made your theme song: https://www.youtube.com/watch?v=xb_PyKuI7II :)


subpar


The code is more complicated than it needs to be. It spawns N threads, then each thread forkexecs child processes. These threads communicate through an atomic int.

There is no need for threads. Just spawn background processes:

    echo 1 &
    echo 2 &
    wait
    echo 3 &
    wait
    echo 4 &
    ...
The key is that the wait() system call will hang until any child process finishes.


I doubt that will make it measurable more efficient. Threads are cheap.


My point wasn't optimization. Just that the author was making it more complicated than was needed.

It's like discovering that "ls" is spawning 5 threads for internal communication.


In a language with good threading support, internal threads are typically making things easier rather than harder. I tend to spawn lots of threads in Rust just because I can and it simplifies the code a lot over having to do some async callback mess.

In particular there is no sane way to async waitpid() on POSIX.


pselect / self-pipe / etc. on SIGCHLD, and on receipt of SIGCHLD, loop through all known child processes with wait(WNOHANG).

... I'm not sure I can disagree with "no sane way".


The only thing you can do from a signal handler is flipping a static and only one signal handler can be set. Good luck making this reusable.

In fact, I challenge you to solve the problem "spawn process; waitpid for 15 seconds; otherwise kill hard" in Rust (or C++ if you feel like) on POSIX once with threads and once without threads by sticking to what's permitted in the standard and so that multiple processes can be waited for.

Then also measure CPU impact :)


> The only thing you can do from a signal handler is flipping a static and only one signal handler can be set.

This is false, you can call any async signal safe function. Incidentally write is one of them.

Another trick is the close-on-exit pipe.


"Async-signal-safe" is a C concept (from the POSIX world where C is your interface to the system, and library calls vs. system calls are behind the abstraction layer), so it doesn't directly apply to Rust. But the underlying semantics of signals are simple to describe: you get interrupted at some instruction pointer and jump into a new function. You can do whatever you want provided you uphold safety, correctness, liveness, etc.

If you change a variable, it has to be one that isn't prone to being cached in a register or the stack by the main program. POSIX's sig_atomic_t does this; in Rust you can use the normal atomic types. They are a tiny bit too careful if this is thread-local, but an ordinary thread-local variable is permitted to be cached within the same thread, and signal handlers break that.

If you take a lock, you have to do something reasonable if the lock is already held, including by the code you interrupted. So you probably shouldn't lock at all. The biggest reason for a POSIX function not to be async-signal-safe is because it wants to call malloc, which takes out a lock (at least a per-thread or per-CPU lock) on the heap. If you get signaled during a malloc, and the signal handler tries to malloc, you deadlock.

But anything that does not risk liveness or correctness problems is fair game. In particular, basically all system calls are fair game, since they're just sending a message to the kernel. C's fprintf() will want to buffer in userspace, which involves an allocation, but write() will at most buffer in the kernel, and the kernel-side code doesn't have the problem of having flow control interrupted while you're in a signal handler. Even if you were previously in a blocking write() when you received a signal, the kernel will return from its implementation of write before delivering the signal back to userspace, so there isn't a re-entrant call to the kernel-side write code. libc's fprintf() doesn't have that luxury.

(And yes, the concept of Rust on POSIX is a bit ill-defined, because POSIX is a set of C-language APIs, which can be implemented in any valid way in C, including header macros. Rust threads use pthreads, yes, but inter-thread communication doesn't involve whatever sig_atomic_t is typedef'd or #defined to.)


Note that the lock is a red herring. A single threaded malloc wouldn't be async signal safe either unless explicitly designed to be so which is hard. In fact anything that touches mutable state, including thread local state is a problem.


Yeah, good point. The weird part is that it almost introduces a new thread, in that it introduces a new flow of control (and, if you ask for it, a new stack), but it doesn't count as a new thread in all the usual ways, like thread-local variables or PIDs. So any code called from a signal handler must be "thread-safe", but that thread-safety cannot rely on assigning an identifier to the thread, because you're using the same identifier as the parent thread.


> Yeah, good point. The weird part is that it almost introduces a new thread, in that it introduces a new flow of control (and, if you ask for it, a new stack), but it doesn't count as a new thread in all the usual ways, like thread-local variables or PIDs.

Worse: most signal handlers that do anything other than setting globals destroy errno in one way or another and when you go back the code that was interrupted has a good chance of malfunctioning.


> This is false, you can call any async signal safe function. Incidentally write is one of them.

Which malloc() is not. Anything that might internally allocate is out of the question. The list of functions that are safe to call in C alone is very limited and even then the question of errno arises.


why are you bringing up malloc?


Because you do not necessarily know which call will allocate memory in Rust and even in C it can be tricky. You literally can only go with the whitelisted functions.


Yes, that's way you should, as I suggested, only call functions explicitly documented as asyc-signal-safe.


Here's a sound implementation with signals using only the Rust standard library and POSIX, no threads:

https://play.rust-lang.org/?gist=ba4802a59f462cb8bce0c1bac92...

It would be significantly simpler with sigwaitinfo() if you didn't care to do a real select loop, but I assume most programs want a real select (or poll/epoll/whatever) loop.


Which proves my point. It's messy, inefficient and now you wrote code which is not composable as you have a global (and the only) signal handler.


I don't think it's less efficient than threads (but someone should test it!). It scales better.

It's not composable, but it's not impossible to work around that if you can assume non-POSIX. Platforms with kqueue just get this right because kqueue can wait for processes. Linux will let you use a different signal to alert for process completion, so pick one of the realtime signals (which is its own game of global namespaces, but hey), and library code that uses SIGCHLD won't be affected. So that's Linux, OS X, and the BSDs. I don't know of a good workaround for Solaris.

A POSIX-compliant way that makes this more composable is to make a single, dummy process to be a process group, setpgid() all your actual children into that process group, and have that process spend its time doing waitpid(0, WNOHANG) and sending you notifications over a pipe or something. That is higher overhead, but my guess is that scales better for many child processes (O(1) extra processes vs. O(n) extra threads).


It's less efficient if the "master" has to select over multiple things. Eg: waiting for children to finish, do some IO with them etc.

In that case you are pretty much limited to polling on the thing (unless apparently you use kqueue, did not know you can wait on process events).


This looks like a perfect real world complete useful application that a beginner in rust could look at.


A rust beginner could also look at Jonathan Turner's lessons from solving first 12 Project Euler problems.

[0] https://www.jonathanturner.org/2015/10/lessons-from-first-12...


>The owner of www.jonathanturner.org has configured their website improperly. To protect your information from being stolen, Firefox has not connected to this website.

I added an exception as I pretty much always do but in case the owner of the site is around here, they might want to investigate this.


website owner here: odd, what configuration of firefox gives you that warning. When I connect with a recent OS X version, it seems to work okay.


Firefox on Windows here. Looks like it's getting the certificate for github, but throwing an error because your domain isn't listed in that certificate. https://imgur.com/a/UeXtC


Yeah, it's not set up for https. Do you see the same thing with http://www.jonathanturner.org/?


Nope, that's fine. Not sure why 0xmohit posted the HTTPS link in the first place.


Same error for Firefox on Linux.


I've just pinged them, thanks.


I always tended to autogenerate a makefile with fake goals like:

    g1:
        job1

    g2:
        job2

and then one goal to depend on them all and I used make -j 4


From: https://www.gnu.org/software/parallel/history.html

> parallel dates back to around the same time. It was originally a wrapper that generated a makefile and used make -j to do the parallelization.


parallel is awesome for when you want to want to run a number of jobs unknown in advance, such as using it as an xargs replacement:

  find . -type f -name | parallel --jobs 5 process_file
That will run "process_file foo" for ever file in a directory with 5 processes running in parallel.


Not being a jerk, but what does parallel bring to the table that xargs doesn't?

  find . -type f -name | xargs -I {} -n 1 -P 5 process_file {}


You can have remote workers with parallel (not sure if you can do that with xargs or not).

    parallel --progress --wd '.' -S '4/fourworkers@blehworker.org,8/eightworkers@eightworkers.org,2/:'  "./complicated_command --param1={1} --param2={2} ::: $(some_shellcommand_to_generate_param1_values.sh) ::: $(some_shellcommand_to_generate_param2_values.sh)
It's very flexible, become my go-to for speeding up parallelizable work. Next step up would be Celery or Spark or something.



Not comparing it with the simpler, better moreutils parallel is really unfair.


I hope you wont take this the wrong way, but can you elaborate on why you feel Tollef's parallel (from moreutils) is better than GNU Parallel?


It was addressed elsewhere in the thread. My take is: 1/ it's written in C. 2/ it doesn't suffer from feature creep.

The code is 427 lines of C. GNU parallell is 10k+ lines of perl. Considering mmstick compares the loading times, it's easy to see where the difference comes from.


I wrote a go utility that is not designed to be direct replacement, but rather nicer CLI tool that does something quite similar. My version is quite a bit more limited, but totally meets my current needs.

https://github.com/amattn/paral


Why this can be faster than GNU parallel?


GNU Parallel is extremely sluggish because it does all sort of different things behind your back: It buffers output on disk (so the from different jobs are not mixed and you are not limited by the amount of RAM - it will even compress the buffering if you are short on disk space), it checks if the disk is full for every job (so you do not end up with missing output), it gives every process its own process group (so the process with children can be killed reliably with --timeout and --memfree), and a lot of other stuff.

It lets you code your own replacement string (using --rpl), and lets you make composed commands with shell syntax:

    myfunc() { echo joe $*; }
    export -f myfunc
    parallel 'if [ "{}" == "a" ] ; then myfunc {} > {}; fi' ::: a b c
It does not need a special compiler, but runs on most platforms that have Perl >=5.8. Input can be larger than memory, so this:

    yes `seq 10000` | parallel true
will not cause your memory to run full.

You can read a lot more about the design in `man parallel_design` and see the evolution of overhead time per job compared to each release on: https://www.gnu.org/software/parallel/process-time-j2-1700MH...

In other words: Treat GNU Parallel as the reliable Volvo that has a lot of flexibility and will get the job done with no nasty corner case surprises.

It is no doubt possible to make a better specialized tool for situations where the overhead of a few ms per job is an issue and where you neither need brakes, seatbelts nor airbags. xargs is an example of such a tool, and you can have both GNU Parallel and xargs installed side by side.


One possible reason is that GNU parallel is a perl script.


a version in C that I think was first released before GNU parallel is in "moreutils" https://joeyh.name/code/moreutils/

A few years ago, debian made GNU parallel provide the "/usr/bin/parallel" executable, instead of moreutils. The maintainer of moreutils had some interesting things to say about that: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=597050#75


> It is 5143 lines of code, and the anthethesis of a simple, clean, well-designed unix command.

Only 5k? :D

But, yes, the criticisms are valid. I recommend moreutils.


He lost me at the point when he complained that GNU parallel "includes the ability to transfer files between computers". For me at least that is _the_ feature of GNU parallel that actually makes it really useful. Which I guess is the problem with all these discussions, one persons useless bloat is another persons nr. 1 killer feature.


But in the spirit of Unix, shouldn't parallel exec some file transfer program?


The actual Perl code calls out to ssh and rsync (or can be configured to use something else) when it's time to actually connect and transfer files. It just does it in a way that is nice and reasonably transparent to the end user.

It felt like his complaint was that that was 'bloat' since Real Men can achieve almost the same thing by just piping some output through some bash scripts they just hacked together.


And it is exactly the hacking part that GNU Parallel tries to help with: A lot of the helper functions in GNU Parallel could be done by expert users (--nice, --tmux, --pipepart, env_parallel, --compress, --fifo, --cat, --transfer, --return, --cleanup, --(n)onall).

But non-expert users will invariably make mistakes (e.g. get quoting wrong, not getting remote jobs to die if the controlling process is killed, or re-scheduling jobs that were killed by --timeout), and why not just have small wrapper scripts built into GNU Parallel that are well-tested, so the non-expert users can enjoy the same stability as the expert users?


Having written my fair share of those hacky wrapper scripts before I discovered GUN parallel I certainly am very happy that they offer everything I need in a single easy to use command.


It does: Rsync.


One of the reasons for the high line number count is the decision not to depend on modules that are not part of the core of Perl 5.8.

Quite a few of the lines are to deal with different flavours of operating systems.

The benefit of this is that by copying the same single file you can have GNU Parallel running on FreeBSD 8, Centos 3.9, and Cygwin.


> a version in C that I think was first released before GNU parallel

While this is technically correct, it is misleading: GNU Parallel existed before it became GNU. See details on: https://www.gnu.org/software/parallel/history.html


I'm sure that's the cause. Their test script is just a straight `echo` of the input, so each test process will exit essentially immediately - it's unlikely the parallel aspect actually kicks in to any decent degree. The majority of the test is spent in the Rust/Perl code vs actually running commands. That said, while the test isn't hugely useful, the fact that this Rust implementation has much less overhead is still a notable improvement.

To add to this, parallel mentions as much in its man pages (that there is a certain startup cost, and a certain job-startup cost), and offers tips for speeding up the processing of jobs which exit fairly fast. But there's no reason you couldn't also do those things in the Rust version, so it's going to win every time. When dealing with commands which take a while to complete though, the extra overhead of the perl script would probably be negligible.


Err, no. It execs other processes. the runtime overhead of running the interpreter is irrelevant as it provides no overhead over the general runtime of the sub-process.


GNU Parallel takes a surprising amount of CPU time. It does have various tasks (track it's children, feed them input, gather all their output and output it to the screen in the correct order), but I'm still surprised how much CPU it takes.


Hmm.. The example put up on the github README is a bad test case(for performance comparison), as it does almost no processing on the actual subprocesses. A better one could be creating thousands of small files(with dd or something in a directory, though it's io-bound not cpu-bound). Best might be some kind of repeated floating point exponentiation repeatedly.


Like headlines that are questions (answer: "no"; next), I tend to skip over headlines that say "written in $language". Usually means they've copied something that already exists, or it doesn't have much else distinctive about it than being written in that language.


I often use make to run jobs in parallel. It is nice to be able to continue after an error or breakage, especially when working with large workloads.


Using Makefiles, however, depends on the results being files. If that is not the case, make cannot tell how far you got.

GNU Parallel has --joblog and can continue from where it left off or retry all failed jobs again.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: