Unix: When pipes get names

Erwin · on Oct 9, 2013

Don't forget the ordinary UNIX domain socket: you can think of them like TCP connections bound to 127.0.0.1 but not assigned a port number but a path name. This can be useful if you have e.g. 3 instances of the same program that needs to do some client/server work: rather than figuring out a port number assignment it can have a UNIX socket. Linux has also an extension to this mechanism, where the UNIX socket does not create a corresponding file.

In addition to doing cool things like passing open files over a UNIX socket, or credentials (your remote server can verify the UID of whoever opened the file in the other end!) the SOCK_DGRAM UNIX sockets are reliable and accept packets beyond normal UDP limits. This is useful for load balancing (you don't have to proxy your connection but can pass the new accepted socket directly to whowever is going to send data to it).

So if you want to make a simple server yet requesting/replying good size packets you can bind it to AF_LOCAL/SOCK_DGRAM, then the client calls bind() and sendto() while the server calls bind, then recvfrom() (which fills in the address of the caller) and sendto() back.

beagle3 · on Oct 9, 2013

For people like me who were aware that this magic exists, but not how it is done: You might want to look at:

http://www.normalesup.org/~george/comp/libancillary/ [code]

http://www.lst.de/~okir/blackhats/node121.html [description]

beagle3 · on Oct 9, 2013

Can you give a link to good example code and documentation about this?

denysonique · on Oct 9, 2013

http://nodejs.org/api/net.html#net_net provides one example, instead of binding to a host/port it listens on a UNIX socket.

saljam · on Oct 8, 2013

I feel I should mention Plan 9's pipeline branching. Not named pipes, but something which I guess served a significant subset of named pipes' use cases. The Plan 9 shell, rc, had a feature where when a subprocess was invoked with the syntax:

  <{ls}

that string would get replaced by a filename, which when opened would be hooked up to the stdout (or stdin for >) of the invoked command.

Used as an argument, this allowed you to essentially do none-linear piping in a mostly transparent way. I say mostly because I don't think you could seek.

Every once in a while I find myself wishing there was something like this in bash. Maybe there is...

Gygash · on Oct 8, 2013

In bash, that's referred to as process substitution[1]; the syntax is only slightly different:

    <(command)

Edit to add: both ksh and zsh support process substitution, too:

    ksh: <(command)
    zsh: either =(command) -- uses a temporary file
             or <(command) -- uses a named pipe

[1]: http://tldp.org/LDP/abs/html/process-sub.html

RamiK · on Oct 9, 2013

You shouldn't bring up Plan9 when discussing Linux.

It's akin to bringing up Mozart's writing symphonies at the age of 8 to the proud mother of a somewhat disabled child that has finally mastered the potty at the tender age of 12.

Seriously, you can literally list every single new IPC introduced in *nix in the last 3 decades and counter it with a better design and execution under Plan9. It's simply the result of a proper design as opposed to an afterthought. No real point in bringing it up unless you can constructively use the information to fix Linux.

eru · on Oct 9, 2013

Why? This seems like an issue that could be fixed in the shell---i.e. no need to involve any kernel design at all.

zobzu · on Oct 9, 2013

yea your comment certainly is constructive.

RamiK · on Oct 9, 2013

It's not. That's the point. You can't get constructive about this comparison. I suppose the only exception is to go the Wayland path and push IPC outside the kernel into the user land. At least there you can reason redundancy vs. redundant... But I honestly can't think of a single instance when that led to actual work getting done ;)

clarry · on Oct 9, 2013

> You shouldn't bring up Plan9 when discussing Linux.

But the discussion isn't about Linux.

derleth · on Oct 12, 2013

Wow. This is entirely incorrect on a simple, factual level.

octo_t · on Oct 8, 2013

you can run

  diff <(ls /path/to/dir/1) <(ls /path/to/dir/2)

in bash/zsh - im not sure if its a per executable thing or actually built into the shell?

saljam · on Oct 8, 2013

ah, so you can! brilliant. thanks for that!

derleth · on Oct 9, 2013

> im not sure if its a per executable thing or actually built into the shell?

It's built into the shell in zsh, and it must also be in bash. By way of explaining how I know, I am first going to tell you a little bit about the program called strace, but probably not backwards-talking little people:

http://en.wikipedia.org/wiki/Strace

When invoked like so:

    strace diff <(ls /path/to/dir/1) <(ls /path/to/dir/2)

strace positions itself between the running process which it invokes (in this case, diff) and the OS kernel. It then prints for you all of the communications traffic, in the form of system calls and return values, between the kernel and the process. Because it runs after the shell has expanded the command line, it gets to see the command line as the program sees it, as opposed to how it was typed.

Now, the output scrolls by pretty fast (there is a -o option to save the output to a file, which I neglected to bother with here), so I saw the end first:

    stat("/proc/self/fd/11", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
    stat("/proc/self/fd/12", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
    open("/proc/self/fd/11", O_RDONLY)      = 5
    open("/proc/self/fd/12", O_RDONLY)      = 7

and so on. Obviously, diff is getting information from /proc/self/fd/11 and /proc/self/fd/12. Weird. Where do those come from?

Scrolling up to the top of my xterm, we have the answer:

    execve("/usr/bin/diff", ["diff", "/proc/self/fd/11", "/proc/self/fd/12"], [/* 68 vars */]) = 0

You can look up execve for yourself; the upshot is, its second argument is the argv passed in to the process by the runtime environment, which reflects the command line after shell expansion. Thus, the <(ls foodir) stuff is a shell feature, and not per-command.

Hello71 · on Oct 9, 2013

Or, more easily:

    echo <(:)

aidos · on Oct 9, 2013

Ok, you're really going to have to explain that to me - what is (:) ? Or : for that matter!?

alxndr · on Oct 9, 2013

http://stackoverflow.com/questions/3224878/what-is-the-purpo...

cynwoody · on Oct 8, 2013

List your files and count them, too:

    ls -l|tee >(wc -l)

Works in bash, but not sh.

laumars · on Oct 9, 2013

Nice example.

One minor correction I'd made though, it that you don't need '-l' in your 'ls' parameters as 'ls' automatically outputs one item per line when you're piping it's output, plus the '-l' option adds an additional line.

Though if you did still want to see the file permissions but have a correct line count then the following would crop the 1st line from 'ls -l':

    ls -l | sed 1d | tee >(wc -l)

Weirdly though, I thought 'ls' had an option to show a file count, but I can't find it in 'man'.

thisrod · on Oct 10, 2013

Not only did Plan 9 have named pipes, it lacked any other sort. Pipes were implemented as a synthetic filesystem.

  http://man.cat-v.org/plan_9_2nd_ed/3/pipe

yason · on Oct 9, 2013

Speaking of pipes, what would be a good way to branch pipes for some work and then join them back again in order? Running:

  ls | pee "./cruncher" "./muncher"

will produce output from both ./cruncher and ./muncher in a mixed order. But often it would be more useful to synchronize the output so that all of ./cruncher output comes first and then ./muncher output.

You could redirect ./cruncher and ./muncher output to individual named pipes and then pipe a 'cat cruncherpipe muncherpipe' to follow that:

  mkfifo cout mout
  ls | pee "./cruncher >cout" "./muncher >mout" | cat cout mout

But that requires the manual precreation of the named pipes which isn't very clean. Using <(./cruncher) on the command line would automatically create an output pipe but doesn't allow redirecting the pipeline to each of the subcommands.

Any brilliant ideas?

anonymous · on Oct 9, 2013

You need to buffer the input in any case, why not just

    LS=$(ls)
    ./cruncher <<END
    $LS
    END
    ./muncher <<END
    $LS
    END

RexRollman · on Oct 9, 2013

This is why I visit here regularly. Even though I have been tinkering with Unix/Linux/BSD for years, I was unaware of named pipes. I might never have a need for it but it is cool to know.

Conductor, thanks for posting the link.

telephonetemp · on Oct 8, 2013

Are there any particularly clever uses for named pipes in shell scripts? I don't think I've ever seen any.

ars · on Oct 8, 2013

Example 1:

In one shell:

mplayer -input file=named_pipe

In a different shell:

echo 'pause' > named_pipe

or:

echo 'quit' > named_pipe

So you can control your music player easily. (I bind the echo script to hotkeys.)

Example 2:

mysql LOAD DATA INFILE wants a filename. So I made a named pipe and sent my data there (putting that command in the background since it will hang till the data is read). Then I use that named pipe as the filename for load data.

Example 3:

You want to send your data over ssh, but you also want to send commands over ssh (you can't do both at the same time, except in the command line which might not be enough). You could send the data over a second ssh session - but then you can't do your preliminary commands first.

So you make a named pipe on the server and send the data to it in one ssh session. Then you open a second ssh session, do your preliminary commands, then your data command using the named pipe as a filename.

(There are other ways to do this with renaming file descriptors in a shell, but a named pipe is easier, and much more flexible.)

Example 4:

You want to start a long running process that waits for data. You let it read from the named pipe, and then you can write to the named pipe from any other program as needed.

gwern · on Oct 9, 2013

I actually did exactly this exactly a few years ago (I thought I was so clever). I had a video with hardcoded subtitles, which I wanted to transcribe; I couldn't just let mplayer run because I cannot transcribe in realtime, and slowing it down would still be a problem because it would take much more time and would necessitate backtracking.

So, I set up a FIFO like described, and then I bound the key '1' in Emacs to echo 'p' to it. Voila. I could seamlessly pause and unpause the video while typing away furiously. The strategy was quite a hack, but it worked very well for my purpose.

Hello71 · on Oct 9, 2013

> You want to send your data over ssh, but you also want to send commands over ssh (you can't do both at the same time, except in the command line which might not be enough). You could send the data over a second ssh session - but then you can't do your preliminary commands first.

SSH already has a Control facility for this.

ars · on Oct 9, 2013

That won't help. How would you feed the data to stdin of a command run in a different process?

The Control facility is for multiplexing the data streams, but that solves a different problem.

nine_k · on Oct 9, 2013

The Control facility lets you reuse the connection, bit not the session. So you still need two sessions, say, one ssh and one scp.

gizmo686 · on Oct 8, 2013

I don't think named pipes work particuarly well in shell scripts. A program writing to a pipe will not terminate until the output is consumed. However, consuming the output will not happen until the program finishes and the script arrives that the program that reads from the pipe. Similarly, a program reading from the pipe will not terminate until the program writing to it does. If you start reading before you start writing, then you simply wait. Anonymous pipes solves this problem by launching all of the programs at the same time. You could work around this by telling the shell to run the command in the background (append & in bash).

Having said that, I have had occasion to use named pipes in a shell script. Basically, I had two programs and I wanted to determine which lines were in one, but not both, of their outputs. Using named pipes, I was able to do something like:

  mkfifo foo
  ./prog1 > foo &
  ./prog2 > foo &
  cat foo | sort | uniq -u

Effectively, this provides a way to combine the output of multiple commands. I took it as a matter of faith that doing this won't interleave lines. I assume that as long as both programs flush only at line breaks then there is not any problem.

bcoates · on Oct 9, 2013

Agreed, every time I've used a named pipe in a shellscript it's caused me nothing but pain.

IIRC, they're guaranteed not to interleave write()s less than PIPE_BUF (4k?) but if your program is buffering internally it might not end writes on a newline.

dllthomas · on Oct 9, 2013

You don't need named pipes for this:

{ ./prog1; ./prog2; } | sort | uniq -u

derleth · on Oct 9, 2013

Does:

    sort < foo | uniq -u

not work with named pipes?

dllthomas · on Oct 9, 2013

It works fine.

The advantage to

    cat foo | sort | uniq -u

is that it is (marginally) easier to drop in another filter in front of sort.

knz42 · on Oct 9, 2013

This argument is less strong in the face of:

  <foo  sort | uniq -u

(redirections can be at the beginning too.)

Also: sort -u instead of sort|uniq.

dllthomas · on Oct 9, 2013

It's still a reason to disprefer the form that was given. I don't see any reason to prefer the cat version over yours, except habits, though. That said, there's only weak reason to avoid the cat.

jl6 · on Oct 9, 2013

It is also simpler, in the sense that it presents a left-to-right order that visually illustrates the pipeline, without the incongruous left-facing angle bracket.

sikhnerd · on Oct 8, 2013

Only time I've used it is as a poor man's screen -x

  mkfifo file; script -f file

Other user simply does

  cat file

smlacy · on Oct 8, 2013

Sort a file in "parallel" (note quotes!) in a multicore machine (note I'm showing 2 cores but this could be generalized to more):

    mkfifo split_aa
    mkfifo split_ab
    mkfifo sorted_aa
    mkfifo sorted_ab
    split -n2 input_file split_ & 
    sort split_aa >> sorted_aa & 
    sort split_ab >> sorted_ab & 
    sort -m sorted_aa sorted_bb > final_sorted

It's left as an exercise for the reader to determine if this actually saves any time, and if this technique could be used to implement a truly parallel sort.

timClicks · on Oct 9, 2013

What about?

    sort --parallel=2 input_file

smlacy · on Oct 9, 2013

Well, yeah, that's the easy way. :)

This is mostly to illustrate a technique of "parallel processing using fifos and split input" It's almost like a miniature mapreduce running locally. See also "bashreduce" (https://github.com/erikfrey/bashreduce) which I didn't write.

cynwoody · on Oct 8, 2013

In this discussion† from yesterday, the task of finding the longest path below a directory came up. You can see both a clumsy solution, using a named pipe and iterative Python, and an elegant one-liner (in a reply), using an anonymous pipe and functional Python.

This is not to say named pipes do not have their uses. Just that you should think before reaching for them first.

†https://news.ycombinator.com/item?id=6510640

_pvxk · on Oct 9, 2013

If you have a slow-starting program, you can use named pipes as its stdin/stdout. Then write a little script that writes to the inpipe and reads from the outpipe (interleavedly, so as not to fill up the pipe buffer, I think that's about 4k or so). Now that little script starts up "instantly" compared to the original slow-starting program. http://mywiki.wooledge.org/NamedPipes shows an outline of it.

http://mywiki.wooledge.org/ProcessManagement#I_want_to_proce.... is another neat trick: when a background process ends, make it echo to some pipe, some other "controller" process is continually reading from that pipe. Since the read is blocking, it'll wait patiently for the echo until it does whatever it should do (maybe even based on whatever it read from the pipe). So much less hacky than sleep-polling :-)

smartaleckkill · on Oct 8, 2013

I use a FIFO to pipe alpine's new mail alert to notifyd; not sure how clever that is, but it works for me.

ars · on Oct 9, 2013

Can you post the script? I'd love to use it.

smartaleckkill · on Oct 9, 2013

Used to use my own wee C prog that just passed the alert unparsed to libnotify; currently using a modified version of this far superior shell script (which I can't take any credit for): https://github.com/Bruce-Connor/alpine-osd-notify/blob/maste... (You have to set newmail-fifo-path in your pinerc.)

christianmann · on Oct 9, 2013

Poor man's (unauthenticated!) remote shell:

    mkfifo /tmp/buf;
    cat /tmp/buf | /bin/sh -i 2>&1 | nc -lp 31337 > /tmp/buf

And then, in another terminal:

    nc 127.0.0.1 31337

binocarlos · on Oct 8, 2013

Great article! I didn't know how to make named pipes just anonymous ones.

I was recently messing around with Arduinos and Raspberry PI's - totally clueless about baud rates and Serial in general but had the bug so kept going, burned a few chips then a few more and finally did something that didn't smell of smoke : )

Then it went ding - the | in Linux was passing data one byte at a time like a Serial connection to an Arduino was. Somehow I've never looked at computers the same way since. Time to go play with some named pipes!

ars · on Oct 8, 2013

A named pipe will not change the buffer size. The way pipes work (both anonymous and named) is they send data in groups that are exactly as large as they came in.

So probably something is using the write function with just one character at a time. The standard wrapper over write (i.e. the stdio functions like puts and printf) can be set to buffer the data first then write it out.

dbbolton · on Oct 8, 2013

Is there something you can do with named pipes that you couldn't also do with a regular text file?

hamburglar · on Oct 8, 2013

A regular file has two obvious disadvantages: first, you write all the data to disk and read it back again, which costs you time and disk space. In a pipe, the data is streamed, with the OS doing buffering for you. Second, if you have one process writing to a plain file and another process reading from it, you need some way of signaling when there's data ready to be read and when the reader should wait, or when the stream has ended. A pipe provides this for you: in blocking IO mode, the reader just issues a read() and blocks until the writer writes something.

antocv · on Oct 9, 2013

What if the file is in /tmp or shared memory then it wont cost that much time and no disk space/access?

The blocking/signalling is still missing though.

hamburglar · on Oct 9, 2013

I don't understand why you'd do it, but sure, if you're determined to avoid pipes even in situations where they'd be really well suited to the problem, a scheme using files on a RAM-based filesystem and roll-your-own signaling/blocking/buffering could conceivably be made approximately equivalent, performance-wise, to a pipe.

Out of curiosity, is there some good reason you'd do this instead of just using a mechanism that's specifically made to solve this problem?

valleyer · on Oct 9, 2013

Well, for one thing, on some systems (e.g., OS X), /tmp is disk-backed.

But also, by writing out to /tmp on memory-backed systems, the lack of blocking means you grow memory use potentially indefinitely if the reader is slow or delayed for some reason. That will ultimately turn into swapping, which is just disk access again.

There's also no great way to truncate the beginning of a regular file.

jholman · on Oct 10, 2013

The blocking/signalling is missing?

What's wrong with

$ ./producer > my_rambacked_file & $ tail -f my_rambacked_file > ./consumer

Well, okay, I guess this way you don't get a signal that the producer is finished.

darkarmani · on Oct 9, 2013

It's hard to have multiple processes logging to the same file, but using pipes Posix defines writes shorter than PIPE_BUF to be atomic. On linux this is 1024 Bytes. So if log less than 1024 bytes per message from multiple processes, you can ensure they won't scramble their messages.

http://manpages.courier-mta.org/htmlman7/pipe.7.html

> POSIX.1-2001 says that write(2)s of less than PIPE_BUF bytes must be atomic: the output data is written to the pipe as a contiguous sequence. Writes of more than PIPE_BUF bytes may be nonatomic: the kernel may interleave the data with data written by other processes. POSIX.1-2001 requires PIPE_BUF to be at least 512 bytes. (On Linux, PIPE_BUF is 4096 bytes.) The precise semantics depend on whether the file descriptor is nonblocking (O_NONBLOCK), whether there are multiple writers to the pipe, and on n, the number of bytes to be written.

nly · on Oct 9, 2013

A pipe is really just a memory buffer. When it gets full, the writing application will block and wait for the reading one to drain it. So in some respects they are primarily IPC and synchronisation mechanisms.

gizmo686 · on Oct 8, 2013

The main benefit is that it avoids needing to write to the disk (which can also be avoided by making the file on a ramdrive). In general, it behaves the same way that a normal file behaves, in the sense that you read what was written, in the order it was written.

The only difference in behavior I can think of is timing. Generally, when you read a text file, you see what is written at that moment in time (barring race conditions). With named pipes, you read what has been written, and then wait until the program writing closes the pipe. If anything gets written in the meantime, you still see it. This lets you use them for message passing in a way that normal files do not.

mohawk · on Oct 9, 2013

Besides the speed aspect which most other people have mentioned, you can avoid running out of harddisk space.

_pvxk · on Oct 9, 2013

This: http://slacy.com/blog/2008/12/on-mkfifo-and-doing-the-imposs...

vacri · on Oct 8, 2013

Avoid hitting the disk.

agumonkey · on Oct 9, 2013

I wonder how much of go (et al.) channels abstractions could be done this way.

nine_k · on Oct 9, 2013

In Go you can control the length of the channel queue, and chan create a zero-buffering channel which blocks the writer until a reader is available. Unfortunately mkfifo does not seem to allow this at all.

sleepydog · on Oct 9, 2013

fifos by themselves aren't useful if you're trying to do multi-process signaling with them, as input from multiple programs is not separated or delimited in any way. fifodirs are a nice signalling mechanism built on top of fifos:

http://www.skarnet.org/software/s6/fifodir.html

luikore · on Oct 9, 2013

Is there a stack-like mkfilo ?