Pipexec – Handling pipe of commands like a single command

qazxcvbnm · 2024-03-10T04:06:00 1710043560

It seems to me that this can be achieved by the following bash-native way of creating extra file descriptor pipes:

  pipe_path="$(mktemp -u)"
  mkfifo "$pipe_path"
  exec 3<>"$pipe_path"
  rm -f "$pipe_path"

Here, exec associates the file descriptor (3 here, replace with any desired descriptor) with the pipe created by mkfifo. The filesystem path to the pipe is removed immediately after we obtain a file descriptor to it, so that the the only remaining reference to the pipe in the system would be from this script, and thus when the script dies, the kernel will automatically free the pipe.

An example use case would be like so: https://unix.stackexchange.com/a/216475/585293

pxeger1 · 2024-03-10T08:17:10 1710058630

Worth noting that POSIX sh only specifies that file descriptors up to 9 be supported, and many shells stick with this minimum.

junon · 2024-03-10T13:13:12 1710076392

Maximum, you mean? Or am I misunderstanding you? Written a lot of shell scripts in my days and somehow this fact escaped me. Thanks!

js2 · 2024-03-10T16:31:37 1710088297

Rephrased for clarity: POSIX specifies at least (a minimum of) 9. POSIX does not specify a maximum. Shells may provide more than 9. From the spec:

> Open files are represented by decimal numbers starting with zero. The largest possible value is implementation-defined; however, all implementations shall support at least 0 to 9, inclusive, for use by the application.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V...

And from the bash manual:

> Redirections using file descriptors greater than 9 should be used with care, as they may conflict with file descriptors the shell uses internally.

https://www.gnu.org/software/bash/manual/bash.html#Redirecti...

Despite writing shell scripts for nearly three decades, I also was unaware of what POSIX had to say but I can't recall ever needing more than a couple extra FDs at most.

junon · 2024-03-10T20:52:04 1710103924

Ah, thanks! Makes more sense :)

chubot · 2024-03-10T05:20:37 1710048037

Hm interesting, also see dgsh, the directed graph shell

https://www2.dmst.aueb.gr/dds/sw/dgsh/

https://github.com/dspinellis/dgsh

https://news.ycombinator.com/item?id=21700014

dgsh uses Unix domain sockets, not pipes. I don't remember exactly why, but it's in the paper, perhaps to avoid deadlocks compared to pipes.

I'd also be interested in some more examples with pipexec or dgsh!

JNRowe · 2024-03-10T06:41:26 1710052886

Yep deadlocks, you're correct. From the paper:

"Modern Unix systems offer named pipes, also known as fifos, which can be used to hand-craft arbitrary process communication topologies. However, if combined one-to-many and many-to-one piping are setup by using named pipes, another problem will occur. Due to the limited buffering offered by typical programs, deadlocks can easily occur when a process consuming data from many producers with more than one input, blocks waiting for input from one of the processes feeding it. This can cause a second feeding process to block, waiting to send its output to another one of the consumer process’s inputs, and, thereby, blocking the upstream process feeding both processes that provide data to the consumer one."

dspinellis has commented on another dgsh discussion¹(along with you). Interestingly, with a light comparison to pipexec².

I stumbled upon pipexec trying to find a battle tested solution to extend a data munging task where I was relying on zsh's multios³, mostly because orchestrating the interactions with a coproc'd jq for output were fighting me. There is something both frustrating and soothing about finding a seven year old comment pointing out why my path was doomed before I'd even started; people have solved the problem already, plus people far smarter than me also found the trap.

¹ https://news.ycombinator.com/item?id=13352659

² https://news.ycombinator.com/item?id=13358090

³ https://zsh.sourceforge.io/Doc/Release/Redirection.html#Mult...

chubot · 2024-03-10T16:24:17 1710087857

Oh wow, I had stumbled on the zsh behavior of

    echo hi > *.txt  # writes ALL of the files!

No other shell does that.

But I didn't know it was called MULTIOS until now. (I guess that's read "mult I/O's"? I have a hard time not reading it as "multi-OS" :) )

It seems a bit niche to be honest, but it's possible to support in Oils.

---

Oils also uses Unix domain sockets already for the headless shell protocol

https://github.com/oilshell/oil/wiki/Headless-Mode

We could do something like dgsh, but so far I haven't seen a lot of uptake / demand. Every time it's mentioned, somebody kinda wants it, and then it kinda peters out again ... still possible though.

I think flat files work fine for a lot of use cases, and once you add streaming, you also want monitoring, more control over backpressure/queue sizes, etc.

JNRowe · 2024-03-10T20:32:20 1710102740

I use mutlios and even I'm not that attached to it. The majority of my use is combined with process substitution, and could be replaced with common-ish tools like pee¹(or pipexec for more complex cases). The only occasion when I'm thankful for it is if I want to use a shell function as a target, but there are workarounds for that too.

As a noclobber user the footgun is largely hidden to me, but I feel its presence. multios without globbing support would be less useful, but would still work for most of my use cases. Scanning my shell history I see various cases of relying on zsh's ability to apply sorting and filtering to globs with multios' input redirection, but only a couple where I want that in output redirection. The input instances could easily be rewritten using cat and globbing.

Even with multios unset the behaviour is different between zsh and bash. For example, nomultios disables all the expansion, so zsh behaves like more like dash with ': >t{1,2}' creating a file instead producing an error like bash does.

[FWIW, I google'd multiios to link the option in mt original comment. It really feels like it needs double-i, and I read the single i name the same way you do.]

---

I'd be one of those people whose desire for dgsh-like functionality wanes. If it was slight DSL that I could "upgrade" pipelines to I'd probably use it, but not enough to warrant working on it or switching other tooling to support it.

The end of result of this morning's pipeline was breaking my jobs up, and applying some judicious use of nq² to keep track of it. I'd follow your advice and move on to more specialist tools if the job grew significantly or if it became a regular occurrence.

¹ https://joeyh.name/code/moreutils/

² https://git.vuxu.org/nq/about/

koolba · 2024-03-10T03:30:59 1710041459

This is neat, but outside of a contrived ouroboros example, what’s a real world use case for this?

There’s a natural flow of outputs becoming inputs and I’m struggling to identify a situation where I would feed things back into the source. Also, named pipes kind of solve that already.

LatticeAnimal · 2024-03-10T04:14:50 1710044090

Agreed -- their only example is `pipexec -l 2 -- [ LS /bin/ls -l ] [ GREP /bin/grep LIC ] '{LS:1>GREP:0}'` which appears to be `ls | grep LIC` with more steps. Seems like a (cool) solution without a real problem.

(I'd love to be wrong though and see a real use case for some cool feedback loop of commands)

CGamesPlay · 2024-03-10T06:14:06 1710051246

I recently wanted this for some scripting over SSH. I basically want to run a script on a remote machine and read it back, but implement it as a function instead of a wrapper around SSH.

Chatbots, where the bot only needs to be line-driven and you can connect it to any CLI chat interface. Or perhaps, run your AI agent attached to a shell, and have it treat standard IO as a shell session.

geon · 2024-03-10T09:22:18 1710062538

Somewhat related: https://github.com/joewalnes/websocketd

> websocketd is a small command-line tool that will wrap an existing command-line interface program, and allow it to be accessed via a WebSocket.

lubutu · 2024-03-10T08:11:13 1710058273

I suppose such feedback could be used for reaching a fixpoint. Suppose you have a build system that reads targets to be built from stdin and outputs to stdout targets that are dependent on that target and must now be rebuilt. With an ouroboros, the build system will continue to run, even if the dependency graph is dynamically cyclical, until the fixpoint is reached and the build terminates.

eichin · 2024-03-10T06:03:23 1710050603

It looks like it might be able to handle one of the tricks I do with dpipe running sshfs ( https://gist.github.com/bdmorin/5eb17828612e7d1b66e92550f428... is probably where I picked it up from) but I'm not sure (and dpipe already handles it fine.)

geon · 2024-03-10T09:15:17 1710062117

It can be used as client/server communication locally. I’ve done that (without pipexec) for bots in a multiplayer game. That way I could implement the bot ai independently and test them against each other.

8n4vidtmkvmk · 2024-03-10T09:29:20 1710062960

I did pretty much this exact same thing too. The game runner spawns 2 bots and acts as the middleman/, piping their output to each other. I did it in node though. I wouldn't want to code this in bash.

sebosp · 2024-03-10T05:42:54 1710049374

This reminds me of MIT's open courseware, 601 SC, unit 1 with state machines, going all the way to build Fibonacci with them without recursion, tbh the moment the teacher translated the state machine to bounds to electrical circuit I felt it was a leap and I couldn't quite understand their relationship, maybe I missed a requirement course. I tried to express that course in Rust as one of my first projects learning the language here https://github.com/sebosp/rustexercises/blob/develop/ocw601s... and I think a similar iteration in the direction of this project would be to build the dependencies as drag-and-drop boxes over the browser (maybe egui) and connect the state machines by clicks, maybe download the generated code as either bash or compilable rust code, you know, for kids.

gorgoiler · 2024-03-10T10:27:00 1710066420

What a beautifully designed tool. In our Python codebase we end up reaching for inline sh scripting a lot whenever we need to pipe between processes. In a way it feels ok — after all, no one has any qualms about reaching for inline SQL to get things done, so what’s wrong with a little shell script in the middle of a Python module?

Just as there are efforts — both wise and misguided — to represent the building of an SQL query with Python syntax, what Python tools are there to build sh pipelines between processes with a more pythonic syntax? Do they provide value in excess of the novelty tax one has to pay for using a non standard library?

erisinger · 2024-03-10T12:40:02 1710074402

I used to use plumbum for Pythonic(ish) shell scripting. It was great for a specific use case.

diekhans · 2024-03-10T19:26:17 1710098777

It's an interesting package, and I have some of the use cases it appears to address. However, the documentation is inadequate to quickly understand how to robustly build some of the more complex cases. In particular, how to build bash-style process substitution. Robust here is the pipeline exits non-zero if any of the substituted processes fail, as demonstrated by this example:

    #!/bin/bash
    set -beEu -o pipefail
    cat <(date) <(false)
    echo did not exit non-zero

If this is addressed, it would be worth more time to figure out Pipexec.

agumonkey · 2024-03-10T14:13:01 1710079981

Considering how airflow/dagster are trendy these days, concurrency too.. I assume a leaner, os/language agnostic solution for this problem might emerge not too far in the future.

wwalexander · 2024-03-10T06:35:29 1710052529

In the category of “command line representations of graphs” see also ffmpeg’s filtergraphs [1].

[1] https://ffmpeg.org/ffmpeg-filters.html#Filtering-Introductio...

notpushkin · 2024-03-10T07:58:24 1710057504

First thing I thought about when seeing the Pipexec's syntax, too.

pipeline_peak · 2024-03-10T04:06:19 1710043579

Isn’t this exactly what shell scripting is for?

mmgutz · 2024-03-10T19:01:06 1710097266

Wasn't understanding the graphics in the README until I used light mode.

k3vinw · 2024-03-10T13:47:01 1710078421

It would be cool to have a tool similar to this, but for composing a graph of commands similar to aws step functions (or Jenkins pipelines). I’d call it mapexec :)

DebtDeflation · 2024-03-10T11:13:16 1710069196

Read the first word of the title and assumed this would be about some new bot/method for automating layoffs.

soygem · 2024-03-10T08:07:00 1710058020

42 pipex, we meet again

snthpy · 2024-03-10T05:33:15 1710048795

This is great, thanks!

keithalewis · 2024-03-10T06:46:43 1710053203

> Nobody will tell you:

> stdin, stdout and stderr are artificial definitions.

Unless you RTFM.

bravetraveler · 2024-03-10T19:53:23 1710100403

Yea

    ~ $ man stdin | grep stdin,
           stdin, stdout, stderr - standard I/O streams
           The input stream is referred to as "standard input"; the output stream is referred to as "standard output"; and the error stream 
           is referred to as "standard error".  These terms are abbreviated to form the symbols used to refer to these files, namely stdin, stdout, and stderr.
           On program startup, the integer file descriptors associated with the streams stdin, stdout, and stderr are 0, 1, and 2, respectively.

This alignment is, indeed, very much deliberate. Take a peek at this:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/s... (also, C11)

They are optional - emphasis mine:

> three streams shall be predefined and need not be opened explicitly

They're artificial in the sense that I don't have to follow laws/rules/customs, I guess. Interesting things may or may not happen if I don't.

In the same way, interesting things happen if I wire stdout to stderr.

keithalewis · 2024-03-11T18:59:58 1710183598

So you read the manual and somebody told you.

bravetraveler · 2024-03-11T20:57:29 1710190649

... yea I thought that was reasonably obvious. I'm posting more about the 'artificial' claim. If we had to pick sides, I'm with you.

The manuals and their tales are not ever-present or absolute, while still true.

Those descriptors do exist. They do useful things. However, there's also no significant consequence to ignoring them.

Abusing them is questionable, of course - but not guaranteed to go poorly

keithalewis · 2024-03-13T02:22:03 1710296523

No need to pick sides. Manuals are a best effort by people writing code to communicate their intent. RTFM is an expression of frustration that people don't pay attention to their efforts before spouting off.