I find sponge really useful. Have you ever wanted to process a file through a pipeline and write the output back to the file? If you do it the naive way, it won’t work, and you’ll end up losing the contents of your file:
The shell will truncate myfile.txt before awk (or whatever command) has a chance to read it. So you can use sponge instead, which waits until it reads EOF to truncate/write the file.
"Have you ever wanted to process a file through a pipeline and write the output back to the file?"
Personally, no. I prefer to output text processsing results to a different file, maybe a temporary one, then replace the first file with the second file after looking at the second file.
With this sponge program I cannot check the second file before replacing the first one. The only reason I can think of not to use a second file is if I did not have enough space for the second file. In that case, I would use a fifo as the second file. (I would still look at the fifo first before overwriting the first file.)
You’re implying that all uses of `sponge` are interactive with a user actively involved.
I used sponge a lot in scripts where I’ve already checked and validated its behavior. When I’m confident it works I can keep just use sponge for simplicity.
Perhaps the backups are a separate process, run beforehand or after this process once several changes (perhaps to several files) have been made, so not backing up intermediate states?
Backups should be handled by an external process, otherwise, your script also needs to take into account:
- Number of backups to retain and cleaning up old versions
- Monitoring of backups. What if backing up fails?
- The script itself can tamper with the backups unintentionally?
When you do, just make sure you use the `--directory` option of `mktemp` to use the same directory as the final file. This makes sure both files are on the same filesystem, so `mv` is atomic (and faster).
I can see how this is handy, but it's also dangerous and likely to bite you in the ass more than not. I think sponge is great, but I think your example is dangerous. If you make a mistake along the way, unless it's a fatal error, you're going to lose your source data. Typo a grep and you're screwed.
The technique I found works well is to edit the file in vim and use the !G (process lines 1-N in shell) (or use emacs in a similar way). Gives you infinite undo and redo until you get the commands right. Then you can view the history and make a shell script like this using sponge.
For example, edit a file, go to line 1 and type:
!Gsort
File is run though sort and results replace the buffer. To undo, use ‘u’, to redo, use CTRL-r.
! is also a normal mode command/operator. It accepts a motion and then drops you into the command line with :{range}! pre-filled, where {range} is the range of lines covered by the motion. !G in normal mode is exactly equivalent to :.,$!
It's OK, I've been a "serious" Vim user for ~7 years and I just learned about it this year. It's such an enormous program with so much functionality that is hard to fault somebody for missing any individual piece of it.
What I find weird is that there's no analogous normal mode operator command for dropping into a command line without the ! prefix. It's easy enough to write your own (good excuse to learn about "operator pending mode"), but I often find myself scratching my head at what made it into the builtin commands and what did not.
That's kind of how I use git. I would never use "sponge" or "sed -i" outside of a git repo or with files that haven't been checked in already.
I agree it would be nice to have this in the filesystem; some filesystems support this (e.g. NILFS[1]), but none of the current "mainstream" ones do AFAIK. In the meanwhile, git works well enough.
This is not a file system I would be interested in. If you’ve ever snooped in on fs activity it is constant and overwhelming on even an average system. IDEs can have undo, vi and emacs have undo. As others in the thread have said, just use multiple files.
Personally I’d be interested in a shell having undo capability, but not a file system.
> This is not a file system I would be interested in. If you’ve ever snooped in on fs activity it is constant and overwhelming on even an average system.
I'm not sure how these sentences are connected. Are you implying that allowing undo would make those problems significantly worse? I'm not sure of that. If you have a CoW filesystem, which you probably want for other reasons, then having a continuous snapshot mode for recent activity would not need much overhead.
If you're saying there's too much activity to allow an undo, well, I assume the undo would be scoped to specific files or directories!
Right. You should always test the command first. If the data is critical, use a temporary file instead. I usually use this in scripts so I don’t have to deal with cleanup.
Couldn't `mv` or `cp` from the temp file to `/etc/passwd` be interrupted as well? I think the only way to do it atomically is a temporary file on the same filesystem as `/etc`, followed by a rename. On most systems `/tmp` will be a different filesystem from `/etc`.
mv can't, or, more correctly the rename system call can not.
rename is an atomic operation from any modern filesystem's perspective, you're not writing new data, you're simply changing the name of the existing file, it either succeeds or fails.
Keep in mind that if you're doing this, mv (the command line tool) as opposed to the `rename` system call, falls back to copying if the source and destination files are on different filesystems since you can not really mv a file across filesystems!
In order to have truly atomic writes you need to:
open a new file on the same filesystem as your destination file
write contents
call fsync
call rename
call sync (if you care about the file rename itself never being reverted).
Without the fsync() before rename(), on system crash, you can end up with the rename having been executed but the data of the new file not yet written to stable storage, losing the data.
ext4 on Linux (since 2009) special-cases rename() when overwriting an existing file so that it works safely even without fsync() (https://lwn.net/Articles/322823/), but that is not guaranteed by all other implementations and filesystems.
The sync() at the end is indeed not needed for the atomicity, it just allows you to know that after its completion the rename will not "roll back" anymore on a crash. IIRC you can also use fsync() on the parent directory to achieve this, avoiding sync() / syncfs().
And yes, the code you highlighted is exactly this special-case in its current form. The mount option "noauto_da_alloc" can be used to disable these software-not-calling-fsync safety features.
I'd like to know why as well. The inclusion of the fsync before the rename implies to me that the filesystem isn't expected to preserve order between write and rename. It could commit a rename before committing _past_ writes, which could leave your /etc/passwd broken after an outage at a certain time. I can't tell whether that's the case or not from cursory googling (everybody just talks about read-after-write consistency). Maybe it varies by filesystem?
The final sync is just there for durability, not atomicity, like you say.
You can use `/etc/passwd.new` as a temporary file to avoid the problems you mentioned. In the worst case, you'll have an orphaned passwd.new file, but /etc/passwd is guaranteed to remain intact.
"Responsibly" is subjective here. I could argue that responsible thing to do is to use as little resources as possible, and in that case, directly overwriting the file would be the "responsible" thing to do.
> I could argue that responsible thing to do is to use as little resources as possible
No, you couldn't, because a sponge is intentionally using more resources: It soaks up as much want as it can. And the program is intended to soak up all of the output. Otherwise it would be `cat`.
This is why I usually just use a temporary directory and do a quick
git init .
git add .
git commit -m "wip"
... and proceed from there. So many ways to screw up ad hoc data processing using shell and the above can be a life saver. (Along with committing along the way, ofc.)
EDIT: Doesn't work if you have huuuuge files, obviously... but you should perhaps be using different tools for that anyway.
I guess if you want something single-file that resembles git (now thinking better, not sure if a requirement at all), you can also try Fossil ( https://www2.fossil-scm.org ).
You cannot use `sudo cat` to open a file with root privileges, because `sudo cat > foo` means "open file foo with your current privileges, then run `sudo cat` passing the file to it", and the whole root thing only happens after you already tried and failed to open the file.
I knew a person who would give something like this as a sysadmin / devops interview question. It was framed as "'sudo cat >/etc/foo' is giving a permission denied error! what's wrong?" Usually the interview candidate would go off on a tangent...
Cat won’t write the file. If you mean `command | sudo cat > file.txt`, that won’t work because the redirection is still happening in the non-root shell. You could do `command | sudo sh -c "cat > file.txt"` but that’s rather verbose.
Are there any legitimate reasons to have a particular file both as an input to a pipe and as an output? I wonder whether a shell could automatically “sponge” the pipe’s output if it detected that happening.
The shell can't detect that unless the file is also used as input via `<`. So it couldn't do that in HellsMaddy's example, since the filename is given as an arg to awk instead of being connected to its stdin.
Yeah, it seems like the kind of command that you only need because of a quirk in how the underlying system happens to work. Not something that should pollute the logic of the command, imo. I would expect a copy-on-write filesystem to be able to do this automatically for free.
> I would expect a copy-on-write filesystem to be able to do this automatically for free.
this is an artifact of how handles work (in relation to concurrency), not the filesystem.
copy-on-write still guarantees a consistent view of the data, so if you write on one handle you're going to clobber the data on the other, because that's what's in the file.
what you really want is an operator which says "I want this handle to point to the original snapshot of this data even if it's changed in the meantime", which a CoW filesystem could do, but you'd need some additional semantics here (different access-mode flag?) which isn't trivially granted just by using a CoW filesystem underneath.
Do people really use copy-on-write filesystems though? I mean it'd be great if that were a default, but I rarely encounter them, and when I do, it's only because someone intentionally set it up that way. In 30+ years of using Unix systems, I can't even definitively recall one of them having a copy-on-write filesystem in place. Which is insane considering I used VAX/VMS systems before that and it was standard there.
sed --in-place has one file as both input and output. It's not really different from any pipe of commands where the input and output files are the same. But sed also makes a copy of the file before overwriting it - per default.
When I found `sponge`, I couldn't help but wonder where it had been all of my life. It's nice to be able to modify an in-place file with something other than `sed`.
I usually used tac | tac for a stupid way of pausing in a pipeline. Though it doesn’t work in this case. A typical use is if you want to watch the output of something slow-to-run changing but watch doesn’t work for some reason, eg:
while : ; do
( tput reset
run-slow-command ) | tac | tac
done
`cat` will begin output immediately. `tac` buffers the entire input in order to reverse the line order before printing it. Piping through tac ensures EOF is reached, while piping through tac a second time puts the lines back in order
'sort -o file' does the same thing but I like the generic 'sponge'. Not sure why it's a binary, since it's basically this shell fragment (if you don't bother checking for errors etc):
(cat > $OUT.tmp; mv -f $OUT.tmp $OUT)
Hmmm ... "When possible, sponge creates or updates the output file atomically by renaming a temp file into place. (This cannot be done if TMPDIR is not in the same filesystem.)"
My shell fragment already beats sponge on this feature!
It would be nice to update to use anonymous files where supported (Linux does). This allows you to open an unnamed file in any directory so that you can do exactly this, write to it then "rename" it over another file atomically.
This was such a footgun! This may be fairly intuitive if you know how shell redirection is implemented. But hard to think of that during the time you write a command.
Ah yes I glanced too quickly over the surface here. It does look more like redirection. Will have to look at it more. Appreciate your helpful response vs the downvoters and the one unhelpful/snarky response.
The downvoting here is the equivalent of getting shamed for “asking a stupid question at work.”
Yes I should have done a bit more homework but shaming for asking a clarifying question is unreasonable. Those of you who have the downvote trigger-finger can and should do better.
I wouldn't overinterpret the downvotes - it's impossible to know what people were thinking, and the mind tends to arrive at the most irritating, annoying, or hurtful explanation.
The same principle works the other way too - when a comment doesn't contain much information, readers tend to interpret it according to whatever they personally find the most irritating, annoying, or hurtful, and then react to that. Our minds are not our best friends this way.
The (partial) solution to this is to include enough disambiguating information in your comment. For example if your comment had contained enough information to make clear that your question was genuinely curious rather than snarkily dismissive, I doubt it would have gotten downvoted.
It's hard to do that because generally our intention is so clear and transparent to ourselves that it doesn't occur to us to include it in the message. Unfortunately for all of us on the internet, however, intent doesn't communicate itself.
I agree with you, the downvotes are unnecessary, it was actually a good question.
tee actually does sorta work for this sometimes, but it’s not guaranteed to wait until EOF. For example I tested with a 10 line file where I ran `sort -u file.txt | tee file.txt` and it worked fine. But I then tried a large json file `jq . large.json | tee large.json` and the file was truncated before jq finished reading it.
It's useful when you want to edit some input text before passing it to a different function. For example, if I want to delete many of my git branches (but not all my git branches):
$ git branch | vipe | xargs git branch -D
`vipe` will let me remove some of the branch names from the list, before they get passed to the delete command.
I mostly rely on fzf for stuff like this nowadays.
You can replace vipe with fzf —multi for example and get non-inverted behavior with fuzzy search.
More to it, not in a pipe (because of poor ^C behavior), but using a hokey in zsh to bring up git branch | fzf, select any number of branches I need and put them on command line, this is extremely composable.
I vaguely recall in ~2010 coming across a Plan 9 manpage(?) that seemed to imply that Emacs could work that way in a pipe (in reference to complex/bloated tools on non-Plan 9 systems), but that wasn't true of any version of Emacs I'd ever used.
And what if you decide mid-vipe "oh crap I don't want to do this anymore"? In the case of branches to delete you could just delete every line. In other cases maybe not?
Actually, it doesn’t. ZQ is the same as :q! which quits without saving with a 0 exit code. So all of your git branches get deleted in this example, since you left the file as it was. You definitely want :cq here.
To be fair, that scenario would be just as bad or worse without vipe.
Also, you can construct your pipeline so that a blank file (or some sentinel value) returned from vipe means “abort”. A good example of this is when git opens your editor for interactive merging — deleting all lines cancels the whole thing.
Yeah you could just as well say "oh crap I didn't mean to do that" after finishing a non-interactive command. However, at least knowing my own lazy tendencies, I could imagine feeling comfortable hitting <enter> on this command without a final careful review, because part of me thinks that I can still back out, since the interactive part isn't finished yet.
But maybe not. I haven't tried it yet (and it does seem really useful).
It will depend on the commands in question. The entire unix pipeline is instantiated in parallel, so the commands following vipe will already be running and waiting on stdin.
You could kill them before exiting the editor, if that's what you want. Or you could do something else.
The other commands in the pipeline are run by the parent shell, not vipe, so handling this would not be vipe specific.
2. Fix my text editor to recognize URLs, and when clicking on the URL, open a browser on it. This silly little thing is amazingly useful. I used to keep bookmarks in an html file which I would bring up in a browser and then click on the bookmarks. It's so much easier to just put them in a plain text file as plain text. I also use it for source code, for example the header for code files starts with:
/*
* Takes a token stream from the lexer, and parses it into an abstract syntax tree.
*
* Specification: $(LINK2 https://dlang.org/spec/grammar.html, D Grammar)
*
* Copyright: Copyright (C) 1999-2020 by The D Language Foundation, All Rights Reserved
* Authors: $(LINK2 http://www.digitalmars.com, Walter Bright)
* License: $(LINK2 http://www.boost.org/LICENSE_1_0.txt, Boost License 1.0)
* Source: $(LINK2 https://github.com/dlang/dmd/blob/master/src/dmd/parse.d, _parse.d)
* Documentation: https://dlang.org/phobos/dmd_parse.html
* Coverage: https://codecov.io/gh/dlang/dmd/src/master/src/dmd/parse.d
*/
and I'll also use URLs in the source code to reference the spec on what the code is implementing, and to refer to closed bug reports that the code fixes.
P.S. I mentioned having links in the source code to the part of the spec. The only problem with this is when the spec (i.e. the C11 Standard) is not in html form. I can only add the paragraph number in the code. What an annoying waste of time every time I want to check that the implementation is exactly right.
which is a godsend to me. Now, in the dmd code generator, I put in links to the detail page for an instruction when the code generator is generating that instruction. Oh, how marvelous that is! And there is joy in Mudville.
Intel actually lags behind the industry in that they don't really have a formal specification. Arm have a machine readable specification that can be verified by a computer whereas Intel have this weird pseudocode.
Also uops.info is a good reference for how fast the instructions are
ts timestamps each line of the input, which I've found convenient for ad-hoc first-pass profiling in combination with verbose print statements in code: the timestamps make it easy to see where long delays occur.
errno is a great reference tool, to look up error numbers by name or error names by number. I use this for two purposes. First, for debugging again, when you get an errno numerically and want to know which one it was. And second, to search for the right errno code to return, in the list shown by errno -l.
And finally, vipe is convenient for quick one-off pipelines, where you know you need to tweak the input at one point in the pipeline but you know the nature of the tweak would take less time to do in an editor than to write the appropriate tool invocation to do it.
Its a shame errno is even needed, and is one of my pet peeves about linux.
On linux, errno.h is fragmented across several places because errnos are different on different architectures. I think this started because when Linus did the initial alpha port, he bootstrapped Linux using a DEC OSF/1 userland, which meant that he had to use the DEC OSF/1 BSD-derived values of errno rather than the native linux ones so that they would run properly. I'm not sure why this wasn't cleaned up before it made it into the linux API on alpha.
At least on FreeBSD, determining what errno means what is just a grep in /usr/include/sys/errno.h. And it uses different errnos for different ABIs (eg, linux binaries get their normal errnos, not the FreeBSD ones).
That is not entirely true. There is sysexits.h, which tried to standardize some exit codes (originally meant for mail filters). It is used by, for instance, argp_parse().
Some programs do follow a specification (sysexits.h) where numbers 64 and higher are used for common uses, such as usage error, software error, file format error, etc.
That’s not an exit code; that is a fake exit code which your shell made up, since your shell has no other way to tell you that a process did not exit at all, but was killed by a signal (9, i.e. SIGKILL). This is, again, not an actual exit code, since the process did not actually exit.
See here for why you can’t use an exit status of 128+signal to fake a “killed by signal” state, either:
Today the tagline is that moreutils is a "collection of the unix tools that nobody thought to write long ago when unix was young", but the original[1] tagline was that it was a "collection of the unix tools that nobody thought to write thirty years ago".
Well, Joey started moreutils in 2006, so it's more than half way to that original "30 years ago" threshold!
The pasteboard, like many things in OS X, is from NextStep. As for why they called it a pasteboard and not a clipboard, I have no idea, presumably someone thought it would be more descriptive.
I was prepared to mock this before I even clicked, but I have to say this looks like a nice set of tools that follow the ancient Unix philosophy of "do one thing, play nice in a pipeline, stfu if you have nothing useful to say". Bookmarking this to peer at until I internalize the apps. There's even an Ubuntu package for them.
I don't think it's a good idea to rely on any of these being present. If you write a shell script to share and expect them to be there, you aren't being friendly to others, but for interactive command line use, I'm happy to adopt new tools.
> Bookmarking this to peer at until I internalize the apps. There's even
an Ubuntu package for them.
Ditto. But I will probably forget they exist and go do the same old
silly kludges with subshells and redirections. May I ask if anyone has
a technique for introducing new CL tools into their everyday workflow?
It helps if things have good man, apropos and help responses, but the
problem is not how new tools function, rather remembering that they
exist at all.
Sometimes I think I want a kind of terminal "clippy" that says:
"Looks like you're trying to match using regular expressions, would you
like me to fzf that for you?"
My two cents on this is that if you do something enough that one of these tools is a good tool for it, it’ll quickly become a habit to use the tool. And if there’s a rare case where one of these tools would have been useful but you forgot it existed, you’re probably not wasting too much time using a hackier solution.
That being said, I’ve been meaning to add a Linux and Mac install.sh script to my dotfiles repo for installing all my CLI tools, and that could probably serve as a good reminder of all the tools you’ve come across over the years that might provide some value.
> May I ask if anyone has a technique for introducing new CL tools into their everyday workflow?
Pick one tool a year. Write it on a sticky note on your monitor. Every time you're doing something on the command line, ask yourself "would $TOOL be useful here?".
You're not going to have high throughput on learning new tools this way, but it'll be pretty effective for the tools you do learn.
When you learn about a tool, actually think about a specific situation/use case where you would actually use it. Try it out as soon as you learn about it. I find I can only do one tool at a time. You remember it better if it’s something you would use frequently at the moment. If I get a big old list of useful stuff, I’d be lucky to actually incorporate more than 2 of them. At a time anyway. If you don’t remember the tool, it’s probably because you wouldn’t use it enough to retain knowledge of it..
I like this one. Thanks. Mentally binding a new tool name specifically
to one regular task sounds like a good entry point for a repeated
learning framework.
>I don't think it's a good idea to rely on any of these being present. If you write a shell script to share and expect them to be there, you aren't being friendly to others, but for interactive command line use, I'm happy to adopt new tools.
Isn't that a shame though? Where does it say in the UNIX philosophy that the canon should be closed?
It's not that different than refraining from using non-POSIX syntax in shell scripts that are meant to be independent of a specific flavour of unix, or sticking with standard C rather than making assumptions that are only valid in one compiler.
There are shades of grey, of course. Bash is probably ubiquitous enough that it may not be a big issue if a script that's meant to be universal depends on it, as long as the script explicitly specifies bash in the shebang. Sometimes some particular functionality is not technically part of a standard but is widely enough supported in practice. Sometimes the standards (either formal or de facto) are expanded to include new functionality, and that's of course totally fine, but it's not likely to be a very quick process because there are almost certainly going to be differing opinions on what should be part of the core and what shouldn't.
Either way, sometimes you want to write for the lowest common denominator, and moreutils certainly aren't common enough that they could be considered part of that.
I think it's fine if you're on a dev team that decides to include these tools in its shared toolkit, but none of these rise to the level that I think warrants them being a dependency for a broadly-distributed shell script. There are slightly-less-terse alternatives for most of the functionality that only rely on core utilities. I don't think it's being a good citizen to say "go install moreutils and its dozen components because I wanted to use sponge instead of >output.txt".
The "What's included" section direly needs either clearer/longer descriptions, or at least links to the tools' own pages (if they have them) where their use case and usage is explained. I've understood a lot more about (some of) the tools from the comments here than from the page - and I'd likely have skipped over these very useful tools if not for these comments!
Ok, longer descriptions from the tools' man pages:
---
chronic runs a command, and arranges for its standard out and standard error to only be displayed if the command fails
(exits nonzero or crashes). If the command succeeds, any extraneous output will be hidden.
A common use for chronic is for running a cron job. Rather than trying to keep the command quiet, and having to deal with
mails containing accidental output when it succeeds, and not verbose enough output when it fails, you can just run it
verbosely always, and use chronic to hide the successful output.
---
combine combines the lines in two files. Depending on the boolean operation specified, the contents will be combined in
different ways:
and Outputs lines that are in file1 if they are also present in file2.
not Outputs lines that are in file1 but not in file2.
or Outputs lines that are in file1 or file2.
xor Outputs lines that are in either file1 or file2, but not in both files.
The input files need not be sorted
---
ifdata can be used to check for the existence of a network interface, or to get information about the interface, such as
its IP address. Unlike ifconfig or ip, ifdata has simple to parse output that is designed to be easily used by a shell
script.
---
lckdo: Now that util-linux contains a similar command named flock, lckdo is deprecated, and will be removed from some future
version of moreutils.
---
mispipe:
mispipe pipes two commands together like the shell does, but unlike piping in the shell, which returns the exit status of
the last command; when using mispipe, the exit status of the first command is returned.
Note that some shells, notably bash, do offer a pipefail option, however, that option does not behave the same since it
makes a failure of any command in the pipeline be returned, not just the exit status of the first.
---
pee:
[my own description: `pee cmd1 cmd2 cmd3` takes the data from the standard input, sends copies of it to the commands cmd1, cmd2, and cmd3 (as their stdin), aggregates their outputs and provides that at the standard output.]
---
sponge, ts and vipe have been described in other comments in this thread. (And I've also skipped some easier-to-understand ones like errno and isutf8 for the sake of length.)
---
zrun:
Prefixing a shell command with "zrun" causes any compressed files that are arguments of the command to be transparently
uncompressed to temp files (not pipes) and the uncompressed files fed to the command.
The following compression types are supported: gz bz2 Z xz lzma lzo
[One super cool thing the man page mentions is that if you create a link named z<programname> eg. zsed, with zrun as the link target, then when you run `zsed XYZ`, zrun will read its own program name, and execute 'zrun sed XYZ' automatically.]
> [One super cool thing the man page mentions is that if you create a link named z<programname> eg. zsed, with zrun as the link target, then when you run `zsed XYZ`, zrun will read its own program name, and execute 'zrun sed XYZ' automatically.]
> [One super cool thing the man page mentions is that if you create a link named z<programname> eg. zsed, with zrun as the link target, then when you run `zsed XYZ`, zrun will read its own program name, and execute 'zrun sed XYZ' automatically.]
They come with good manpages if you install them. I don't disagree that a little more detail on the linked web page might help people decide whether or not to install, but in keeping with how these are well-behaved oldschool-style Unix commands, they also come with man pages, which is more than I can say for most command line tools people make today (the ones that assume you have Unicode glyphs and color terminals [and don't check if they're being piped or offer --no-color] and don't accept -h/--help and countless other things that kids these days have eschewed).
Anywait is the tool I've always wanted, and implemented in pretty much the same way I would do so. However, besides waiting on a pid, I occasionally wait on `pidof` instead, to wait on multiple instances of the same process, running under different shells (wait until all build jobs are done, not just the current job).
ched also looks quite useful; automatically cleaning the old data after some time is great, as I commonly leave it lying around.
age also looks great for processing recent incoming files in a large directory (my downloads, for example)
p looks great to me, I rarely need the more advanced features of parallel, and will happily trade them for color coded outputs.
I looked to see what nup is because I don't understand the description...only to find out it doesn't actually exist. I'm assuming it's intended to send a signal to a process? But if so, why not just use `kill -s sigstop`?
pad also doesn't exist, but seems like printf or column could replace it, as these are what I usually use. I think there's also a way to pad variables in bash/zsh/etc.
whl is literally just `while do_stuff; do; done` and repeat is just `while true; do do_stuff; done`. It never even occurred to me to look for a tool to do untl; I usually just use something along the lines of `while ! do_stuff; do c=$((c + 1)); echo $c; done`. While the interval and return codes make it almost worthwhile, they themselves are still very little complexity; parsing the parameters adds more complexity than their implementation does.
spongif seems useful, but is really just a variation of the command above.
These are necessarily bash functions, not executables, but here are two tools I'm proud of, which seem similar in spirit to vidir & vipe:
# Launch $EDITOR to let you edit your env vars.
function viset() {
if [ -z "$1" ]; then
echo "USAGE: viset THE_ENV_VAR"
exit 1
else
declare -n ref=$1
f=$(mktemp)
echo ${!1} > $f
$EDITOR $f
ref=`cat $f`
export $1
fi
}
# Like viset, but breaks up the var on : first,
# then puts it back together after you're done editing.
# Defaults to editing PATH.
#
# TODO: Accept a -d/--delimiter option to use something besides :.
function vipath() {
varname="${1:-PATH}"
declare -n ref=$varname
f=$(mktemp)
echo ${!varname} | tr : "\n" > $f
$EDITOR $f
ref=`tr "\n" : < $f`
export $varname
}
Mostly I use vipath because I'm too lazy to figure out why tmux makes rvm so angry. . . .
I guess a cool addition to viset would be to accept more than one envvar, and show them on multiple lines. Maybe even let you edit your entire env if you give it zero args. Having autocomplete-on-tab for viset would be cool too. Maybe even let it interpret globs so you can say `viset AWS*`.
Btw I notice I'm not checking for an empty $EDITOR. That seems like it could be a problem somewhere.
`ts` is like `cat`, but each line gets prefixed with a timestamp of when that line was written. Useful when writing to log files, or finding which step of a program is taking the most time.
`sponge` allows a command to overwrite a file in-place; e.g. if we want to replace the contents of myfile.txt, to only keep lines containing 'hello', we might try this:
grep 'hello' < myfile.txt > myfile.txt
However, this won't work: as soon as grep finds the first matching line, the entire contents of the file will be overwritten, preventing any more from being found. The `sponge` command waits until its stdin gets closed, before writing any output:
You will want to read the man pages. 'pee' is straightforward to understand if you already understand 'tee'. 'pee' is used to pipe output from one command to two downstream commands.
I know tee but have no idea what pee would do considering this description.
From your description I guess something like « pee a b » would be the same as « a | b »? If so that’s cool, but the one line descriptions definitely need a rework.
The unix shell was designed to optimize for simple pipes. to make pipe networks you usually have to use named pipes, aka fifos.
I once had to use an IBM mainframe shell(cms If I remember correctly) which had pipes however IBM in their infinite wisdom decided to make the pipe network the primary interface, while this made complex pipes a bit less awkward, simple(single output) pipes were wordy compared to the unix equivalent.
The need for ifdata(1) has become even more acute with the essential replacement of ifconfig(1) by ip(1), an even more inscrutable memory challange. However, it would be even nicer if its default action when not given an interface name was do <whatever> for all discovered interfaces.
Do not “ip -brief address show” and “ip -brief link show” serve as suitable replacements for most common uses of ifdata(1)? The ip(8) command even supports JSON output using “-json” instead of “-brief”.
In fact, I take it back. ifdata(1) is not in any way a replacement for ifconfig(1) for most things. The problem is that just running ifconfig with no arguments showed you everything, which was generally perfect for interactive use. Now to get any information from ip(1) you have to remember an argument name. If you do this a lot, it's almost certainly fine. If you do it occasionally, it's horrible.
One does have to wonder though, why isn't -brief the default and the current default set to -verbose or -long. I look at -brief on either command and it has all the information I am ever looking for.
We were discussing ifdata, not ifconfig. From the documentation, ifdata is explicitly meant for use in scripts. And in scripts, using “ip -brief” or “ip -json … | jq …” may well be suitable replacements for ifdata.
Most of them are nice to have, but they still ship an incompatible suboptimal parallel, which you explicitly have to check against in your configure, if you expect GNU parallel.
Oh so that’s it! I use GNU parallel a lot, and installed moreutils yesterday, and parallel seemed to behave a bit… different. Couldn’t quite understand why, as I didn’t expect moreutils to replace parallel when you install them.
I'm sorry, what? First, moreutils package installs its parallel as parallel-moreutils. Second, pacman (like any other pm) wouldn't allow overwriting files belonging to other packages.
So just today I was wondering if there was a cli tool (or maybe a clever use of existing tools...) that could watch the output of one command for a certain string, parse bits of that out, and then execute another command with that parsed bit as input. For example, I have a command I run that spits out a log line with a url on it, I need to usually manually copy out that url and then paste it as an arg to my other command. There are other times when I simply want to wait for something to start up (you'll usually get a line like "Dev server started on port 8080") and then execute another command.
I know that I could obviously grep the output of the first command, and then use sed or awk to manipulate the line I want to get just the url, but I'm not sure about the best way to go about the rest. In addition, I usually want to see all the output of the first command (in this case, it's not done executing, it continues to run after printing out the url), so maybe there's a way to do that with tee? But I usually ALSO don't want to intermix 2 commands in the same shell, i.e. I don't want to just have a big series of pipes, Ideally I could run the 2 commands separately in their own terminals but the 2nd command that needs the url would effectively block until it received the url output from the first command. I have a feeling maybe you could do this with named pipes or something but that's pretty far out of my league...would love to hear if this is something other folks have done or have a need for.
$ mkfifo myfifo
$ while true; do sed -rune 's/^Dev server started on port (.*)/\1/p' myfifo | xargs -n1 -I{} echo "Execute other command here with argument {}"; done
In the other terminal, run your server and tee the output to the fifo you just created:
A named pipe sounds like a good way to fulfill your requirement of having the command runs on separate shells.. In the first terminal, shove the output of commend A into the named pipe. In the second terminal, have a loop that reads from the named pipe line by line and invokes command B with the appropriate arguments.
You can create a named pipe using "mkfifo", which creates a pipe "file" with the specified name. Then, you can tell your programs to read and write to the pipe the same way you'd tell them to read and write from a normal file. You can use "<" and ">" to redirect stdout/stderr, or you can pass the file name if it's a program that expects a file name.
1. Run one command with output to a file, possibly in the background. Since you want to watch the output, run “tail --follow=name filename.log”.
2. In a second terminal, run a second tail --follow on the same log file but pipe the output to a command sequence to find and extract the URL, and then pipe that into a shell while loop; something like “while read -r url; do do-thing-with "$url"; done”.
I have searched ts for a long time. Used it many years ago but forgot the exact name of the tool and package. No search machine could find it or I just entered to wrong search words.
I wrote a utility like ts before, called it teetime, was thrilled with my pun. It was quiteand useful when piping stdout from a compute heavy tool (multi hour EDA tool run) as you could see by the delta time between logs what the most time consuming parts were.
A lot of these things -- and a lot of shell tools in general -- strike me as half-baked attempts to build monads for the Unix command line. No disrespect intended; nobody understood monads when Unix was invented. But it makes me wonder what a compositional pipe-ish set of command line tools would look like if it were architected with modern monad theory in mind.
moreutils indeed has some great utils, but a minor annoyance it causes is still shipping a `parallel` tool which is relatively useless, but causes confusion for new users or conflict (for package managers) with the way way way more indispensable GNU parallel.
But that’s the point of parallel. The use case is for when you have N processors and a 10Gb NIC. Each job is CPU bound or concurrent license bound, or some jobs may take longer than others. Parallel allows you to run X jobs simultaneously to keep the CPU or licenses busy.
I have used vidir from this collection quite a bit. If you're a vi person, it's makes it quite convenient to use vi/vim for renaming whole directories full of files.
Ha! I use vifm now too, since this functionality is pretty much built in, but I still use vidir for one-offs and I thought it might appeal to some of the true minimalists.
> > chronic: runs a command quietly unless it fails
> Isn't that just `command >/dev/null`?
Often times you want to run a command silently (like in a build script), but if it fails with a nonzero exit status, you want to then display not only the stderr but the stdout as well. I’ve written makefile hacks in the past that do this to silence overly-chatty compilers where we don’t really care what it’s outputting unless it fails, in which case we want all the output. It would’ve been nice to have this tool at the time to avoid reinventing it.
new ideas are great. but this isn't a new idea. "run a command silently unless it fails" is basic; it's the sort of thing that should make one think "i should search the shell man pages" rather than "i should roll my own new utility."
I guess that chronic doesn't just show stderr, but also stdout if the command fails. If I am not mistaken, your example would hide the stdout output, even when the command fails.
I think I could pick apart about half of these and show how they aren't needed,
and the showpiece example is particularly weak since sed can do that do that entire pipeline itself in one shot, especially any version with -i.
You don't need either grep or sponge. Maybe sponge is still useful over simple shell redirection, but this example doesn't show it.
One of the other comments here suggests that the real point of sponge vs '>' is that it doesn't clobber the outoutput until the input is all read.
In that case maybe the problem is just that the description doesn't say anything about that. But even then there is still a problem, in that it should also stress that you must not unthinkingly do "sponge > file" because the > is done by the shell and not controlled by executable, and the shell may zero out the file immediately on parsing the line before any of the commands get to read it.
This makes sponge prone to unpleasant surprise because it leads the user to think it prevents something it actually has no power to prevent. The user still has to operate their shell correctly to get the result they want, just like without sponge.
So it's a footgun generator.
Maybe it's still actually useful enough to be worth writing and existing, but just needs some better example to show what the point is.
To me though, from what is shown, it just looks like an even worse example of the "useless use of cat" award, where you not only use cat for no reason, you also write a new cat for no reason and then use it for no reason.
But there is still something here. Some of these sound either good or at least near some track to being good.
Just a small comment about sponge: looking at the example, it's doing "sponge file", not "sponge > file". Given that, it's totally up to sponge to decide when it's going to open the output file.
It's been a very long time since this happened, but in my early days of using Linux, I experienced naming collisions with both sponge and parallel, and at the time I didn't know how to resolve them. I don't remember which other sponge there was, but I imagine most Linux users are familiar with GNU parallel at this point.
How might one use sponge in a way that shell redirection wouldn’t be more fully-featured? The best I can currently think of is that it’s less cumbersome to wrap (for things like sudo.)
I tried to prototype my own implementation of `vidir` in Node.JS a while back (not realizing that it existed under this name in moreutils), and I ended up getting derailed after realizing how incomplete Node.JS's support was for getting the group / username corresponding to a G/UID: https://github.com/stuartpb/whomst
My favorite missing tool of all time is `ack` [0]. It's grep if grep were made now. I use it all the time, and it's the first thing I install on a new system.
It has a basic understanding of common text file structures and also directories, which makes it super powerful.
I was introduced to vidir through ranger's :bulkrename feature. Extremely handy. I don't think I've used the other stuff, but from reading the thread, vipe sounds great.
heh. I use `chronic` all the time, `ifdata` in some scripts that predate Linux switching to `ip`. I occasionally use `sponge` for things but it's almost always an alternative to doing something correctly :-)
Looking at the other comments, I suspect one of the difficulties in finding a new maintainer will be that lots of people use 2 or 3 commands from it, but nobody uses the same 2 or 3, and actually caring about all of them is a big stretch...