Hacker News new | past | comments | ask | show | jobs | submit login
Moreutils – Unix tools that nobody thought to write (2012) (joeyh.name)
266 points by pmoriarty on Aug 22, 2020 | hide | past | favorite | 220 comments



Nice collection of quite useful tools. Some of these can be easily replicated by using a more modern shell (bash, zsh) like mispipe, others are just shortcuts (e.g. ifne, chronic).

But what immediately stood out to me is `vidir`. I really like the idea of editing file names with an editor. Using loops and regex in a shell for mass renaming can be a mess. It should be way easier with `vim`. This tool made me install moreutils.


On most POSIX systems you can use fc(1) to edit a command in your $EDITOR. In vi(1) and friends you can then use "%!ls" to replace the contents of the buffer with directory listing and edit the commands you want.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/f...

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/v...


> you can then use "%!ls" to replace the contents of the buffer with directory listing and edit the commands you want.

I often used :r !ls for that, thanks for the tip + it's shorter


Note that in general this is not a one-to-one replacement. `%!cmd` sends the current buffer contents to `cmd`’s stdin and replaces the buffer with `cmd`’s output. In the case of ‘ls’ this works since it doesn’t take anything on stdin


In Emacs, you can do this by invoking shell-command-on-region, by default bound to C-u M-| - iirc if no region is active it replaces the buffer, but I rarely use it that way so I might be wrong about that. At worst, C-x h C-u M-| gets you there.


huh! I've been working on posix systems for 20 years and I didn't know that...


You just improved my life. Thank you.


Check out renameutils[1], which has the advantage over vidir of doing sanity checks before renaming.

Also, this can be done in emacs using wdired.

[1] - http://www.nongnu.org/renameutils/


The ranger file manager also has this built in.


Was about to point this out, but it's good to know if you ssh in a box which doesn't have Python installed.

Edit: Or use a more portable terminal FM, nnn pops to my mind


MMV is reasonably powerful, but straightforward, can produce & consume rename listings, and is included in a lot of distros.: https://git.deb.at/w/pkg/mmv.git


See writable dired-mode in emacs. Here's a short demo: https://youtu.be/8l4YVttibiI


I use Emacs shell-mode for most of my terminals, which allows navigating and editing the buffer just like any other file. I'll often make several copies of a command, then use a macro to alter each one; after checking that they look right, pressing Enter will send them all to the shell.


I always tried to use the shell with emacs (eshell, multi-term) but it didn't render correctly curses and all the fancy stuffs like emojis. Out of curiosity, how do you manage that?


Check out emacs-libvterm. I’ve replaced all terminal emulators with it. All the benefits of an Emacs buffer and no issues with curses or lag.


When I used Emacs, Eshell was perhaps my favorite feature. I loved how modern it felt in many ways; why shouldn’t you be able to just `cd` into a remote location (TRAMP+SSH under the hood), or invoke Elisp commands like Magit straight from the shell. It could however be quite slow, and choked on some common escape codes (e.g. the progress bars emitted by the modern Ubuntu `apt` command.)

Regarding curses, I found that there was two solutions. The first is to automatically spawn curses apps in a proper terminal emulator, you just have to setup the `visual-commands` variable properly. The second alternative is to replace curses apps with Emacs apps, e.g. htop to helm-top. Personally, I ended up going the second route after a while, as I realized that there are actually very few curses apps that are important to me, and that Emacs apps are better integrated if you use Emacs for everything else.

If you rely on a lot of curses apps, a “real” terminal like emacs-libvterm may however suit you better. Renders curses apps and emojis as well as any other terminal I’ve used. It’s also much faster than ansi-term and friends at rendering.


I use a separate terminal (st) for curses-like things. They work in Emacs ansi-term, but since that doesn't offer the same navigation/editing interface as normal Emacs buffers I don't find it compelling enough to use.

Eshell is interesting, but I can't use the bashisms I'm used to, and I can't copy commands back-and-forth between the prompt and a standalone script.

shell-mode lives in-between these two extremes: it runs a normal shell, but Emacs manages the buffer. I don't know about ansi-term or eshell (or alternatives mentioned by others), but shell-mode handles emoji fine, as well as progress bars and colour codes; I think it defaults to TERM=dumb, so many programs won't output colour, etc. unless you override it to something like TERM=xterm-256.


Usually with ansi-term instead, if I'm honest. Also look into emojify-mode if your system doesn't handle emoji fonts well or at all.


Belatedly to add - look into the eshell-visual-* lists, to configure commands, subcommands (like "git log"), and I think some other traits, such that matching commands get their own ansi-term buffer.



It's one of the reasons I like using the nnn file manager. It has a batch rename feature with exactly that functionality.


+1 for the powerful yet minimalistic nnn where concise help is only a question mark away and the default mnemonics make sense. Wonderful piece of software.


Honestly one of the most well maintained projects I’ve ever used. The author has a coherent “philosophy” and vision of how the program should work and does a good job of preventing feature/scope creep. Don’t see that too often


There are Vim plugins that let you rename files and directories within Vim, such as https://github.com/qpkorr/vim-renamer.

Also if you happen to use fern[0] you can mark off multiple files and directories, hit a hotkey and now you can edit their paths in a Vim buffer.

[0]: https://github.com/lambdalisue/fern.vim


I've known about `moreutils` for years, but the only two I ended up actively using myself are `vidir` and `vipe`.

Outside of `moreutils`, one of the utilities I always install on all my machines is `atool`, which is just so much nicer and more intuitive than trying to remember the command-line options needed to handle the various tar.*, rar, zip, 7zip, lzip, etc. archive formats from the command line.


you might also be interested in vimv and vimcp

https://github.com/thameera/vimv

https://github.com/danieldugas/vimcp


I used vidir yesterday for something that would have been more annoying with other tools - if you swap the numbers at the beginning of the line, then you'll safely swap the file contents. Useful if your TV episodes are named the wrong thing...


Emacs has dired for that reason. Imagine ls that you can edit.


> vidir

I just use ranger for that.


Inspired by this project I put together sysadmin-tools:

https://github.com/skx/sysadmin-util

Later they were replaced by a busy-box style collection of utilities written in golang (mostly to ease installation):

https://github.com/skx/sysbox


There's a few programs I wish were standard on all Unix systems by now:

* tree - print directory structure

* bat - cat, but actually designed for reading files. Syntax highlighting, line numbers, automatic paging.

* rg - better grep

* direnv - local environment variables


One of the reasons is Rust, but another reason is that the standard shied away from adding anything with frills because UNIX-likes run on all kinds of systems, many which don’t have space for this. There is a reason why something as seemingly ubiquitous as bash or vim are not available everywhere (but sh and vi are). The base set of utilities is really “do more with less” and the ability to keep to this philosophy means that I can connect to a router with 2 MB of MMC storage and still be able to have an extremely functional base set of tools, rather than having some arbitrary subset of the standard utilities because they just can’t be supported on platforms like that.


Exactly, I can telnet into my shitty Chinese router and still have cat and sh


My phrasing was a bit narrow-minded there, I meant consumer *nix distributions like Debian and what have you. These all manage to make room for Thunderbird and whatever awful music client they've come up with, but lack handy CLI tools.


A few that I wish hadn't been ignored from 4BSD:

    jot -- print sequential or random data 
    rs  -- reshape a data array
    vis -- display non-printable characters in a visual format
and from Unix >V7:

    apply -- apply a command to a set of arguments
    mc -- multicolumn print
and from AT&T:

    tw -- file tree walk
And then the obligatory personal tools that have stood the test of time:

    align -- align text columns
    crop -- crop lines to a width
    dabl -- delete adjacent blank lines
    dtb -- delete trailing blanks
    emboxxen / deboxxen -- convert to/from box drawing characters
    eol -- convert line endings
    field -- simple line field extraction
    freeze / thaw -- cross-shell synchronization
    mergl -- merge lines into previous lines' blanks
    pad -- pad lines to a width
    put / take -- cross-shell pipe
    uni -- unicode character properties and search


Some of the above can easily be written in C or Python or Ruby (and some of the simpler ones even in shell), at least for simple versions, maybe without all the frills of the originals.

And seq in Linux is like jot for sequential data, at least.

And from my blog: An Unix seq-like utility in Python: https://jugad2.blogspot.com/2017/01/an-unix-seq-like-utility...


Yes, they're generally available in some form (now generally including the ‘originals’ now that V8–V10 have been published). And they're all very simple, which is the whole point of Unix tools.

I've found I'm using the `put`/`take` pair, for splitting a pipeline across shells, more again in the WFH era where I often have multiple ssh sessions into a machine, when I would probably have used the clipboard in a GUI session. Also useful for things like seeing stderr from different parts of a pipeline in different windows. `put` is just `cat >>$(mkpipe "$1")` and `take` is `cat $(mkpipe "$1")` where `mkpipe` is

    fifo="${TMPDIR:-/tmp}/fifo-$(id -u)-$1}"
    test -p "$fifo" || mkfifo "$fifo"
    echo "$fifo"


>And they're all very simple, which is the whole point of Unix tools.

Not all Unix tools are simple. E.g. make. But I know what you mean - the Unix philosophy.

https://en.m.wikipedia.org/wiki/Unix_philosophy

The TAOUP book by Eric Raymond (The Art Of Unix Programming) has a lot about that.

https://en.m.wikipedia.org/wiki/The_Art_of_Unix_Programming

And my IBM developerWorks tutorial / case study on Developing a Linux command-line utility (in C) may be of interest to people who want to write their own Unix tools that play well with others.

https://jugad2.blogspot.com/2014/09/my-ibm-developerworks-ar...


> tree

I agree, tree is great. It can be somehow replicated by using "find .", if you are in a hurry.

> bat

If you need that functionality, why don't you open the file with vim, for example?

> rg - better grep

Grep is pretty nifty, I don't see how could it be improved. What is the main advantage of of rg over grep?

> direnv - local environment variables

I read the manpage for direnv and I was really scared. What is it for? I never used an .envr. If I need to see my environment I can simply use "env". What is it missing?


direnv is super useful if you're working in Python and have multiple projects: you can add an .envrc file that activates the right virtualenv whenever you enter a directory. It's one less thing to worry about.

I figure you can also use it in conjunction with `module load` to automatically load the right module when you enter a project's directory. Modules are used in many computing clusters to manage environments.

Also direnv will never execute an .envrc without permission, you need to run `direnv allow` whenever there is a new or newly modified .envrc.


I've got a hand rolled direnv that's quite simple:

  function lenv {
    path=$PWD
    while [[ ! -f "$path/.env" && "$path" != '/' ]]; do
      path=$(dirname "$path")
    done
    if [ -f "$path/.env" ]; then
      source "$path/.env"
    fi
  }
function cd { builtin cd "${@}"; lenv; }

It lacks the security but it works recursively and allows more than just environment variables. A common script I have is to automatically build for instance:

  here=$(dirname ${BASH_SOURCE[0]})
  function autobuild {
    (
      mkdir -p build
      cd $here/build
      ../configure
      while true; do
        make && make check
        inotifywait -qr -e close_write,delete $here/src $here/test $here/Makefile.am
      done
    )
  }
I've tried generalizing the latter but between different targets, different build systems, different projects structures etc, copy and pasting something like this per project is the least worst option.


direnv does allow more than just environment variables, you can run arbitrary commands. Personally I wouldn't run potentially expensive commands like building the project, but if that's really what's best for your workflow, it can be done.

If there is no .envrc in the directory it will look for one in the parent directory, but it won't activate more than one.

Also one thing it does that your hand-rolled version does not is that it remembers what the environment variables were before and restores them to that state when you exit the directory.


> Personally I wouldn't run potentially expensive commands like building the project

My above example wasn't running the command, just defining it to be run manually. The .env becomes a dumping ground for all sorts of project specific stuff. The docs say direnv doesn't support this.

> Also one thing it does that your hand-rolled version does not is that it remembers what the environment variables were before and restores them to that state when you exit the directory.

I've thought of adding this, but apart from a couple of things I reset manually I haven't really found the need.


Ohh, okay, I see.

You're right, you can't define aliases/functions using direnv. That's a shame, that does seem useful.


Re: bat. I agree with you, I don’t get the hype over this program either. It’s pretty easy to configure 'less’ to use syntax highlighting and ‘less’ is installed basically everywhere.

The main advantage of rg is how much faster it is than grep. It’s WAY faster.

I don’t use direnv but I have colleagues who do. I think the idea is that it can set environment variables/run arbitrary commands when you cd into a certain directory (such as activating a Python virtual env, for instance).


I also like that rg's default mode is to search all files recursively, skipping binary files and honoring .gitignore. 90% of the time this is just what I want.


I don't think they should be standard at all. I am familiar with all of them pretty much since the moment they are born (which is not that long, TBH) and the only one I actually use is tree. Which is also quite a judgement call, since it can be replaced by exa, which is more powerful (I'm considering it; I'm pretty sure that you will like it, BTW, so take a look). bat is useless, since there is vim/view. rg is actually good, but cannot be used as a drop-in replacement for ag (I don't remember what the problem is exactly, but I tried several times and burntsushi even gave me some advice on that, and I still use ag, which is slower, so I assume there actually was some serious reason). And, by the way, no, neither of them is actually a replacement of grep.

(Also, yeah, let's ignore that Rust in the standard libraries is still a problem.)


Good list of things you can install to better your CLI experience, but I am not sure if I agree that they fill out missing features.

Why is bat better than less?

I admit to catting files for reading as much as the next guy, but I almost always have the tiniest regret that I didn’t feed it into less and kept my terminal cleaner.

As for RG and AG. other than being faster, how are they better? I thought they were api complainant drop-in replacements so it’s not like they fill in a missing role.


> I thought they were api complainant drop-in replacements so it’s not like they fill in a missing role.

Could you please let me know how you came to that understanding? Because I'd really like to fix it. ripgrep was never intended to be POSIX compatible. It would have been a straight-jacket over its functionality. See: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#pos...

> As for RG and AG. other than being faster, how are they better?

For ripgrep at least, a lot of it comes down to speed, better output formatting by default and suppressing results you probably don't want by default (but that can be turned off quite easily).

But there are other features, like better Unicode support, built-in support for searching compressed files, support for file types and a few other things. The README includes a bit more. ;-)


Hi thanks setting me straight.

I have actually read a few of your (great) articles about ripgrep. I hope I didn’t come off as dismissive of your impressive work - though rereading my comment I should perhaps have chosen my wordS a bit more thoughtfully.

I did know that RG would skip more files than normal grep, so I guess I knew it’s not a safe replacement for all shell scripts.

But other than that, I’ve never thought of AG or RG as tools that brings something new to the table. I’ve always thought of them as a drop ins, with slightly new defaults and much faster speed. And the discussion related to TFA is tools that are missing, not tools that are better.

Maybe it’s a mix of the name and countess blog posts telling readers to use RG instead of GREP, that lead me to consider RG as a better grep and not a different tool.


Interesting. Hmm. Not quite sure how to combat that one. The issue is that even though ripgrep is not POSIX compatible, it can pretty easily replace grep in most or all circumstances. The similarities between the tools are, for example, much greater than their differences. I know for me at least, I don't ever run grep interactively. The only time I ever use it is in shell scripts.


> As for RG and AG I thought they were api complainant drop-in replacements

To grep? Not that I know, I don't think either respects POSIX grep to say nothing of GNU grep. Though the trivial usage is identical, and many basic options carry over (e.g. -C and friends).

> other than being faster, how are they better?

Way better defaults for one. `rg` defaults to being recursive, ignoring binary files, respecting various types of ignore files, not being limited to BREs, coloring, …

They also have useful convenience features like "file categories" (e.g. `-t` will expand to a bunch of predefined include glob patterns so you don't have to input them by hand, which can be tedious), or parallelism (recursive ripgrep will parallelise searches across files, with grep you have to remember to combine `find`, `xargs -P` and `grep` for that to happen).


>cat, but actually designed for reading files. Syntax highlighting, line numbers, automatic paging. //

`less` not good enough?


I’ve never seen syntax highlighting in “less”.


Ha! Here’s how to have syntax highlighting while using less: https://superuser.com/questions/71588/how-to-syntax-highligh...


While I don't use syntax highlighting, I find `lesspipe` useful for files that have a reasonable ‘presentation form’ as well as a source form — html, markdown, etc.


bat is another great alternative

https://github.com/sharkdp/bat


Mentioned at the top of this thread ;)


> rg

Never going to happen because it's written in rust. ag (https://github.com/ggreer/the_silver_searcher) on the other hand is written in C.


Never say never :)

If Rust being added to the Linux kernel[1] isn't far-fetched, I don't think adding a utility in Rust is crazy either.

[1] - https://lore.kernel.org/lkml/CAKwvOdmuYc8rW_H4aQG4DsJzho=F+d...


Interesting.

Isn't it the case that LLVM doesn't support all architectures that Linux does?


I think this is just for kernel modules that you can choose to compile out.


That's what I expected too (as one of the people on the thread / one of the authors of the farthest-along set of bindings for modules), but Linus and Greg KH both seem fairly inclined for it to be on by default and used for core code (small bits of core code, in the beginning) and not simply modules. I'm not sure how that's going to play out with architecture support, and that's one of the things I want to get out of the Plumbers session this upcoming week.

https://github.com/fishinabarrel/linux-kernel-module-rust/is... is a chart of Linux architectures and whether Rust and LLVM support them.


Oh, wow, that would be huge. Not just architecture support, but even having a toolchain on some platforms might be problematic as I can imagine making rustc a requirement to build Linux to be fairly controversial in and of itself. And that architecture list is not looking very good…would be very interested to know what they decide on as well!


What's never going to happen?


A Rust program being standard on all UNIX systems.


I'll bite: Why?


Rust is designed for a fairly small number of relatively modern systems; there are many close-to-POSIX systems that can’t (and I would guess likely will never) be able to run Rust code in any formal capacity. To the hopeful: even C++ has been unable to win here, despite having many more years of a head start and much more accommodation for stranger platforms. It’s just not going to happen.


What kind of platforms are you thinking of? People have got Rust running on the Raspberry Pi.

https://old.reddit.com/r/rust/comments/cdcads/


Linux on ARM is a very mainstream platform from the perspective I’m looking at this from, I’m afraid. I’m talking about strange Unices running on architectures that GCC may or may not maintain a backend for, or maybe a BusyBox available on some debug interface, maybe a school project to build a simple POSIX-compatible OS or a bootstrap for a platform that had been recently jailbroken. In these cases C is almost always the go-to language and I suspect it will remain so for the foreseeable future as this is precisely the long tail of distributions that standards were intended to provide a base set of tooling for.


Could you give some actual examples? This sounds interesting but I'm not sure I've ever seen something like that in the wild.


This is clearly not one of the examples saagarjha was thinking of, but an actual example where Rust is not an option is the x32 ABI for x86_64 platforms. This ABI works well with GCC, but LLVM suffers from multiple code generation bugs. As Rust is based on LLVM, Rust for the x32 ABI does not work well either. While it's possible to use a mixture of ABIs, so that you have a mostly-x32 system but with some x64 binaries, this requires a multilib setup with multiple copies of system libraries. A pure x32 system cannot currently have Rust utilities.

(I have some patches to improve things but I have not been able to submit them to LLVM yet, and with those patches I did manage to get a working x32 build of rg on my system. I hope to be able to do so in the future.)


Rust supports many platforms but only has "tier 1 support" on a handful of mainstream architectures.

https://doc.rust-lang.org/nightly/rustc/platform-support.htm...


MIPS and SPARC to name a few. Almost all routers are on MIPS and most of them are running some form of Linux or BSD.

AFAIK, rust is still marked as "guaranteed to build" on these platforms, but assume only Linux. BSDs, not so much.


In my experience, most consumer routers are ARM?


Maybe new ones, but there's a ton of MIPS ones too.


The Debian ripgrep binary is 5.1MB. That would consume over 15% of a 32MB flash, which is the common constraint you have to work with in many (actually, discounting Android, it might even be most) Linux deployments.


You're not likely to even find full-fat grep on such platforms (only busybox)


bat is Rust too.


+1 for ag (the_silver_searcher)


Why do you want automatic paging by bat, if you can just open it in Vim.


I'm still stuck on the fact that one of the utilities is called `pee`..

In fact the docs even say:

> make sponge buffer to stdout if no file is given, and use it to buffer the data from pee.

Translation, pee into a sponge!


I seriously hate this package and the manner of combining different utils with different names in a same package in general. The reason being, "sponge" is an actually useful tool and for me it's pretty much the only useful tool in the package. So I need to install whole moreutils package on ubuntu to have "sponge" and I have to clog my bin namespace with all this trash. It would be mildly annoying from the perfectionist point of view, but ok. But there is also "parallel" command in this package, which is trash as well. Meanwhile GNU "parallel" is not trash, but an actually useful program, maybe more useful than "sponge", and it comes in a separate package. So I either have to do some tinkering, or I have to choose which one I need more: "sponge" or "parallel" and install only one package. It's year 2020. This is stupid.

This is by the way the reason why GraphicsMagick is better than ImageMagick (I still use the latter, though, because it doesn't cause the same problems for me as moreutils does, and it's just more popular than GM).


On principle, I agree with you -- this project should be split into multiple packages. However, it's the distro's job to harmoniously put various open source packages together. It's impossible for package authors to make everyone happy. Plus, in the moreutils case, the author has already gone to great lengths to publish his work for free, he shouldn't have to put up with more work to the point where it stops being fun for him/her. Finally, it's just NOT appropriate to call parts of this project "trash". He scrached his itch and was kind enough to publish his work. If you don't like it, just don't use it, no need to be harsh about it.

The right way for distros to package this is to split the moreutils source distribution into moreutils-* packages, plus a meta package that pulls in all of the commands. So your complaint actually belongs in your distro's bug tracker -- it has nothing to do with the moreutils project.


Calling software "trash" is uncalled for, but squatting on a namespace is a real thing and not something that you can justify behind the usual "someone did this for free and owes you nothing" because this make the landscape worse for everyone else.


In version 7, ImageMagick has switched to just "magick" instead of "convert", "mogrify" etc., and for compatibility with version 6 it still supports subcommands like "magick convert" etc., but at least the namespace cluttering is gone.

https://imagemagick.org/script/command-line-tools.php

Edit: I wasn't quite right, the commands still exist, but they are symlinks to "magick" and can be called using the subcommand mentioned above; you still have to delete the symlinks to clean up the namespace.

https://imagemagick.org/script/porting.php#cli


Since Debian Stretch / Ubuntu 17.04 they do the right thing.

If you install parallel & moreutils you get the GNU parallel as /usr/bin/parallel and moreutil's parallel as /usr/bin/parallel.moreutils. If you only install moreutils, it provides /usr/bin/parallel.

You can use the debian alternatives system to flip the order.

https://superuser.com/a/1253492/78988


Another alternative might be to have a set of packages, 'moreutils-spnge, moreutiles-pee, ... moreutils-common, moreutils-doc', which could be installed independently, and, if desired, a 'moreutils' virtual package dependent on the whole set for simplified installation.

Debian also has namespace-collision policies and an 'alternatives' facility for deciding which of multiple implementations of a tool (gawk, nawk, mawk; vim, vim-tiny, nvi; python3, python2; etc.) is primary on a given system.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=597050

https://www.debian.org/doc/debian-policy/ch-binary.html#virt...

https://www.debian.org/doc/debian-policy/ap-pkg-alternatives...


you raise a great point. why not install the package, then delete the /usr/bin files you don't need?

would it solve the problem if the author namespaced the commands with a hyphenated prefix? curious about these considerations, which it seems like you've spent time thinking about. what do you see as the "best practices" regarding a set of utilities that are maintained and published together?


In a most general sense, this is not really a problem of moreutils, but of how tools are installed/distributed in Linux distributions. But since we have to be realistic, yes, package authors should take such problems into consideration.

I think if tools are absolutely unrelated, they just should be distributed as a separate packages. GNU coreutils is tolerated mostly because it's so ubiquitous (so much, that it causes Stallman to grumble about "you mean GNU/Linux, not Linux"). moreutils is late to the party, it isn't ubiquitous, and the usefulness of any tool in the package is questionable, so the author really shouldn't be so brave to assume that if he thought he needed all of them, everyone will.

If there is a reason enough to distribute a package with several separate callable binaries (as with ImageMagick), I think git or GraphicsMagick are perfect examples of how it should be done. Hypenated prefix is also ok. Even if your tool is supposed to be used by somebody 50 times a day, too long of a name isn't really a problem, since user can always just make an alias (as I do with most of the tools I use frequently).

Sure you can always combine binaries from 2 packages manually, but as I said, it just requires some tinkering, so I cannot simply have in some textfile a list of utils to install on a new PC in a matter of minutes.


Stick the binary wherever your textile is, copy it to path on setup.


I remember building the gnu core tools on sun os and solaris boxes. I had a license to the compiler from sun, then I would get GNU compiler going, then I'd get the whole GNUniverse going.

The GNU build/install process let you set a prefix, so it was customary to specify `g` get prefixed, so `gmore` and `gcd` etc. if you were afraid of breaking old user scripts that used the sun versions.

The build for this looks pretty simple. I cloned the source, ran "make". I got some XML errors about docbook, but it did create a `sponge`

  otool -L sponge
  sponge:
 /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.100.1)
looks like one could copy it to `~/bin/`


Your package manager will still see that the two packages want to own the same file and give you a conflict error unless you force install which it's very messy. Namespacing the commands would solve this.

But in general it is bad practice to package a bunch of unrelated tools together. You should be able to install them separately to only have what you need. Youd want different packages for each command or sets of related commands with common dependencies. And then have a meta package that includes all of your packages for people who want to install it all in one go.


Excepting the user to have to clean cruft to install the software they want is a terrible philosophy and not in line with Unix at all, now you've created a situation where to install "sponge" you need to place 10 other random binaries onto your machine and know enough about the process to know to delete them...


Or use Nix to install these kinds of package and only create the symlinks you care about.


Another alternative might be, if the programs are single source code files and can mostly work by itself, you can just compile the ones you need and put the binaries where you need them, rather than installing a package.


How is sponge different from just normal file redirection?


See the first example on the web page:

  % sed "s/root/toor/" /etc/passwd | grep -v joey | sponge /etc/passwd
If you tried this as:

  % sed "s/root/toor/" /etc/passwd | grep -v joey > /etc/passwd
The shell would overwrite /etc/passwd with an empty file before sed has a chance to read from it.


I don't recall how difficult it is to maintain a debian package.

It seems like it shouldn't be too much work to fork the repository, and tweak the packaging logic to only include this one binary. You'd still have to compile the whole thing every time, but rebasing would be straightforward.


Don't most shells implement something like ">"? How is sponge an improvement on this?


In the provided example

  % sed "s/root/toor/" /etc/passwd | grep -v joey | sponge /etc/passwd
It looks like this allows an in place modification to the same doc. If you used

  % sed "s/root/toor/" /etc/passwd | grep -v joey > /etc/passwd
Bash will process the redirection first, then execute the commands in the pipe. So, it will have cleared out /etc/passwd with the non-appending redirection ">" before sed operates on it. You'd end up with a blank /etc/passwd file. (ymmv, I don't know how other shells would handle this)

Sponge, it seems, would cause the pipe to allow the preceding commands to complete before it outputs.


you can sudo sponge, and not run the entire command with elevated privileges.

(i'd welcome a method of doing that in zsh.)


Is there a way to namespace on install? Or does that have to be configured in the package?


What I normally see in Archlinux is a rename of the less common conflicting utility when the package is built. In this example, the moreutils package's parallel utility and its manpage got renamed to parallel-moreutils.


You put them in different directories, e.g. Solaris-style /opt/package/…, and arrange your PATH accordingly.


I am not aware of such a way with apt on ubuntu. It can be different with some other package managers, obviously.


dpkg-divert


Docker


I wish there was a short and simple command for “list all files of a directory and no subdirectories“. ls doesn’t seem to have a switch for that kind of functionality.

It’s discussed on stack-overflow. [0]

Someone even wrote a nodejs tool for this functionality [1], but I would rather have something written in a compiled language.

[0]: https://stackoverflow.com/questions/10574794/how-to-list-onl...

[1]: https://github.com/mklement0/fls


But the top answer has it:

> find . -maxdepth 1 -type f

A little clunky, but there's certainly no need to mess about in JavaScript land. If you use it often, create a shell macro.


find doesn't return the same output as ls when run on another directory. Maybe there is a switch to do this? I'm not sure.

    $ mkdir -p tmp && touch tmp/a tmp/b tmp/c
    
    $ find ./tmp -maxdepth 1 -type f
    ./tmp/a
    ./tmp/c
    ./tmp/b
    
    $ ls -1 ./tmp 
    a
    b
    c


You can also use basename with the find command

    $ find ./tmp -maxdepth 1 -type f -exec basename {} \;
    b
    a
    c
I don't know the downsides of using basename though.


from all the solutions, this seems to be the most simple with readily available tools. Unfortunately it's still not as short as I would like it to be and I hope I remember the syntax with -exec $CMD {} \; the next time I need it.

Thanks to all the helpers, who took their time to propose solutions.

But what I was trying to say is: I would be ok with adding a new command to the tools I use on my machine, VMs and containers. fzf, fd, rg are tools which make my life easier. I would prefer to have the features of fls in ls itself or a tool similar to fls in coreutils, moreutils or another package .


    +   $ find /tmp/x ! -path /tmp/x -prune -type f -exec sh -c 'for p; do printf '\''%s\n'\'' "${p##*/}"; done' - {} +
    c
    b
    a
Seems to do the trick and is POSIX only (so works on busybox and mac). You can probably just tack `| sort` at the end if you want it sorted.


This answer reminds me of the classic HN comment - why use Dropbox when "you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software."

People value software that's easy to use. That command you typed is a Frankenstein's monster.

https://news.ycombinator.com/item?id=9224


This had become a discussion of the best way to write the "shell macro" version that could replace the node.js program

  fls () {
   find . -maxdepth 1 -type f
  }
vs

  fls () {
   find . ! -path . -prune \
    -type f \ 
    -exec sh -c 'for p; do printf '\''%s\n'\'' "${p##*/}"; done' - {} +
  }
The user would just type `fls` in the directory they were exploring.


You can run it in a subshell pretty easily, like this:

    $ (cd tmp && find . -maxdepth 1 -type f)


I think this is what you are looking for

    $ find ./tmp -maxdepth 1 -type f -printf '%f\n'


I'm on MacOS... :-/

   $ find ./tmp -maxdepth 1 -type f -printf '%f\n'
   find: -printf: unknown primary or operator


If you use brew or macports you can install the coreutils package and then you're good to go with gnu find.

I also use MacOS and I have setup the gnu userland on my local so it matches the linux environment on our servers and containers.


You’re right that defaulting to the GNU coreutils is probably more convenient. However, NOT doing this is a good way to ensure that any scripts you write remain portable and don’t use GNU extensions. That’s the reason I stick with the default BSD coreutils on macOS.


> You’re right that defaulting to the GNU coreutils is probably more convenient.

It's not about convenience. It is about having a development environment that matches the target runtime environment.

You can run into nasty surprises when the default behavior of a tool in your dev environment is different than in your production environment.

I've been writing shell scripts professionally for over 20 years and I have always taken this approach and it has served me well.


You could also try restricting yourself to POSIX shell.


find <directory path> -type f -d 1

Works for me on OSX.


Not got a shell handy, but "-exec ls"?


The -ls flag to find prints the same as ls


not quite.

from the man page:

> The format is identical to that produced by ``ls -dgils''.


List all files and no directories :

  ls -1p | grep -v '/$'
You may wrap it in an alias or .bashrc function. I use the reverse (list only directories):

     function lsd {
         ls -1p $* | grep '/$'
     }


The are two main responses to this comment as I write this. One of them lists only regular files. One of them lists anything that isn't a directory. And that's why there isn't a short and simple command.


ah, yes, the simple test only had files, no symlinks, hidden files, etc, anything else you might want to distinguish upon in a shell script.

But I don't think it's too hard to solve this problem. A command which filters should also be able to invert the filter. One example for this is grep -v.


> Someone even wrote a nodejs tool for this functionality

Coming from a more low level background but doing front end development from time to time, I am always amazed by the sheer amount of seemingly useless javascript reimplementations of what would be bash one liners.

For your question, how about "find <dir> -type f - maxdepth 1" ?

You can put that in an alias or script in your path


Try submitting a pr to exa. Probably the fastest way to get it solved properly.


Thanks for mentioning exa. I saw it before, but never tried it. Maybe this is an opportunity to learn Rust.


Couldn't you just alias the find command mentioned in the SO thread?


Just curious, how is `find . -maxdepth 1 -type f` not simple?


The output is different between find and ls when you run it on another directory.


ls *(.) if you use zsh.


Technically the request was for ls *(^/) in Zsh-ese.


I use vipe frequently, to let me pipe from/into vim. This enables fun shortcuts as:

    xclip -o | vipe | xclip
This one in particular, let's you edit your clipboard with $EDITOR.



I use `chronic` in almost all of my cron jobs so that cron sends me an email with the output only if the command has a return code of != 0.


ts looks like it's a subset of https://github.com/ThomasHabets/ind

sponge can be replaced with dd

Other than that, yeah nice ideas. And very much in the unix spirit of one tool to do one thing.


sponge buffers the whole input before writing the output. Its utility would be in reading from a file, working on it with other tools, and writing the result back to the original file, all in one line.


I know I wanted to do this, but I am very happy such a utility doesn't exist. Unix utilities, usually work in a pipeline fashion, read line -> process -> write to output. This allows them to allocate a limited amount of memory. If you read all input in memory, you are asking for trouble: 'cat /dev/zero | sponge a'.


I routinely operate on machines with gigabytes of memory, but rarely write pipelines which output gigabytes of data, so this has never been a concern for me. But even then, there’s still swap space which is effectively like using a temp file but lazier.

By the way, I think you’d be in even more trouble if you wrote:

    cat /dev/zero > a


That is the point.


Seems like the example command should be easily doable with just `sed -i`, though.

sed -i 's/root/toor/;/joey/d' /etc/passwd

But I get your point.


Oh, neat! Thanks.


Yes, they weren't written "when Unix was young", but people had thought to write some of them over a decade before this toolset. lckdo and ts came long after setlock and cyclog (also the precursor to multilog) from the 1990s.

* https://cr.yp.to/daemontools/setlock.html

* http://cr.yp.to/daemontools/multilog.html

* http://cr.yp.to/daemontools/upgrade.html

* http://jdebp.uk./Softwares/nosh/guide/commands/cyclog.xml#CO...


The sponge example:

    sed "s/root/toor/" /etc/passwd | grep -v joey | sponge /etc/passwd
I think it can be rewritten as:

    sed -n '/joey/! s/root/toor/p' -i /etc/passwd


Yes, but -i on sed is specific to GNU, I don't think it exists on BSD/OSX/busybox


-i is definitely not specificed by POSIX, but it supported on all those platforms with some small differences, for instance on OSX the backup extension (-i.bak) is not optional.


Amazingly, there is no portable way to use -i that works on both GNU and BSD sed implementations. Which means if you’re writing a portable script, you can’t use -i at all.

(Would love to be proved wrong on this)


According to Stack Overflow [1], this works:

    sed -i.bak 's/foo/bar/' filename
[1]: https://stackoverflow.com/a/22084103/3266847


That works when specifying a backup extension, but not if you don’t want to create a backup file.

    sed -i ‘’ ...
works on BSD sed but not GNU. Meanwhile:

    sed -i’’ ...
    sed -i ...
both work on GNU but not BSD.


Well, yes, you can't use it without creating a backup file, but it uses "-i" and is portable.


What is the difference between `sponge` and redirecting with `>` (or `>>`)? Is it about better compatibility with pipes or something?


Well one thing is that if you’ve tried it you’ll run into the problem he illustrates there that you are writing to the Sam file you are reading from, which won’t work right.

My guess is sponge buffers all the input and then sends it to output once stdin is closed.


I wondered this too so looked at the man page [1]

> Unlike a shell redirect, sponge soaks up all its input before opening the output file. This allows constricting pipelines that read from and write to the same file.

[1] https://linux.die.net/man/1/sponge


Imo sponge is the most useful of all of them. For example if you want to filter a file in place you can do ‘grep str file | sponge file’. If you do that with a redirect you’ll end up with an empty file.

Some commands (eg. sed) have an in-place option, but many don’t.


While we're on the topic and on the wave of modern riffs on classic tools, personally I'm pining for a remake of xargs—because I never can whack it into submission with anything more complex than `xargs rm`. Specifically, passing multiple arguments from the input to the called command apparently just can't be properly done, at least not on OSX.


parallel may be what you are looking for: https://www.gnu.org/software/bash/manual/html_node/GNU-Paral...

The defaults are a lot saner, and it's easier to pass arguments how you want.


Hmm, I guess I considered Unix utils rather narrowly specialized, so `parallel` meant ‘my CPU got too many free cores’ for me. I'll take a closer look at it, thanks.


Where is Msdos utility "ncd"? It was like "cd" but guessed from few character hint where you wanted to go. If the choice was not immediately obvious if offered a menu or a tree. I shortened it to "n". -- There was some linux utility but setting it up was annoyingly tedious, with lots of useless and cryptic options.


Didn't know about "ncd", ended up re-inventing it. The following script, named 'F', takes regexp(3) pattern in $1 and works reasonably well in the 'all text is hypertex' environment of (p9p) Acme. If it finds one match, it opens it via plumb(1). Otherwise lists results; one right-click opens the clicked result again via plumb(1).

Yes, the find -type f focuses on files; nonetheless it's easy enough to right-click-select the relevant directory to open it rather than the file.

  #!/usr/bin/env rc
  
  if (~ `{find -type f | 9 grep $1 | wc -l} 1)
   plumb `{find -type f | 9 grep $1}
  if not
   find -type f | 9 grep $1


Que? What is rc?

"rc - implementation of the AT&T Plan 9 shell".

Do I have to install that?


Nevermind now that I think about it, this is basically most of ncd:

    #! /bin/bash                                     
     d=$(find / -name "$1" 2>/dev/null | head -1)            
     echo Jumping to $d, Control-D to return         
     cd $d                                           
     bash
Actually it is now perfect. If the guess is wrong, you can return and add letters to you search string.


Consider fzf: https://github.com/junegunn/fzf

I have it bound to ctrl-e for edit: https://gist.github.com/FeepingCreature/5f575b6fcda041a48f58...

It's like adding the vscode file finder to bash :)

The additional code is to make bash "act as if you typed in the command manually". So you can arrow-up to find it in the history buffer with the filename expanded.


Ach. It was "Norton Change Directory" http://www.softpanorama.org/OFM/norton_change_directory_clon...

But I already made my own. It is quite perfect:

    #! /bin/bash
     d=$(find /  -name "$1" 2>/dev/null | head -1)
     echo Jumping to $d, Control-D to return 
     sleep 1
     cd $d
     bash


Use z[1]. Also use fzf.

[1] - https://github.com/skywind3000/z.lua


cd cannot be an external command in Unix-like systems, it has to be a shell built-in.


But cd can be invoked from a shell function (which doesn't create a subshell).


It could have been a script for adding to shell startup files.


I think that's the default behaviour in Zsh or Fish if you press tab.


Probably immediate subdir. What I am talking about is universal jump-to-anywhere.

Like I have subdir "/media/tnoko/sdc1/capture/Roinaa/".

"ncd Roi" should be totally sufficient and unique command to go there.


Notably for this there's autojump[0], z[1] and fasd[2]. More work than an efficient script for sure, but more fancy options around it all. Autojump works pretty well with zsh (and before that with fish) for me.

[0] https://github.com/wting/autojump

[1] https://github.com/rupa/z

[2] https://github.com/clvv/fasd


For this I use autojump[0], what I really like is the integration with ranger FM[1]. I don't know exactly how ncd works, but with autojump you need to visit the directory at least one time to be able to fuzzy-jump.

0: https://github.com/wting/autojump

1: https://github.com/fdw/ranger-autojump


A lot of these seem easily achieved in a POSIX shell manner, but most annoyingly to me

> sponge: soak up standard input and write to a file

I do that all the time...

cat > file.txt

Update: reading the comments on here, apparently sponge sucks up all content before opening the destination file which allows editing an input file in place. Minor advantage there.


The purpose of sponge is that you can do this:

grep foo file.txt | sponge file.txt

If you do this with redirections then file.txt will be truncated before it's been processed, leaving you with an empty file instead of what you wanted. Sponge collects its input first and then writes everything out at the end, so you can output to a file that was used as an input.

(Parent updated while I was writing. Oh well)


That doesn't seem like an efficient way to do it.

Commands that process file in place (sed -i) write to a temporary file in the same filesystem and then rename to the target file, which works if you want to process files that don't fit into memory.


I apologise, I was speaking loosely. I don't know if sponge collects input in memory or in a file. (Frankly I've never needed to care since the files I've ever needed this tool for have all been small)


Without inspecting the implementation of `sponge` that's how I assumed it already internally worked.


Not minor; it's literally the point of this tool


It lets you edit in place, which is very handy. It also keeps the original file permissions, which "blah bar > foo ; mv foo bar" does not do. Convenient. Frankly, I think that should be in the posix standard.


I wrote a util a bit like this when I realised there was no command for emptying a directory: https://github.com/adamserafini/emptydir


While your code is very nicely factored, well-documented and quite readable, bash is not an easy language for this sort of utility. Any names containing spaces or starting with dashes require very careful handling.

If you do stick with bash, I would recommend giving shellcheck a try. It won't catch everything, but I've learned a lot from running it on my code.


What the difference in just deleting the whole directory and recreating it with `mkdir`. Or just do an `rm -rf dir/*`?


Hidden directories, links within dir... (usually, you want a "nofollow" default for destructive operations)


So this where errno comes from. I discovered it by accident on my system a few years ago and have been using it ever since. Really handy when working in C. Still no idea why I have that package installed though.

On another note

> pee: tee standard input to pipes

Goddammit guys!


In case you are like me and didn't get the sponge example, the difference from a shell redirect is it doesn't truncate the input so allows in-place modification of the input/output file.


Is it possible to make some of these moreutil commands aliases of some combination of coreutils commands?


Some of these I do not find so useful and stuff, although I do use ts, and sometimes sponge.


It's funny that `vidir` spawns `nano` by default on my system.


It probably honers your $EDITOR env variable which often defaults to `nano`. You can change it to anything you want including `vi[m]` :)


Check your $EDITOR environment variable.


That'd be the Debian alternatives system in action.

$ update-alternatives --list editor



ts seems to fit the Unix philosophy of do one thing and do one thing well quite nicely. It works pretty well for writing to logs.


> sponge

Why not use > file?

> mispipe

In bash there's PIPESTATUS for that.


Re sponge: the shell will open the output file for writing before invoking the command so in the example joey provides, /etc/passwd will be an empty file by the time sed opens it.


heh, yeah that's a mistake everyone makes once, and hopefully not more than once.


That makes sense but in the example provided, why can't we use sed with the in-line flag?


Because the -i flag works differently on GNU sed vs non-GNU sed and there is no way to use it portably.


I think this is a good question as moreutils doesn’t do a very clear job of explaining it.

Sponge lets you do a series of piped transformations on 1 or more files and overwrite the result back to the original files.

Using > things will just break.

I’m guessing this also means you can’t pipe something bigger than what your memory can hold.


> Why not use > file?

This, I learned the hard way, and I am not the only one. So in order for you not to make that mistake: the first thing that happens is that "file" is truncated, that happens before any command is run, so

  sed s/foo/bar/ file > file 
will always result in an empty "file", because it will be truncated before "sed" is run. I lost a couple of hours of work like that, once, and then I learned.

From the manpage of sponge

  sponge reads standard input and writes it out to the specified file. Unlike a shell redirect, sponge soaks up all its input before opening the output file. This allows constricting pipelines that read from and write to the same file.
So, the command

  sed s/foo/bar/ file | sponge file 
will do what you expect


Try

    sed s/foo/bar/ -i file 
to edit file.


That what I do now, one thing to note though is that the "-i" option is not always available. Usually, now, it is, but we still have a couple of Solaris servers at work that don't support it.

This, btw, is why I hate shell scripts. There are so many variants of bourne shells and UNIX tools that writing a portable script is a minefield, as if properly dealing with spaces wasn't tricky enough...


Right you are. The -i option is not standard (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/s...)


Great question.

There's a good Stack Overflow answer here: https://unix.stackexchange.com/questions/207919/sponge-from-...


> Why not use > file?

I'm assuming sponge buffers everything in memory so you don't get concurrent modification bugs.


Yes, if you try to sed a file back to itself for instance you'll get a truncated file (or maybe a no-op?). In fact, that's exactly the example sponge's manpage uses.


It's a truncated file:

  $ cat test
  moo
  $ cat < test > test
  $ cat test
  $


I was wondering about tee for that:

  $ cat test 
  moo
  $ cat < test | tee test > /dev/null
  $ cat test
  moo
...seems to work. Am I just getting lucky with a race condition?


> Am I just getting lucky with a race condition?

Probably this is buffering, either from dd or from the kernel.

  $ dd if=/dev/urandom bs=1M count=1 of=test.bin 2>/dev/null
  $ cp test.bin test2.bin
  $ cat < test2.bin | tee test2.bin >/dev/null
  $ diff test.bin test2.bin
  Binary files test.bin and test2.bin differ
  $ wc -c test{2,}.bin
   131072 test2.bin
  1048576 test.bin
Notice the file got truncated at exactly 128k; a nice round size for a write cache.


Yes, check this, with f.txt containing 99999 numbers:

    $ seq 1 99999 > f.txt
    $ cat < f.txt | tee f.txt > /dev/null
    $ wc -l f.txt
    23696 f.tx


I think using

    tac | tac
is a safe bet since must be reading all file before by definition.


Possible, since you don’t pipe into the file but instead ask tee to write into it.


That redirection will make the shell truncate the file long before sed has a chance to open it.


Can I use pee into a sponge?


With the annoying half-way implementation of GNU parallel. Does not support the basic features, is not better nor faster, but still tramples over this namespace.

I really like many of the moreutils tools, but dealing with constant parallel breakage (kind of a Hadoop for dummies) is annoying.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: