Yep, I've also been loving mcfly, bat, and fd. I mentioned this further down in the post, but I compiled a list of all these modern unix tools a couple months ago: https://github.com/ibraheemdev/modern-unix
This compilation is very useful. I've been needing to review my CLI toolkit for a bit, and this looks like it'll make it easy for me to try out some alternatives. Thank you for putting the time and energy in to create it.
Just what I've been looking for! Thanks for making this. I'd love to see someone build a "new" userland with all the old gnu tools replaced where there's a good modern alternative.
I’ve mixed feelings about rg in exactly this respect.
On one hand, it’s awesome. It’s both excellent from a programmer’s standpoint and indispensable from a user’s standpoint. It’s just so neat it’s hard not to love it. A remote interviewer thought I just had a custom alias for grep -r and it was hard not to laugh and share the joy and relief that I didn’t.
On the other hand, it’s decidedly anti-Unix, arguably even more so than GNU utilities. The man page goes on for ages, and not because it’s written badly (it’s not), but because the tool does so much. I really wish you didn’t have to bundle all of that functionality into a monolithic binary to get this kind of performance.
For reference, walking directory trees was specifically cut out of the Plan 9 userland and left for du(1) because it was stupid how many implementations of it there were. Yes, Rust’s is in a library, but then you don’t have Unix as your OS, you have Rust instead. “Every language wants to be an OS”[1].
While I don't think anti-Unix is a totally accurate description, there are definitely aspects of ripgrep that aren't compositional. Some of it is performance, but some of it is just about output. Its default output format to a tty is fundamentally not compositional. So I end up building stuff into ripgrep that you might otherwise achieve via pipes. I do draw a line, because it otherwise gets ridiculous.
The plan9 comparison is kind of meh to be honest, because it never achieved widespread adoption. Popularity tends to beget complexity.
But otherwise this isn't so surprising to me. One of the hardest parts about making code fast is figuring out where to fall on the complexity vs perf trade off. It's pretty common that faster code is more complicated code. Directory traversal is a classic example. There are naive approaches that work in most cases, and then there are approaches that work well even when you throw a directory tree that contains hundreds of thousands of entries. And then of course, once you add gitignore filtering, you really want that coupled with tree traversal for perf reasons.
When you specialize, things get more complex. Then we build abstractions to simplify it and the cycle repeats. This to me seems more like a fundamental aspect of reality than even something specific to programming.
Whoops, that’ll teach me to philosophize in public.
(Next up, an attempt at self-justification, followed by even more philosophizing.)
I don’t think it’s a totally accurate description, either, for what it’s worth :) The use of the GNU tools (and not, I don’t know, Active Directory) as a reference point was in part to signal that the comparison is intentionally horribly biased. In any case it’s more of a provocative synopsis of my sentiments than it is a criticism.
I hope you take the grandparent comment as the idle musings that it is and not as any kind of disapproval. My mixed feelings are actually more about what ripgrep tells us about the limits of pipes-and-streams organization than they are about ripgrep itself.
The tty prettyprinting problem (and the related machine-readable-output problem) is (of course) not limited to ripgrep; I don’t even know how to satisfyingly do this for ls. What people consider ridiculous can differ: I’d certainly say that egrep fork/execing fgrep in order to filter its input through it is ridiculous, but apparently people used to do this without batting an eye. Having a separate program whose only purpose in life is to colorize grep output (outside of some sort of system-wide convention) would be silly, though, and that’s is a much stronger standard. So no scorn from me here, even if I’d have preferred the man were a couple screenfuls shorter.
I’m not sure how much stock one should put in adoption. (If AT&T’s antitrust settlement expired a couple of years later, would we be using today?) Popularity begets complexity in various ways: handling of edge cases, gradual expansion of scope, fossilization of poorly factored interfaces. Not all of those are worthy of equal respect, except they’re not discrete categories. But I’m not evangelizing, either; only saying that Plan 9 has an unquestionable (implementation) simplicity aesthetic, so it seems useful to keep it in view when answering the question “how complex does it need to be?” Even though its original blessed directory walking ... construct, du -a | awk '{print $2}', is a bit kooky.
The case of traversal with VCS exclusions is fascinating to me, by the way. It looks like it begs to be two programs connected by a unidirectional channel, until you introduce subtree exclusion, at which point it starts begging to be two programs connected by a bidirectional channel instead, and the Unix approach breaks down. I’m very interested in a clean solution to this, because it’s also the essential difference between typesetting with roff and TeX: roff (for the most part) tries to expand complex commands first, then stuff the resulting primitives into the typesetter; TeX basically does the same, except there’s a feedback loop where some primitives affect the expansion state (and METAFONT, which is a much more pleasant programming environment in the same vein, doubles down on that). It seems that some important things that TeX can do and roff can’t are inevitably tied to this distinction. And it’s an important distinction, because it’s the distinction between having your document production be a relatively loosely coupled pipeline of programs (that you can inspect at any point) and having it be a big honkin’ programming environment centered around and accepting only its own language (or plugin interface, but that’s hardly better). I would very much to have this lack of modularity eliminated, and the walk-with-VCS-exclusion issue appears to be a much less convoluted version of the same even if it’s not that valuable in isolation.
So why bother building a whole crate for this? Well, because, to make it fast, it takes low-thousands (to cover the variety of memr?chr{2,3} variants) lines of code to make use of platform specific SIMD instructions to do it.
This is a good example of something the Plan9 folks would probably never ever do. They'd write the obvious code and (probably) demand that you accept it as "good enough." (Or at least, this is my guess based on what I've seen Rob Pike say about a variety of things.)
I have a lot of sympathy for this view to be honest. I'd rather have simpler code. And I feel really strongly about that. But in my experience, people will always look for something faster. And things change. Back in the old days, the amount of crap you had to search wasn't that big. But now that multi-GB repos are totally normal, the techniques we used on smaller corpora start to become noticeably slow. So if you don't give the people fast code, then, well, someone else will.
Anyway, none of this is even necessarily a response to you specifically. I'd say they are also just kind of idle musings too. (And I mean that sincerely, not trying to throw your words back in your face!)
> “how complex does it need to be?”
Yeah, I think "need" is the operative word here. And this is kinda what I meant by popularity breeding complexity I think. How many Plan9 users were trying to search multi-GB source code repos or walk directory trees with hundreds of thousands of entries? When you get to those scales---and you accept that avoiding those scales is practically impossible---and Plan9's "simplicity above everything else" means dealing with lots of data is painfully slow, what do you do? I think you either jump ship, or the platform adapts. (To be clear, I'm not so certain of myself as to say that this is what Plan9 never achieved widespread adoption.)
> The case of traversal with VCS exclusions is fascinating to me, by the way. It looks like it begs to be two programs connected by a unidirectional channel, until you introduce subtree exclusion, at which point it starts begging to be two programs connected by a bidirectional channel instead, and the Unix approach breaks down.
Yeah, I think this (and many other things) are why I consider the Unix philosophy as merely a guideline or a means to an end. It's a nice guardrail, and where possible, hugging that guardrail will probably be a good heuristic that will rarely lead you astray. That's valuable. It's like Newton's laws of motion or believing that the world is flat. Both are fine models. You just gotta know not only when to abandon them, but that it's okay to do so!
But yes, this has been my struggle for the past several years in my open source work: trying to find that balance between performance and complexity. In some cases, you can get faster code without having to pay much complexity, but it's somewhat rare (albeit beautiful). The next best case is pushing the complexity down and outside of the interface (like memchr). But then you get into cases where performance bleeds right up and into APIs, coupling and so on.
> On the other hand, it’s decidedly anti-Unix, arguably even more so than GNU utilities.
When you have a design guideline – in this case, "Do one thing and do it well" – then you must never forget that it's just a guideline, a means to an end, a tool to achieve your goal. If your guideline conflicts with or contradicts your goal, then the guideline is wrong – or should be ignored, at least. Otherwise, the guideline becomes an ideology, and ideologies can be harmful.
I see this too often with various guidelines ("do one thing and do it well", "never use gotos", "linked lists are bad"), which are zealously repeated without understanding whether they make sense in that particular context. (I'm not saying this applies to you, this is a general observation)
It has a flag to add a “Git” column to the output, showing the status of each file in the listing (new/modified/etc.) it also has a `—tree` flag which does exactly what you think it does. This let me remove my old `tree` command and consolidate into one.
The main thing I get out of it is super amazing fuzzy-search of my terminal history. Normally with bash's built in CTRL-R reverse history search, you pretty much have to know EXACTLY what you're searching for, and if there are a lot of similar commands between the most recently run match for a search and the one you're ACTUALLY searching for, you may have a really hard time finding it since you have to go backwards through the commands one at a time. For myself and many people, we were getting really good at typing `history | grep '<regex>'`, maybe with more pipes to grep for further refinement.
But with fzf[1], that whole workflow of searching through your history is probably two orders of magnitude faster. Now you hit CTRL-R and you start typing any random part of the command you're trying to remember. If there was some other part of the command you remember, hit space and type that search term after the first search term. FZF will then show you the last 10-ish matches for all the search params you just typed, AND it will have done all this with no UI lag, no hitching, and lightning fast.
I don't know what other people use FZF for, as this is the SINGLE feature that's so good I can't live without it anymore.
Here's an example with rip grep. Say you search for something with a lot of results, you can feed the results with a lot of answers into it, and then add additional filters. Useful for when you're not "exactly" sure what your search term needs to be. I use it when ripgrepping a lot of code bases all at once....
I also have a folder of a bunch of curl commands that I can search and apply fuzzy finding on the results that helps me explore to find.
Contrived example, search for "t" in all these files and then pipe to fzf: rg t . | fzf
I use it multiple times every day to switch to already existing git branches, checkout remote branches with creation of tracking branch, or to checkout remote branches in detached HEAD mode. I made git aliases for those, and this is how it looks like in my .gitconfig:
1. built in to zsh so I can fuzzy find my command history
2. similarly command t let's me do.. for example
```bat <ctrl t> (and fuzzy find whatever file I want```
3. it's a vim plugin so it's a replacement the need for nerd tree, can be used as a file explorer, buffer finder, fuzzy find lines instead of / search, switch branches, etc
4. fzf-tab for zsh gives you fuzzy tab completion in your terminal too
I find repgrep defaults sensible when searching for text but fd misses files that I expect it to find -- this has happened a few times now and I have gone back to find.
For instance: 'fd .apk' to search for android builds.
Unlike find, fd follows the rg approach of respecting VCS ignores, which does have both advantages and disadvantages. But if you know you’re specifically searching for a binary, just pass -u (once to include VCS-ignored files, or even twice to also include hidden ones) and you’re golden. In your specific example you probably also want -F (fixed string) or -g (glob), because fd does regexes by default and I doubt you want to type '\.apk'.
As to find, its flexibility certainly comes handy on occasion, but...
Have you looked at the original man pages for 7th ed. (http://man.cat-v.org/unix_7th) or Plan 9 (http://man.cat-v.org/plan_9_3rd_ed)? They’re basically tldr with less examples, and exhaustive, except there’s so much less stuff to exhaust. I was both pleasantly surprised by how simple they are and unpleasantly surprised by how far we’ve fallen in terms of complexity (though my recent comment https://news.ycombinator.com/item?id=27498189 explains why that’s an oversimplification). The GNU man pages (where they even exist) or the POSIX ones are a usability abomination in comparison. Even BSD’s are still quite bad.
If I had to succinctly put my finger on it, I would say: it's "new wave" because it doesn't search everything by default. Of course, ripgrep wasn't the first to do that, but maybe is the first to do it while retaining competitiveness with GNU grep.
Of course, what's "new wave" to one person might be "mild evolutionary step" to another. (And perhaps even "evolutionary" implies too much, because not everyone sees smart filtering by default as a good thing.)
I would say 90% of how ripgrep works is based on long standing Unix tooling. "Smart" filtering by default is certainly not in the Unix tradition, but to say ripgrep has nothing to do with Unix is just wrong.
To be honest, I don't really know what you're after here. It is fine to say that the "new wave" is not rooted in Unix, but that doesn't mean its inaccurate to call ripgrep a Unix tool.
I am after there is nothing about new wave UNIX to talk about, unless we are now supposed to start talking about Rust and ripgrep adoption at Open Group.
No, that's what you think new wave UNIX is. This strikes me as just pedantic nonsense to me. You don't need to go around policing what "new wave" is supposed to mean to everyone.
Yup, ripgrep (and ack before it) is definitely a new style! Doesn't mean it has nothing to do with Unix, or that "new wave Unix" is itself inaccurate or nonsensical.
When you start quoting the dictionary to prove a point, maybe it's time to take a step back.
It has no concrete specific definition as far as I'm aware. It's just a colloquial phrase to distinguish it from something that has been around for a long time, while also drawing focus to its similarities.
I think you should read 'new wave unix' as 'new programs used by people that spend a lot of time on the CLI, replacing programs they used before that were most of the time very old and unchanged for quite some time, so not in line with "modern" expectations'. A good example is fd, which works like find but is more intuitive to me (and to many other people): you use fd PATTERN instead of find -name PATTERN or something.