Hacker News new | past | comments | ask | show | jobs | submit login
Ripgrep 13.0 (github.com/burntsushi)
547 points by mfrw on June 14, 2021 | hide | past | favorite | 150 comments



Shoutout to Burntsushi

He knows how to keep a good changelog https://github.com/BurntSushi/ripgrep/blob/master/CHANGELOG.... it’s good to read through and see the progress.

He also cares about ergonomics as well as performance which I like. I find it much more easier to use than existing Grep tools.

Interestingly, I believe Ripgrep started off as a test harness to benchmark the Rust Regex crate (he was the author of that). It was originally not intended to be a consumer cli. I guess he dogfooded it that well it became a successful tool


Thanks. :-)

That's right. IIRC, I wrote it as a simple harness because I wanted to check whether someone could feasibly write a grep tool using the regex crate and get similar performance. Once I got the harness up and running though, I realized it was not only competitive with GNU grep in a larger number of cases than I thought it would be (after some perf bug fixes, IIRC), but was substantially faster than ag. So that's when I decided to polish it up into a tool for others to use.


Do you use any tooling for the Changelog/issue tagging (PERF, BUG) etc?


Nope. I just try to keep it updated as I go. But in practice, I usually forget to update it and it just ends up becoming part of the release process.[1]

[1] - https://github.com/BurntSushi/ripgrep/blob/master/RELEASE-CH...


I recently switched to `rg` after many years of using `ag`, and the fact that it properly handles `.gitignore` syntax makes my life much better.


Dude it’s really good


Changelogs become good when one knows what they're doing.

If not, one cannot convincingly tell what the changes are actually for to other people and the changelogs get vague or omitted of details.


lots of other cool rust goodies in his github account as well.


xsv has been a real boon for wrangling large csvs.


Also consider installing ripgrep-all [1], an extension making it possible to search through PDFs, eBooks, ZIPs and other documents. The integration with fzf is also fantastic!

[1] https://github.com/phiresky/ripgrep-all


`cargo install ripgrep-all` fails to compile due to a missing cachedir dependency, and the author won't make a new release out of master due to a possible deadlocking issue. This fails to inspire confidence:

https://github.com/phiresky/ripgrep-all/issues/88


I have been using ripgrep-all since very beginning. Can't live without it!


Aaaaa, yes yes yes. I need that in my life.


:O


ripgrep is absolutely fantastic software. If you're not using it yet I guarantee it's worth taking the time to install and learn basic usage.

On macOS: "brew install ripgrep" - then use "rg searchterm" to search all nested files in the current directory that contain searchterm.

You may be using it without realizing already: the code search feature in VS Code uses ripgrep under the hood.


One thing I like about VS Code is the search feature.

The result is instantaneous and only found out recently it uses ripgrep from my co-worker.


Seconded: Ripgrep I install after being on any computer for 30 seconds. It's incredibly fast and featureful!


As mentioned elsewhere, using --hidden which now has a shorthand -. is a good default configuration to add via the .ripgreprc.

I wouldn't want to miss anything on my home directory or repository-specific dotfiles.


This would be great, except I rarely—bordering on never—want to search .git, and when I do that's the only thing I want to search.


Yes but I frequently would like to find files in .gitignore (e.g., where is this mess inside generated source code).

Edit: oh, this is different from --no-ignore.


I just have a separate alias for rg --no-ignore


actually most of the time you might want to use rg -i searchterm . I don't know why its not default? May be they want to make it similar to grep but with perf?


It's not the default because I personally find it pretty unintuitive, and while I don't have any numbers, I think most others do too. When I'm comparing ripgrep with other tools that enable smart case (like -i, but is disabled if there are any literal capital characters in your pattern) by default, I still get tripped by it because I forget about it.

On Unix, you can just do `alias rg="rg -S"`, and you get smart-case by default. :-) Or if you truly always want case insensitive by default, then `alias rg="rg -i"`.

"make ripgrep similar to grep" is certainly one goal, but there are other goals that compete with that and whether or not ripgrep matches grep's behavior with respect to case sensitivity is only one small piece of that.

It was proposed to enable --smart-case by default: https://github.com/BurntSushi/ripgrep/issues/178


Oh, I hope the current default stays!:)

I usually know when I want case insensitive match - and I frequently want case-sensitivity with all lowercase search terms. Eg: doing a search for methods or variables in ruby - where "model" is right, but "Model" is a class or module (and vice-versa - but then smart case would work).


Don't worry, it will. :-)

While I do allow for breaking changes between major versions, I don't really see myself ever doing any kind of major breaking change. Especially one that subtly changes match semantics in a really major way.


ripgrep also has a configuration file for options without the need for an alias. https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md#c...


I think that's a bit of a personal choice. I definitely prefer the current default of an exact match including case. Partially because I'm very familiar with the common `-i` flag to ignore case which is used in `grep`, `git grep`, `ack`, `ag` and probably others. But I've never used a flag to make a search case-sensitive. Plus it's easy enough to alias `rg` to `rg -i` if you prefer that as the default.


The other flag to know about is --no-ignore, or alternatively -u/--unrestricted.

By default, ripgrep does not search a bunch of files, in particular anything gitignored and anything with a leading dot (which the docs call "hidden"):

https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md#a...

These flags override that.


-i is ignore-case, right?

Usually I'm searching code for things like variable names, so actually case-sensitive searches are a better default for me.


I almost never use -i. I may do rg "[Dd]efault" if I want to search for "default" or "Default", but not "DEFAULT". Usually I just want to search in a very specific case since I'm looking for variables.


--smart-case is worth looking into as well. you can also put whatever arguments you want in your ripgreprc file to tweak the "defaults" to your liking.


My favorite 'new wave Unix' utility by far. So ergonomic, yet an easy switch from egrep.


Yep, I've also been loving mcfly, bat, and fd. I mentioned this further down in the post, but I compiled a list of all these modern unix tools a couple months ago: https://github.com/ibraheemdev/modern-unix


This compilation is very useful. I've been needing to review my CLI toolkit for a bit, and this looks like it'll make it easy for me to try out some alternatives. Thank you for putting the time and energy in to create it.


I bumped into three blog posts a while ago that listed similar modern alternatives – lots of overlap:

https://darrenburns.net/posts/tools/

https://darrenburns.net/posts/more-tools

https://darrenburns.net/posts/even-more-tools


Awesome. Would be nice if the utils were marked as zero dependency and/or cross platform.



Just what I've been looking for! Thanks for making this. I'd love to see someone build a "new" userland with all the old gnu tools replaced where there's a good modern alternative.


Awesome list. Double awesome that almost all tools are cross-platform.


love it. very easy to understand what the tool can do :)


I’ve mixed feelings about rg in exactly this respect.

On one hand, it’s awesome. It’s both excellent from a programmer’s standpoint and indispensable from a user’s standpoint. It’s just so neat it’s hard not to love it. A remote interviewer thought I just had a custom alias for grep -r and it was hard not to laugh and share the joy and relief that I didn’t.

On the other hand, it’s decidedly anti-Unix, arguably even more so than GNU utilities. The man page goes on for ages, and not because it’s written badly (it’s not), but because the tool does so much. I really wish you didn’t have to bundle all of that functionality into a monolithic binary to get this kind of performance.

For reference, walking directory trees was specifically cut out of the Plan 9 userland and left for du(1) because it was stupid how many implementations of it there were. Yes, Rust’s is in a library, but then you don’t have Unix as your OS, you have Rust instead. “Every language wants to be an OS”[1].

[1]: https://wiki.c2.com/?LanguageIsAnOs


While I don't think anti-Unix is a totally accurate description, there are definitely aspects of ripgrep that aren't compositional. Some of it is performance, but some of it is just about output. Its default output format to a tty is fundamentally not compositional. So I end up building stuff into ripgrep that you might otherwise achieve via pipes. I do draw a line, because it otherwise gets ridiculous.

The plan9 comparison is kind of meh to be honest, because it never achieved widespread adoption. Popularity tends to beget complexity.

But otherwise this isn't so surprising to me. One of the hardest parts about making code fast is figuring out where to fall on the complexity vs perf trade off. It's pretty common that faster code is more complicated code. Directory traversal is a classic example. There are naive approaches that work in most cases, and then there are approaches that work well even when you throw a directory tree that contains hundreds of thousands of entries. And then of course, once you add gitignore filtering, you really want that coupled with tree traversal for perf reasons.

When you specialize, things get more complex. Then we build abstractions to simplify it and the cycle repeats. This to me seems more like a fundamental aspect of reality than even something specific to programming.


Whoops, that’ll teach me to philosophize in public.

(Next up, an attempt at self-justification, followed by even more philosophizing.)

I don’t think it’s a totally accurate description, either, for what it’s worth :) The use of the GNU tools (and not, I don’t know, Active Directory) as a reference point was in part to signal that the comparison is intentionally horribly biased. In any case it’s more of a provocative synopsis of my sentiments than it is a criticism.

I hope you take the grandparent comment as the idle musings that it is and not as any kind of disapproval. My mixed feelings are actually more about what ripgrep tells us about the limits of pipes-and-streams organization than they are about ripgrep itself.

The tty prettyprinting problem (and the related machine-readable-output problem) is (of course) not limited to ripgrep; I don’t even know how to satisfyingly do this for ls. What people consider ridiculous can differ: I’d certainly say that egrep fork/execing fgrep in order to filter its input through it is ridiculous, but apparently people used to do this without batting an eye. Having a separate program whose only purpose in life is to colorize grep output (outside of some sort of system-wide convention) would be silly, though, and that’s is a much stronger standard. So no scorn from me here, even if I’d have preferred the man were a couple screenfuls shorter.

I’m not sure how much stock one should put in adoption. (If AT&T’s antitrust settlement expired a couple of years later, would we be using today?) Popularity begets complexity in various ways: handling of edge cases, gradual expansion of scope, fossilization of poorly factored interfaces. Not all of those are worthy of equal respect, except they’re not discrete categories. But I’m not evangelizing, either; only saying that Plan 9 has an unquestionable (implementation) simplicity aesthetic, so it seems useful to keep it in view when answering the question “how complex does it need to be?” Even though its original blessed directory walking ... construct, du -a | awk '{print $2}', is a bit kooky.

The case of traversal with VCS exclusions is fascinating to me, by the way. It looks like it begs to be two programs connected by a unidirectional channel, until you introduce subtree exclusion, at which point it starts begging to be two programs connected by a bidirectional channel instead, and the Unix approach breaks down. I’m very interested in a clean solution to this, because it’s also the essential difference between typesetting with roff and TeX: roff (for the most part) tries to expand complex commands first, then stuff the resulting primitives into the typesetter; TeX basically does the same, except there’s a feedback loop where some primitives affect the expansion state (and METAFONT, which is a much more pleasant programming environment in the same vein, doubles down on that). It seems that some important things that TeX can do and roff can’t are inevitably tied to this distinction. And it’s an important distinction, because it’s the distinction between having your document production be a relatively loosely coupled pipeline of programs (that you can inspect at any point) and having it be a big honkin’ programming environment centered around and accepting only its own language (or plugin interface, but that’s hardly better). I would very much to have this lack of modularity eliminated, and the walk-with-VCS-exclusion issue appears to be a much less convoluted version of the same even if it’s not that valuable in isolation.


> I hope you take the grandparent comment as the idle musings that it is and not as any kind of disapproval.

Yup, all is well. :-)

My favorite example of trading performance for simplicity is `memchr`. I have a small little write-up on it here: https://docs.rs/memchr/2.4.0/memchr/#why-use-this-crate

The essence of it is that if you want to find a byte in a slice in Rust, then that's really easy:

    fn memchr(needle: u8, haystack: &[u8]) -> Option<usize> {
        haystack.iter().position(|&b| b == needle)
    }
So why bother building a whole crate for this? Well, because, to make it fast, it takes low-thousands (to cover the variety of memr?chr{2,3} variants) lines of code to make use of platform specific SIMD instructions to do it.

This is a good example of something the Plan9 folks would probably never ever do. They'd write the obvious code and (probably) demand that you accept it as "good enough." (Or at least, this is my guess based on what I've seen Rob Pike say about a variety of things.)

I have a lot of sympathy for this view to be honest. I'd rather have simpler code. And I feel really strongly about that. But in my experience, people will always look for something faster. And things change. Back in the old days, the amount of crap you had to search wasn't that big. But now that multi-GB repos are totally normal, the techniques we used on smaller corpora start to become noticeably slow. So if you don't give the people fast code, then, well, someone else will.

Anyway, none of this is even necessarily a response to you specifically. I'd say they are also just kind of idle musings too. (And I mean that sincerely, not trying to throw your words back in your face!)

> “how complex does it need to be?”

Yeah, I think "need" is the operative word here. And this is kinda what I meant by popularity breeding complexity I think. How many Plan9 users were trying to search multi-GB source code repos or walk directory trees with hundreds of thousands of entries? When you get to those scales---and you accept that avoiding those scales is practically impossible---and Plan9's "simplicity above everything else" means dealing with lots of data is painfully slow, what do you do? I think you either jump ship, or the platform adapts. (To be clear, I'm not so certain of myself as to say that this is what Plan9 never achieved widespread adoption.)

> The case of traversal with VCS exclusions is fascinating to me, by the way. It looks like it begs to be two programs connected by a unidirectional channel, until you introduce subtree exclusion, at which point it starts begging to be two programs connected by a bidirectional channel instead, and the Unix approach breaks down.

Yeah, I think this (and many other things) are why I consider the Unix philosophy as merely a guideline or a means to an end. It's a nice guardrail, and where possible, hugging that guardrail will probably be a good heuristic that will rarely lead you astray. That's valuable. It's like Newton's laws of motion or believing that the world is flat. Both are fine models. You just gotta know not only when to abandon them, but that it's okay to do so!

But yes, this has been my struggle for the past several years in my open source work: trying to find that balance between performance and complexity. In some cases, you can get faster code without having to pay much complexity, but it's somewhat rare (albeit beautiful). The next best case is pushing the complexity down and outside of the interface (like memchr). But then you get into cases where performance bleeds right up and into APIs, coupling and so on.


> On the other hand, it’s decidedly anti-Unix, arguably even more so than GNU utilities.

When you have a design guideline – in this case, "Do one thing and do it well" – then you must never forget that it's just a guideline, a means to an end, a tool to achieve your goal. If your guideline conflicts with or contradicts your goal, then the guideline is wrong – or should be ignored, at least. Otherwise, the guideline becomes an ideology, and ideologies can be harmful.

I see this too often with various guidelines ("do one thing and do it well", "never use gotos", "linked lists are bad"), which are zealously repeated without understanding whether they make sense in that particular context. (I'm not saying this applies to you, this is a general observation)


bat [0] (a cat replacement) and fzf [1] are the other two I would miss dearly, standing above some other "new" tools I use very regularly.

  [0]: https://github.com/sharkdp/bat
  [1]: https://github.com/junegunn/fzf


See also delta for diffing (and as a drop-in replacement for git diff) and exa for ls.

https://github.com/dandavison/delta

https://github.com/ogham/exa


Yes, I have it enabled as git's diff. I have tried exa but didn't see much difference. Anything remarkable I may be missing from exa?


It has a flag to add a “Git” column to the output, showing the status of each file in the listing (new/modified/etc.) it also has a `—tree` flag which does exactly what you think it does. This let me remove my old `tree` command and consolidate into one.


fzf is another that is just absurdly powerful. fzf comes in handy in so many useful little scripts for me.


What do you use fzf for? I can't figure out a use case for it but everyone seems to love it, what am I missing on?


The main thing I get out of it is super amazing fuzzy-search of my terminal history. Normally with bash's built in CTRL-R reverse history search, you pretty much have to know EXACTLY what you're searching for, and if there are a lot of similar commands between the most recently run match for a search and the one you're ACTUALLY searching for, you may have a really hard time finding it since you have to go backwards through the commands one at a time. For myself and many people, we were getting really good at typing `history | grep '<regex>'`, maybe with more pipes to grep for further refinement.

But with fzf[1], that whole workflow of searching through your history is probably two orders of magnitude faster. Now you hit CTRL-R and you start typing any random part of the command you're trying to remember. If there was some other part of the command you remember, hit space and type that search term after the first search term. FZF will then show you the last 10-ish matches for all the search params you just typed, AND it will have done all this with no UI lag, no hitching, and lightning fast.

I don't know what other people use FZF for, as this is the SINGLE feature that's so good I can't live without it anymore.

[1] - https://github.com/junegunn/fzf#key-bindings-for-command-lin...


Here's an example with rip grep. Say you search for something with a lot of results, you can feed the results with a lot of answers into it, and then add additional filters. Useful for when you're not "exactly" sure what your search term needs to be. I use it when ripgrepping a lot of code bases all at once....

I also have a folder of a bunch of curl commands that I can search and apply fuzzy finding on the results that helps me explore to find.

Contrived example, search for "t" in all these files and then pipe to fzf: rg t . | fzf


I use it multiple times every day to switch to already existing git branches, checkout remote branches with creation of tracking branch, or to checkout remote branches in detached HEAD mode. I made git aliases for those, and this is how it looks like in my .gitconfig:

  [alias]
    cof = !git for-each-ref --format='%(refname:short)' refs/heads | sort | uniq | fzf | xargs git checkout
    cor = !git branch --list --remotes | sed 's@origin/@@' | sort | uniq | fzf | xargs git checkout
    cord = !git branch --list --remotes | sort | uniq | fzf | xargs git checkout


1. built in to zsh so I can fuzzy find my command history 2. similarly command t let's me do.. for example ```bat <ctrl t> (and fuzzy find whatever file I want``` 3. it's a vim plugin so it's a replacement the need for nerd tree, can be used as a file explorer, buffer finder, fuzzy find lines instead of / search, switch branches, etc 4. fzf-tab for zsh gives you fuzzy tab completion in your terminal too


I use it for switching git branches. I have this in ~/bin/git-bsel:

    git for-each-ref refs/heads/ --format='%(refname:short)' --sort='-authordate' | fzf +s --query "$*" | xargs git checkout


I actually prefer fzy to fzf because it doesn't go fullscreen, but I think the simplest convincing example is:

    alias f = fd | fzy


Mine are fd [0] to replace `find` and tldr [1] to replace `man`.

[0]: https://github.com/sharkdp/fd

[1]: https://tldr.sh/


I find repgrep defaults sensible when searching for text but fd misses files that I expect it to find -- this has happened a few times now and I have gone back to find.

For instance: 'fd .apk' to search for android builds.


Unlike find, fd follows the rg approach of respecting VCS ignores, which does have both advantages and disadvantages. But if you know you’re specifically searching for a binary, just pass -u (once to include VCS-ignored files, or even twice to also include hidden ones) and you’re golden. In your specific example you probably also want -F (fixed string) or -g (glob), because fd does regexes by default and I doubt you want to type '\.apk'.

As to find, its flexibility certainly comes handy on occasion, but...

  find: paths must precede expression
Aaaargh.


I like cheat.sh as the alternative for tldr which doesn't need to be installed as all, because it can be used with curl:

  $ curl cheat.sh/tar


fd is fantastic as well, and I can get away with tldr 80% of the time before diving into man.


Have you looked at the original man pages for 7th ed. (http://man.cat-v.org/unix_7th) or Plan 9 (http://man.cat-v.org/plan_9_3rd_ed)? They’re basically tldr with less examples, and exhaustive, except there’s so much less stuff to exhaust. I was both pleasantly surprised by how simple they are and unpleasantly surprised by how far we’ve fallen in terms of complexity (though my recent comment https://news.ycombinator.com/item?id=27498189 explains why that’s an oversimplification). The GNU man pages (where they even exist) or the POSIX ones are a usability abomination in comparison. Even BSD’s are still quite bad.


What is so new wave, when it is mostly about GNU/Linux, POSIX has hardly changed and it is basically CLI like for 50 years?


If I had to succinctly put my finger on it, I would say: it's "new wave" because it doesn't search everything by default. Of course, ripgrep wasn't the first to do that, but maybe is the first to do it while retaining competitiveness with GNU grep.

Of course, what's "new wave" to one person might be "mild evolutionary step" to another. (And perhaps even "evolutionary" implies too much, because not everyone sees smart filtering by default as a good thing.)


Yeah, but that has nothing to do with UNIX.

Kudos for your ripgrep achievements, though.


I would say 90% of how ripgrep works is based on long standing Unix tooling. "Smart" filtering by default is certainly not in the Unix tradition, but to say ripgrep has nothing to do with Unix is just wrong.

To be honest, I don't really know what you're after here. It is fine to say that the "new wave" is not rooted in Unix, but that doesn't mean its inaccurate to call ripgrep a Unix tool.


I am after there is nothing about new wave UNIX to talk about, unless we are now supposed to start talking about Rust and ripgrep adoption at Open Group.

Now that would be new wave UNIX


No, that's what you think new wave UNIX is. This strikes me as just pedantic nonsense to me. You don't need to go around policing what "new wave" is supposed to mean to everyone.


"a group of people who together introduce new styles and ideas in art, music, cinema, etc."

https://www.oxfordlearnersdictionaries.com/definition/englis...


Yup, ripgrep (and ack before it) is definitely a new style! Doesn't mean it has nothing to do with Unix, or that "new wave Unix" is itself inaccurate or nonsensical.

When you start quoting the dictionary to prove a point, maybe it's time to take a step back.


I am really curious what new wave UNIX is supposed to be.


It has no concrete specific definition as far as I'm aware. It's just a colloquial phrase to distinguish it from something that has been around for a long time, while also drawing focus to its similarities.


I think you should read 'new wave unix' as 'new programs used by people that spend a lot of time on the CLI, replacing programs they used before that were most of the time very old and unchanged for quite some time, so not in line with "modern" expectations'. A good example is fd, which works like find but is more intuitive to me (and to many other people): you use fd PATTERN instead of find -name PATTERN or something.


THANK YOU FOR PROVIDING A “If you have not heard of _____ before, this is what it does” LINE RIGHT IN THE RELEASE NOTES.

Seriously: the amount of time I’ve spent trying to figure out what something does after clicking an interesting HN headline just boggles me.

I feel like I want to start using ripgrep just because of that line.


You're welcome. :) Someone suggested it long ago, and adding that blurb is specifically part of my release process: https://github.com/BurntSushi/ripgrep/blob/master/RELEASE-CH...


Highlights: A new short flag, -., has been added. It is an alias for the --hidden flag, which instructs ripgrep to search hidden files and directories.

BREAKING CHANGES: Binary detection output has changed slightly.


Lack of this alias was one of the things that put me off regarding ripgrep. I'm using the silver searcher instead as it is plenty fast and it's CLI makes sense for me and doesn't change, AFAIR it's even mostly the same as ack, which I used before.


loving '-.'. I use --hidden a lot.

also noteable, an underlying lib got some vectorization speed improvements.


I didn’t know this was the same author as xsv, a tool I find indispensable now.

Going to install ripgrep ASAP


As a ripgrep user, TIL about xsv. Thanks for mentioning it, definitely one for the toolbelt.

Can't believe I wasn't following BurntSushi on GitHub, what a track record.


Yeah. It almost seems unfair but I'll just use both and be glad they are available.


Haven't tried ripgrep before but finally installed. It is noticeably faster than older tools and more importantly the CLI ergonomics is much better! Thanks!


on a tangential note: there is a bunch of tools re-written in rust, a handy non exhaustive list [1].

[1]: https://zaiste.net/posts/shell-commands-rust/


I love Rust, but I love and recommend ripgrep because of its usefulness, reliability and robustness, not because of the language it's written in.

I haven't looked into all the tools in that list, but I would not for example recommend exa over ls as it is simply not reliable enough: if a filename ends in a space, you won't see that in exa, and that bug has been reported for years. To me that is a clear blocker, and if it is still there, I simply cannot trust the file listing from exa, no matter how pretty it may look.


Static linking goes a long way to making Rust worth considering when choosing tools.

Anecdote: I once had to recover a system with a corrupted libpcre.so. This will break almost every standard gnutil. The easiest way to do it without a recovery OS was to use a few alternatives written in Rust, which don't have this problem because they statically link their dependencies (and cargo still worked, so it was easy to install them).


Your anecdote is amusing, but I hope you’re not using it as an example to support your claim that Rust’s static linking should be considered when choosing tools.


Here's a hopefully more exhaustive list that isn't restricted to Rust: https://github.com/ibraheemdev/modern-unix, meaning it can include other awesome tools like fzf (written in Go).


If you like fzf, give skim [1] a go. It's basically faster (anecdotally) fzf, but can also be used as a Rust library. I made a little personal to-do manager using it, together with SQLite for persistence. I also improved my fish shell ergonomics with [2].

[1] https://github.com/lotabout/skim [2] https://github.com/zenofile/fish-skim


Thank you for this compilation. I think the demos are a cherry on top; it becomes an easy sell :) Thank you :)


Wasn't ripgrep one of the first inspirations for others to initiate "rewrite existing tools with rust with better usability and performance" wave?

Either way, thanks for a quality tool, more, if it inspired others to come up with good rust tools.


Maybe in Rust. xsv does predate ripgrep though. Maybe ripgrep was the first one to get really popular though. To be honest, I dunno.

With that said, Go tools were doing this before Rust even hit 1.0 as far as I remember. There was 'sift' and 'pt' for "smarter" greps for example, although I don't think either of them ever fully matured.


Little trick if you ripgrep a lot of web projects:

alias rg="rg --max-columns 200 --max-columns-preview"

So now if you hit a minified css or js file, it will truncate any match longer than 200 characters instead of flooding your screen with a million chars line.


This is helpful, but it doesn't (yet) show you the actual matching part of the line, just the first N chars.


The best feature about ripgrep (apart from its great performance): Sane defaults out of the box!


This can be an underrated feature of software, although I feel like we're getting better at it. I remember being so frustrated with guides to using software back in the early 00s where part of the guide would be reconfiguring default settings. It always seemed to be a sign of sub-par quality software.


I unconsciously reach for ripgrep on a daily basis. Finding references just is incredibly easy without cluttering it with stuff listed in .gitignore (I'm looking at you node_modules..)

This has reminded and motivated me to submit a donation to the project.


For those also looking I found it in the FAQ[0]:

------------------------------------------

How can I donate to ripgrep or its maintainers?

As of now, you can't. While I believe the various efforts that are being undertaken to help fund FOSS are extremely important, they aren't a good fit for me. ripgrep is and I hope will remain a project of love that I develop in my free time. As such, involving money---even in the form of donations given without expectations---would severely change that dynamic for me personally.

Instead, I'd recommend donating to something else that is doing work that you find meaningful. If you would like suggestions, then my favorites are:

    The Internet Archive
    Rails Girls
    Wikipedia


* [0] https://github.com/BurntSushi/ripgrep/blob/64ac2ebe0f2fe1c89...


I have a chapter on ripgrep (includes lot of examples for Rust regex as well) here: https://learnbyexample.github.io/learn_gnugrep_ripgrep/ripgr...

The -r option is handy as well for some search and replace problems (faster than GNU sed, literal search with -F, can use PCRE2 when needed, etc): https://learnbyexample.github.io/substitution-with-ripgrep/

---

I'm currently checking out another tool written in Rust - frawk (https://github.com/ezrosent/frawk) as an awk alternative


ripgrep always stroke me like the "industrial strength faberge egg" of Knuth/McIlroy fame [1]

Yes it's an impressive achievement to make it that fast, but you could get the same or better performance by just using an index to search. I've been using GNU id-utils [2] for a long time, that is using an index, and it gets comparable performance for a fraction of the source code and brain power, and likely energy use too.

[1] http://blobthescientist.blogspot.com/2017/10/knuth.html

[2] https://www.gnu.org/software/idutils/


The comparison doesn't really make much sense IMO. Searching with and without indexing is targeting two very different use cases. I wanted to try idutils and see if it suffered from pitfalls (like if the indexing scanner didn't grab a token) or how it handled regexes. But I couldn't get it compiled using these commands[1]. It failed with:

    gcc -DHAVE_CONFIG_H -I.   -D_FORTIFY_SOURCE=2  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -MT fseterr.o -MD -MP -MF .deps/fseterr.Tpo -c -o fseterr.o fseterr.c
    fseterr.c: In function 'fseterr':
    fseterr.c:74:3: error: #error "Please port gnulib fseterr.c to your platform! Look at the definitions of ferror and clearerr on your system, then report this to bug-gnulib."
       74 |  #error "Please port gnulib fseterr.c to your platform! Look at the definitions of ferror and clearerr on your system, then report this to bug-gnulib."
          |   ^~~~~
    make[3]: *** [Makefile:1590: fseterr.o] Error 1
So I can't even easily get idutils to try it, but suspicion is that it misses some key use cases. And from what I can tell, its index doesn't auto-update, so you now have to use your brain power to remember to update the index. (Or figure out how to automate it.)

ripgrep is commonly used in source code repos, where you might be editing code or checking out different branches all the time. An indexing tool in that scenario without excellent automatic updates is a non-starter.

[1] - https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=iduti...


I've been using ugrep these days (https://github.com/Genivia/ugrep) - and am pretty happy with it.


I like this page for a comparison of ripgrep and similar tools:

https://beyondgrep.com/feature-comparison/


For SSD, if the read queue depth is big, the performance improves significantly compared to sequential reads. Does ripgrep has an option to read many files in parallel?


ripgrep uses parallelism by default.

Parallelism applies both to directory traversal, gitignore filtering and search. But that's where its granularity ends. It does not use parallelism at the level of single-file search.


I was actually searching for a superior grep and found ripgrep and ugrep. Both are nice but ugrep wins for me for the versatility and stronger fuzzy matching.


How does ripgrep compare to Hyperscan? I always wanted to wrap that library in a grep utility and never got around to it


Ripgrep includes one of the most prominent algorithms from Hyperscan internally for some expressions.

Longer story: Ripgrep uses Rust's regex library, which uses the Aho-Corasick library. That does not just provide the algorithm it is named after, but also "packed" ones using SIMD, including a Rust rewrite of the [Teddy algorithm][1] from the Hyperscan project.

[1]: https://github.com/BurntSushi/aho-corasick/tree/4499d7fdb41c...


This post by the author is a great introduction to the techniques used in ripgrep https://blog.burntsushi.net/ripgrep/


That's awesome!


Ripgrep uses one of Hyperscan's string search algorithms. In general, Hyperscan isn't wonderfully suited for a "grep use case", as it's not focused on quick compile times. It was built for cases where the regex sets (and it's usually sets, sometimes large sets) are known in advance and where it's worth doing "heroics" to optimize scanning of these sets on a precompiled bytecode.

It's not a direct comparison with Rust's regex crate or with ripgrep, but this article shows the comparison with re2. Notice that on average it takes 140K worth of scanning to "catch up" with RE2::Set with 10 patterns - the situation would be even more marked with 1 pattern.

https://www.hyperscan.io/2017/06/20/regex-set-scanning-hyper...

Personally, I'm dissatisfied with the approach to regex scanning in Hyperscan (too heavyweight at construction and too complex) but not much more pleased by the Rust regex crate or RE2 (frankly, the whole compile-a-giant-DFA-as-you-go isn't that great either). I feel on the verge of taking another crack at the problem. Lord knows the world needs another regex library...


I personally still really like the lazy DFA approach. Especially for the single-pattern use case. In particular, it handles the case of large Unicode character classes quite well by avoiding the construction of huge DFAs unless the input actually calls for it.

With that said, I have longed for simple ways of composing regexes better. I've definitely fallen short of that, and I think it's causing me to miss a whole host of optimizations. I hope to devote some head space to that in the next year or so.


I'm OK with laziness, but lazy != "lazy DFA". I can't help but think people keep building RE2-style DFA constructions because they don't know how to run NFAs efficiently. The idea that your automata has to keep allocating memory is pretty weird, especially in MT land.

What I'm thinking about lately is sticking a lot closer to the original regular expression parse tree when implementing things. Yes, that leaves performance on the table relative to Hyperscan, but I suspect the compile time could be extremely good. Also, it would be better suited to stuff like capturing and back-references.

Like I said, I suspect I'll be building "Ultimate Engine the Third" sometime in the not-too-distant future (Hyperscan's internal name was "Ultimate Engine the Second", an Iain M. Banks reference).


Yeah, we've had this conversation before. :-) I look forward to seeing what you come up with!


ripgrep's internals are generic over the regex engine, so not only is it possible to plug Hyperscan into ripgrep, but someone has already done it: https://sr.ht/~pierrenn/ripgrep/

(I should really add a link to that in the README.)


I don't see it listed in the notes, but using the --sort-files flag no longer seems to have a performance penalty.


It still disables parallelism. No changes there. If you have anneasily reproducible benchmark on a public corpus, I could try to provide an analysis. (Note: if the corpus is small enough, it might not be possible to witness a perf difference. Or it may even be possible that using parallelism is slower due to the overhead of spinning up threads.)


Interesting!

Perhaps the difference is then explained by my choice of search term. The term I tried after upgrading must happen to appear early in the sorted corpus.

I just now tried it with a very rare term, and it does indeed take longer overall to complete the search.


Any idea how this works compared to eg GNU sort --parallel? (or clever tricks with partial sort and merge)?

I'm guessing rg can be faster in general - because of less memory allocation/copying by sorting before outputting?


I'm not familiar with GNU sort's --parallel flag.

That --sort-files disables parallelism is not really a theoretical limitation.

See: https://github.com/BurntSushi/ripgrep/issues/152


Just to jump on the burntsushi bandwagon, his nfldb project taught me postgres. I'll always be thankful for that.


I use `git grep` all the time, and I'm always amazed how crazy fast it is on huge repos. And, of course, that also uses gitignore, has many parameters (including path filters after --).

I guess ripgrep might be worth trying. Anyone used both of these who can compare? I'd like to know what I'm missing out on.


I find the output of rg easier to read vs git grep. However the latter has some useful options ("--and", "--or") that rg lacks:

https://github.com/BurntSushi/ripgrep/issues/875#issuecommen...


rip grep works outside of a git repo


Can someone help me? Why no use "grep -r"? Is it faster or better suited for recursive?


Check out this post for an in-depth analysis: https://blog.burntsushi.net/ripgrep/



ripgrep is just one in the line of "more ergonomic greps", for example ack and ag were popular before ripgrep


Thanks so much for ripgrep!! It has become my goto for searching large code bases!!


Out of the many traditional CLI commands that have modern counterparts (in Rust) nowadays, this is the only one that really stuck. I need it, it's THAT much better than grep. Thansk Burntsushi.


One time, I almost accidentally deleted my files because I typoed rm instead of rg. Was saved by the -i flag... Since then, I've been using "alias gr=rg"


Kinda feel like rm should have been a longer command to type, like delete or unlink.


On a Mac, apparently it's safer to let the Finder handle it.

    # Move deleted files to macOS user trash (safer)
    trash () {
      command mv "$@" ~/.Trash ;
    }


I don’t see how this would have helped in this case.


alias rm='rm -i'


There are a number of Emacs packages which provide ripgrep integration in emacs. It makes for a nice combination of tools.


How does it compare to the_silver_searcher?


You might be interested in this blog post: https://blog.burntsushi.net/ripgrep/


There's a comparison to similar tools in the readme

https://github.com/BurntSushi/ripgrep#quick-examples-compari...


similar ergonomics, better performance


I'd like to know this, too


If you're on macOS, you can also install ripgrep via MacPorts: "sudo port install ripgrep"


ripgrep is one of the most powerful tools in my toolkit. It is amazing being able to scan through huge codebases for strings and patterns in what feels like an instant. It eliminates the need for code intelligence or a full fledged IDE in my experience.


I use fd and ripgrep 90% of the time before I need return to find/grep these days.


I love ripgrep. It’s insanely fast. And very useful to my workflow.


Fun trivia: vscode uses ripgrep for its search feature


When I first started using VS code, I was googling around for a ripgrep extension for search. It took me a bit to find that oh, that's already the default. Shocking getting used to not having 40 years of legacy coming from emacs :)


It's Emacs, so configuring it to use ripgrep instead of native is grep is possible in a few lines of Emacs lisp in your configuration file. See https://stegosaurusdormant.com/emacs-ripgrep/


I had done that. The point is with a modern editor, I don't have to.

Emacs is unusable in a modern dev environment with its defaults. You have to write lisp.


I'm using ivy/avy/counsel in Emacs and there's counsel-rg, which I use all the time. It's fantastic. It's not stock, but it's really easy to configure (and it calls ripgrep under the hood, so you need to install ripgrep).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: