If you find yourself needing to search a large (xGB) codebase you should at leas...

aseipp · on July 4, 2023

Not really comparable, honestly. "Gigabytes" has nothing to do with it. Sourcegraph can e.g. index multiple repositories across multiple languages and link them together at large scale. You're doing an extremely easy case where 99% of the "meaty parts" are written in a single language (Nix) in a single repository (nixpkgs) with a very formulaic structure, where the answer you're looking for is also in that same repository. Finding like 90% of things isn't actually hard for that reason. I love Nix, I love that, but it is a fraction of the cases these tools handle.

The hard case is this simple extension of your example: I found the definition of X package in Nixpkgs. Now how do you find all the users of X, across, say, 10 other repositories? Or all of GitHub? That isn't theoretical; if you make a backwards-incompatible API change to a NixOS module, you might want to know that. So suddenly you need a lot more things in place to make this work. Now change X so that it's something like an RPC interface defined in protobufs, and then change your query to "What clients are using this interface and what servers define it", and keep in mind these can all be in different languages in different repositories. That is not so easy with Ripgrep, but tools like Kythe or SourceGraph can handle them with far, far greater ease.

Also, for many cases, you actually need language aware search and the search engine needs to understand more structure than just utf8 bytes to answer you. Ripgrep won't help you find the definition of that fucked up thing that was defined by a template instantiation that was hidden by a macro in C++ from a header that was generated at build time, that you are only looking up because it was barfed out from some huge stack trace that came from production. SourceGraph can answer that instantly with no false positive (assuming you have SCIP indexing as part of your build system.)

Yes, ripgrep is nice and I use it when writing nixpkgs patches all the time. But something like SourceGraph, Kythe, OpenGrok etc are all really a completely different class of tools.

And the "X gigabytes" fact isn't really that impressive when you realize all the weight is in the .git/ directory of Nixpkgs; ripgrep will instantly filter that out and never even search it, so it isn't actually searching a working set of that size. The actual pkgs directly is in contrast about 300MB. It still is crazy fast though, no doubt.

anotherhue · on July 4, 2023

No argument, rolling out change sets is also a huge win. My point is that many people do not know about the tools they currently have at their disposal.