So why bother building a whole crate for this? Well, because, to make it fast, it takes low-thousands (to cover the variety of memr?chr{2,3} variants) lines of code to make use of platform specific SIMD instructions to do it.
This is a good example of something the Plan9 folks would probably never ever do. They'd write the obvious code and (probably) demand that you accept it as "good enough." (Or at least, this is my guess based on what I've seen Rob Pike say about a variety of things.)
I have a lot of sympathy for this view to be honest. I'd rather have simpler code. And I feel really strongly about that. But in my experience, people will always look for something faster. And things change. Back in the old days, the amount of crap you had to search wasn't that big. But now that multi-GB repos are totally normal, the techniques we used on smaller corpora start to become noticeably slow. So if you don't give the people fast code, then, well, someone else will.
Anyway, none of this is even necessarily a response to you specifically. I'd say they are also just kind of idle musings too. (And I mean that sincerely, not trying to throw your words back in your face!)
> “how complex does it need to be?”
Yeah, I think "need" is the operative word here. And this is kinda what I meant by popularity breeding complexity I think. How many Plan9 users were trying to search multi-GB source code repos or walk directory trees with hundreds of thousands of entries? When you get to those scales---and you accept that avoiding those scales is practically impossible---and Plan9's "simplicity above everything else" means dealing with lots of data is painfully slow, what do you do? I think you either jump ship, or the platform adapts. (To be clear, I'm not so certain of myself as to say that this is what Plan9 never achieved widespread adoption.)
> The case of traversal with VCS exclusions is fascinating to me, by the way. It looks like it begs to be two programs connected by a unidirectional channel, until you introduce subtree exclusion, at which point it starts begging to be two programs connected by a bidirectional channel instead, and the Unix approach breaks down.
Yeah, I think this (and many other things) are why I consider the Unix philosophy as merely a guideline or a means to an end. It's a nice guardrail, and where possible, hugging that guardrail will probably be a good heuristic that will rarely lead you astray. That's valuable. It's like Newton's laws of motion or believing that the world is flat. Both are fine models. You just gotta know not only when to abandon them, but that it's okay to do so!
But yes, this has been my struggle for the past several years in my open source work: trying to find that balance between performance and complexity. In some cases, you can get faster code without having to pay much complexity, but it's somewhat rare (albeit beautiful). The next best case is pushing the complexity down and outside of the interface (like memchr). But then you get into cases where performance bleeds right up and into APIs, coupling and so on.
Yup, all is well. :-)
My favorite example of trading performance for simplicity is `memchr`. I have a small little write-up on it here: https://docs.rs/memchr/2.4.0/memchr/#why-use-this-crate
The essence of it is that if you want to find a byte in a slice in Rust, then that's really easy:
So why bother building a whole crate for this? Well, because, to make it fast, it takes low-thousands (to cover the variety of memr?chr{2,3} variants) lines of code to make use of platform specific SIMD instructions to do it.This is a good example of something the Plan9 folks would probably never ever do. They'd write the obvious code and (probably) demand that you accept it as "good enough." (Or at least, this is my guess based on what I've seen Rob Pike say about a variety of things.)
I have a lot of sympathy for this view to be honest. I'd rather have simpler code. And I feel really strongly about that. But in my experience, people will always look for something faster. And things change. Back in the old days, the amount of crap you had to search wasn't that big. But now that multi-GB repos are totally normal, the techniques we used on smaller corpora start to become noticeably slow. So if you don't give the people fast code, then, well, someone else will.
Anyway, none of this is even necessarily a response to you specifically. I'd say they are also just kind of idle musings too. (And I mean that sincerely, not trying to throw your words back in your face!)
> “how complex does it need to be?”
Yeah, I think "need" is the operative word here. And this is kinda what I meant by popularity breeding complexity I think. How many Plan9 users were trying to search multi-GB source code repos or walk directory trees with hundreds of thousands of entries? When you get to those scales---and you accept that avoiding those scales is practically impossible---and Plan9's "simplicity above everything else" means dealing with lots of data is painfully slow, what do you do? I think you either jump ship, or the platform adapts. (To be clear, I'm not so certain of myself as to say that this is what Plan9 never achieved widespread adoption.)
> The case of traversal with VCS exclusions is fascinating to me, by the way. It looks like it begs to be two programs connected by a unidirectional channel, until you introduce subtree exclusion, at which point it starts begging to be two programs connected by a bidirectional channel instead, and the Unix approach breaks down.
Yeah, I think this (and many other things) are why I consider the Unix philosophy as merely a guideline or a means to an end. It's a nice guardrail, and where possible, hugging that guardrail will probably be a good heuristic that will rarely lead you astray. That's valuable. It's like Newton's laws of motion or believing that the world is flat. Both are fine models. You just gotta know not only when to abandon them, but that it's okay to do so!
But yes, this has been my struggle for the past several years in my open source work: trying to find that balance between performance and complexity. In some cases, you can get faster code without having to pay much complexity, but it's somewhat rare (albeit beautiful). The next best case is pushing the complexity down and outside of the interface (like memchr). But then you get into cases where performance bleeds right up and into APIs, coupling and so on.