Hacker News new | past | comments | ask | show | jobs | submit login

How does this compare to other regex parsing libraries, performance wise?



He gives a link to this at the bottom: https://github.com/BurntSushi/regexp/tree/master/benchmark/r...

(You have to scroll down to the code block sections to see the numbers).

Rust beats Go pretty soundly (on this specific benchmark), but C beats Rust pretty soundly.

Why is C so much faster?


Yeah, this is really the only comparison I have so far. It was included with a PR for adding regexes to Rust, so hopefully it will make it into the shootout proper in the next release.

I also have some more specific benchmarks against Go: https://github.com/BurntSushi/regexp/tree/master/benchmark

(For any curious readers out there, the benchmark is multithreaded in Rust[1], with a particularly nice use of futures.)

> Rust beats Go pretty soundly (on this specific benchmark), but C beats Rust pretty soundly.

I believe the dynamic regexes also beat Go on this specific benchmark, but the optimizing compiler is likely to blame there. (And to be clear, Go still beats Rust native regexes in some benchmarks, primarily because it has more clever optimizations. It was written by Russ Cox, after all!)

> Why is C so much faster?

The C program is pretty elaborate, but here are my guesses.

* There are many ways to implement regex matching. For example, RE/C++ uses both a very very quick DFA and a NFA simulation. My library only has the (slower) NFA simulation. I don't know what Tcl uses for an implementation strategy, but whatever it is, it's likely a factor.

* If string replacement is done in place, then it's going to avoid a good chunk of allocation. (In this particular benchmark, it would be perfectly legal to do string replacement in place.)

I'm sure there are other things I'm missing, but these seem like good candidates for a good chunk of the difference.

[1] - https://github.com/BurntSushi/regexp/blob/master/benchmark/r...


> Why is C so much faster?

Because processors are made with C in mind, as it's the lingua franca, and so they end up being optimized for it.


> Because processors are made with C in mind, as it's the lingua franca, and so they end up being optimized for it.

No, Rust exposes the C memory model pretty directly, and it uses the backend of an industrial-strength C++ compiler (clang/LLVM). The difference here is algorithmic.

(Rust does have some codegen bugs that affect performance, however, though I suspect they won't affect this code.)


Ok, but why is Python faster than Rust implementation? (And yes, I assume those are C libs behind python).

NOTE: Rust is supposed to have zero cost abstractions. I think this is more to lack of optimization on regexp part, than some fault of language.


The Rust implementation hasn't been extensively microoptimised.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: