Rust Performance: A story featuring perf and flamegraph on Linux

anp · on July 24, 2016

I'll be around on and off if anyone wants to discuss, although I hope it is a fairly uncontroversial post. Since so many Rust folks (including myself) come from non-native code, I think it's good to take time to document the discovery process as it pertains to Rust and ideally make it easier for future programmers to pick up.

the_duke · on July 25, 2016

Great writeup.

anp · on July 25, 2016

Thanks!

honkhonkpants · on July 25, 2016

Bounds checking can be a nuisance in Golang as well. One trick I've picked up is if I want to iterate over some small prefix of a slice, say the first 10 bytes of a slice of bytes that is known to be longer than 10 bytes, it is tempting to do this:

  for i := 0; i < 10; i++ {
    something(honk[i])
  }

But if you do that, you get a bounds check on every iteration. Much faster to do this:

  subhonk := honk[:10]
  for _, v := range subhonk {
    something(v)
  }

There's no range check inside the loop of a golang range iterator, for obvious reasons.

anp · on July 25, 2016

Rust does this as well with iterators and iterator adapters -- I just forgot to include the version that did so in the blog post. Writing a loop like this:

    let mut count = 0;
    for b in &bwt[(i * self.k) + 1 .. r + 1] {
        if b == a {
            count += 1;
        }
    }

Gave me similar performance to the filter/count iterators, as both elide bounds checking. I thought I mentioned that in the parting thoughts, but I may be misremembering and I have to run.

pkaye · on July 25, 2016

Good to know this. Some reason I felt in my mind that the first version would be faster because I am providing more information...

kibwen · on July 24, 2016

For further utility in benchmarking Rust programs, see cargo-benchcmp: https://github.com/BurntSushi/cargo-benchcmp

conradev · on July 25, 2016

Are perf and flamegraph the de facto tools for recording and viewing performance?

I was looking for an alternative to Instruments (macOS) on Linux and I came across callgrind and kcachegrind, which worked really well for me when profiling some Rust code.

anp · on July 25, 2016

I'm not sure what's de facto, but I most commonly see other Rust/Linux users reaching for valgrind (callgrind, cachegrind, etc). I've recently used massif (part of valgrind) for doing profiling of heap allocations, as well. In the past I also used oprofile because it gave me a nicely annotated source dump of sample rates in different statements.

I am growing to like perf a lot, though. It seems (haven't measured though) to impose less runtime overhead, and the reporting and source annotation tools are top notch.

cmrx64 · on July 25, 2016

It's important to note that perf is a sampling profiler using the CPU performance monitoring unit, whereas callgrind (and all the valgrind tools, in general) uses processor emulation to record every event.