Monte Carlo profiling

jgrahamc · on Oct 21, 2011

It doesn't have to be a slow function either, it could just be a function that's getting called a ridiculous number of times.

Back at Electric Cloud I was working on debugging an apparently live locked build process using our tool (this was back in 2004) where make would run forever with the CPU pegged.

It turned out that the particular build tree structure (at the file level) was somewhat pathological for our tool and we were traversing the tree a ridiculous (exponential) number of times relative to its size. We were doing this to constantly calculate a value that depended on sub-tree size.

After ages trying to narrow down the bug I just went into the debugger and broke in to look at a particular data structure. After doing this a few times I noticed that I was always in the same function.

Memoizing the function fixed the problem.

saurik · on Oct 21, 2011

This is how common profiling tools (such as gprof) actually work.

"Profiling also involves watching your program as it runs, and keeping a histogram of where the program counter happens to be every now and then."

"The run-time figures that gprof gives you are based on a sampling process, so they are subject to statistical inaccuracy."

-- http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html#SEC11

CountHackulus · on Oct 21, 2011

This is also (roughly) how profiling JIT compilers (like most JVMs) work. The method that needs the more expensive optimizations applied to it is the one that gets the most samples.

thwest · on Oct 21, 2011

Yeah this is how Intel describes VTune's operation as well. The trick is that you can't make the measurements from inside your own code, you have to have an external process doing the interrupting. As a newbie I didn't know this. I took an internal framework meant for very coarse level profiling and applied it at an absurdly fine level. The result was spending much, much too long trying to make sense of the numbers.

wulczer · on Oct 21, 2011

Also known as Poor Man's Profiling (see http://poormansprofiler.org/ or http://webcache.googleusercontent.com/search?q=cache%3Apoorm... as it seems the site's down).

It's a pretty good technique, actually.

rdtsc · on Oct 21, 2011

$ pstack PID

Combine with watch

$ watch pstack PID

On RHEL6 pstack is in gdb RPM package.

(bonus: shows all threads)

jbert · on Oct 21, 2011

Yes, I've used this and it's great for anything where you want a C-stack backtrace.

As a variant, if you want a language-level backtrace in an interpreted language, you can:

- set up a signal handler which dumps a (perl, python, etc) level backtrace

- hit the program with the signal a few times

rdtsc · on Oct 21, 2011

Great idea!

My Python server processes during testing handle 2 signals SIGUSR1 & SIGUSR2.

SIGUSR1 : runs guppy memory usage & leak profiling (http://guppy-pe.sourceforge.net/)

    import guppy
    _hpy = guppy.hpy()
    log(...,_hpy.heapu(),...)
    log(...,_hpy.heap(),...)
    log(..., <other _hpy info>...)

SIGUSR2 : open an interactive console using rconsole (http://code.google.com/p/rfoo/) and let me log and muck around with live object while the server process keeps running.[NOTE: don't use on production systems, it opens an RPC port that lets anyone control your process remotely!]

        from rfoo.utils import rconsole
        rconsole.spawn_server()

Then on the console:

         $ rconsole

lutorm · on Oct 21, 2011

Wow, you learn something new every day! That looks awesome. Surprised I've never heard of it before.

jsnell · on Oct 21, 2011

I can't help but boggle at the suggestion that in this day and age there are programming environments where doing this is less work than running a real profiler. I mean, come on, profilers that can't handle recursion or that can't show higher than function level granularity?

lutorm · on Oct 21, 2011

Well, it means compiling with profiling, running, visualizing the output, and then figuring out the hotspots. That takes some amount of work. (Or you run with valgrind, in which case it runs so painfully slow that before you've gotten to the slow part, you've gone home...)

Compared to that, firing up the debugger and hitting break is pretty easy. If the hotspot is like core meltdown hot, it's probably faster.

jsnell · on Oct 21, 2011

Plenty of tools don't require recompiling anything for profiling. Running the program once for profiling seems like it should be less work than running and interrupting it 10 times (heck, a lot of tools won't even require restarting the program if it already happens to be running). Visualizing the output by function or by something more fine-grained should be a matter of a single command. And if it's easier to manually find the hotspots by looking at a bunch of stacktraces than the profiler output, there's something badly wrong with said output.

Now, I'm sure there are special circumstances where this method is the right thing. But that's totally different from it apparently being suggested as best practice to newbies.

cdavid · on Oct 21, 2011

There are actually ways to profile python functions on a per-line function (e.g. the line profiler from Robert Kern: http://packages.python.org/line_profiler)

lini · on Oct 21, 2011

Wow, I didn't know this technique had a name :) I used it today trying to figure out why a SharePoint application was eating up the CPU - break into debugger 6-7 times and 3 of those happened to be in the same function call or its children in the stack trace - problem found!

wnight · on Oct 21, 2011

My team was writing a Linux-based PABX and a round of changes had caused lockups - but a weird kind. The machine was there, but just not responding. It wasn't dead, just maybe 100K times too slow. But when I hit alt-sysrq to shut the machine down and it responded instantly.

So I wrote an Alt-Sysrq handler that dumped the stack of the running process. The failed one was usually the one running so it was pretty easy to find, and then the stack trace told us enough about why it was deadlocked to make a fix without needing more investigation.

It was essentially just a one-sample sampling (monte-carlo) profiler.

ot · on Oct 21, 2011

It also works to find out in which loop a process is stuck in, and in this case with 100% accuracy :)

eric-hu · on Oct 21, 2011

For people that have done this with success: how many times of closing an app do you consider a "few" times? Wouldn't you need to repeat this enough to make the result statistically significant--or is it more like a rule of thumb?

ddlatham · on Oct 21, 2011

For Java, running jstack a few times can be much faster and easier than tying up to a profiler and often will show your bottleneck or problem right away.

abhijitr · on Oct 21, 2011

aka "sampling profiler" as opposed to "instrumentation profiler"