It doesn't have to be a slow function either, it could just be a function that's getting called a ridiculous number of times.
Back at Electric Cloud I was working on debugging an apparently live locked build process using our tool (this was back in 2004) where make would run forever with the CPU pegged.
It turned out that the particular build tree structure (at the file level) was somewhat pathological for our tool and we were traversing the tree a ridiculous (exponential) number of times relative to its size. We were doing this to constantly calculate a value that depended on sub-tree size.
After ages trying to narrow down the bug I just went into the debugger and broke in to look at a particular data structure. After doing this a few times I noticed that I was always in the same function.
This is also (roughly) how profiling JIT compilers (like most JVMs) work. The method that needs the more expensive optimizations applied to it is the one that gets the most samples.
Yeah this is how Intel describes VTune's operation as well. The trick is that you can't make the measurements from inside your own code, you have to have an external process doing the interrupting. As a newbie I didn't know this. I took an internal framework meant for very coarse level profiling and applied it at an absurdly fine level. The result was spending much, much too long trying to make sense of the numbers.
SIGUSR2 : open an interactive console using rconsole (http://code.google.com/p/rfoo/) and let me log and muck around with live object while the server process keeps running.[NOTE: don't use on production systems, it opens an RPC port that lets anyone control your process remotely!]
from rfoo.utils import rconsole
rconsole.spawn_server()
I can't help but boggle at the suggestion that in this day and age there are programming environments where doing this is less work than running a real profiler. I mean, come on, profilers that can't handle recursion or that can't show higher than function level granularity?
Well, it means compiling with profiling, running, visualizing the output, and then figuring out the hotspots. That takes some amount of work. (Or you run with valgrind, in which case it runs so painfully slow that before you've gotten to the slow part, you've gone home...)
Compared to that, firing up the debugger and hitting break is pretty easy. If the hotspot is like core meltdown hot, it's probably faster.
Plenty of tools don't require recompiling anything for profiling. Running the program once for profiling seems like it should be less work than running and interrupting it 10 times (heck, a lot of tools won't even require restarting the program if it already happens to be running). Visualizing the output by function or by something more fine-grained should be a matter of a single command. And if it's easier to manually find the hotspots by looking at a bunch of stacktraces than the profiler output, there's something badly wrong with said output.
Now, I'm sure there are special circumstances where this method is the right thing. But that's totally different from it apparently being suggested as best practice to newbies.
Wow, I didn't know this technique had a name :)
I used it today trying to figure out why a SharePoint application was eating up the CPU - break into debugger 6-7 times and 3 of those happened to be in the same function call or its children in the stack trace - problem found!
My team was writing a Linux-based PABX and a round of changes had caused lockups - but a weird kind. The machine was there, but just not responding. It wasn't dead, just maybe 100K times too slow. But when I hit alt-sysrq to shut the machine down and it responded instantly.
So I wrote an Alt-Sysrq handler that dumped the stack of the running process. The failed one was usually the one running so it was pretty easy to find, and then the stack trace told us enough about why it was deadlocked to make a fix without needing more investigation.
It was essentially just a one-sample sampling (monte-carlo) profiler.
For people that have done this with success: how many times of closing an app do you consider a "few" times? Wouldn't you need to repeat this enough to make the result statistically significant--or is it more like a rule of thumb?
For Java, running jstack a few times can be much faster and easier than tying up to a profiler and often will show your bottleneck or problem right away.
Back at Electric Cloud I was working on debugging an apparently live locked build process using our tool (this was back in 2004) where make would run forever with the CPU pegged.
It turned out that the particular build tree structure (at the file level) was somewhat pathological for our tool and we were traversing the tree a ridiculous (exponential) number of times relative to its size. We were doing this to constantly calculate a value that depended on sub-tree size.
After ages trying to narrow down the bug I just went into the debugger and broke in to look at a particular data structure. After doing this a few times I noticed that I was always in the same function.
Memoizing the function fixed the problem.