Except the insight of the answer is that the bottleneck is in a different program than suspected. Running the suspect program under a profiler wouldn't yield any answers.
Sure, anyone can find the answer if they already know where to look.
Running the suspect program would show blocked states; even without that, you would simply see no difference, conclude that the problem must lie elsewhere, and switch to looking at a full system profile.
FWIW, I always start with a full system profile including all thread states when debugging mysterious performance issues, precisely because "the problem is in another process" is such a common occurrence. This is also why it's important to use a sampling profiler specifically.
Sure, anyone can find the answer if they already know where to look.