> If you want to profile a multi-threaded application, you must give an entry point to these profilers and then maybe merge the outputs.
It basically boils down to (currently) doing multiprocessing profiling is a giant pain in the ass, you have to manually attach the profiler yourself if you ever launch another process, and every profiled process produces it's own output file.
It's not impossible, it's just very annoying. I've been vaguely meaning to write a thing which attaches to the fork() call and automatically starts the profiler in the child-process, and handles aggregating all the results back to a single output when all children exit.
Multi-process (<~1M) profiling is obviously bread and butter in the HPC world. That's what the tools I referenced are for primarily. The more recent Python targeting may not be so solid, especially if there's no good launch framework to hook into, which would be a good reason for using MPI.
It basically boils down to (currently) doing multiprocessing profiling is a giant pain in the ass, you have to manually attach the profiler yourself if you ever launch another process, and every profiled process produces it's own output file.
It's not impossible, it's just very annoying. I've been vaguely meaning to write a thing which attaches to the fork() call and automatically starts the profiler in the child-process, and handles aggregating all the results back to a single output when all children exit.