It's great to see all this discussion. If anyone has a link to any Intel papers or systematic analysis of how sampling profilers interact with Intel super-scalar out-of-order processors that would be appreciated.
HP Z840, two sockets, 64 GB of RAM, SSD. I don't remember the details and I'm at home now.
The most recent tests were done on my four-core/eight-thread laptop. It's running a more recent version of Windows so it's a "fairer" test in that way, and it reproduces the problem just fun with my artificial test program, CreateProcessTests.