That microbenchmark is old and quite imperfect; I'm pretty sure it's not actually measuring what it claims to measure.
The point still stands. Creating threads (at least in a normal OS like a modern Linux) is really fast.
E.g., when processing a huge logfile it's faster to create one short-lived thread per each line than it is to use a thread pool with the producer-consumer routine and conditional variables.
https://stackoverflow.com/questions/3929774/how-much-overhea...