Hacker News new | past | comments | ask | show | jobs | submit login

There are hardware reasons even if you leave any software scaling inefficiency to the side. For tasks that can use lots of threads, modern hardware trades off per-thread performance for getting more overall throughput from a given amount of silicon.

When you max out parallelism, you're using 1) hardware threads which "split" a physical core and (ideally) each run at a bit more than half the CPU's single-thread speed, and 2) the small "efficiency" cores on newer Intel and Apple chips. Also, single-threaded runs can feed a ton of watts to the one active core since it doesn't have to share much power/cooling budget with the others, letting it run at a higher clock rate.

All these tricks improve the throughput, or you wouldn't see that wall-time reduction and chipmakers wouldn't want to ship them, but they do increase how long it takes each thread to get a unit of work done in a very multithreaded context, which contributes to the total CPU time being higher than it is in a single-threaded run.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: