Most web servers run logic concurrently (different requests are handled in parallel). The request handlers then need to access a shared data store, and that's where you get a bottleneck (that people try to solve with all manners of eventually-consistent caches or distributed stores).
Web servers run logic concurrently because it is the sanest architecture to allow people to write their handlers in whatever language they want. They certainly don't do it for performance reasons. The average web app programmer also doesn't use threads directly for running their own logic concurrently.
Now you could argue whatever shared data store they're accessing needs to run things concurrently in order to maximize performance (and certainly many do), but even then we're usually looking at relatively simple shared memory patterns often with fairly coarse locking that scales well to a few cores, but not much beyond that (which often doesn't matter since you can saturate your network and/or memory bus before that). Maybe once in a while you'll see a lock-less concurrent data structure, but they're still fairly exotic.
...that scales well to a few cores, but not much beyond that (which often doesn't matter since you can saturate your network and/or memory bus before that)
You're describing limitations to current implementations that seriously hinder scaling. Developers are far from happy with this state of affairs, which, thankfully is not a necessity. You most certainly will not saturate the network if the data is distributed wisely. Here's a post I once wrote for highscalability.com that discusses the networking issue: http://highscalability.com/blog/2012/8/20/the-performance-of...
The memory bus is a bigger problem in general: if you do enough work on enough cores you will saturate the bus. But this, too, is simply the current state of affairs and not some insurmountable barrier to scaling.