TLDR; Efficiency difference: you can think of virtual threads as being about 1,000 times (3 orders of magnitude) less expensive, in the general case. Exception: If you are only doing CPU-only work, regular threads will be better (that's not how most web servers/services operate).
But if you're waiting 100ms of ms for your database (or any network) to respond and you have many of those (blocking) method/function calls in flight... virtual threads are the way to go in terms of efficiency.
Great video explaining all of this (at timestamp, all 30 min is worth watching):
On a wimpy old desktop running lots of other stuff I got 75000 threads with that snippet (had to increase max_map_count tunable first with sysctl -w vm.max_map_count=500000, a knob well documented for bigger thread counts). Considering that in a real world use case with that much concurrency (such as the "100k threads frequently waiting for 100ms db queries" scenario), I'd be using a bigger machine and there'd also be some actual application context data and TCP connection state dwarfing the thread memory requirements in those blocked contexts, I'll still call virtual threads solving quite rare use cases, pending stronger evidence.
I don't doubt virtual threads are efficient when microbenchmarked against OS threads, but there's only so much to be gained when optimizing a part of the system that wasn't a bottleneck to begin with.
(Also I hope in the scenario we have a beefy DB that can exploit the concurrency available in this many pending concurrent queries per client, and isn't just putting them in ever queue! I guess it could, maybe it has 1000 read replica servers, each with 100 attached SSDs or 100 cores serving from in-memory data)
Right... but using virtual threads I got 1,000,000 (in a second or two...) and likely for fraction of the memory. And I can even get many millions without a problem.
I agree that for a low volume typical CRUD-only server that never bursts above a few dozen requests per second that's not an issue.
Messaging is one special "rare" case.
Imagine a messaging channel in Slack/Discord/Telegram/WhatsApp-type app, where there's thousands of participants (say 10,000) in one channel. 1 person posts a message...
9,999 messages are generated (one to for each of the other participants). One might say, that can be handled with a few machines...
Now imagine a person posts something super interesting to the channel where 100 people respond with an emoji reaction almost instantly.
Now you have 100 * 9999 = 999,900 (!) messages, all within a few seconds! And this is for just one message _reaction_. Each one of those messages likely involves multiple steps of IO (sending/receiving data over the socket, writing to one or multiple databases...). So a thread might be blocked for hundreds of milliseconds at various times, which would ultimately likely require 100-1000x more servers to operate as scale increases.
The thing is that you _can_ write a program efficiently using regular threads by using asynchronous techniques. Another name for such style of programming is _callback hell_ :) . And when something does go wrong, it is a lot harder to debug.
OS threads in Linux are fast and you can have a lot of them. Eg https://news.ycombinator.com/item?id=37621887