Can't remember which talk it was from but I remember enjoying Joe Armstrong speaking about the importance of a language being "correct" rather than fast:
The performance problem is solved. You just wait 20 years.
Nonsense, it still works. Hardware prices are still falling. People aren't writing web apps without garbage collection. Performance/$ is still going to rise.
While I do understand what you mean about writing web apps in managed languages for optimizing programmer productivity instead of speed of code, garbage collection is not slow in general. In fact, garbage collection can be faster than manual collection in some cases, due to several reasons (e.g., GC may require operations based on the size of the working set only, manual collection will need to touch each piece of garbage explicitly).
There are a lot of things that make GC performance an issue, including stop-the-world behaviour for some collectors making real-time operations impossible as well as the tendency for programmers to produce a lot more garbage in their programs. In general, I think that most people overstate the percieved inefficiencies of GC as it comes to throughput (predicatable latency is the main culprit IMHO).
The problem with garbage collection is not only the speed penalty. Garbage collection will also use a lot more memory than manual memory management or automatic reference counting. This is especially bad when running in a memory constrained environment and dealing with data intensive tasks. Think image or video manipulation on a mobile or embedded device.
Reference counting is very expensive time-wise and memory-wise if you have lots of small objects. Memory-wise because of the additional counting-field needed, and time-wise because of the counter update at every gain and loss of a reference. These updates can also easily become huge concurrency bottlenecks, since they introduce false write sharing. Another issue with reference counting is of course that cyclic structures are hard to handle.
It is true that GC works best when you have sufficient amounts of free memory compared to the amount of garbage produced. On any kind of memory-constrained (embedded) device I would naturally view everything about memory allocations as critical issues to handle, with clear memory usage budgets.
I mean, this conversation has been around for easily 10 years. We live in a world where we already have to do this—look at all the progress in the last decade on the prevalence of futures, async i/o, and channels/queues/whatever. Memory bandwidth and latency has proven to be far more of a performance drag that has improved far slower—the likely areas that will lead to performance problems will be in areas other than single-core clock speed.
As it turns out, Erlang is already providing much better memory locality than your average language. Each process runs in a contiguous memory area (heap+stack) which fits perfectly with a many-core world.
This limitation is only true if you can't utilize multiple small cores. Erlang would absolutely rock on a 1024 core machine, where you almost need no scheduler.
Yeah a wafer scale cpu with hundreds of gigabytes sprinkled between those cores. Cores aren't the issue, memory bandwidth is. 90% of the silicon will be devoted to communication. Kinda like a brain...
People are inventing decent languages, but that doesn't mean they will be used since adoption is a matter of fashion and historical accident as much as anything