In particular, a lot of was written by Siu Kwan Lam (https://github.com/sklam), who doesn't seem prone to taking public credit but has been quietly doing an awesome job.
Yes, Nuitka (http://nuitka.net/) and Cython (http://cython.org/) both compile a Python program to a chain of C API calls. The performance benefit isn't typically all that large, since removing dispatch overhead isn't really a huge deal compared to using more efficient data representations. Cython has the additional benefit that it lets you declare some typing assumptions, which then enables serious performance gains.
There are also runtime compilers for Python subsets such as Numba (http://numba.pydata.org/) and Parakeet (https://github.com/iskandr/parakeet), which can potentially give you huge speedups (as long as you stay within the world of numerical/array-oriented code).
I have experience with Cython, and it works pretty well -- you basically use python code, but add some type hints and it will compile and get quite faster. Of course, it will not be as fast as normal C, because it still uses smarter (but slower) Python data structures, but it provides nice middle ground between pure python or PyPy, and writing the performance critical-part of your software in C by hand, which is bit more work, especially if you need to transfer multidimensional arrays between C and Python.
You can use any Python to write high-performance webapps cause the performance requirements for "webapps" is low and the backends are usually servers/services and/or are more constrained by datastructures/algorithm/disk than execution speed.
There really are some "webapps" with tight performance requirements. For example, Google's web frontend is a statically compiled C++ application. And yes, anithero, you could write that sort of thing in Cython and with some type declarations get acceptable native performance. You will probably miss a few opportunities for optimization but catch the low-hanging fruit and get an orders of magnitude speedup over Python.
Interesting. I was actually thinking of using something like this to take parts of the slower web frameworks that have tonnes of functionality, such as Django's ORM or templating, and get them on par with microframeworks.
I seem to remember performance wise the RPython guys seems to think a JIT implementation of Python would be faster because of the dynamic nature of Python.
This is not a compiler that does python -> llvm. It is a way to build llvm IR from python using the C++ API. Otherwise you output LLVM IR as text, and call the llvm command line tools to do stuff with it.
For example, you can use this to generate LLVM IR at runtime, and use the LLVM JIT to compile and execute that LLVM code.