OK, I'm not familiar with MatLab performance. :-) But image processing is very computationally expensive, and the demo here is unbelievably fast. It must be optimized for its specific tasks. You just can't get that from a general package.
In retrospect, it's not surprising that they're optimizing some parts with assembler - even SSE (Streaming SIMD (Single Instruction, Multiple Data) Extensions http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions), which I hadn't heard of, but which is the kind of vector parallelism that gives supercomputers their speed.