This is an extension on another paper from a few years ago where they described the split architecture; this paper describes some optimizations for the hardware and the compiler they use with it. The new optimizations increase performance by a factor of about 6 compared to their earlier work.
Additionally, I don't think that this processor executes code quite linearly. Its hardware can detect and break down functional code and run it in a parallel manner; they make full use of their multiport split memory to do something like eight times as much work/cycle as a (heavily pipelined!) Core 2 Duo. I admit that it probably won't work on iterative code, but there's enough functional code floating around that this could see some use as a coprocessor.
Additionally, I don't think that this processor executes code quite linearly. Its hardware can detect and break down functional code and run it in a parallel manner; they make full use of their multiport split memory to do something like eight times as much work/cycle as a (heavily pipelined!) Core 2 Duo. I admit that it probably won't work on iterative code, but there's enough functional code floating around that this could see some use as a coprocessor.