The transposed direct form II allows a biquad to be calculated with two scalar by vector multiplications and some shuffling, which should be faster than the proposed matrix solution I believe.
I'd be happy to see benchmarks of that. The problem is that the "shuffling" creates serial data dependencies, while the matrix form doesn't. Sure, the number of multiplications is smaller for direct forms, but that's not what has the most effect on performance.