I think that is largely because they aren't using enough workers. IIRC they use 3 * cpu_count for sync, which just won't be enough.
They also misleadingly de-emphasise latency variation. One Python async framework I'd never heard of was top of the pops on throughput there even though the latency numbers suggested it had pretty much fallen apart in the test.
[1] https://www.techempower.com/benchmarks/