Your point is valid when it comes to rayon (I don't know much about wide) being inherently slower than custom optimization, but from what I've seen I suspect rayon isn't even the bottleneck in terms of performance, there's some decent margin of improvement (I'd expect at least double the throughput) without even doing arcane stuff.