I have a verilog module that implements the "Mux-Based Data Reversal" design with overflow output. Yosys/nextpnr synthesize it with a 102MHz timing estimate on the lattice ice40hx8k.
As for the 74F design, yeah, it uses the logarithmic approach. It still has massive multiplexers, but there's only 5 stages for a full 32 bit shift, plus the extra gates to handle the carry bit.
For what it's worth, you might want to consider other logic families than F. It's no longer the fastest and can be less well-behaved and more power-hungry than others. As well as getting harder and harder to find and more and more expensive.
In terms of speed, you'll find that LVC at 3.3V is likely to outperform F at 5V, and AUC at 3.3V will definitely outperform everything short of ECL -- the catch there is that AUC is technically not specified for 3.3V operation. To stay at 5V, LVC is a good choice if available in the functions you need (LVC is maddeningly inconsistent in that some parts run at 3.6V max and others at 5V max), or look around at the various high-speed families otherwise. The big bus drivers in ABT are great if they're available.
Thank you so much, I'll ingest this into my system, haha.
Also, very cool results, that's really impressive!
I'm a computer engineer by trade (primarily FPGAs), so it's always great to see other FPGA engineers (hi!). :)
I have a verilog module that implements the "Mux-Based Data Reversal" design with overflow output. Yosys/nextpnr synthesize it with a 102MHz timing estimate on the lattice ice40hx8k.
As for the 74F design, yeah, it uses the logarithmic approach. It still has massive multiplexers, but there's only 5 stages for a full 32 bit shift, plus the extra gates to handle the carry bit.