(Author here) Thanks - and I agree. That being said this is running 100 MHz on (...

danbruc · on May 9, 2020

One could probably go for something like the product of the number of gates involved and the clock frequency to quantify the efficiency of an implementation. Then you can either have a very simple processor with only a few gates but running at a high clock frequency to get all the computations done or you can have a very large parallel implementation with many gates but requiring only a lower clock frequency.

One must obviously only count the actually used gates, for example if floating point units in a processor are not used, and account for idle time if a frame is completed faster than the frame time. Also counting gates might be somewhat tricky, for example in a FPGA where multiplexers and memory are used to build look-up tables to then implement gates, so one could either count the actual gates in the FPGA because those are the gates that are actually used but one could also want to count the gates in the design as if the design was implemented in an ASIC. On the other hand the difference is probably just a small constant factor and it might not really matter that much.

In the end power consumption should capture this pretty well as it scales with the number of actually switching transistors and clock frequency. One would still have to account for the differences in technology and especially supply voltage which goes quadratically into the power consumption.