Hacker News new | past | comments | ask | show | jobs | submit login

(Author here) Thanks - and I agree. That being said this is running 100 MHz on (what I believe is) a relatively low end FPGA? Also my implementation remains simplistic, many things to optimize. But this is a very good and deep question: how can we compare these things? What is the right metric? (FLOPS per watt? err, no, no floating point involved ;) ). I am wondering and this seems quite a difficult/subtle question.

I love GPUs, I spent many year working with them (still do!) and these are beautiful pieces of hardware and engineering (as modern CPUs are). They have evolved beyond our craziest dreams since the NVidia register combiners (https://www.khronos.org/registry/OpenGL/extensions/NV/NV_reg...). The performance we get nowadays is absolutely mind-boggling (I often think we don't fully realize how powerful they actually are).

Can we dream of some sort of mixed platform, where we could 'burn-in' very specific functions into FPGA type hardware that would seamlessly interact with our modern GPUs/CPUs? Is it already happening?




One could probably go for something like the product of the number of gates involved and the clock frequency to quantify the efficiency of an implementation. Then you can either have a very simple processor with only a few gates but running at a high clock frequency to get all the computations done or you can have a very large parallel implementation with many gates but requiring only a lower clock frequency.

One must obviously only count the actually used gates, for example if floating point units in a processor are not used, and account for idle time if a frame is completed faster than the frame time. Also counting gates might be somewhat tricky, for example in a FPGA where multiplexers and memory are used to build look-up tables to then implement gates, so one could either count the actual gates in the FPGA because those are the gates that are actually used but one could also want to count the gates in the design as if the design was implemented in an ASIC. On the other hand the difference is probably just a small constant factor and it might not really matter that much.

In the end power consumption should capture this pretty well as it scales with the number of actually switching transistors and clock frequency. One would still have to account for the differences in technology and especially supply voltage which goes quadratically into the power consumption.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: