This is an idea that's been around forever and from a hardware perspective it's ...

DougMerritt · on April 8, 2021

Correct but at the same time that overstates things, because extremely often the choice between full custom ICs versus FPGA is dictated by the expected volume of product, not just functionality considerations.

It is typically simply cheaper to deploy FPGAs in released products when the volume is small, while it may be cheaper to use full custom when the volume is in the millions to hundreds of millions, in the cases where either solution is functionally workable.

That includes amortizing the non-recurring engineering costs over the total units, which is typically higher for full custom than FPGA -- although sometimes they are actually in the same ballpark.

Aside from that you are correct; people sometimes imagine that most any application can be significantly accelerated with FPGAs, but even in the cases where fine-grained parallelism is present to be accelerated (well-known not to be the case for all application areas), the FPGA solution space is decreased by the solution space where full custom makes engineering and financial sense.

regularfry · on April 8, 2021

Maybe they're targeting a different arch layer? Is there any mileage in pushing that sort of tech into on-chip routing? As you get more and more cores, obviously interconnect area becomes more of a problem (and bus bandwidth more constrained). Is there much to be gained from a compiler being able to say "This next bit of code wants as much uncontended bandwidth as you can muster between 5 cores and L1"? That way, actually reconfiguring anything would be a bunch of microcode, rather than something the compiler took direct control over.

einpoklum · on April 8, 2021

There's some (perhaps lots) of potential for this for in-memory analytic processing (e.g. in an analytics-focused DBMS).

Also, a specific potential use of FPGAs is for pattern matching on large amounts of text/data: If you do it at all, you're likely to do it often; and it can't be a custom implementation since the circuit depends on the specific pattern.