Hacker News new | past | comments | ask | show | jobs | submit login

That point about toolchain complexity just reminds me of where compilers were 20 years ago. You could get significant speed bumps because one vendor had an optimisation another didn't. That's probably still true for things like the intel compiler, for all I know. I do remember at one point doing computer vision stuff before NVidia ate the world that it wasn't uncommon to come across code that expected to have access to intel-compiled maths libraries to be able to run in real-time.

The point is that the free toolchains got good enough that in general you don't need to care, and I think it's reasonable to expect the same dynamics to apply here.




It's still true for compilers -- for instance gcc 10 had a greater than 10% perf improvement for AArch64 targets compared to gcc 9 as a result of various optimisation work: https://community.arm.com/developer/tools-software/tools/b/t...


The difference is that software compilation is a fairly local optimization process - allowing you to change/inline one bit of generated code without significantly affecting the performance of the rest of the generated code - address space is cheap. On the other hand, especially when the fpga approaches capacity, one change could cause a significant portion of the design to have to be rerouted as space is limited causing significant performance differences or not meeting the timing requirements anymore.


> The difference is that software compilation is a fairly local optimization process - allowing you to change/inline one bit of generated code without significantly affecting the performance of the rest of the generated code

Not really.

The uop cache, and L1 code cache, of modern chips is rather small. You can often grossly increase performance locally by loop-unrolling, but if that causes the "hot path" to no longer fit in uop-cache (or L1 cache), then you've lost a chunk of global performance vs a small local-gain in performance.

Global vs local optimization is just a tough subject in general. Even on CPUs (which is probably easier than FPGAs)


OP wrote "a fairly local optimzation". Compared to what a placement algorithm must do for an FPGA, your example is still exactly that, and one that's relatively easy to get right with a few heuristics.


Yep. But to me that only changes what the problem is that the toolchain solves. It doesn't affect the social and economic pressures on how the relevant toolchains evolve.

To a certain (admittedly limited) degree the "limited space" problem with FPGAs maps intriguingly to the "limited memory" conditions that early LISP and FORTH compilers spawned in.


More to the point the market for FGPAs is fundamentally limited by some factors.

If you are making 10,000+ units the economics are overwhelmingly in favor of ASIC over FPGA.

(In particularly if you are the first mover that proves the market with an FPGA you should have started an ASIC design in parallel to it because the second and third movers will have ASIC from day one and a cost structure 10x or 20x better than you!)

This keeps FPGA a niche market because it serves niche markets.


> The point is that the free toolchains got good enough that in general you don't need to care, and I think it's reasonable to expect the same dynamics to apply here.

No the question was: would the vendor want this?

I'm referring to this comment above:

> Programmable logic seems like a space that could be disrupted with a company that has fully open-source tooling around its hardware along with great documentation. (...)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: