> But the CPU than needs to use a built in "hardware JIT" to translate the seque...

still_grokking · on Nov 20, 2022

I'm fully aware of the named things.

But I tried already to explain why I think those things failed: It's all about the software.

You can't build a "sufficiently smart compiler" if the task is to parallelize an inherently sequential command stream.

This endeavor failed because it was part of the wrong effort.

The whole point is that people didn't think in terms of systems design any more at this time! They looked on "the hardware problem" in separation.

The expectation was that you can take a program written for "a fast PDP-7" add some "smart compiler" magic on top, and let this run (hopefully) faster on an improved hardware architecture. This didn't work out. Imho to no surprise.

I think the problem lies in the mental model behind computers which dominates the perception is since some time. We're almost exclusively (at least when it comes to general computation) left with the the model of a sequential command stream machine. CPUs execute (interpret) a sequence of instructions. Those instructions may alter an "infinite" (or at least "gigantic enough") global state in arbitrary ways. That's the basic model.

As long as your software is basically written with this model as background you can't "fix this on the hardware level", at least without real magic.

What we no do is to mitigate the problem as best as we can with methods that reached imho completely insane levels.

I think that we could find much better solutions if we would take a step back and rethink the machine in whole. I mean, software hardware co-design is a thing, only not when it comes to general computation especially not together with also OS / runtime (software) design.

My current personal favorite idea how thing could be improved and simplified is to see a computer as a graph of event-stream processors. This mental picture scales nicely on different "zoom levels". From raw hardware to the runtime level to (distributed?) application design. Only that it would require languages that compile down to such a model in an efficient way. A command stream language wouldn't map good or likely even acceptably (same problem as with the old "more exotic" hardware designs).

BTW: For number crunching VLIW and compiler guided parallelism works quite fine. Vector CPUs anybody? That were the fastest computes once, afik. And since MMX up to AVX the whole idea is a big mainstream success. (We're not even talking about GPUs here of course, that's not the topic; nobody tried to replace "typical" CPUs with GPUs so this does not count).