Hacker News new | past | comments | ask | show | jobs | submit login

> but their instruction window does not seem to scale.

The issue with scaling instruction windows really has more to do with (i) the scheduling logic and (ii) the physical register file than it does with ISA semantics. In particular, the scheduler is usually quite expensive to scale, because it needs to compare each completed result against the not-yet-fulfilled sources on waiting instructions.

> All the dependency information available to a compiler/runtime is lost when code-generating to x86 (same for ARM), yet a processor core needs to extract the same information again in hardware.

I think this is a bit different than what you describe above w.r.t. software dataflow. When you talk about reactive programming in software, dependences are things like event streams or promises that become fulfilled. That's very different from (say) an individual 32-bit register value flowing from producer to consumer instruction, and the compiler usually only knows about the latter type of dependence. Perhaps you're envisioning a different type of reactive/dataflow programming framework though, like maybe chaining together arithmetic transformations or something similar lower-level?

---

All of that said, there has been some interesting work in compiling conventional software down to a more "dataflow-like" architecture. See e.g. the TRIPS project at UT Austin [1] from about 10 years ago -- the idea there was to build a fabric of execution units connected by a mesh interconnect on the chip, and then stream data through.

[1] https://www.cs.utexas.edu/~trips/




I was indeed more thinking of the fine-grained dataflow or event-driven tasks, in the style of VAL/SISAL or very recently OCR (https://xstack.exascale-tech.com/wiki/index.php/Main_Page).

I heard there were some fundamental problems with TRIPS, but never got any detailed explanation. Any idea what it was? Is fine-grained dataflow still too much overhead to do in hardware or was there something else.

There were some dataflow-critique papers back in the 80-90s, but most of their points seem moot these days: low transistor occupancy (compared to the 90%+ of the simple RISC microprocessors) and high cost of associative memory needed for the token matchers. These days, we are working with billions instead of 10k transistors and commonly ship processors with megabytes of associative memory in the form of SRAM cache.


Cool, I have some more reading to do -- thanks for that link!

My understanding of TRIPS is that compiler complexity is an issue -- it depends on being able to chunk the program into hyperblocks with fairly high control locality (i.e., remain in one hyperblock for a while), because the transition cost is higher than a simple branch on a conventional superscalar. In the best case this can be great (32 ALUs worth of ILP, IIRC) but in the worst case (e.g. large code footprint) this can degrade terribly.

IIRC there are some more practical issues too -- e.g., the ISA is specific to the microarchitecture (because it explicitly maps to ALUs in the fabric), so binary compatibility is an issue.

All that said, still a really interesting idea, I think.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: