As far as I remember one of the main ideas was "dataflow variables", which are lazily evaluated in parallel (?)
I've read about this paradigm in the literature, e.g. with the Lucid language and others. And the book goes into some depth about it.
But somehow this paradigm hasn't caught on, despite the constant production of new experimental languages. And I've never seen a real program in it -- e.g. one over 5K lines that people use.
Anyone have a conjecture on what happened? Is it flawed, or did nobody pay enough attention? I think there is some overlap with async/await which seems to be the dominant concurrency paradigm now (C#, JS, Python, Rust, etc.)
I think one issue is that real systems are composed of multiple languages, and it's hard to bridge programs with wildly different evaluation semantics with C or JavaScript code.
And I guess the paradigm is not "reactive". It wants to control the main loop, but in GUIs and networking code where async/await is appropriate, the app doesn't "own" the thread of control. A new paradigm for batch programs is perhaps of limited use.
I guess I sort of answered my own question -- the model doesn't solve enough problems to justify its cost. (Also, another issue is that it's a model of very fine-grained parallelism, where as coarse-grained parallelism with limited communication is faster. That's how people optimize in practice.)
> I think there is some overlap with async/await which seems to be the dominant concurrency paradigm now
Dataflow variables are exactly promises where every use that calls for the result is awaited; in a language where it's the core paradigm, you just don't write async and await everywhere (and often the runtime will be free to abandon calculations that won't be used), so, no, it doesn't just have “some overlap” with async/await; async/await is syntax to for using the paradigm in a language that is otherwise eager and synchronous.
So, it's also wrong to say that it hasn't caught on, the paradigm is pervasive and frequently used in industrial programming, which usually uses multiparadigm languages, not languages purely devoted to a single paradigm.
> (Also, another issue is that it's a model of very fine-grained parallelism, where as coarse-grained parallelism with limited communication is faster. That's how people optimize in practice.)
Dataflow variable don't require any parallelism, though they can leverage it.
Citation needed for both your first and second paragraphs :)
As mentioned in my sibling comment [1] there are lots of definitions of dataflow. It's plausible that the model in CTM can be expressed entirely with async/await, but I'd like to see it.
As for the second claim, if it's caught on, then it should be easy to point to code that uses it. Or name some industrial systems that use it. There are some programming models that live only in specific industries but they also tend to leak out into open source over time.
And what programming language do those systems use? Are they using Mozart/Oz or something else? I wasn't aware of any production usage of that language but I could be wrong.
For data flow in industrial programming look at IEC 1131 which is a major standard. It's a different world and, no, it doesn't seem to leak out to the mainstream despite there being some very interesting solutions to i/o heavy concurrent problems.
Haskell has dataflow support too (just to mention a better known language), but not sure how popular this compared to all the other concurrency libraries Haskell provides.
I worked with LabView enough to have an opinion, I think.
The issue is that it isn't actually any easier or faster. The machine still needs to do the same synchronizations and cache invalidations. The engineer becomes very limited in what they can effectively do (recursive structures? They become like {} in Go, performance nightmares).
Labview is very effective in its niche, factory monitoring and automation, but I found beyond that it was a pain and a half.
> Labview is very effective in its niche, factory monitoring and automation, but I found beyond that it was a pain and a half.
I’ve never used Labview but it’s worth pointing out that it’s only one possible implementation, just like we have many different takes on other paradigms. I’ve heard people say that labviews specific design is somewhat flawed, but I can’t comment on that since I’ve not used it myself.
I wouldn't include Excel -- it's similar to but not the same as the model presented in the CTM book.
You could call "excel" dataflow but the problem with that term is that are a dozen distinct models that can be called "dataflow". Similar to why "Google Cloud Dataflow" is a bad product name. That's also dataflow but the term is so general that it's not useful.
For example, in Excel, derived cells are recomputed when their sources change. This is true of some dataflow proramming languages but not others, including the one in CTM. This is expensive and has a lot of bearing on the implementation.
In fact I learned a few years ago Excel is occasionally wrong in both directions:
- it fails to recompute cells that are dirty in the name of speed
- it recomputes cells that are clean even though it's not strictly necessary
That's probably a fine tradeoff for Excel but you wouldn't want a programming language with those semantics.
I've read about this paradigm in the literature, e.g. with the Lucid language and others. And the book goes into some depth about it.
But somehow this paradigm hasn't caught on, despite the constant production of new experimental languages. And I've never seen a real program in it -- e.g. one over 5K lines that people use.
Anyone have a conjecture on what happened? Is it flawed, or did nobody pay enough attention? I think there is some overlap with async/await which seems to be the dominant concurrency paradigm now (C#, JS, Python, Rust, etc.)
I think one issue is that real systems are composed of multiple languages, and it's hard to bridge programs with wildly different evaluation semantics with C or JavaScript code.
And I guess the paradigm is not "reactive". It wants to control the main loop, but in GUIs and networking code where async/await is appropriate, the app doesn't "own" the thread of control. A new paradigm for batch programs is perhaps of limited use.
I guess I sort of answered my own question -- the model doesn't solve enough problems to justify its cost. (Also, another issue is that it's a model of very fine-grained parallelism, where as coarse-grained parallelism with limited communication is faster. That's how people optimize in practice.)
I'm interested in any contrary opinions though.