I’m excited for ideas like this to become mainstream. Today’s approaches to hete...

junon · on Nov 17, 2022

We have, though. Spark, for example, does this in just about every language. It's been around for ages, is liberally licensed, and deployed at scale in thousands of enterprises.

zawodnaya · on Nov 17, 2022

Well, Spark does allow you to accomplish distributed workloads for certain forms of computation. But it's limited to those forms of computations (streaming, map-reduce). It also has a large operational footprint. It's also lamentable that distributed code that uses Spark looks nothing like its non-distributed counterpart.

Something very much like Spark map-reduce can be implemented in ~100 lines of Unison code:

https://www.unison-lang.org/articles/distributed-datasets/

Some videos on Unison's capabilities over and above Spark:

Distributed programming overview: https://www.youtube.com/watch?v=ZhoxQGzFhV8

Collaborative data structures (CRDTs): https://www.youtube.com/watch?v=xc4V2WhGMy4

Distributed data types: https://www.youtube.com/watch?v=rOO2gtkoZ3M

Distributed global optimization with genetic algorithms: https://www.youtube.com/watch?v=qNShVqSbQJM

__MatrixMan__ · on Nov 17, 2022

Spark still references functions by name though, right? So if a peer says "call this function" I have to trust both that peer AND whatever system resolves the name to some cpu instructions.

Plus you're significantly limited in how much you can memoize if you can have different versions of the "same" function.

cmollis · on Nov 17, 2022

spark is great for distributed computation.. also has about a million config switches and is generally kind of 'bulky'. EMR makes management a lot easier, but you still have to fiddle with num executors, memory, etc. But it has been 'through the wars' and is generally pretty solid on some pretty large data sets. Once you get it conf'd it's pretty good. The best part is just writing the scala code to run the job.. admittedly, it would be great to use something a bit lighter for certain workloads.