Hacker News new | past | comments | ask | show | jobs | submit login

I’m excited for ideas like this to become mainstream. Today’s approaches to heterogeneous and distributed computing are how I imagine single core computing was 40 years ago. You have to manually manage practically everything. Instead, let the compiler or interpreter or whatever figure out where to actually run it (CPU vs ALU vs GPU vs remote machine #42), what to keep in what part of cache (L1 vs L2 vs RAM vs disk vs S3), etc.



We have, though. Spark, for example, does this in just about every language. It's been around for ages, is liberally licensed, and deployed at scale in thousands of enterprises.


Well, Spark does allow you to accomplish distributed workloads for certain forms of computation. But it's limited to those forms of computations (streaming, map-reduce). It also has a large operational footprint. It's also lamentable that distributed code that uses Spark looks nothing like its non-distributed counterpart.

Something very much like Spark map-reduce can be implemented in ~100 lines of Unison code:

https://www.unison-lang.org/articles/distributed-datasets/

Some videos on Unison's capabilities over and above Spark:

Distributed programming overview: https://www.youtube.com/watch?v=ZhoxQGzFhV8

Collaborative data structures (CRDTs): https://www.youtube.com/watch?v=xc4V2WhGMy4

Distributed data types: https://www.youtube.com/watch?v=rOO2gtkoZ3M

Distributed global optimization with genetic algorithms: https://www.youtube.com/watch?v=qNShVqSbQJM


Spark still references functions by name though, right? So if a peer says "call this function" I have to trust both that peer AND whatever system resolves the name to some cpu instructions.

Plus you're significantly limited in how much you can memoize if you can have different versions of the "same" function.


spark is great for distributed computation.. also has about a million config switches and is generally kind of 'bulky'. EMR makes management a lot easier, but you still have to fiddle with num executors, memory, etc. But it has been 'through the wars' and is generally pretty solid on some pretty large data sets. Once you get it conf'd it's pretty good. The best part is just writing the scala code to run the job.. admittedly, it would be great to use something a bit lighter for certain workloads.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: