Didn't post this, but am one of the authors of Legion (more on the language side / Regent [1]). Feel free to ask me questions. I do think our front page does an especially bad job of explaining what we do, so I'm sure you'll have some. :-)
So based on my quickie checks... is this a task-based dynamic scheduling (as opposed to static thread-based scheduling) that extends out to CUDA / OpenMP / other backends?
Which sounds pretty good to me. I'm writing a very basic version of future + task based scheduling for Chess AI, so I think this methodology is very powerful. I haven't dug very deeply into your work, but it seems like you guys have a good idea of what you're doing.
Seems like a cool project. Unfortunately, my home computer is AMD Vega64, so I won't be able to use Legion. But its still got a methodology that I personally think, is really cool.
---------
The silly example that "sold" me on task-based scheduling was Intel's "Stupid Fibonacci" example. I think you guys should put a similar example on your website: https://software.intel.com/en-us/node/506102
Stupid parallel Fibonacci is basically fizzbuzz, every programmer should understand it. And while its not very useful, you should be able to represent it in Legion in less than 10 lines of code.
But anyway, the reason we don't really push fibonacci is because it (a) doesn't really expose the unique features of Legion/Regent and (b) if you write it naively, probably performs even worse than naive fibonnacci implementations on other parallel frameworks.
The real secret sauce in Legion is being able to load a mesh or whatever, partition it a couple different ways (e.g. for ghost cells vs the main computation) and then launch tasks without needing to think about what needs to be updated where to get the data to the right place or copying data down to the GPU and back.
We will add AMD GPU support eventually, since Frontier is going to be a GPU machine, but probably not on a time scale that makes any difference to you.
What tutorial/paper/github repo/demo do you think shows the heart of why this is the best thing? I'm just not sure what this does better than some of the other strategies that are out there that are hadoop/HDFS based, S3/lambda based, from the privs/resources style k8s/cgroups/containers, or from the task based parallelisms that you can see in Apache Airflow, Jenkins, and other job runners.
If you have a tolerance for video, I'd recommend the bootcamp [1]. Otherwise the tutorial [2].
I'm not familiar with all of the systems you mentioned, but I'd think of it more as a Spark or Dask or MPI competitor and less as a job scheduler. We have tasks, the runtime automatically identifies dependencies between tasks based on what data they use, and the support for describing different partitionings of data is very expressive.
Is this Legion project related to the late 1990s Legion, which involved distributed computing for HPC and was funded by the same funding agencies and had collaborations with the same HPC organizations?
It appears not to be. The earlier Legion was developed at the University of Virginia. I see no reference to it on the web page or in the sources on GitHub.
A couple of tentative comments from skimming the paper; I haven't seen Partisan before so take this for what it's worth.
Partisan is an actor model runtime that appears to be (mostly?) compatible with Erlang. The main selling point of Partisan is optimizations which improve the performance of actors, which are impressive, but is orthogonal to many of the issues Legion attempts to address.
In particular Partisan seems to mostly take the actor model as is (in fact this seems to be a goal so that their runtime can handle existing Erlang applications). That means it's going to have all the usual pitfalls that can occur with actor based programming. There is a class of bugs that shared-nothing actors make impossible (i.e. concurrent access to shared state). But there is (in my view at least) a much larger class of concurrency bugs that conventional actor models do not attempt address at all. (I am sure you can find some actor model implementations that attempt to address this, but this effort does not appear to be one of them.)
In contrast, Legion's semantics are sequential. You cannot write a data race, deadlock, or any other sort of concurrency bug because the code behaves as if it were executing literally sequentially. Any parallelism that occurs in the execution is guaranteed by the runtime system to preserve the original sequential semantics of the program. Any access to distributed data is guaranteed to be local because the runtime will move it for you before the task starts executing.
To put it another way, in Legion you have to explicitly think about how to divide up your program (tasks) and data (regions/partitions), but once you do that the parallelism/distributed memory is implicit. In contrast, nearly all of the common explicitly parallel programming models make one or both of the first two implicit (though the user still has to think about it, because data structures have to be divided e.g. between actors), and require the user to explicitly do the last one (which is where concurrency bugs creep in).
Chapel is a bit difficult to compare to since the language is so broad. It provides something of a kitchen sink in terms of features: everything from network atomics up through forall-style parallel loops that automatically distribute their iterations across the machine. The upshot of this is that you'll never run into a point where you just flat out can't do something. On the other hand, it's not necessarily obvious how to do what you want (and do it efficiently).
One fundamental difference with Legion is that Chapel is explicitly parallel. As I noted in another comment, Legion has sequential semantics [1]. As far as I know there is no subset of Chapel that has sequential semantics while still retaining the ability to execute code in parallel. This means there are classes of bugs that Legion avoids that Chapel cannot.
On the other hand, Chapel lets users write code that is pretty high-level while still retaining the ability to do distributed execution. In particular, Chapel's forall loops can be combined with domain maps to get distributed-memory execution of code that, if not literally sequential, still looks pretty darn nice.
Domain maps in Chapel provide the ability to specify how data is distributed around the machine in a way which is decoupled from e.g. the forall loops over the data. There is a surprising amount of subtlety to this, and I admit I initially underestimated exactly how important domain maps are in Chapel. In particular, domain maps are permitted to make internal copies of remote data, which means that domain maps can be responsible for ghosting/halo exchanges. Because domain maps are user customizable, this essentially permits the user to put their low-level data movement code in the domain map, while keeping the rest of the code clean and high-level.
The closest analog to domain maps in Legion would be partitions. But in many ways they're completely different beasts. Partitions in Legion offer a similar amount of expressiveness, in the sense that the user can specify partitions to contain any subsets of the data. But in Legion, data movement in fully automatic. Legion does not permit, or require, the user to write custom data movement code of any kind, no matter how complicated the partition is. Also, partitions are not fixed to specific locations in the machine but can be dynamically rebalanced. Of course all of this is technically possible in Chapel, because domain maps are user-customizable, but none of it comes out of the box and as far as I know this sort of behavior is underexplored in Chapel in general.
As a final note I'll mention that the Chapel folks are really friendly. I've been doing some comparisons with Chapel recently and I found it relatively easy to get help on issues I was having, and that certainly has not been true of all the systems I've been using!
Thanks for this! I've played around with Chapel a bit before, and certainly agree that the community is very welcoming. I did come across Legion when I was researching languages in the HPC space, but admit that I did not get a chance to really take it for a spin--I shall rectify this! The sequential semantics and automatic partitioning seems quite powerful, really appreciate the explanation.
I've been thinking about a fully declarative language (logic-based like datalog/prolog or functional) with an orthogonal optimization layer. This seems to go in that direction, but still based on an imperative base and focused on parallelism. Are there other languages doing this logic/optimization split?
Here's a free idea for anyone who's interested: I want to see a CSS for program optimization. You label your code with whatever semantics labels you want. By default they do nothing, but you can load "code profiles" to instruct the compiler how to optimize the code based on whatever machine/application specific properties you care about. This way you can add optimization hints to the code without littering them throughout the code base.
Didn't post this, but am one of the authors of Legion (more on the language side / Regent [1]). Feel free to ask me questions. I do think our front page does an especially bad job of explaining what we do, so I'm sure you'll have some. :-)
[1]: http://regent-lang.org/