This is intriguing, and solves real world problems that I have come into writing Clojure. On the other hand, it is complicated, taking up to four arguments. It reminds me of Common Lisp's `loop` macro, able to handle complex and interesting situations, but has its own chapter in "Common Lisp, the Language" because it is so complicated. Is it really helping solve the accidental complexity problem at that point? Perhaps. I use the loop macro all the time, so it does bear thinking about.
The OP also speaks of decreasing indentation as a goal instead of decreasing cyclomatic complexity. It is true that one tracks with the other, and I'll be the first to say that I too am an 80 character wide masochist, but optimizing code structure around decreasing indentation is dubious at best.
In a language that I don't quite have the cojones to write, the ideal API for reduce comes from Haskell’s Data.Foldable, where it is called foldMap.
If you don't know Haskell the idea is simple, it is that the essence of MapReduce is for you to have an aggregating data type A with two operations,
empty(): A
combine(A, A): A
such that
combine(combine(x,y), z) = combine(x, combine(y, z))
combine(empty(), x) = x
combine(x, empty()) = x
The requirement that `combine(x, y) = combine(y, x)` is not strictly necessary if the MapReduce implementation maintains shards and shard indices and is careful to only merge neighboring shards, the first property is then sufficient.
To make this into MapReduce, you need your underlying data type X and an injection, (x: X) → A. This is the mapping part of MapReduce.
So for example “First10” is such an aggregation; given the first 10 (or fewer) in shard 1 and the first 10 (or fewer) in shard 2, you can form the first 10 (or fewer) in shards 1+2. The “map” function can use the empty value to provide “filtering,” so you get “the first 10 which satisfy this logical condition” directly.
Sum is another such aggregation, Count is a specialization of it, Mean is technically just a pair of Sum and Count. Oh, did I forget to mention? Given two aggregations you can easily produce an aggregation that does both. “give me the sum of this field and the first 10 records summed and the count of terms” ... One (parallelizable) pass for all three, because the aggregations compose.
So a macro API is possible where this is instead something like, (I don't know Clojure)
You can argue if it decreases complexity or not - I think it's hard to judge until you use it regularly bc right now at first sight, for me at least, it has a strong unfamiliarity to it. It looks kinda long and complicated and doesn't play well with the threading macro (in comments this is discussed)
If it's a neat trick macro you use once every two months then you may never get used to it
But the fact that it yields a clear performance gain is a more objective "win".
It's not just stylistic at this point and I'm inclinded to give it a shot
I don’t really find loop complicated because the basic cases are straightforward. The advanced cases are discovered complexity when you deliberately wade into the depths to solve some weird problem.
I don’t think anyone with two gray cells to rub together doubts Stephen Wolfram’s software chops. It’s just he chose to become a billionaire instead of a social outcast like RMS. Their roles could easily have been reversed.
Wolfram Language is one of the coolest languages in existence.
The fact that it's not open source is a huge deal breaker for me.
I can't deploy it on any interesting platforms, nor can I bundle the it easily to release to the public and guarantee that the public will be able to run the code.
Check out https://ferret-lang.org? Mostly Clojure syntax (files are .clj) but reduces to C++. Seems to be mostly for embedded/microcontroller applications, Arduino, raspberry etc. Doesn't depend on the stdlib and can run with or without a GC and in as little as 2k of RAM apparently, which is pretty neat!
If you have it working in babashka though, it means it can also be used with Clojure and compiled to native using GraalVM. So anything that runs in babashka can be compiled to native as well. At least in theory.
Super long startup times and large binary size are what turned me off. This is clojure specifically not jvm specific. I find the JVM to be fine, but when I looked for another lisp I did want one that was compiled just because it was easier to distribute my programs.
I invite you to try it. Put together a modest CLI that does SQLite, some network calls, and unzips files. ( https://github.com/djhaskin987/zinc ). Using native-image with any reasonable set of dependencies like this is *horrendous*. Just because you can doesn't mean it's tractable. I spent 10% of my time writing the tool and 90% of it trying to get it to compile. Absolutely the worst experience trying to get something to build in my life, and I'm a devops engineer. Building and shipping code is my thing.
Ya, I'm not going to disagree, it's not the nicest build pipeline.
That said, you can figure it out normally.
Using native dependencies will always be the hardest. I'd recommend first trying to use graalvm friendly libraries, and if not, libraries that are pure Java and don't have native dependencies.
For SQLite for example, you have to include the SQLite C driver, and that's where it gets a bit complicated.
It lists Clojure native image friendly libraries you can use. And it also includes some pre-configured dependencies you can depend on that will bring the correct build config.
Then just pretend like there are no other libraries yet for this "new" language. Or learn about the native image process more deeply and contribute to the effort to add easier support for more libraries.
It is ridiculously difficult to use native-image. The only one I know of to successfully pull this off in a broad wide-ranging program is the author of babashka, Michael Borkent. The man is a legend, and using his work I was able to get my program to run on Linux, but getting it to run Windows was a whole nother hurdle and I wasn't willing to continue. Keeping track of all of those jira issues and matching Clojure releases with native-image releases and getting all the configuration files right and writing the scripts that you need to write the configuration files, and then waiting 30 minutes for anything of value to compile, not really an ideal development cycle scenario.
Sure, sure, the base case is easy. I would be more impressed with an example that has datalevin or SQLite as a dependency, though. Also Cheshire and/or jetty. Real-world dependencies make it difficult.
The OP also speaks of decreasing indentation as a goal instead of decreasing cyclomatic complexity. It is true that one tracks with the other, and I'll be the first to say that I too am an 80 character wide masochist, but optimizing code structure around decreasing indentation is dubious at best.