The work-stealing part can be understood as automatic, but Java's FJ is responsible for that AFAICT.
Having to state "do this in parallel" (by using clojure.core.reducers fns instead of their clojure.core counterparts) is not automatic but explicit parallelization.
I believe that the idea of a compiler that will infer the parallelizable parts of your code crosses the limits of computability.
I view the need to use the reducer functions in place of the "vanilla" versions similarly to a compiler flag (you could stick it in the namespace definition and not worry about it for the rest of the code). In my mind, what makes this "automatic" is that in at least 80% of the cases, that's the only thing you would need to do. All of your code using map, reduce, and friends can remain mostly unchanged.
Before reducers, I'd have said that Obj-C's blocks with GCD was the closest thing to "automatic parallelization" (in that you could get concurrent execution without the need to handle explicit concurrency primitives). The problem, of course, is that writing C/Obj-C with closures is a non-trivial shift in design. With Clojure, the way that code is currently written is already well positioned to benefit from reducers.
The functions are in a separate namespace at the moment, but perhaps the ultimate plan is to make them the default? Does anyone know whether that is the plan?
I suppose there should always be an escape hatch to explicitly choose either of the two for a specific piece of code, but the default could be "JVM, please choose what's best for my code".
Parallelization has some fixed cost, so for small (i.e. most) work loads, using it results in worse performance. So I don't see reducers (or clojure.core/pmap, for that matter) becoming the default.
Generally, the part of a program worth parallelizing is pretty obvious: that that is long-running, must process lots of data, etc. Most calls to map/reduce/filter in the average (Clojure) program are nothing like that.
Finally and AFAICT, ForkJoin can do actually worse than a simple FixedThreadPool (as used by clojure.core/send) for workloads that are "symmetric" and not particularly divisible in subtasks.
The reducers versions have the drawback that they aren't lazy because they are defined in terms of reduce; so there is still a place for the core map/filter/etc functions which can operate on, and return lazy sequences without doing any actual work.
Having to state "do this in parallel" (by using clojure.core.reducers fns instead of their clojure.core counterparts) is not automatic but explicit parallelization.
I believe that the idea of a compiler that will infer the parallelizable parts of your code crosses the limits of computability.