One significant issue is that most programming languages are inherently temporally inexpressive, having either full order (imperative/impure functional) or no order (purely functional). Ideally the language would make it easy to organize statement in such a way that a programmer could indicate a partial ordering relation without having to use function composition. Additionally, a big problem with function composition in purely functional programming is that multiple long function compositions can only be ordered as monolithic units, however in many cases being able to order at the individual function level is desirable. If a language had an explicit notion of relative temporal index for statements (which could be adjusted) it would be a nice win for writing concurrent code. That would also let compiler writers lift a lot of the stuff they do up to the program source level as macros (which would be a HUGE win).
A futures library solves the case where, in an imperative programming language, the programmer writes a block to compute A, a block to compute B, then combines A and B. Pure FP works fine in that case, since writing (+ (compute-A) (compute-B)) does the same ordering.
It seems to me pure FP is fine unless computations for A and B have interdependencies, which would result in duplicate computation in pure FP unless you can refactor to a different algorithm. I'm not sure I understand what you mean by wanting to control ordering at a "the individual functional level", unless you mean interdependencies between computations done in "monolithic units".
A->(loop B until done)->C
A->(loop D until done)->E
C-\
+-> F
E-/
Something like that. Am I one the right track?
No, because I can still express that in pure FP:
(let [(A (compute-A))]
(let [(C (compute-C A)
(E (compute-E A)]
(compute-F C E)))
Trying again ...
A->(loop B until (valid B D))->C
A->(loop D until (valid D B))->E
C-\
+-> F
E-/
Is that better? Care to help me understand what you're getting at?
I was thinking more along the lines of working around data-locality issues. For instance, imagine that you have code that requires very high latency fetches, or you are working with a data set that doesn't fit in memory. Typically, you have to develop modified algorithms that for these scenarios, but there is no reason that a minimal fiber scheduler couldn't adapt seamlessly. Even better, if your conditions change (like for instance, mobile devices moving from low throughput to high throughput links) a scheduler can adapt, but the hand rolled algorithm must be re-coded.