> Easy - drop the JVM branch and continue the CLR branch.
The CLR branch, IIRC, was dropped because the CLRs reified generics -- which are better than erasure so long as your type system fits exactly what the CLR assumes -- are a very bad fit with Scala's more robust type system, which leaves you stuck with a bad interop story if you don't want to radically change Scala's type system to fit the CLR's assumptions.
Reified generics is one of the reasons why Scala on the .NET is effectively dead. The VM's type system isn't powerful enough to support some features of Scala, such as higher-kinded types.
But the JVM not having reified generics is a mostly a blessing for Scala. With reified generics, your language's type system can essentially be only as powerful as the VM's type system, and I think it's clear that Scala today wouldn't be the same if Sun had gone the reified generics route.
(Still, it would be nice to have specialized generic containers for some primitive types in the JVM)
Tail calls are actually rather un-optimized in .NET. F# avoids them as much as possible. F# used to emit tail calls any time it could, but it hurt performance. IIRC, tail calls were also ignored in many areas prior to v4 (and they still have some limitations IIRC). The JVM could implement tailcalls via analysis; the tail prefix isn't truly needed.
But Xamarin's porting of Java to C# to run Android on Mono without the JVM did show a rather large perf increase. I'd assume that's mostly generics eliminating boxing.
What do you mean by "The JVM could implement tailcalls via analysis"? Offhand, I can't see a way for this to be practically viable, but I may just be lacking imagination.
Maybe I'm dense, but why not the same way a compiler implements it? In a recursive function, at the point of recursion, flag the call to have codegen not setup a new stackframe and just reuse the current frame.
But you may be right that this might not be viable to run fast enough for JIT on a large codebase. Obviously you could tail-call enable every call that's eligible, but that probably doesn't generate optimal code. OTOH, they have tracing in the JIT for hotspot stuff, right? So if the tracing part counted how many call loops it had hit, it could decide to implement tailcalls in that path. Or perhaps it invokes analysis when the stack gets to be 75% full.
Right, my thought was that doing real tail calls everywhere that's eligible would almost certainly be a performance disaster. I'm not sure that hotspot could help, at least if applied naively, since correctness depends on never missing a tail call that could lead to unbounded stack usage.
For example, consider an F#-like language on the JVM. Given the definition `let apply f x = f x`, you could imagine `apply` getting called in all sorts of contexts that don't result in unbounded recursion, so a hotspot-like system would not generate a tail call when compiling the method given the first N invocations. But then if another function is defined as `let rec loop n = if n < 1 then n else apply loop (n-1)` and invoked as `loop 10000000`, then the lack of a tail call in `apply` is fatal.
So if something kicked in at 75% stack usage (via a guard page?) and forced analysis, wouldn't that work? It'd be like the plethora of JVM options now: another configurable switch.
How much of the stack are you willing to look at when you hit a guard page? The non-tail call might not be right near the top, depending on the structure of the code. I'm not saying it's impossible, but I think the approach of having the higher-level language add the annotations is probably more practical.
If the tail-call-needing function isn't found rather quickly (say, in the first 100? frames), then isn't it unlikely that the tail call will help? That seems like a much rarer case. Is the cost of going 1000 functions deep that expensive? Most threads aren't gonna hit a high % of stack without having some sort of problem anyways.
Of course it'd be better for the JVM to support this in bytecode, just like generics, stack allocation, and pointers. But it seems to me that if there was a real solid need for TCO then a JVM could implement it with tunable heuristics (most Java code doesn't need TCO, so if your code really depends on it in complex scenarios, you can always pass a -XXTcoInspectionDepth argument.)
Note that Scala doesn't implement general tail call optimization; it only applies to direct self-recursive calls (as far as I'm aware). By contrast, on .NET F# implements true tail call optimization, including cases where the call is statically unknown at compile time (as in `let apply f x = f x`).
.NET is Windows-only, which is a significant limitation compared to the JVM. I don't think better performance on Windows -- even assuming going to CLR instead of JVM would provide that -- is a good reason for Scala to move to the CLR as the primary target.
I understand your point. Consider this: using Xamarin you can get native performance on iOS and better-than-Dalvik performance on Android. Is there a good tool chain for that in JVM/LLVM world that supports Scala?
"Better than Dalvik" isn't necessarily a compelling selling point looking forward, since Android is almost certainly going to moving from Dalvik to ART as the default runtime fairly soon.