Hacker News new | past | comments | ask | show | jobs | submit login
Writing a Managed JIT in C# with CoreCLR (xoofx.com)
130 points by benaadams on April 12, 2018 | hide | past | favorite | 27 comments



Thanks, enjoyed the article! Though on this point:

> As you may know, in .NET CoreCLR (and Mono), the JIT has been entirely written in C/C++. That’s sometimes even a joke used by C++ developers to remind the enthusiastic C# developer crowd that their ecosystem is being actually ran by C++.

The CLR has never been only about C#. While C# may be the most popular language to target the VM, it will run anything that has been compiled to valid MSIL (or even emitted at runtime).


No-one claimed it's only about C#.


Correct, but the rest of the paragraph sort of gives that impression (at least to me) by mentioning how language compilers are usually written in a more primitive language, like the Go compiler being written in C then later rewritten in Go.

IMHO that sort of blurred the line between the platform and language in the article...rewriting the Go compiler in Go sort of makes sense, but I don't see why this was a good comparison to the .NET ecosystem. Rewriting the CLR JIT in C# isn't making it "self-hosting" as its not language specific.

Does that make sense?


That's true that I put an explicit emphasis on C#

But let's face it, the whole CoreFX library is in C#, Roslyn Compiler is in C#... the line is already blurred

So while the CLR is not language specific (like the JVM is not Java specific), rewriting the runtime with the most prominent language used for the underlying platform sounds an acceptable implicit emphasis ;)


Fair enough :)


I hear is pretty bad performance wise compared to the JVM for dynamic languages.


Neither the JVM nor the CLR were originally designed with dynamic languages in mind.

Thanks to the efforts of IronPython's creator, which eventually joined Microsoft for a while, the Dynamic Language Runtime (DLR) was created.

Eventually the JVM followed suit by investing into the infrastructure that eventually became known as invokedynamic.

Where most JVMs JIT compilers still beat .NET JIT compilers is that they explore more re-optimizations, while .NET JITs tend to only optimize once, but there are plans to at least improve this on RyuJIT.


What dynamic languages run on the jvm? To my knowledge it doesn't support dynamic types at all.


Clojure, Groovy, JRuby, Jython are popular dynamic languages on the JVM, there are others:

https://en.wikipedia.org/wiki/List_of_JVM_languages

> To my knowledge it doesn't support dynamic types at all.

Not sure where you've got that from or what that's supposed to mean.


You missed the most relevant one that even comes bundled with the JDK, JavaScript.


> Clojure, Groovy, JRuby, Jython are popular dynamic languages on the JVM, there are others

Perhaps he meant no dynamic languages have become popular enough on the JVM that it becomes common knowledge that the JVM can support popular types. None of those 4 languages you mentioned (Clojure, Apache Groovy, JRuby, Jython) are popular, whereas alternative JVM languages built from the ground up as static languages (i.e. Kotlin and Scala) are the ones people are adopting.


My impression was that Clojure's popularity was comparable to Kotlin and Scala's. Is that not the case?


I like programming in Clojure -- it's the language that introduced me to lisp syntax and immutable values. But Scala has been adopted and Kotlin is being adopted to a far greater degree than Clojure seems to have been.


Rhino (JavaScript on the JVM, originally written by Mozilla) dates back to 1997 and JRuby to 2001. Suffice it to say, dynamic languages have ran on the JVM for 20 years.

Reflection used to be used to implement dynamic types, but since Java 7 (2011) there's been a JVM instruction, invokedynamic, to explicit support dynamic types.


Nice! This sort of direction is definitely the way managed runtimes are going and should go, long term.

Here's a quick comparison vs Graal, which is the same thing but for the HotSpot JVM.

Java 9+ has Java-side interfaces for extending the compiler called JVMCI. So you don't need the deep hacking with the C++ vtables and other such mangling, that I guess could be quite fragile in the face of future changes to the runtime. There's a clean plugin API to let you write a compiler entirely in managed code. You can drop it in as a JAR, set some command line flags, and it'll be used.

HotSpot has an interpreter (written in C++ for now). The CLR never did, instead it uses a model where all code is compiled on first access. This means Graal avoids the re-entrancy issues that Alexandre solves with a counter in a TLS slot. Instead Graal is invoked asynchronously whilst the main program continues to run. And Graal is itself interpreted until it warms up, at which point it starts to compile itself.

The ICorJitCompiler interface the .NET JIT has to implement is much simpler than the HotSpot equivalent:

http://lafo.ssw.uni-linz.ac.at/javadoc/graalvm/jdk.internal....

Whilst the compile method is kind of the same, the CompilationResult structure is far harder to fill out. That's because HotSpot heavily uses de-optimisation to make things go fast, so the compiler isn't just responsible for production of machine code but also de-optimisation metadata. That metadata enables converting stack frames of optimised code back into interpreter stack frames and is quite complex to produce, in fact it affects the design of the whole compiler. It's also required to produce safepoint metadata, GC pointer maps and more.

http://lafo.ssw.uni-linz.ac.at/javadoc/graalvm/jdk.internal....

There's also an additional reflection API that provides access to things that the compiler needs, like raw bytecodes.

The author says finally

The benefits to have a Managed JIT wouldn’t be visible in short-term or even medium-term while it could open many possibilities in the long run, but for a project of the - legacy - size of .NET, this is probably too much to ask.

But is it too much to ask? Java not only has Graal but Project Metropolis, the goal of which is to explore conversion of HotSpot into Java. HotSpot is older and more complex than the CLR, but apparently they don't consider it impossible. And SubstrateVM is the spiritual successor to Jikes; a JVM written entirely in Java, using magic methods and classes to model pointer arithmetic for the garbage collector and the compiler in AOT mode to produce the final binary image.

The real issue is not .NET's size or legacy, but rather, Microsoft's level of investment in it and/or their strategic choices. The .NET team are paying a heavy price for the lack of investment into portability in recent years, and have been frequently distracted by the many re-spins and backwards compatibility breaks. They have also tended to push complexity and performance into the C# language rather than the runtime. Perhaps in hindsight the Java approach has worked out with less tech debt and a better path to the future.


I don't think It's a problem of level of investment from Microsoft. They have quite a good chunk of top level engineers working across the board on .NET. They are making lots of progress in performance area in C#, including the work on AOT on CoreRT...

No, in my opinion, the critical issue that .NET has been facing is not to be able to enter the academic circles. I remember to work in Java 20 years ago in my engineering school, and Java was already there... going to the same school a few months ago to present .NET and Unity... and students were literally discovering .NET! The amount of complete misinformation is very scary there... That's the unfortunate backslash of years of Microsoft not being friendly with the Linux&OSS community (closed source, no OSS, proprietary APIs to lock users to their OS, trying even to play not nicely with OSS...etc).

Now a lot has changed at Microsoft for the past years to improve this situation, it is definitely not the same, a lot more open. But still, look, .NET is barely visible in the academic circles. It will take years of education to re-balance things. In the mean time, the Java community has been working so closely with academic circles for years that they have been able to bring many interesting breakthrough... not counting the super large ecosystem that Java has, directly and indirectly because of that. There are a lot more Java OSS friendly developers... while a large part of .NET developers are more corporate-closed-source developers that are not sharing anything... despite the .NET OSS movement that has been slowly growing in the past years... yet, not at the level of Java.

Though, luckily, the design process of the C# language has been a lot more streamlined than Java (faster evolution in many areas, better choices at the beginning - value types, a lot more control of memory layout, no generic type erasures...) and in my opinion, been able to bring more interesting features than Java has been able to bring. Also, performance wise, Java is today, in many tests, behind C# and .NET Core. So things are not that bad after all.

About the Jit Compiler, I'm really please to see this Graal initiative in Java, as it can help to further bump the idea that it would be very relevant for .NET as well!


I depended where in the world one was located.

The Portuguese university where I graduated from, had an agreement with Microsoft Education, I remember visiting it several years later, and getting positively surprised that all compiler design classes had migrated to CLR as target platform.

The same classes where the first ones several years earlier to adopt Java, alongside JavaCC, soon after they got released.


Very true, let me correct that my comment mostly applies to what I have seen in France ;)


Graal started as Project Maxime on Sun Research labs, back in 2005.

Oracle kept Sun Research labs, Maxime eventually became Graal, absorbed lots of PhD work specially from the Linz university, and became integrated into the official JDK in 2018, 13 years later.

The opening line of Project Metropolis, was how Java should look like in the next 20 years, and lets be honest it remains to be seem how long Oracle will be committed to it.

I know you are aware of all this, given the very good comment you posted, but I guess the big question is if Microsoft is also willing to spend the same amount of money into, lets say, using .NET Native to rewrite the CLR in C#.


> ..using .NET Native to rewrite the CLR in C#

They're already doing this, see https://github.com/dotnet/corert/blob/master/Documentation/i...


I agree that the .NET runtime seems under invested in and the lateness to the portability game is damaging.

It would also be nice to see more experimentation, e.g as you mention Graal, Truffle, Substrate VM and Project Metropolis all look amazing and there doesn’t seem much on the .NET side to compete other than CoreRT.

However I’m skeptical about the claims that complexity has been pushed into C# and has caused tech debt. What do you mean by that?

To me they seem to have made the right trade offs. Java is getting value types too because there’s a limit to what run times can do on behalf of developers. Spans in C# and .NET seem the way languages are moving in general with e.g slices in Go, views in C++ etc.

There are definitely some pitfalls for high performance areas but they are actively being worked on as well as ensuring the general case is better, e.g devirtualization of interfaces and abstract classes where possible in the JIT IIRC.


The only technical debt I see on C#, are little things like 3 different ways of declaring lambdas for example.

As for the rest, my only complaint is that Java and C# designers should have paid more attention to what was being done in Delphi, Oberon, Component Pascal, Modula-3, Eiffel, and have offered value types, spans, low level primitives for high performance code, AOT/JIT compilers since version 1.0.

And in this regard, C# fares much better than Java.


I'd argue that copying covariant array conversion from Java was a mistake as well. Perhaps in both the language and the runtime.


Actually they copied it from all major OOP languages with common root object that were already prevalent before Java was designed.

I am thinking about Eiffel, Smalltalk, Sather, Oberon family, Moduls-3, Object Pascal.

So it was a natural mistake to do.


By pushing things into the language, what I mean is that for example C# has stack allocation and the JIT is a very straight line compiler (for many years it was basically a C++ compiler, opts wise). The Java guys refused to add this complexity to the language and instead implemented escape analysis and scalar replacement. Now with Graal they're doubling down on that approach: there are no proposals to add stackalloc to Java, but Graal is capable of eliminating allocations in far more cases than previously. In other words Java can automatically mark things as stackalloc (sorta) without the user thinking about it, and it gets better over time. Whereas C# code does not get less allocation heavy over time.

A more obvious example is value types. CLR guys put value types into the runtime and type system. Now Java is going in that direction too, but actually, the Truffle guys have demonstrated that they can specialise data structures to get the same layouts that value types give you on the fly with compiler techniques again (e.g. List<Integer> is compiled to List<int> behind the scenes when possible). And both C2 and Graal inline code more aggressively than the CLR does, and when functions are inlined together more EA is possible, so that's like passing data by value instead of by reference. So it's not really clear that Valhalla is going to be necessary in the end, if that line of compiler research is pursued further, and over the years I've been revising what I anticipate the performance improvements to be from it downwards.

And finally Spans in C# are yet another example of this. Java has ByteBuffer which provides a similar abstraction, but it doesn't do e.g. array slices or access to the stack. But then again, you hardly need to think about the stack when writing Java because the runtime will use it most of the time the human would have done anyway, and the collections API offers sub-array access with inlined and fully optimised accesses. I would note that Go and C++ are both designed with AOT compilation in mind and neither are widely held up as examples of excellent language design.

Overall, it's not clear that we've reached the limit of what runtimes can do. It seems more likely we've reached the limit of what runtimes written in C++ can do before they hit the sort of complexity scaling limits that make Java and C# so widely used to begin with. The speed with which the Graal/Truffle guys have been able to develop new optimisations that were theoretical for years with the C++ compiler is remarkable - even things where the more traditional branch of Java development is considering .NET style language complexification to get it.


Is there any problem in using this in applications that verify the code signature?


It's the bytecode (the 'intermediate language' in CLR terminology) that is verified by signature, not the dynamically generated machine code, so it won't make any difference.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: