This. I don't think enough people appreciate how the simple and clear the JVM byte code's design is.
Most platforms in use today are so complex. Take a look at x86, or one of the latest ECMAScript specifications. Even LLVM bitcode is a bit complicated compared to the JVM bytecode.
I think in programming language research, going forward, we need some research into "high-level bytecodes". I.e. bytecodes that capture high-level concepts in a clear and simple way.
Those bytecodes happened because the JVM bytecode was designed to be easily interpreted, whereas CIL was designed for JIT compilation. So for example CIL's `add` opcode is missing info that needs to come from the context in which it is used and the JVM's `iadd` and variations are easier to interpret.
You can see this design choice even today in how the JVM and CLR work. The JVM starts execution in an interpreter mode, then gradually compiles pieces of code as it detects bottlenecks. So the compilation that happens at runtime is very gradual and based on runtime measurements.
The CLR on the other hand has done JIT compilation, with the ability to cache the compiled code for faster startup (e.g. Ngen). So it has been oriented towards ahead of time compilation.
Sure. I tried to find information yesterday when I posted that regarding whether / how the `add` code handles differing types. Now since I'm not on my phone, I looked up the ECMA spec and it looks like CIL still only allows like types, with some minor exceptions involving `native int`. So it's just up to the compiler to make sure that actually happens and extend types as necessary.
I also prefer the CIL and also the C# language to Java (though I really like Java and its ecosystem), but we have to admit that MS had 5-10 years to learn from the Java design decisions and their effects, and still did not manage to overcome every problematic point :)
Although it should be mentioned that java language semantics are largely (depending on how you measure them :-) absent from the jvm. (Default methods were a very unusual change in that respect)
And, as you say, subsequent history has weirdly inverted the JVM and CIL. The former is a lot less 'J' and the latter is a lot less 'C' ;-)
I would bet that there's a lot higher rate of .NET users running VB.NET than there are JVM users running Scala/Clojure/Kotlin. They just don't tend to be the sort to post to Hacker News.
I think the bane of James Gosling's existence is being incorrectly associated with Java-the-lacklustre-language instead of correctly being associated with the superb JVM.
If there is a problem with java bytecode, then that is hard to verify. You need multiple passes over the bytecode, until you've reached a steady state.
There is also the "issue" that Java bytecode allows arbitrary control flows with goto, while Java doesn't.
IMHO WebAssembly solves that better but I also need to admit that they could already learn from Java.
> IMHO WebAssembly solves that better but I also need to admit that they could already learn from Java.
While that may be true of the language Java, WebAssembly's jump instructions are not without their annoyances too. For example, the JVM bytecode requires your stack to be precise when jumping, WebAssembly just cares about the most recent piece. If you expect your jump targets to have the same stack layout, WebAssembly makes the impl handle it. I had to account for this and other differences in my compiler [0].
The specification is dense but still readable. I'd start reading in order, up through the third chapter which explains how to compile various Java snippets to bytecode. Perhaps start there and go back to the second chapter when you need more context.
Back when Java decompilers were much more primitive and easy-to-break than they are now, I wrote some mods for a Java game in a language called "Jade", which was essentially a textual syntax for raw JVM bytecode. Oddly enough I can't easily find any references to this tool online now.
Overtaken by HTML templating languages, gems and a dozen other things with the same name. Eventually these items reach such a density to form a black hole, sucking through the older items, perhaps to another place in spacetime.
This is very cool. You can basically even call all Java classes (and the other way around) from that JavaScript! (And reliably limit the classes that can be used if you want to execute JS in a safe environment.) Why isn't this used more often? (Or is it used more often?)
I can't speak to whether it is used more often, but I would bet the java-based Nashorn vm is significantly slower than the c++-based nodejs. In fact, a cursory google search shows this is the case, and it's not even close.
While it is of course going to be slower, "it's not even close" is a judgement claim. The tests you have link talk about 1.6-2-3 times the difference. In a lot of applications the performance hit is going to be worth gaining access to the whole Java ecosystem of libraries.
Also the performance is very likely to increase with newer versions.
Nashorn isn't that slow, but there's another JS-on-the-JVM project called Graal.js which is about as fast as NodeJS/V8.
One reason Nashorn isn't used that much is that it doesn't expose a node.js compatible API. JS people often want Node specifically, not just the ability to run JS. It has some cool features though. The shell mode is neat.
I investigated trying to use Doppio for Peergos. It is a very cool project. In the end we decided against it because it wasn't fast enough for our use case yet, and it requires the page to download the whole JVM (at least the rt.jar - the runtime) which is >60 MiB. We managed to get around this by manually stripping out the parts of the JVM that we didn't need, which brought it down to a few MiB.
I imagine that once they update it to Java 9 with the modular JDK (and resultant splitting of rt.jar) this reduction will largely happen automatically with it lazily downloading the parts that it needs.
Java applets for JVM written in TypeScript transpiled into JavaScript will be the next big thing. Freeze any browser, anywhere, instantly. Imagine the print out of stack trace with error - so exciting.
Assuming it worked well enough, you could use it to run legacy Java applets without having the Java plugin installed. That would at least improve the security situation a bit while still allowing for legacy applets.
> This paper presents DOPPIO, a JavaScript-based runtime system
that makes it possible to run unaltered applications written in generalpurpose
languages directly inside the browser.
Someone should really have told them about webassembly...
Doppio's lineage goes back even further than 2012. A "JVM in JS" was given in Emery Berger's 691ST course in the fall of 2011. My notes show that I submitted my "finished" JVM on October 26, 2011. I say "finished" because it became obvious rather quickly that a lot of things would be difficult if not impossible in JS (like threading and synchronization) and so we negotiated with Emery to come up with a subset of Java that we could reasonably implement for a class project. IIRC, there were 6-8 VMs written that semester.
Doppio proper (the repository you refer to) started during the spring meeting of 691. Those guys went above and beyond the minimal spec we implemented, and they tackled a lot of stuff that we thought was impossible. Thus the research paper.
IIRC, we also had to write a decentralized chat program in JS that semester. That was also "fun".
Q. Can asm.js serve as a VM for managed languages, like the JVM or CLR?
A. Right now, asm.js has no direct access to garbage-collected data; an asm.js program can only interact indirectly with external data via numeric handles. In future versions we intend to introduce garbage collection and structured data based on the ES6 structured binary data API, which will make asm.js an even better target for managed languages.
I admire the effort, but: doesn't "academic" mean "scientific"? Can there possibly be any "science" in having a well-known VM reimplemented in a well-known programming language?
Please take a look at the contributions in the paper at the end of the introduction. The JVM uses operating system abstractions (like a file system, threads, and synchronous APIs) that were non-existent in the browser when the paper were published, and are still widely unavailable natively. Figuring out how to construct the functionality necessary not to just simply execute Java bytecode but to provide a full Java Virtual Machine, and then show how this could be generalized beyond a JVM, is a CS contribution. Disclosure: I'm a labmate of the author.
Here's one of the scientific questions, and FWIW, it's actually written right on the first page: can you implement a programming language that relies on synchronous primitives in a language that has only asynchronous primitives? The answer to that question is not obvious, but it turns out to be "yes". Doppio is existential proof that it can be done. Furthermore, that fact is falsifiable (see Karl Popper), which makes the process of learning the answer "science."
You may not think this is an important question to answer--and that is your prerogative--but you can't argue that it isn't scientific. One of those other scientific questions is about threading (again, see page 1). BTW, did you know that System.out.println relies on synchronization primitives? You literally cannot write helloworld in Java without invoking a lot of machinery.
> I admire the effort, but: doesn't "academic" mean "scientific"?
No, it doesn't. Or did you forget about music, literature, engineering, business, finance, history, philosophy, dance, art, architecture, nursing, etc.?
> Academic: of, relating to, or characteristic of a school, especially one of higher learning.
It appears as though this was a university project, the kind of project one completes for their masters even. Academic seems to be precisely the right term.
File a bug report, please. Doppio is actively maintained, though our current focus is on Browsix (which incorporates and extends some of Doppio's functionality -- see browsix.org).
After seeing this I wanted to know:
- how many opcodes exist in the JVM
- how easy would it be to create another language that compiles to the JVM.
- how does the JVM represent basic data types
The list goes on. All because someone built a JVM in JS.
What happened to the hacker mentality of HN? Where's the curiosity gone?