Doppio: JVM written in JavaScript

brad0 · on July 12, 2017

This is cool. Not because it's useful or performant etc. but because it shows how simple java bytecode is.

After seeing this I wanted to know:

- how many opcodes exist in the JVM

- how easy would it be to create another language that compiles to the JVM.

- how does the JVM represent basic data types

The list goes on. All because someone built a JVM in JS.

What happened to the hacker mentality of HN? Where's the curiosity gone?

winter_blue · on July 12, 2017

This. I don't think enough people appreciate how the simple and clear the JVM byte code's design is.

Most platforms in use today are so complex. Take a look at x86, or one of the latest ECMAScript specifications. Even LLVM bitcode is a bit complicated compared to the JVM bytecode.

I think in programming language research, going forward, we need some research into "high-level bytecodes". I.e. bytecodes that capture high-level concepts in a clear and simple way.

jdmichal · on July 12, 2017

I've actually found CIL to be simpler than JVM bytecode, even with additional capabilities. Simple addition is a good example:

JVM: dadd, fadd, iadd, ladd. Addition for double, float, int, and long data, respectively.

CIL: add, add.ovf, add.ovf.un. Addition, signed addition with overflow check, and unsigned addition with overflow check.

bad_user · on July 12, 2017

Those bytecodes happened because the JVM bytecode was designed to be easily interpreted, whereas CIL was designed for JIT compilation. So for example CIL's `add` opcode is missing info that needs to come from the context in which it is used and the JVM's `iadd` and variations are easier to interpret.

You can see this design choice even today in how the JVM and CLR work. The JVM starts execution in an interpreter mode, then gradually compiles pieces of code as it detects bottlenecks. So the compilation that happens at runtime is very gradual and based on runtime measurements.

The CLR on the other hand has done JIT compilation, with the ability to cache the compiled code for faster startup (e.g. Ngen). So it has been oriented towards ahead of time compilation.

Different trade-offs.

jdmichal · on July 12, 2017

Sure. I tried to find information yesterday when I posted that regarding whether / how the `add` code handles differing types. Now since I'm not on my phone, I looked up the ECMA spec and it looks like CIL still only allows like types, with some minor exceptions involving `native int`. So it's just up to the compiler to make sure that actually happens and extend types as necessary.

kodfodrasz · on July 12, 2017

I also prefer the CIL and also the C# language to Java (though I really like Java and its ecosystem), but we have to admit that MS had 5-10 years to learn from the Java design decisions and their effects, and still did not manage to overcome every problematic point :)

pjmlp · on July 12, 2017

They also had difference purposes driving their design, and we should not forget that.

JVM - Bytecodes only for Java

CIL - Bytecodes for VB.NET, C#, Managed C++ and the 1.0 SDK contained examples for Lisp, Pascal, Eiffel, Ada, ....

Of course, history then took another path for the JVM.

EDIT: Forgot that C++/CLI replaced Managed C++, which was the C++ variant on 1.0.

shellac · on July 12, 2017

> JVM - Bytecodes only for Java

Although it should be mentioned that java language semantics are largely (depending on how you measure them :-) absent from the jvm. (Default methods were a very unusual change in that respect)

And, as you say, subsequent history has weirdly inverted the JVM and CIL. The former is a lot less 'J' and the latter is a lot less 'C' ;-)

cwyers · on July 12, 2017

I would bet that there's a lot higher rate of .NET users running VB.NET than there are JVM users running Scala/Clojure/Kotlin. They just don't tend to be the sort to post to Hacker News.

shellac · on July 12, 2017

But there are probably more CPUs running non-java JVM languages ;-) It all depends what you count, as usual.

(I was really referring to the healthy non-java jvm language community. JRuby, Scala, and Clojure have lively communities and commercial backing)

readittwice · on July 12, 2017

Although this might really be true, I am skeptical that this has something to do with the inherent design of the Java bytecode compared to the CIL.

_pmf_ · on July 12, 2017

I think the bane of James Gosling's existence is being incorrectly associated with Java-the-lacklustre-language instead of correctly being associated with the superb JVM.

jhbadger · on July 12, 2017

Personally, I always associate him with a pre-GNU version of Emacs: https://en.wikipedia.org/wiki/Gosling_Emacs

readittwice · on July 12, 2017

If there is a problem with java bytecode, then that is hard to verify. You need multiple passes over the bytecode, until you've reached a steady state. There is also the "issue" that Java bytecode allows arbitrary control flows with goto, while Java doesn't.

IMHO WebAssembly solves that better but I also need to admit that they could already learn from Java.

kodablah · on July 12, 2017

> IMHO WebAssembly solves that better but I also need to admit that they could already learn from Java.

While that may be true of the language Java, WebAssembly's jump instructions are not without their annoyances too. For example, the JVM bytecode requires your stack to be precise when jumping, WebAssembly just cares about the most recent piece. If you expect your jump targets to have the same stack layout, WebAssembly makes the impl handle it. I had to account for this and other differences in my compiler [0].

0 - https://github.com/cretz/asmble#control-flow-operations

electrum · on July 12, 2017

The JVM has a full specification: https://docs.oracle.com/javase/specs/jvms/se8/html/

The specification is dense but still readable. I'd start reading in order, up through the third chapter which explains how to compile various Java snippets to bytecode. Perhaps start there and go back to the second chapter when you need more context.

I've also found this old but still relevant book to be a good guide: http://www.artima.com/insidejvm/ed2/

valarauca1 · on July 12, 2017

Here ya go. Java's internals are super well documented/specified

Loader stuff:

https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.ht...

Byte Code is simple:

https://en.m.wikipedia.org/wiki/Java_bytecode_instruction_li...

guelo · on July 12, 2017

Doesn't look that simple.

valarauca1 · on July 12, 2017

Ah friendo you've never read x64 assembly

mintplant · on July 12, 2017

Back when Java decompilers were much more primitive and easy-to-break than they are now, I wrote some mods for a Java game in a language called "Jade", which was essentially a textual syntax for raw JVM bytecode. Oddly enough I can't easily find any references to this tool online now.

bcg1 · on July 12, 2017

Are you sure it wasn't Jasmin?

http://jasmin.sourceforge.net/

mintplant · on July 12, 2017

Aha, you're right! Thanks.

dboreham · on July 12, 2017

Beyond the Internet Event Horizon :(

codefined · on July 12, 2017

Overtaken by HTML templating languages, gems and a dozen other things with the same name. Eventually these items reach such a density to form a black hole, sucking through the older items, perhaps to another place in spacetime.

gergoerdi · on July 13, 2017

I've used Krakatau, a JVM assembler / disassembler written in Python, to good effect in a project: https://github.com/Storyyeller/Krakatau

_fq4v · on July 12, 2017

People have written x86 emulators in javascript. That doesn't make x86 bytecode simple.

apignotti · on July 12, 2017

If you like Doppio JVM, you may find our project interesting too http://blog.leaningtech.com/2017/06/announcing-cheerpj-java-.... Our approach is to rely heavily on AOT compilation of JARs to JS to achieve higher perf, while still supporting full reflection. Moreover the RT is automatically split to reduce download time and bandwidth. A Swing application starts with ~15MB on our system. http://cheerpjdemos.leaningtech.com/SwingDemo.html

emeryberger · on July 17, 2017

Also, see our follow-on project, Browsix (http://browsix.org), which makes it possible to run Unix applications inside the browser.

cobookman · on July 12, 2017

So now you can run the JVM in JavaScript running in the JVM....

For those who don't know the JVM comes with a JavaScript engine by default: http://www.oracle.com/technetwork/articles/java/jf14-nashorn.... And you can even get v8/nodejs to be compiled in your java jar binary: https://github.com/eclipsesource/J2V8

Capt-RogerOver · on July 12, 2017

This is very cool. You can basically even call all Java classes (and the other way around) from that JavaScript! (And reliably limit the classes that can be used if you want to execute JS in a safe environment.) Why isn't this used more often? (Or is it used more often?)

cderwin · on July 12, 2017

I can't speak to whether it is used more often, but I would bet the java-based Nashorn vm is significantly slower than the c++-based nodejs. In fact, a cursory google search shows this is the case, and it's not even close.

See: http://blog.jonasbandi.net/2014/03/performance-nashorn-vs-no... http://pieroxy.net/blog/2015/06/29/node_js_vs_java_nashorn.h...

(I can't speak to the quality of either of these tests, but the results seem decisive)

Capt-RogerOver · on July 12, 2017

While it is of course going to be slower, "it's not even close" is a judgement claim. The tests you have link talk about 1.6-2-3 times the difference. In a lot of applications the performance hit is going to be worth gaining access to the whole Java ecosystem of libraries. Also the performance is very likely to increase with newer versions.

peoplewindow · on July 12, 2017

Nashorn isn't that slow, but there's another JS-on-the-JVM project called Graal.js which is about as fast as NodeJS/V8.

One reason Nashorn isn't used that much is that it doesn't expose a node.js compatible API. JS people often want Node specifically, not just the ability to run JS. It has some cool features though. The shell mode is neat.

james-mcelwain · on July 12, 2017

Here's a great testing library that leans heavily on Nashorn. https://github.com/intuit/karate. I find this a beautiful use of JavaScript in Java.

sandGorgon · on July 12, 2017

I think it is impolite not to give accurate credit. It is Typescript... and not vanilla javascript.

https://github.com/plasma-umass/doppio

rounce · on July 12, 2017

May as well write Java.

mercer · on July 12, 2017

How so?

thangngoc89 · on July 12, 2017

Use Java to create bytecode which is executed in JS to produce JVM bytecode

Scarbutt · on July 12, 2017

Impressive, loaded a clojure.jar, got a repl and wrote/called some silly functions, it worked...

amelius · on July 12, 2017

Of course, the Java bytecode has only a limited number of instructions.

fredley · on July 12, 2017

I would like to run this in Rhino[1] - Mozilla's Javascript engine written in Java - just because. Turtles all the way down...

[1]: https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Rh...

ianopolous · on July 12, 2017

I investigated trying to use Doppio for Peergos. It is a very cool project. In the end we decided against it because it wasn't fast enough for our use case yet, and it requires the page to download the whole JVM (at least the rt.jar - the runtime) which is >60 MiB. We managed to get around this by manually stripping out the parts of the JVM that we didn't need, which brought it down to a few MiB.

I imagine that once they update it to Java 9 with the modular JDK (and resultant splitting of rt.jar) this reduction will largely happen automatically with it lazily downloading the parts that it needs.

mintplant · on July 12, 2017

A similar project from Mozilla: https://github.com/mozilla/pluotsorbet

It was targeted toward running J2ME apps on FirefoxOS phones. With that project dead it's no longer under active development.

crncosta · on July 12, 2017

Atwood's Law:

"Any application that can be written in JavaScript, will eventually be written in JavaScript."

mi100hael · on July 12, 2017

Corollary:

"Any module written in JavaScript will eventually be ported to Go."

kodablah · on July 12, 2017

Done [0] and done [1]. Though, the latter one at least, isn't really in a production state because compiling the JVM stdlib takes hours in Go.

0 - https://github.com/zxh0/jvm.go

1 - https://github.com/cretz/goahead

trynumber9 · on July 12, 2017

See also luje. "An experimental (read: toy) Java virtual machine written in pure Lua."

https://cowlark.com/luje/combined.html#index

flukus · on July 12, 2017

Does it run java applets?

expertentipp · on July 12, 2017

Java applets for JVM written in TypeScript transpiled into JavaScript will be the next big thing. Freeze any browser, anywhere, instantly. Imagine the print out of stack trace with error - so exciting.

pjmlp · on July 12, 2017

It will, just wait for WebAssembly to be more mature.

tgma · on July 12, 2017

(2014)

ndr · on July 12, 2017

"Any application that can be written in JavaScript, will eventually be written in JavaScript." - Jeff Atwood

colordrops · on July 12, 2017

So, every application will be written in JavaScript?

fredley · on July 12, 2017

Every application that can

colordrops · on July 13, 2017

Very few can't

chii · on July 12, 2017

"and any application that can't be written in javascript will soon be compiled to javascript" -- me

peoplewindow · on July 12, 2017

There is also TeaVM:

http://teavm.org/

didibus · on July 12, 2017

This is actually quite awesome.

z3t4 · on July 12, 2017

Would be cool if it could run GUI programs!

bwidlar · on July 12, 2017

JVM written in Javascript, what could be wrong?

flogic · on July 12, 2017

Assuming it worked well enough, you could use it to run legacy Java applets without having the Java plugin installed. That would at least improve the security situation a bit while still allowing for legacy applets.

rorosaurus · on July 12, 2017

Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.

msgilligan · on July 12, 2017

Don't worry, it has Nashorn ;)

Randgalt · on July 12, 2017

Like Taco Town :D Run Javascript in the JVM in Javascript

the_duke · on July 12, 2017

But... why??

Edit:

> This paper presents DOPPIO, a JavaScript-based runtime system that makes it possible to run unaltered applications written in generalpurpose languages directly inside the browser.

Someone should really have told them about webassembly...

lihaoyi · on July 12, 2017

The first commit to Doppio was 13 Feb 2012

One year before asm.js appeared, in Mar 2013

Three years before WebAssembly appeared, in June 2015

Five years later, In 2017, WebAssembly still doesn't have a GC and cannot run the Java/Scala/Groovy programs that Doppio was able to run in 2012

raddan · on July 12, 2017

Doppio's lineage goes back even further than 2012. A "JVM in JS" was given in Emery Berger's 691ST course in the fall of 2011. My notes show that I submitted my "finished" JVM on October 26, 2011. I say "finished" because it became obvious rather quickly that a lot of things would be difficult if not impossible in JS (like threading and synchronization) and so we negotiated with Emery to come up with a subset of Java that we could reasonably implement for a class project. IIRC, there were 6-8 VMs written that semester.

Doppio proper (the repository you refer to) started during the spring meeting of 691. Those guys went above and beyond the minimal spec we implemented, and they tackled a lot of stuff that we thought was impossible. Thus the research paper.

IIRC, we also had to write a decentralized chat program in JS that semester. That was also "fun".

Yes, I created a HN account just to post this.

cscurmudgeon · on July 12, 2017

Thank you. This is great work from both an engineering and scientific perspective.

jryan49 · on July 12, 2017

The paper was released at least a year before WebAssembly was even announced...

shakna · on July 12, 2017

... and a year after WASM's predecessor was announced.

jryan49 · on July 12, 2017

Directly from asm.js FAQ (http://asmjs.org/faq.html):

Q. Can asm.js serve as a VM for managed languages, like the JVM or CLR?

A. Right now, asm.js has no direct access to garbage-collected data; an asm.js program can only interact indirectly with external data via numeric handles. In future versions we intend to introduce garbage collection and structured data based on the ES6 structured binary data API, which will make asm.js an even better target for managed languages.

shakna · on July 12, 2017

Just pointing out that the work on what would become WASM was already heavily underway by that point.

wgjordan · on July 12, 2017

This paper was first published in June 09–11 2014 (first commit Feb 11 2014), Webassembly was first announced on June 17 2015.

shakna · on July 12, 2017

asm.js that lead to WebAssembly was announced 21st March, 2013.

nteon · on July 12, 2017

Doppio relies heavily on the Javascript object model, the asm.js subset of Javascript isn't very relevant.

dukoid · on July 12, 2017

Webassembly doesn't have GC (yet)

amelius · on July 12, 2017

Does doppio actually rely on the JavaScript garbage collector?

nteon · on July 12, 2017

yes -- it implements Java objects on top of Javascript objects.

Koshkin · on July 12, 2017

> (Read the academic paper)

I admire the effort, but: doesn't "academic" mean "scientific"? Can there possibly be any "science" in having a well-known VM reimplemented in a well-known programming language?

nteon · on July 12, 2017

Please take a look at the contributions in the paper at the end of the introduction. The JVM uses operating system abstractions (like a file system, threads, and synchronous APIs) that were non-existent in the browser when the paper were published, and are still widely unavailable natively. Figuring out how to construct the functionality necessary not to just simply execute Java bytecode but to provide a full Java Virtual Machine, and then show how this could be generalized beyond a JVM, is a CS contribution. Disclosure: I'm a labmate of the author.

raddan · on July 12, 2017

Here's one of the scientific questions, and FWIW, it's actually written right on the first page: can you implement a programming language that relies on synchronous primitives in a language that has only asynchronous primitives? The answer to that question is not obvious, but it turns out to be "yes". Doppio is existential proof that it can be done. Furthermore, that fact is falsifiable (see Karl Popper), which makes the process of learning the answer "science."

You may not think this is an important question to answer--and that is your prerogative--but you can't argue that it isn't scientific. One of those other scientific questions is about threading (again, see page 1). BTW, did you know that System.out.println relies on synchronization primitives? You literally cannot write helloworld in Java without invoking a lot of machinery.

munificent · on July 12, 2017

> I admire the effort, but: doesn't "academic" mean "scientific"?

No, it doesn't. Or did you forget about music, literature, engineering, business, finance, history, philosophy, dance, art, architecture, nursing, etc.?

Spivak · on July 12, 2017

> Academic: of, relating to, or characteristic of a school, especially one of higher learning.

It appears as though this was a university project, the kind of project one completes for their masters even. Academic seems to be precisely the right term.

bdcravens · on July 12, 2017

Yes, in the sense that it's called "computer science" with little scientific method involved.

amelius · on July 12, 2017

I got an error, so I falsified it according to the scientific method:

    Error extracting doppio_home.zip: Error: ENOENT: No such file or directory., '/persist/vendor/java_home/lib/images/cursors/cursors.properties'

emeryberger · on July 17, 2017

File a bug report, please. Doppio is actively maintained, though our current focus is on Browsix (which incorporates and extends some of Doppio's functionality -- see browsix.org).

nerdponx · on July 12, 2017

No, it doesn't, and yes, there can.

pgbovine · on July 12, 2017

relevant paper: https://homes.cs.washington.edu/~jfogarty/publications/works...

This is an example of research as illustrated by Figure 5 in that paper (known functionality, novel techniques)