Hacker News new | past | comments | ask | show | jobs | submit login

Compiling to a sort of intermediary representation has distinct advantages over either distributing pure source or pure machine code.

Pure source-based interpretation, where you build software to look directly at human-written text and run a program based on those instructions (eg, BAISC, Python, Lua), are inherently slowed by necessitating that the program be, well, parsed and validated on-the-fly. A lot of the processing time is eaten up by virtue of simply understanding each line of textual source code and making decisions about what code to run based on that. However, the advantage is that code written in one of these languages can be dropped onto another machine running an entirely different operating system or processor architecture and run just fine, since it's the language interpreter which is tasked with interpreting the human-readable instructions, therefore only that interpreter needs to be ported across platforms.

Pure compiled code, eg C/C++, Rust, etc have the advantage of being very fast. When your final product is pure machine code, there is no translation necessary. You just feed your instructions straight to the CPU, and the CPU does its job. However, the tradeoff here is that you need to produce many different sets of instructions, one for each operating environment + processor architecture combination you wish to support. Essentially, the tradeoff of compilation versus interpretation is whether you can justify the up-front cost of maintaining platform-dependent builds of your software, or whether you can justify the downstream cost of slower operating performance.

But there is a middle-ground. A sort of combined approach between compilation and interpretation, which gains us some of the advantages of both, but also pulls in some of the drawbacks. In this "hybrid" approach, software is compiled to an intermediate representation, in this case Java bytecode. You can think of this bytecode as machine code for some abstract machine, the "Java Virtual Machine." So the target processor architecture for this compilation step is therefore the JVM, and our final shippable product is that compiled artifact. We then ship our compiled bytecode, and require end users to have a copy of the JVM on their systems to then execute our bytecode. This JVM is an interpreter of Java bytecode; it reads this bytecode, and makes decisions about which pathways in its code to execute for each given JVM instruction. This Java bytecode interpreter is therefore the platform-dependent component that needs to be maintained and ported across to different processor architectures/operating environments, and yes, you need a copy of this to run any compiled Java artifacts. However, being that Java bytecode is a more rigid format, optimized for simplicity and machine-readability rather than human-readability and expressiveness, the interpreter for Java bytecode is much, much simpler to develop and maintain, and the simplicity of each instruction combined with earlier decisions made by the compilation step allow us to apply optimizations to the bytecode while we're interpreting it, much like a real CPU would do. The advantages of this hybrid approach are thus that we maintain platform independence of interpreted code, but at a reduced ongoing cost as our interpreter component is much simpler; additionally, because of the optimizations we can apply both during the initial compilation step and during interpretation time, we are able to gain quite a fair bit of speed back when compared to standard interpretation. The drawbacks to this hybrid approach, however, are that you do need that external interpreter component in order to execute the final product, and that interpreter still needs to be maintained for each platform you want to run your code on; additionally, because we still use a compiler step along the line, there is still an upfront time cost of translating our human-readable code into bytecode.

This hybrid approach is actually used by some compiled languages. LLVM-based compiler tools, for instance, compile source code into an intermediate representation known as "LLVM IR." However, this intermediate code is not shipped to consumers; rather, it's simply used to apply further optimizations to the code now that it's in a simpler format.

So, longwinded explanations aside... yes. You compile Java code to bytecode, and ship that bytecode rather than either your raw source code or your completely compiled code. That bytecode should be runnable on any platform you're able to get it onto, provided you have a functioning JVM built for that target system. Should the "internet collapse," you will need some way to source a Java Runtime Environment in order to run any compiled Java programs you have, and you will need a Java Development Kit in order to compile and run any Java source code.




Thanks for your explanation, as is often the case for me when I encounter this level of insight, I need to take a break, relax... and digest everything said word for word, this involves opening some tabs and a lot of reading, but these answers are always appreciated, thanks.

BTW you made me lol when you followed my ridiculous "internet collapse" scenario.

I'll be definitely coming back to this answer.


Actually LLVM IR is shipped to consumers on the Apple ecosystem mainly on watchOS, the difference being that it isn't the same out of the box LLVM IR used by the LLVM project, rather a massaged version that Apple uses as distribution format.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: