Practical and Portable X86 Recompilation (2014)

ksherlock · on Jan 7, 2015

Interesting... I've also seen it done for the 6502/nintendo[0], but self modifying code was a deal breaker.

http://andrewkelley.me/post/jamulator.html

ggambetta · on Jan 7, 2015

Very cool idea. Reminds me of the Emulator-Backed Remakes [1] thing I did a few months ago. Glad to see other people thinking in similar directions :)

[1] http://gabrielgambetta.com/remakes.html

ddevault · on Jan 7, 2015

Having worked a lot in assembly and C and such, I imagine it'd probably be a lot easier (and more performant, probably) to write something that translates x86 instructions into x64 instructions. This is really interesting all the same, though, because it enables you to potentially support as many targets as your C compiler supports.

I once did something similar by converting the Minecraft server to .NET with IKVM and hooking into its terrain generator.

falcolas · on Jan 7, 2015

Sweet project, and awesome walkthrough, but this strikes me as less of a recompilation and more of building a virtual x86 machine whose instructions are defined at compile time.

I imagine that with a perfect optimizing compiler, this distinction would go away, but we're nowhere near that point.

Is it not possible to abstract away some of those calls back into native C++ code (i.e. avoiding the need for cpu and memory classes)?

tbirdz · on Jan 7, 2015

It is an interesting idea/project, but can it really be considered Reverse Engineering? It seems like it is just translating the original instructions into a C++ form. If the goal is to build an open implementation of the CubeWorld server, then I don't see how this helps. It seems you are still just running the original proprietary code, albeit translated in format.

Additionally one of the goals in reverse engineering is to understand how the algorithms actually work and get a deeper understanding beyond the "black-box" level. This technique doesn't really provide new insight into the code, it just transforms it from one kind of closed binary blob to another kind of closed binary blob.

Again, I don't mean to denigrate the work you have accomplished here, but I would hesitate to classify it as "Reverse Engineering"

drv · on Jan 7, 2015

It's certainly a neat hack, but legally I would avoid this approach unless the original creator is cooperative or at least no longer around. Distributing code derived in this manner is pretty clearly copyright infringement unless the original license allows it. I am not a lawyer, but I would argue that, while reverse engineering to allow interoperability is acceptable, doing so by purely copying the original is not, and performing mechanical transformations on the code (disassembly/recompilation) is not enough to cause the resulting code to be a non-derived work.

That said, I can see how this is a technologically reasonable first step toward a new implementation. Once this initial translation step is done, individual functions can be swapped out for new (non-translated) versions fairly easily by editing source code, as opposed to patching the original binary. Later in the post, there's mention of replacing commonly-executed functions with native versions to improve performance and allow porting to other environments.

dtech · on Jan 8, 2015

I agree. I have no idea about the state of x86 decompilation, but decompiling and refactoring the binary seems like a much better long-term plan.

This will work right up until you're on par with the original server, and then you'll be severely limited since you can't change the world generation.

grokys · on Jan 7, 2015

What are the legal implications of this? As far as I understood reverse engineering using disassembled code isn't legal. And this isn't just reading the disassembled code - they're taking it wholesale.

delinka · on Jan 8, 2015

You can reverse engineer to your heart's content using whatever you like, provided you didn't "acquire" the original source code. Given the binary, there's nothing to legally prevent you from using or creating tools to convert machine code to assembler code and then find patterns that you can disassemble into compilable source code.

Now redistributing that code might maybe possibly get you into hot water, but to find out you'll need to go through a trial. So you use your disassembled source to learn the algorithm, then reimplement it in your own style. Now you're not even in the gray with respect to the original source.

desdiv · on Jan 8, 2015

Don't you need a Chinese wall between the person who disassembles the binary and the reimplementor?

serf · on Jan 8, 2015

http://en.wikipedia.org/wiki/Server_emulator has a bit about the legal implications.

tomyws · on Jan 7, 2015

This is fascinating!

Practical reuse of assembly is resourceful and this approach to portability quite cool (take a look at this non-portable approach to fixing up and executing an assembly dump[1]).

I wonder if the future of video game console emulation lies in recompilation, perhaps to an intermediate representation format for LLVM (similar to Dagger[2]).

[1] http://aluigi.altervista.org/mytoolz.htm#dump2func [2] http://dagger.repzret.org/

brigade · on Jan 7, 2015

Static recompilation, no. See http://andrewkelley.me/post/jamulator.html for an attempt at that and why it isn't useful.

tbirdz · on Jan 8, 2015

These issues may be serious problems, such as self modifying code in emulating older systems like the NES, but are they still as problematic in more modern consoles? I believe that modern consoles are already being programmed more in high level languages (C, C++, etc) than in assembler. Playstation even has a gcc based compiler. Since the code was initially written in a higher language and then compiled, would it not be easily to statically recompile it?

However, I am not a games programmer, and do not have any actual experience with console development. If there is anyone on here with experience in these areas, I would appreciate their thoughts on the matter.

brigade · on Jan 8, 2015

Dynamic codegen in games might be less of an issue, but instead you get to deal with emulating a MMU among other things. And the point is that all of these issues specific to whole-system emulators mean that static recompilation isn't faster than dynamic, so there's no point in doing contortions to get static working. The original language barely matters when all you have is machine code.

As for where newer consoles are easier, the big thing is that there's more dynamic linking so there's more opportunity for high-level emulation of entire systems. Think being able to emulate entire OpenGL function calls rather than the raw GPU-specific register writes.

vtbassmatt · on Jan 8, 2015

The Xbox consoles, at least, don't allow self-modifying code, so that particular issue wouldn't exist. I assume PlayStation is the same.

hyc_symas · on Jan 7, 2015

Reminds me of my 8086 -> 68000 recompiler I wrote in my Atari ST days. Translating MSDOS binaries to GEMDOS was pretty straightforward back then.

amelius · on Jan 7, 2015

Actually, I think it is much easier and less error-prone (and hence more secure) to translate from one machine language to another, than it is to translate Javascript to machine language. Hence, I don't understand why we don't use some form of machine language instead of Javascript on the web.

The number of different cases to tackle is certainly much smaller.