Having worked a lot in assembly and C and such, I imagine it'd probably be a lot easier (and more performant, probably) to write something that translates x86 instructions into x64 instructions. This is really interesting all the same, though, because it enables you to potentially support as many targets as your C compiler supports.
I once did something similar by converting the Minecraft server to .NET with IKVM and hooking into its terrain generator.
Sweet project, and awesome walkthrough, but this strikes me as less of a recompilation and more of building a virtual x86 machine whose instructions are defined at compile time.
I imagine that with a perfect optimizing compiler, this distinction would go away, but we're nowhere near that point.
Is it not possible to abstract away some of those calls back into native C++ code (i.e. avoiding the need for cpu and memory classes)?
It is an interesting idea/project, but can it really be considered Reverse Engineering? It seems like it is just translating the original instructions into a C++ form. If the goal is to build an open implementation of the CubeWorld server, then I don't see how this helps. It seems you are still just running the original proprietary code, albeit translated in format.
Additionally one of the goals in reverse engineering is to understand how the algorithms actually work and get a deeper understanding beyond the "black-box" level. This technique doesn't really provide new insight into the code, it just transforms it from one kind of closed binary blob to another kind of closed binary blob.
Again, I don't mean to denigrate the work you have accomplished here, but I would hesitate to classify it as "Reverse Engineering"
It's certainly a neat hack, but legally I would avoid this approach unless the original creator is cooperative or at least no longer around. Distributing code derived in this manner is pretty clearly copyright infringement unless the original license allows it. I am not a lawyer, but I would argue that, while reverse engineering to allow interoperability is acceptable, doing so by purely copying the original is not, and performing mechanical transformations on the code (disassembly/recompilation) is not enough to cause the resulting code to be a non-derived work.
That said, I can see how this is a technologically reasonable first step toward a new implementation. Once this initial translation step is done, individual functions can be swapped out for new (non-translated) versions fairly easily by editing source code, as opposed to patching the original binary. Later in the post, there's mention of replacing commonly-executed functions with native versions to improve performance and allow porting to other environments.
What are the legal implications of this? As far as I understood reverse engineering using disassembled code isn't legal. And this isn't just reading the disassembled code - they're taking it wholesale.
You can reverse engineer to your heart's content using whatever you like, provided you didn't "acquire" the original source code. Given the binary, there's nothing to legally prevent you from using or creating tools to convert machine code to assembler code and then find patterns that you can disassemble into compilable source code.
Now redistributing that code might maybe possibly get you into hot water, but to find out you'll need to go through a trial. So you use your disassembled source to learn the algorithm, then reimplement it in your own style. Now you're not even in the gray with respect to the original source.
Practical reuse of assembly is resourceful and this approach to portability quite cool (take a look at this non-portable approach to fixing up and executing an assembly dump[1]).
I wonder if the future of video game console emulation lies in recompilation, perhaps to an intermediate representation format for LLVM (similar to Dagger[2]).
These issues may be serious problems, such as self modifying code in emulating older systems like the NES, but are they still as problematic in more modern consoles? I believe that modern consoles are already being programmed more in high level languages (C, C++, etc) than in assembler. Playstation even has a gcc based compiler. Since the code was initially written in a higher language and then compiled, would it not be easily to statically recompile it?
However, I am not a games programmer, and do not have any actual experience with console development. If there is anyone on here with experience in these areas, I would appreciate their thoughts on the matter.
Dynamic codegen in games might be less of an issue, but instead you get to deal with emulating a MMU among other things. And the point is that all of these issues specific to whole-system emulators mean that static recompilation isn't faster than dynamic, so there's no point in doing contortions to get static working. The original language barely matters when all you have is machine code.
As for where newer consoles are easier, the big thing is that there's more dynamic linking so there's more opportunity for high-level emulation of entire systems. Think being able to emulate entire OpenGL function calls rather than the raw GPU-specific register writes.
Actually, I think it is much easier and less error-prone (and hence more secure) to translate from one machine language to another, than it is to translate Javascript to machine language. Hence, I don't understand why we don't use some form of machine language instead of Javascript on the web.
The number of different cases to tackle is certainly much smaller.
http://andrewkelley.me/post/jamulator.html