Statically Recompiling NES Games into Native Executables with LLVM and Go (2013)

Animats · on Sept 15, 2017

All that work, and then he hits the expected killer problem - self modifying code. Getting past that without an interpreter is going to be tough. Not impossible, because the self modifying code isn't that dynamic in the places he found it. It's not like it's reading external input and compiling it. There are a limited number of cases to be handled.

It's amusing seeing this in a machine which gets its code from a ROM.

boomlinde · on Sept 15, 2017

That example is not self-modifying, though. The BIT absolute instruction simply uses the LDY immediate instruction as operand data by means of overlapping them. This saves some bytes as you'd otherwise have to use relative branching (and there's no branch-always) or a three byte jump to load a different Y value, but the code remains unsabotaged.

These tricks are IMO an indicator that they had a good idea that the game code was going to be a tight fit (or had possibly already discovered that the hard way), so they were optimizing for size. A jump would be faster than executing the BIT instruction.

The manually loading an address to the stack and running RTS is an interesting trick. Wozniak used it in his SWEET-16 interpreter, pushing the interpreter loop entry point to the stack before jumping to the subroutines that execute the byte code, just so that he could use RTS to return to the interpreter loop. In that sense it fulfills an entirely different purpose than a JMP indirect and also saves some bytes, again at the expense of execution speed and non-gray hair. For a table based interpreter, it fills gap left by the lack of a JSR indirect instruction.

dzdt · on Sept 15, 2017

> It's not like it's reading external input and compiling it.

Some games, because of bugs, allow exactly that. User input might be directly executable. That means that static compilation cannot be achievable in general. Either you have different behaviour in the event of such bugs or you have a full interpreter to fall back on.

[1] https://www.polygon.com/2014/1/14/5309662/bizarre-super-mari...

Gaelan · on Sept 15, 2017

That's an SNES game.

XR0CSWV3h3kZWg · on Sept 15, 2017

Arbitrary execution in super mario bros 3 here: http://tasvideos.org/4961S.html

Perseids · on Sept 15, 2017

Which is one of the reasons I hope WebAssembly becomes the de-facto standard for binary code distribution: It preserves more structure of the original program and allows less of the lowest level shenanigans (it provides a clear and protected control flow, has a small instruction set, no dynamic code generation, and no instruction misalignment / dual-alignment).

zokier · on Sept 15, 2017

Wait, WASM disallows all dynamic code gen? That sounds really disappointing. How do JITs work on WASM?

dangerbird2 · on Sept 15, 2017

They don't now. The spec will almost certainly allow some form of dynamic code generation/loading in the future

http://webassembly.org/docs/future-features/#platform-indepe...

pvillano · on Sept 15, 2017

Preface: these are the questions of a CS student, not attacks.

What about java/.net/regex engines? Is it required that they exist outside the browser, or are performance limited to interpreting, or is there a better solution?

What's the problem with the control flow of assembly?

dogprez · on Sept 15, 2017

NES cartridges also could have their own hardware too. Don’t like the NES sound card? You could ship your own inside of the cartridge. I think notoriously castlevania 3 shipped some custom hardware in their cartridge. Some of the cartridges for the famicom came with an fm synth chip.

mistercow · on Sept 15, 2017

I believe this was even more common on the SNES. The most well-known example is the Super FX chip, which was used in Star Fox and Yoshi's Island among others, but there were actually a ton of these things: https://en.wikipedia.org/wiki/List_of_Super_NES_enhancement_...

convivialdingo · on Sept 15, 2017

Wonderful, detailed deep-dive into static compilation and emulation with an approachable start. Good job.

nickcw · on Sept 15, 2017

Interesting read, shame about the dynamic code problems.

I wonder whether something like this might be better attempted in rpython which will build a JIT compiler for you.

http://rpython.readthedocs.io/en/latest/

ericfrederich · on Sept 15, 2017

This means that they test some condition, and then either transfer control flow to the next instruction, or to a different label. This means that we can mark the possible branch address and the next address as instructions.

I hope all of these NES ROMs were coded well. Seems like you could do something like...

  if False:
      jump_to_data_address

... then he'd be interpreting data as code.

dezgeg · on Sept 15, 2017

This actually happens surprisingly often in 6502 code, because unconditional branches always take three bytes, unlike conditional jumps which take two bytes. So if the value of some condition flags are known to be constant (e.g. ORing with a nonzero constant always clears the zero flag) it saves a byte to use an always-true conditional jump in place of an unconditional jump.

hayd · on Sept 15, 2017

Aside, it's surprising to me that there isn't a (maintained) Java/JVM implementation on top of LLVM...

legulere · on Sept 15, 2017

Reflection is hard to support with ahead of time compilation, yet java code often makes heavily use of it for (de-)serialization.

Reason077 · on Sept 15, 2017

"Reflection is hard to support with ahead of time compilation"

Not true at all. There's nothing in reflection and serialization[1] that is hard to implement with an AOT compiler.

The real problem with AOT-compilation in Java is that there are many optimisations, like devirtualization, that can only be done (or done much more easily/effectively) at runtime.

You can get some of this back with whole-program/closed-world optimisation, but in reality that's highly impractical for Java. Many Java programs are very large and people want the ability to update them quickly and easily, often without even restarting let alone recompiling.

An LLVM-based Java runtime could use a hybrid AOT/JIT model, however. Recompiling parts of the program as needed at runtime based on profiler data, you'd get the fast startup of AOT combined with the high performance of JIT.

Don't get any crazy ideas about beating Hotspot's performance, though.

[1] one exception is classes/methods that are defined at runtime using dynamically generated bytecode. But that's pretty rare.

pjmlp · on Sept 15, 2017

Still there are plenty of commercial options to AOT compile Java, and OpenJDK is getting its free variant with Java 9 for Linux x64, with Windows and OS X already supported on the Java 10 development branch.

Oracle Labs is also pushing forward their research of meta-circular JVM, which makes use of AOT compilation. Although it restricts it to a "native Java" subset.

http://cr.openjdk.java.net/~jrose/metropolis/Metropolis-Prop...

amaranth · on Sept 15, 2017

OpenJDK's AOT compiler, when you also allow it to include the JIT and still recompile hotspots, is about 5% slower than just using the JIT. The only reasons to use it are startup times, ease of distribution (no JVM to ship, sort of), and for use in places that don't allow JITs (iOS).

pjmlp · on Sept 15, 2017

OpenJDK's AOT is also on its early days.

Have you ever profiled Excelsior JET, Aonix, Jamaica, IBM Websphere Real Time for example?

Their products are almost as old as Java, so I imagine they are worth their price.

amaranth · on Sept 15, 2017

I believe those products prohibit doing public benchmarking so even if I had I couldn't tell you the results. :)

pjmlp · on Sept 15, 2017

However if they were really bad, they would have already disbanded the teams, because companies would just use the gratis JDK with pure JIT.

Regarding OpenJDK, according to the Java Languages Summit talk, the Java 10 dev branch already has better optimizations than version 9.

dullgiulio · on Sept 15, 2017

But the exact same is valid for Go...

ksherlock · on Sept 15, 2017

LLVM used to have a .net backend. (https://github.com/llvm-mirror/llvm/tree/release_27/lib/Targ...)

pjmlp · on Sept 15, 2017

No, but there is a LLVM implementation on top of Java/JVM. :)

https://github.com/graalvm/sulong

Which is much better, because we need more language research with type safe languages, not with C and C++.

emerged · on Sept 15, 2017

I had a similar project idea for ages. Instead of LLVM it would static recompile into C macros or C++ inline function calls for each opcode. Then it would compile the output and run the game.

Emulation is just plain fun.

taocp · on Sept 15, 2017

http://mp2.dk/blog/blog/2014/04/14/practical-and-portable-bi...

emerged · on Sept 16, 2017

The compiler tends to treat static variables very differently from locals. I was holding out for efficient local functions which can be forced inlined to keep the emulated registers as actual system registers. Sadly std::function doesn't play well with force inline on all platforms.

k__ · on Sept 15, 2017

lol, it just came to me that clang is an abbreviation for C language. I always imagined it to be the sound metal makes when you slam it together, because bare metal programming etc.

SyneRyder · on Sept 15, 2017

Though for what it's worth, the official pronunciation is the klang metal sound:

http://lists.llvm.org/pipermail/llvm-dev/2008-July/015629.ht...

colordrops · on Sept 15, 2017

Could be a pun.

cdelsolar · on Sept 15, 2017

I came upon this realization very recently too.

eustace · on Sept 15, 2017

Clang is a compiler, part of the LLVM suite ;)

jdmoreira · on Sept 15, 2017

The clang llvm frontend is for C, C++ and Objective-C.

zsmizzle · on Sept 15, 2017

Dupe: submitted 1560 days ago - https://news.ycombinator.com/item?id=5838326

tomjakubowski · on Sept 15, 2017

The polite convention on HN is to say this was discussed previously, and link to it, especially when it was last submitted and discussed over four years ago :-)

vog · on Sept 15, 2017

Please note that duplicate submission are not discouraged on HN in general, especially not after such a long time.

However, thanks for sharing the link to the previous discussion!

DanBC · on Sept 15, 2017

Although clicking the |past| link takes you to this page: https://hn.algolia.com/?query=Statically%20Recompiling%20NES...

dang · on Sept 15, 2017

Reposts are ok after about a year: https://news.ycombinator.com/newsfaq.html.

vog · on Sept 15, 2017

I'm always confused about "Guidelines" and "FAQ". It seems that whenever I have a question about HN, I need to consult both, because it is not clear at all which site covers which question.

Maybe "Guidelines" and "FAQ" should simply be merged?

Edit: Opened an "Ask HN" submission about this: https://news.ycombinator.com/item?id=15255270

icebraining · on Sept 15, 2017

I think Meta Ask HNs are also discouraged :) you're supposed to send them an email.

grzm · on Sept 15, 2017

I understand what you're saying. There's the Show HN guidelines as well. One could imagine them all as three sections on one page. The guidelines are for what you should do; the FAQ for general information. Then again, they're short. It's not a lot of effort to check both.

mohaine · on Sept 15, 2017

This should have a 2013 added to the tittle.