Hacker News new | past | comments | ask | show | jobs | submit login
Statically Recompiling NES Games into Native Executables with LLVM and Go (2013) (andrewkelley.me)
265 points by wooby on Sept 15, 2017 | hide | past | favorite | 44 comments



All that work, and then he hits the expected killer problem - self modifying code. Getting past that without an interpreter is going to be tough. Not impossible, because the self modifying code isn't that dynamic in the places he found it. It's not like it's reading external input and compiling it. There are a limited number of cases to be handled.

It's amusing seeing this in a machine which gets its code from a ROM.


That example is not self-modifying, though. The BIT absolute instruction simply uses the LDY immediate instruction as operand data by means of overlapping them. This saves some bytes as you'd otherwise have to use relative branching (and there's no branch-always) or a three byte jump to load a different Y value, but the code remains unsabotaged.

These tricks are IMO an indicator that they had a good idea that the game code was going to be a tight fit (or had possibly already discovered that the hard way), so they were optimizing for size. A jump would be faster than executing the BIT instruction.

The manually loading an address to the stack and running RTS is an interesting trick. Wozniak used it in his SWEET-16 interpreter, pushing the interpreter loop entry point to the stack before jumping to the subroutines that execute the byte code, just so that he could use RTS to return to the interpreter loop. In that sense it fulfills an entirely different purpose than a JMP indirect and also saves some bytes, again at the expense of execution speed and non-gray hair. For a table based interpreter, it fills gap left by the lack of a JSR indirect instruction.


> It's not like it's reading external input and compiling it.

Some games, because of bugs, allow exactly that. User input might be directly executable. That means that static compilation cannot be achievable in general. Either you have different behaviour in the event of such bugs or you have a full interpreter to fall back on.

[1] https://www.polygon.com/2014/1/14/5309662/bizarre-super-mari...


That's an SNES game.


Arbitrary execution in super mario bros 3 here: http://tasvideos.org/4961S.html


Which is one of the reasons I hope WebAssembly becomes the de-facto standard for binary code distribution: It preserves more structure of the original program and allows less of the lowest level shenanigans (it provides a clear and protected control flow, has a small instruction set, no dynamic code generation, and no instruction misalignment / dual-alignment).


Wait, WASM disallows all dynamic code gen? That sounds really disappointing. How do JITs work on WASM?


They don't now. The spec will almost certainly allow some form of dynamic code generation/loading in the future

http://webassembly.org/docs/future-features/#platform-indepe...


Preface: these are the questions of a CS student, not attacks.

What about java/.net/regex engines? Is it required that they exist outside the browser, or are performance limited to interpreting, or is there a better solution?

What's the problem with the control flow of assembly?


NES cartridges also could have their own hardware too. Don’t like the NES sound card? You could ship your own inside of the cartridge. I think notoriously castlevania 3 shipped some custom hardware in their cartridge. Some of the cartridges for the famicom came with an fm synth chip.


I believe this was even more common on the SNES. The most well-known example is the Super FX chip, which was used in Star Fox and Yoshi's Island among others, but there were actually a ton of these things: https://en.wikipedia.org/wiki/List_of_Super_NES_enhancement_...


Wonderful, detailed deep-dive into static compilation and emulation with an approachable start. Good job.


Interesting read, shame about the dynamic code problems.

I wonder whether something like this might be better attempted in rpython which will build a JIT compiler for you.

http://rpython.readthedocs.io/en/latest/


This means that they test some condition, and then either transfer control flow to the next instruction, or to a different label. This means that we can mark the possible branch address and the next address as instructions.

I hope all of these NES ROMs were coded well. Seems like you could do something like...

  if False:
      jump_to_data_address
... then he'd be interpreting data as code.


This actually happens surprisingly often in 6502 code, because unconditional branches always take three bytes, unlike conditional jumps which take two bytes. So if the value of some condition flags are known to be constant (e.g. ORing with a nonzero constant always clears the zero flag) it saves a byte to use an always-true conditional jump in place of an unconditional jump.


Aside, it's surprising to me that there isn't a (maintained) Java/JVM implementation on top of LLVM...


Reflection is hard to support with ahead of time compilation, yet java code often makes heavily use of it for (de-)serialization.


"Reflection is hard to support with ahead of time compilation"

Not true at all. There's nothing in reflection and serialization[1] that is hard to implement with an AOT compiler.

The real problem with AOT-compilation in Java is that there are many optimisations, like devirtualization, that can only be done (or done much more easily/effectively) at runtime.

You can get some of this back with whole-program/closed-world optimisation, but in reality that's highly impractical for Java. Many Java programs are very large and people want the ability to update them quickly and easily, often without even restarting let alone recompiling.

An LLVM-based Java runtime could use a hybrid AOT/JIT model, however. Recompiling parts of the program as needed at runtime based on profiler data, you'd get the fast startup of AOT combined with the high performance of JIT.

Don't get any crazy ideas about beating Hotspot's performance, though.

[1] one exception is classes/methods that are defined at runtime using dynamically generated bytecode. But that's pretty rare.


Still there are plenty of commercial options to AOT compile Java, and OpenJDK is getting its free variant with Java 9 for Linux x64, with Windows and OS X already supported on the Java 10 development branch.

Oracle Labs is also pushing forward their research of meta-circular JVM, which makes use of AOT compilation. Although it restricts it to a "native Java" subset.

http://cr.openjdk.java.net/~jrose/metropolis/Metropolis-Prop...


OpenJDK's AOT compiler, when you also allow it to include the JIT and still recompile hotspots, is about 5% slower than just using the JIT. The only reasons to use it are startup times, ease of distribution (no JVM to ship, sort of), and for use in places that don't allow JITs (iOS).


OpenJDK's AOT is also on its early days.

Have you ever profiled Excelsior JET, Aonix, Jamaica, IBM Websphere Real Time for example?

Their products are almost as old as Java, so I imagine they are worth their price.


I believe those products prohibit doing public benchmarking so even if I had I couldn't tell you the results. :)


However if they were really bad, they would have already disbanded the teams, because companies would just use the gratis JDK with pure JIT.

Regarding OpenJDK, according to the Java Languages Summit talk, the Java 10 dev branch already has better optimizations than version 9.


But the exact same is valid for Go...



No, but there is a LLVM implementation on top of Java/JVM. :)

https://github.com/graalvm/sulong

Which is much better, because we need more language research with type safe languages, not with C and C++.


I had a similar project idea for ages. Instead of LLVM it would static recompile into C macros or C++ inline function calls for each opcode. Then it would compile the output and run the game.

Emulation is just plain fun.



The compiler tends to treat static variables very differently from locals. I was holding out for efficient local functions which can be forced inlined to keep the emulated registers as actual system registers. Sadly std::function doesn't play well with force inline on all platforms.


lol, it just came to me that clang is an abbreviation for C language. I always imagined it to be the sound metal makes when you slam it together, because bare metal programming etc.


Though for what it's worth, the official pronunciation is the klang metal sound:

http://lists.llvm.org/pipermail/llvm-dev/2008-July/015629.ht...


Could be a pun.


I came upon this realization very recently too.


Clang is a compiler, part of the LLVM suite ;)


The clang llvm frontend is for C, C++ and Objective-C.


Dupe: submitted 1560 days ago - https://news.ycombinator.com/item?id=5838326


The polite convention on HN is to say this was discussed previously, and link to it, especially when it was last submitted and discussed over four years ago :-)


Please note that duplicate submission are not discouraged on HN in general, especially not after such a long time.

However, thanks for sharing the link to the previous discussion!


Although clicking the |past| link takes you to this page: https://hn.algolia.com/?query=Statically%20Recompiling%20NES...


Reposts are ok after about a year: https://news.ycombinator.com/newsfaq.html.


I'm always confused about "Guidelines" and "FAQ". It seems that whenever I have a question about HN, I need to consult both, because it is not clear at all which site covers which question.

Maybe "Guidelines" and "FAQ" should simply be merged?

Edit: Opened an "Ask HN" submission about this: https://news.ycombinator.com/item?id=15255270


I think Meta Ask HNs are also discouraged :) you're supposed to send them an email.


I understand what you're saying. There's the Show HN guidelines as well. One could imagine them all as three sections on one page. The guidelines are for what you should do; the FAQ for general information. Then again, they're short. It's not a lot of effort to check both.


This should have a 2013 added to the tittle.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: