GateBoy – A gate-level simulation of the original Game Boy hardware

warning26 · on Sept 2, 2021

> Is GateBoy a perfect simulation of a Game Boy?

> Actually no, for complicated reasons. The Game Boy chip has a handful of logic gates that operate independently of the master clock and whose exact behavior depends on things like gate delay. These gates create glitches that depend heavily on the physical placement of the gates, the silicon process used to make them, and other weird things like temperature and voltage.

This is fascinating, I previously would have assumed that a gate-level simulation would necessarily be perfect by definition.

aappleby · on Sept 2, 2021

And incredibly annoying as the developer - so much work translating and debugging gates, and it's still not "correct" for some definition of correct ;)

BoxOfRain · on Sept 3, 2021

Reminds me of hearing about how Crash Bandicoot was developed, as well as being something of a masterpiece in itself they successfully debugged an issue that caused the memory card to get corrupted when the controller was touched in a certain way. I can't remember the precise reason but they eventually discovered that running one of the chips at a higher frequency introduced subtle electrical interference with nearby components on the board. This was after much debugging for a supposed software issue in the memory card logic. I often think of that when I have a tricky bug to sort out.

williamdclt · on Sept 3, 2021

https://www.gamedeveloper.com/programming/my-hardest-bug-eve...

A great and short read

TheJoeMan · on Sept 3, 2021

“ This is the only time in my entire programming life that I've debugged a problem caused by quantum mechanics “

Who knows, we may be running into this more literally in our lifetime.

BoxOfRain · on Sept 3, 2021

I suppose all problems of this world are caused by quantum mechanics in a sense.

hbn · on Sept 3, 2021

If you want a great and long read, Andy Gavin has a pretty extensive series about Crash Bandicoot on his blog, some entries more technical than others. I've never even played the games for more than a few minutes, but I had a great time reading through these

https://all-things-andy-gavin.com/video-games/making-crash/

deelowe · on Sept 2, 2021

Don't be too annoyed. There are significant differences across models and even batches within the same generation of system. There is no completely correct definition of a system.

mattcurrie · on Sept 3, 2021

Some of these differences are CPU observable, so I made this ROM to identify different Game Boy models and revisions: https://github.com/mattcurrie/which.gb

deelowe · on Sept 3, 2021

Very cool!

grp000 · on Sept 3, 2021

So is this saying that a possible modern day commercial for a Game Boy might feature: "The artisanal properties of the Nintendo Game Boy make each and every one a unique playing experience!"?

ceronman · on Sept 3, 2021

I guess it's impossible to ever have it correct. The gates are just another level of abstraction. You could also simulate the silicon atoms and the electrons flowing and use that to have transistors and then gates. And so on.

p4bl0 · on Sept 3, 2021

This is also the reason why it is so hard to protect against side channel leakage. During my PhD I worked on a solution that would be theoretically perfect (and is formally proved so), but is not in practice because of exactly that.

You can find the paper (Formally Proved Security of Assembly Code Against Power Analysis) and presentation slides on my web page: https://pablo.rauzy.name/research.html#rauzy2014formaldpl

flatiron · on Sept 2, 2021

If you simulate cycle accuracy you don’t need to simulate each gate. Think of pins on your cpu and the trillions of gates. If you simulated your cpu you just care what’s at the pins. Not at the gates.

zamadatix · on Sept 2, 2021

This assumes the gate level implementation doesn't have any side effects that are related to more than an instantaneous state. Seemingly unlikely as that is a larger class than the group of side effects mentioned in the quote above.

flatiron · on Sept 2, 2021

You are right. You buy nothing other than research for gate level. For accuracy you simply need cycle level (which is nothing easy mind you)

deelowe · on Sept 2, 2021

I read this as well. I'm definitely referencing this in the future when someone mentions how "FPGAs directly simulate the system hardware."

aappleby · on Sept 2, 2021

As I understand it asynchronous logic stopped being a common thing decades ago, so those sorts of glitches aren't possible (or less possible?) now. The hard stuff in simulation now is probably handling multiple clock domains.

photojosh · on Sept 2, 2021

Still out there, but only used in special situations. The ARM microcontroller I'm working with has an asynchronous prescaler on the real-time clock crystal input before it gets fed into a second synchronous prescaler which you can then read the output of directly.

This is all about reducing power consumption down in the nA range when the CPU is turned off but the RTC keeps ticking over.

Multiple clock domains definitely tricky though!

jasonjayr · on Sept 3, 2021

As a software-only guy .... multiple clock domains sounds like the same class of problem as multi threaded + parallel programming, right?

aappleby · on Sept 3, 2021

Sort of? It's almost more equivalent to networking or distributed processing - circuits in different clock domains can't just send a wire to another domain, they have to go through a synchronizer and do some handshaking and other stuff that's vaguely similar to RPCs. I'm stretching here, it's slightly beyond what I've worked with so far.

sizeofchar · on Sept 3, 2021

Seems like the same as in multithreaded programming, in which you can’t let threads share memory without synchronizing else you get data races and corruption.

naikrovek · on Sept 3, 2021

it's more like two simple computers talking over serial, and the serial connection bitrate can't be faster than the clock of the slowest of the two computers.

photojosh · on Sept 4, 2021

Yep! The microcontroller reference manual is full of warnings to ensure clock rates are compatible by being within a certain range of each other.

deelowe · on Sept 3, 2021

Sort of. You have to be careful when signals cross clock domains because they can become asynchronous. So anytime you go from one domain to another, you have to be sure to synchronize the data. This is often accomplished via flip flop.

Teknoman117 · on Sept 3, 2021

This is my absolute favorite paper on the topic:

http://www.sunburst-design.com/papers/CummingsSNUG2008Boston...

deelowe · on Sept 2, 2021

I’ve only ever seen behavioral solutions for newer systems. At a certain point, logic level designs become unreasonable.

matheusmoreira · on Sept 3, 2021

> These gates create glitches

What sort of glitches? Which effects could they produce on outputs?

I wish there was a concrete example.

gambiting · on Sept 3, 2021

There are examples on the github page, literally right below that sentence - have you stopped reading just before that point??

"For example, there's a glitch in the external address bus logic that causes internal bus addresses like 0xFF20 to appear on the external bus even though the logic should prevent that. Due to gate delays, not all of the inputs to gate LOXO (page 8 in Furrtek's schematics) arrive at the same time. This causes LOXO to produce a glitch pulse that in turn causes latch ALOR to make a copy of one bit of the internal bus address. ALOR then drives that bit onto the external bus (through a few more gates) where it can be seen with an oscilloscope or logic analyzer."

Anunayj · on Sept 3, 2021

Btw I'm a little confused, So are these "glitches" reliably same across all game boys or can this differ from chip to chip?

randomswede · on Sept 6, 2021

They can differ from gameboy to gameboy. And from day to day (as some of them are apparently temperature dependent).

rot13xor · on Sept 3, 2021

Latches or a ring oscillator to generate true randomness. There could also be bugs in the design that cause metastability.

jstrieb · on Sept 2, 2021

This is very impressive!

The creator—Austin Appleby—also created MurmurHash, which is very useful for data structures like Bloom filters. The canonical MurmurHash implementation was released with a handy non-cryptographic hashing test suite.

https://github.com/aappleby/smhasher

Thanks for your work, Austin!

aappleby · on Sept 2, 2021

You are quite welcome. :)

barosl · on Sept 2, 2021

What a marvelous project! The amount of efforts that the author put into this project makes my jaw drop.

This project reminds me of a question I've had in my mind, though. If this world is a simulation, would that be an accurate simulation of the elementary particles and their interactions (like this project), or an appropriated one (like normal Game Boy emulators)?

aappleby · on Sept 2, 2021

FurrTek deserves as much credit as I do for doing the initial exceptionally painstaking step of marking out the die traces and cells on the die shot. Not all of his guesses were correct, but they were enough to get things started.

I suspect that this world is a high-level simulation until you pull out a microscope, at which point it switches to low-level mode. ;)

all2 · on Sept 3, 2021

I read somewhere (I'll dig for the article) that one of the conclusions of Quantum mechanics was that "things don't exist unless you look at them". And the claim was that this actually happened on the macro level.

This is one of many articles that discusses the experiment: https://www.sciencealert.com/reality-doesn-t-exist-until-we-...

arduinomancer · on Sept 3, 2021

Ah yes the real life LOD system

modeless · on Sept 2, 2021

This is awesome stuff Austin!

> GateBoy, LogicBoy, and MetroBoy exist to give me a starting point for working on Metron, which is my long-term project to build a set of programming tools that can bridge between the C/C++ universe used by software and the Verilog/VHDL universe used by hardware.

I'd love more details about this. What does it mean to bridge C/C++ and Verilog/VHDL?

aappleby · on Sept 2, 2021

SystemVerilog has enough C++-like _stuff_ in it that you can write a surprising amount of code that looks almost like C++ - you can call methods on objects, pass interfaces around, lots of other things that you might not immediately recognize as a hardware language.

That said, SystemVerilog does not _run_ like C++. Conceptually every function and method is running simultaneously all the time, with clock edges keeping things synchronized.

But... You can, with a extreme amount of care, write C++ code that can be almost-trivially translated line by line into SystemVerilog and can produce the same results when executed - provided you follow a whole lot of strict rules that you have to manually enforce somehow on the C++ side.

GateBoy enforces those rules at runtime in debug builds (see Gates.h), and I have written a very quick and dirty proof-of-concept LLVM plugin that can enforce those rules at compile time. I've also written another LLVM tool that can do the automatic translation from C++ to SystemVerilog, and I regression-tested it on a chunk of code from MetroBoy to prove that everything works as claimed. I was able to take the original C++, translate it to SystemVerilog, translate that _back_ to C++ using Verilator, run both C++ versions in lockstep, and verify that every register bit at every cycle was identical.

Eventually I'll get the LLVM tools to the point where they can validate and translate all of GateBoy's codebase, and then those tools will be released in my "Metron" repository.

UncleOxidant · on Sept 2, 2021

> I've also written another LLVM tool that can do the automatic translation from C++ to SystemVerilog

Is this code available anywhere?

aappleby · on Sept 2, 2021

wait, the repo on Github is just the read/write checker. I'll have to go find the translator...

aappleby · on Sept 2, 2021

the Metron repo is on my Github but it's a horrendous embarassing mess.

aappleby · on Sept 2, 2021

And it's mostly a proof-of-concept so don't expect much :)

flatiron · on Sept 2, 2021

Anyone who wants the “typescript” of verilog should check out https://en.m.wikipedia.org/wiki/Chisel_(programming_language... Makes life’s 100x easier

aappleby · on Sept 2, 2021

Easier than VHDL, harder than C. Chisel's "adder" example on the wikipedia page is:

  class Add extends Module {
    val io = IO(new Bundle {
      val a = Input(UInt(8.W))
      val b = Input(UInt(8.W))
      val y = Output(UInt(8.W))
    })
    io.y := io.a + io.b
  }

whereas the same in C would be

  uint8_t Add(uint8_t a, uint8_t b) {
    return a + b;
  }

and while that is not directly usable in SystemVerilog, it can be mechanically translated to

  function byte add(byte a, byte b)
    return a + b;
  endfunction

There's way more to it than that, but I strongly believe there are better ways of writing hardware than the existing HDLs.

artemonster · on Sept 3, 2021

Imho, you‘re barking at the wrong tree. Design with existing HDLs is a solved problem. Verification is not :)

flatiron · on Sept 2, 2021

Honestly from the chisel and C example I can’t really see much a difference. Not sure why corporate it making me choose. That io bundle in your first example is defined in place and then used. If that was previously defined (which would be a shitty Wikipedia example) it would look a lot like your C example.

ArtWomb · on Sept 2, 2021

Whoa! This just cements my contention: Nintendo GameBoy should be on every intro to computational architectures syllabus ;)

aappleby · on Sept 2, 2021

I really think it should be, though I'd put a Risc-V RV32I core in place of the original one.

professoretc · on Sept 3, 2021

Writing Hello World for the original GameBoy is a module in the architecture course I teach.

Keyframe · on Sept 2, 2021

Amazing work. Congratulations on endurance!

UncleOxidant · on Sept 2, 2021

That had to require a huge amount of endurance, perseverance, etc. I'd like to see a story on how he kept motivated and focused on this project.

aappleby · on Sept 2, 2021

Family tragedy and pandemic, no lie. Working on it kept me sane.

throwyanope · on Sept 2, 2021

Would this ever make it into the MiSTer FPGA project as a core? There is currently a GB core included, but I'm not sure whether the VHDL is a 1:1 mapping to the actual HW gates themselves.

aappleby · on Sept 2, 2021

The cores they have are more than good enough. I do intend to get GateBoy to where it can be automatically translated to Verilog though, at which point running on a FPGA should be more straightforward.

userbinator · on Sept 3, 2021

Since logic simulation is a massively parallel task, I wonder how much faster it would be if you used the GPU to do it.

Funny to see the gates named "DEFY", and "CYKA" right below it...

thechao · on Sept 3, 2021

You either sort by cone, getting great gate-value reuse density, but lose gobs of parallelism; or, you sort by gate-type & eat-shit on gather-scatter. It’s a devil’s bargain, either way. It’s why the majors sell $$$…$$$ custom emulation equipment.

It’s also a field long overdue for disruption.

mschuster91 · on Sept 3, 2021

> It’s also a field long overdue for disruption.

Won't happen though, way too niche market - outside of a single-digit number of hobbyists, the only ones who have an actual use for such equipment are companies for whom the expense is a rounding error on the balance sheet.

ip26 · on Sept 3, 2021

Doesn’t avx offer bitwise logical operations? Pack your wires into avx vectors.

piyh · on Sept 3, 2021

I've heard family talk about chip design and verilog. This is the first time i feel like I've had a glimpse into what they're talking about. Great write up.

mysterydip · on Sept 2, 2021

I was under the impression the gameboy cpu was a standard z80, but it seems that's not the case? What did they add to it?

userbinator · on Sept 2, 2021

https://gbdev.gg8.se/wiki/articles/CPU_Comparision_with_Z80

They added/changed/removed instructions to fit the system more closely, e.g. IN/OUT is now done through MMIO in the FFxx area and thus there are dedicated instructions to access that address range.

ekimekim · on Sept 2, 2021

The Gameboy CPU is signifigantly less capable than a full z80. It's missing key features like the IX and IY registers (and thus addressing modes), most of the 16-bit ALU, the second set of "shadow" registers intended for interrupt handling (instead, almost all gameboy interrupt routines start by pushing all the current registers onto the stack), and a bunch of other things I forget.

There's also some stuff it adds, such as the swap instruction (swapping the upper and lower nibbles of a byte) and the stop instruction (puts the gameboy into a low-power state until it is woken by a keypress).

aappleby · on Sept 2, 2021

They removed some instructions and added some instructions. Check the pandocs - https://gbdev.io/pandocs/

GauntletWizard · on Sept 3, 2021

This is awesome!