> Actually no, for complicated reasons. The Game Boy chip has a handful of logic gates that operate independently of the master clock and whose exact behavior depends on things like gate delay. These gates create glitches that depend heavily on the physical placement of the gates, the silicon process used to make them, and other weird things like temperature and voltage.
This is fascinating, I previously would have assumed that a gate-level simulation would necessarily be perfect by definition.
And incredibly annoying as the developer - so much work translating and debugging gates, and it's still not "correct" for some definition of correct ;)
Reminds me of hearing about how Crash Bandicoot was developed, as well as being something of a masterpiece in itself they successfully debugged an issue that caused the memory card to get corrupted when the controller was touched in a certain way. I can't remember the precise reason but they eventually discovered that running one of the chips at a higher frequency introduced subtle electrical interference with nearby components on the board. This was after much debugging for a supposed software issue in the memory card logic. I often think of that when I have a tricky bug to sort out.
If you want a great and long read, Andy Gavin has a pretty extensive series about Crash Bandicoot on his blog, some entries more technical than others. I've never even played the games for more than a few minutes, but I had a great time reading through these
Don't be too annoyed. There are significant differences across models and even batches within the same generation of system. There is no completely correct definition of a system.
Some of these differences are CPU observable, so I made this ROM to identify different Game Boy models and revisions:
https://github.com/mattcurrie/which.gb
So is this saying that a possible modern day commercial for a Game Boy might feature: "The artisanal properties of the Nintendo Game Boy make each and every one a unique playing experience!"?
I guess it's impossible to ever have it correct. The gates are just another level of abstraction. You could also simulate the silicon atoms and the electrons flowing and use that to have transistors and then gates. And so on.
This is also the reason why it is so hard to protect against side channel leakage. During my PhD I worked on a solution that would be theoretically perfect (and is formally proved so), but is not in practice because of exactly that.
If you simulate cycle accuracy you don’t need to simulate each gate. Think of pins on your cpu and the trillions of gates. If you simulated your cpu you just care what’s at the pins. Not at the gates.
This assumes the gate level implementation doesn't have any side effects that are related to more than an instantaneous state. Seemingly unlikely as that is a larger class than the group of side effects mentioned in the quote above.
As I understand it asynchronous logic stopped being a common thing decades ago, so those sorts of glitches aren't possible (or less possible?) now. The hard stuff in simulation now is probably handling multiple clock domains.
Still out there, but only used in special situations. The ARM microcontroller I'm working with has an asynchronous prescaler on the real-time clock crystal input before it gets fed into a second synchronous prescaler which you can then read the output of directly.
This is all about reducing power consumption down in the nA range when the CPU is turned off but the RTC keeps ticking over.
Sort of? It's almost more equivalent to networking or distributed processing - circuits in different clock domains can't just send a wire to another domain, they have to go through a synchronizer and do some handshaking and other stuff that's vaguely similar to RPCs. I'm stretching here, it's slightly beyond what I've worked with so far.
Seems like the same as in multithreaded programming, in which you can’t let threads share memory without synchronizing else you get data races and corruption.
it's more like two simple computers talking over serial, and the serial connection bitrate can't be faster than the clock of the slowest of the two computers.
Sort of. You have to be careful when signals cross clock domains because they can become asynchronous. So anytime you go from one domain to another, you have to be sure to synchronize the data. This is often accomplished via flip flop.
There are examples on the github page, literally right below that sentence - have you stopped reading just before that point??
"For example, there's a glitch in the external address bus logic that causes internal bus addresses like 0xFF20 to appear on the external bus even though the logic should prevent that. Due to gate delays, not all of the inputs to gate LOXO (page 8 in Furrtek's schematics) arrive at the same time. This causes LOXO to produce a glitch pulse that in turn causes latch ALOR to make a copy of one bit of the internal bus address. ALOR then drives that bit onto the external bus (through a few more gates) where it can be seen with an oscilloscope or logic analyzer."
The creator—Austin Appleby—also created MurmurHash, which is very useful for data structures like Bloom filters. The canonical MurmurHash implementation was released with a handy non-cryptographic hashing test suite.
What a marvelous project! The amount of efforts that the author put into this project makes my jaw drop.
This project reminds me of a question I've had in my mind, though. If this world is a simulation, would that be an accurate simulation of the elementary particles and their interactions (like this project), or an appropriated one (like normal Game Boy emulators)?
FurrTek deserves as much credit as I do for doing the initial exceptionally painstaking step of marking out the die traces and cells on the die shot. Not all of his guesses were correct, but they were enough to get things started.
I suspect that this world is a high-level simulation until you pull out a microscope, at which point it switches to low-level mode. ;)
I read somewhere (I'll dig for the article) that one of the conclusions of Quantum mechanics was that "things don't exist unless you look at them". And the claim was that this actually happened on the macro level.
> GateBoy, LogicBoy, and MetroBoy exist to give me a starting point for working on Metron, which is my long-term project to build a set of programming tools that can bridge between the C/C++ universe used by software and the Verilog/VHDL universe used by hardware.
I'd love more details about this. What does it mean to bridge C/C++ and Verilog/VHDL?
SystemVerilog has enough C++-like _stuff_ in it that you can write a surprising amount of code that looks almost like C++ - you can call methods on objects, pass interfaces around, lots of other things that you might not immediately recognize as a hardware language.
That said, SystemVerilog does not _run_ like C++. Conceptually every function and method is running simultaneously all the time, with clock edges keeping things synchronized.
But... You can, with a extreme amount of care, write C++ code that can be almost-trivially translated line by line into SystemVerilog and can produce the same results when executed - provided you follow a whole lot of strict rules that you have to manually enforce somehow on the C++ side.
GateBoy enforces those rules at runtime in debug builds (see Gates.h), and I have written a very quick and dirty proof-of-concept LLVM plugin that can enforce those rules at compile time. I've also written another LLVM tool that can do the automatic translation from C++ to SystemVerilog, and I regression-tested it on a chunk of code from MetroBoy to prove that everything works as claimed. I was able to take the original C++, translate it to SystemVerilog, translate that _back_ to C++ using Verilator, run both C++ versions in lockstep, and verify that every register bit at every cycle was identical.
Eventually I'll get the LLVM tools to the point where they can validate and translate all of GateBoy's codebase, and then those tools will be released in my "Metron" repository.
Easier than VHDL, harder than C. Chisel's "adder" example on the wikipedia page is:
class Add extends Module {
val io = IO(new Bundle {
val a = Input(UInt(8.W))
val b = Input(UInt(8.W))
val y = Output(UInt(8.W))
})
io.y := io.a + io.b
}
whereas the same in C would be
uint8_t Add(uint8_t a, uint8_t b) {
return a + b;
}
and while that is not directly usable in SystemVerilog, it can be mechanically translated to
function byte add(byte a, byte b)
return a + b;
endfunction
There's way more to it than that, but I strongly believe there are better ways of writing hardware than the existing HDLs.
Honestly from the chisel and C example I can’t really see much a difference. Not sure why corporate it making me choose.
That io bundle in your first example is defined in place and then used. If that was previously defined (which would be a shitty Wikipedia example) it would look a lot like your C example.
Would this ever make it into the MiSTer FPGA project as a core? There is currently a GB core included, but I'm not sure whether the VHDL is a 1:1 mapping to the actual HW gates themselves.
The cores they have are more than good enough. I do intend to get GateBoy to where it can be automatically translated to Verilog though, at which point running on a FPGA should be more straightforward.
You either sort by cone, getting great gate-value reuse density, but lose gobs of parallelism; or, you sort by gate-type & eat-shit on gather-scatter. It’s a devil’s bargain, either way. It’s why the majors sell $$$…$$$ custom emulation equipment.
Won't happen though, way too niche market - outside of a single-digit number of hobbyists, the only ones who have an actual use for such equipment are companies for whom the expense is a rounding error on the balance sheet.
I've heard family talk about chip design and verilog. This is the first time i feel like I've had a glimpse into what they're talking about. Great write up.
They added/changed/removed instructions to fit the system more closely, e.g. IN/OUT is now done through MMIO in the FFxx area and thus there are dedicated instructions to access that address range.
The Gameboy CPU is signifigantly less capable than a full z80. It's missing key features like the IX and IY registers (and thus addressing modes), most of the 16-bit ALU, the second set of "shadow" registers intended for interrupt handling (instead, almost all gameboy interrupt routines start by pushing all the current registers onto the stack), and a bunch of other things I forget.
There's also some stuff it adds, such as the swap instruction (swapping the upper and lower nibbles of a byte) and the stop instruction (puts the gameboy into a low-power state until it is woken by a keypress).
> Actually no, for complicated reasons. The Game Boy chip has a handful of logic gates that operate independently of the master clock and whose exact behavior depends on things like gate delay. These gates create glitches that depend heavily on the physical placement of the gates, the silicon process used to make them, and other weird things like temperature and voltage.
This is fascinating, I previously would have assumed that a gate-level simulation would necessarily be perfect by definition.