I love reading these kinds of reverse engineering discovery tales, I find them technically educative, and overall enlightening... they help put my own struggles debugging thorny things into sharp perspective, making me even more appreciative of the level of control I have over my own code and it’s runtime environments.
The level of effort put into reverse engineering for emulating classic hardware of all kinds is just remarkable. The Dolphin EMU team has a long history of these kind of blog posts for people looking for more of this kind of writing https://dolphin-emu.org/blog/
I've always had the attitude that if can reproduce a bug consistently, I can debug it, and I can fix it. Now here's a nail in the coffin of that theory.
Only if the problem is purely software. I've had reproducible problems that were impossible to solve by myself. The process went like this:
- Huh, getting USB disconnects randomly, need to fix this
- Hmm, device driver seems to be fine
- Hmm usbnet driver seems fine
- Hmm xhci driver seem fine
- $5000 and one USB 3.0 protocol analyzer later...
- What the heck is this host controller doing?
At this point it was looking like a hardware problem but seeing as I didn't have the schematics or verilog of the host controller I could go no further (nor did I want too... I never imagined USB could be such a deep rabbit hole).
Most of the bugs I've dealt with, the problem was usually down to something simple that can be explained quickly. (e.g. syntactically valid typo, config error, simple logic error etc.).
Seems like "Holy grail" bugs like this happen to need a thorough understanding of complex systems (which I usually like to hide under layers of abstraction).
In defense of the coffin, I'd say this is not so much about a bug, but about reverse engineering real world effects of undefined behavior (as caused by a bug in software).
That is not the goal of mGBA (or any GBA emulator for that matter), btw. If you wanted to reproduce it faithfully, the emulator would be unusably slow.
You are trying to argue about the meaning of the words being used. This is not productive, as what matters is what the author of the argument meant when they used that word.
To be honest, I think it's about crossing the lines of emulation and simulation. In emulation, we mostly consider instruction level behavior as good enough, but here, we'll have to venture below cycle level and integrated hardware effects to copy the behavior.
These test ROMs are interesting. Since they are designed to run on real hardware that lacks debugging capabilities, the user must be able to operate the test suite and check the results manually. They aren't like modern software test suites.
According to the nesdev wiki, automation consists of playing back controller input, waiting for the tests to run, taking a screen shot of the results screen, hashing it and comparing the result to the hash of the expected results screen. This method also allows testing real games to check for reproducible bugs.
Definitely just working by chance in this case. This comes up a lot, unfortunately. Small budgets, tight deadlines, games not validated for correctness under software emulation for detecting these types of edge cases prior to shipping.
Another example: there are several GBA games that dereference null pointers, but it works because there is no memory protection fault hardware in the GBA. Since the BIOS is mapped to address 0 and yet locked out from reading post-boot (in a failed anti-copying mechanism), you have to emulate the open bus behavior to run these games correctly as a result.
There are also several anti-emulator routines in the GBA library, such as lying about the type of save memory (flash, EEPROM, or SRAM) that a given game actually has (by including strings in the ROM from the Nintendo SDK used to access them.)
Still, in the case of Hello Kitty, there's got to be a reason the developer added that loop at startup, right? They might have not understood the intricacies of how or why it worked, but presumably they were trying to do something.
> Then I noticed a difference between my test ROM and Pokémon Emerald when they hang. There was music playing in Pokémon. There was also music playing in Sonic Pinball Party. There wasn’t music playing in Hello Kitty, but this gave me an idea.
Maybe there was supposed to be music playing, this was for timing it, and it was removed, broken, or just forgotten about?
For the Game Boy Advance, yes. There was both a secretive internal Nintendo GBA emulation kit, and the public emulator development community had devkit games emulated before the system even officially launched.
I understand it was less practical to do this for say, the NES or SNES. But it should have been done for the GBA. It's not just important for future hardware to be able to easily emulate GBA games, it also protects against product refreshes breaking games, something that plagued quite a few classic Game Boy games as newer models came out and fixed bugs in the older models.
Interesting story. Goes to show you the falacy of writing fault tolerant systems as opposed to writing noisy systems.
Had the GBA architects made the device halt on invalid memory rather than return the last thing on the bus and hope everything would be ok this entire class of bug wouldn't exist and the authors of that code likely would've found that bug during production.
> Had the GBA architects made the device halt on invalid memory rather than return the last thing on the bus and hope everything would be ok this entire class of bug wouldn't exist and the authors of that code likely would've found that bug during production.
Of course, would it matter if these bugs were found? The games ran fine on real hardware.
There’s no maintenance burden either, because updating a gba game is literally impossible.
Not quite. The Pokémon Ruby/Sapphire Berry Glitch bug was fixed by connecting to games that had a patch program, such as Pokémon Emerald, the 3rd in the Ruby/Sapphire/Emerald series.
Huh, cool. Does anyone know how this worked? I know the carts themselves are mostly write-once so the bug and patch must have been related to the save portion or some hardware specific to this cartridge, right?
> because updating a gba game is literally impossible
Hey now, I'm sure it's not "impossible", you could probably overwrite specific sectors with laser or whatever, but probably had more to do with the distribution, where the economy and environment impact wouldn't make sense.
Are they "bugs" from the point of view of the gba designers or the game programmers? I was u der the impression that the game programmers purposely read invalid memory this way in order to prevent emulation
Correct. When you're coding really close to the metal you discover (and perhaps depend upon) all kinds of shenanigans that a higher level language compiler might work around.
The graphics on Atari 2600 had to be fed line-by-line during the vertical blanking interval. Made for some really tricky assembly coding, trying to get enough done to make the interval without wasting VERY limited ROM resources (aka can't cheap with NOP instructions).
I'm assuming you meant "horizontal blanking interval" instead of "vertical blanking interval". But I don't think that's entirely correct either. Some games will change the registers while the beam is actively drawing so that, e.g., the left half of the line will be drawn with one set of settings and the right half drawn with another.
The level of effort put into reverse engineering for emulating classic hardware of all kinds is just remarkable. The Dolphin EMU team has a long history of these kind of blog posts for people looking for more of this kind of writing https://dolphin-emu.org/blog/