Hacker News new | past | comments | ask | show | jobs | submit login

That's always an issue, but in my testing with emulators, once you rule out external inputs (eg keyboard input), it tends to become predictable and reproducible. And using a VM should help dramatically in reducing randomness. And you keep capturing states and narrowing the window until you can reproduce it in eg ten seconds; as too much can happen in the span of ten minutes. The trick is that you have to get a state right before the problem actually begins, but you don't yet know where that is. So it's possible your state capture may be a bit too late, after the issue already eg corrupted memory somewhere.

I know that there are certainly bugs that this kind of technique would never work on. I've hit a few bugs that could only be triggered on live hardware before. But I'm curious if they've tried this kind of approach for this bug yet or not.




Could you post more about your technique? Like, what emulator are you using (qemu)? How do you trigger the snapshots? Are you snapshotting memory or disk or both? How much disk space is consumed by all these snapshots? Do you discard snapshots? How do you know when the bug has been triggered?


Sure, it's my own software. I hook up saving and loading states to key presses, so eg F5 would save, F7 would load. And then F6/F8 would increment or decrement the save slot number.

You basically have to capture everything possible: all memory, the state of all the CPU registers, the state of all hardware registers, etc. Obviously disk would be a real challenge, where you'd have to keep a delta list of disk changes since program start, or simply not serialize that state. If you miss anything, you can have problems loading states correctly. However, there is quite a bit of tolerance between theory and practice, so if there is something that you really can't capture the state of for some reason (like a hardware write-only register when you weren't logging what was previously written to it), a lot of times you can get away with it anyway.

Because the system I am using is so old, snapshots are only 300KB each. Sometimes I dump them to disk, sometimes I just keep them in RAM. I know that a PC would be much more challenging, given how much more hardware is at play, and because VMs aren't quite the same as pure software emulation like qemu (though you could potentially use qemu for this too), but VM software does implement this snapshot system, so clearly it's possible.

You know when the bug is triggered through the visual output. And what's cool is that by saving periodic snapshots automatically to a ring buffer, you can code a special keypress to "rewind" the program. So it crashes, then you go back a bit, and save a disk snapshot there, and wait to see if the bug repeats. If it does, then you turn on trace logging and dump all of the CPU instructions between your save point and the crash. Then you go to the crash point, and slowly work your way back to try and find out where things went wrong.


Byuu is the author of the excellent SNES (Super Nintendo) emulator bsnes. I would guess he is referring to emulation errors in emulators, where snapshot support is common, and usually of good quality.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: