Hacker News new | past | comments | ask | show | jobs | submit login
Nestur: NES Emulator in Rust (github.com/spieglt)
153 points by tosh on Jan 6, 2020 | hide | past | favorite | 52 comments



> (Warning: this pipe currently takes you to an empty room, it's not the only one, and I don't know why.)

If you look into speedruns you'll see that pipe behavior is very strange. One of the top speedruns involves pushing Mario into blocks in order to move the camera by something like 10 px, so that when you go down a certain pipe it goes to the warp zone instead of the bonus area.

Might be worth watching some videos about that speedrun and trying to trace what's going on in the code to allow that. It might get you closer to understanding your own pipe issue.

Unfortunately I don't have a link offhand, but search something like `perfect mario speedrun` and you'll probably find it.


You are thinking of Wrong Warping. Super Mario Bros has, in essence, only one idea for warping out of the level: a table of destinations based on the screen's X location. When you pass an entry, the next entry becomes hot. If an exit is triggered (ie. entering a pipe, climbing a vine, etc), the game will go to the hot entry's starting screen.

Normally, Mario is locked to a maximum screen offset to the right (120 pixels, if I recall) and the table is written with the expectation that Mario will be in bounds when entering the warp; however, there are certain things (bumping things, etc) that that cause Mario to gain a few pixels past the limit. The warp will go to the current hot entry, which is the prior entry to the normal one.

The notorious use is 4-2. You want to use the vine to warp to level 8, but the vine has a huge animation time. There is a warp pipe farther past. It turns out that it's faster to time a series of bumps to get yourself several pixels past the screen limit and use the pipe. The game uses the vine's warp from the table, and now you can continue on without having gone through the vine animation.

So, the emulator may not representing the screen's offset correctly.


Thank you, that's exactly what I was referring to and an excellent explanation.

> So, the emulator may not representing the screen's offset correctly.

Yeah, that or the table's not being filled correctly, or it uses a load instruction that does pointer arithmetic, or something. That's my underinformed theory at least. Probably worth checking out.


Trying to divine the eccentrics of the nes would be a nightmare. In order to emulate the hardware properly, there are several bugs that need to be reproduced such as the oamaddr bug for example. Most small projects like this aren't striving for cycle and bug perfect emulation, just for proof of concept. If you want state of the art emulation for accuracy, look at mesen or bizhawk. Unfortunately they both incorporate so many other features that reading the "core" emulation bits is difficult. So for casual reference, projects like this one are great.

[1] https://github.com/sourmesen/mesen

[2] https://github.com/TASVideos/BizHawk


I've not done NES specifically, but I've written some emulators. As the pack-in game and a cultural landmark, I'd probably be targeting SMB 1 accuracy first and foremost. It doesn't seem to do anything crazy like vertical scrolling, so my ignorant assumption would be that it'd be a good game to start with.

> there are several bugs that need to be reproduced such as the oamaddr bug for example

Given what I've heard about how pipes work, that sounds like a good place to start digging in. Do you have any specific references for that? I'd much rather hear first-hand experience rather than google results, since so much of early console hardware documentation is flat out wrong.


Unfortunately I do not, I've toyed around with small cpu architecture emulation and a touch of gameboy. The bug I was referring to is also called the sprite overflow bug.

http://wiki.nesdev.com/w/index.php/PPU_sprite_evaluation

Edit: There's also a bit of documentation in one of the bizhawk nes cores: https://github.com/TASVideos/BizHawk/tree/master/BizHawk.Emu...


I'll bet you a nickel something screwy is going on with the JMP Indirect logic. If it helps, Super Mario Bros. for the NES has been disassembled with wonderful comments and labels:

https://gist.github.com/1wErt3r/4048722

At some point during every frame, shortly after the NMI routine completes, the game ends up on this particular line of code:

https://gist.github.com/1wErt3r/4048722#file-smbdis-asm-L929

This calculates an address based on the current game mode, and then jumps there. When the player performs any action that changes the level, this mode cycles a few times while the game code blanks the screen, switches gears, and loads in new data. The indirect jump and branching logic that drive the game's mode switching are tricky to get right, and if they're wrong, this little kernel will break in surprising ways.

When I was troubleshooting my branching routines, I had my emulator pause on SMB's first indirect jump, then found that position in the disassembled code and followed along. I mean, you can achieve similar results by unit testing or working to pass all of blargg's tests, but I found it much more fun to trace the emulator in a live environment, running "production" code.


> I'll bet you a nickel something screwy is going on with the JMP Indirect logic.

You've lost a nickel! It was a problem with my signed-offset branch function where a bad cast was allowing 0x80 (-128) to overflow. It was really fun trying to troubleshoot by hooking certain memory accesses and subroutines, but what wound up fixing it was investigating the crash of Blargg's `instr_test-v5/rom_singles/11-stack.nes`. Thank you for your help!


> I'll bet you a nickel something screwy is going on with the JMP Indirect logic

I think we're on the same page, except that you sound like you actually know what you're talking about and I'm just following my gut, also known as `guessing`.

Thanks for the link, that's a great reference!


You are thinking of 4-2. This is the 1-2 pipe.

Here is the video: https://youtu.be/i1AHCaokqhg


Oh I recognize where it is, I've wasted away plenty of time in Mario games over the years.

I brought it up because I'm told that the general idea applies to all pipes. They take you to the secret area for that level, and if there are multiple pipes that lead to different areas then that secret area has to be switched out. That sounds like it's relevant if the secret area in this case is an empty space, perhaps that memory hasn't been initialized or there's a specific load instruction that's doing something different in the emulator as compared to the hardware. (And for all I know the emulator could be following the spec correctly, but the hardware worked differently)

Either way, thank you for the video link.


The key word here is "subpixels."


I understood subpixels to be something different.

The core of what I wanted to point out is the behavior of warp pipes in SMB1, which was outlined quite well here: https://news.ycombinator.com/item?id=21976073

Specifically the part where there's a section in memory somewhere that contains the details of where to warp to, and that varies on Mario's x coordinate.

If a warp pipe in this emulator drops you into an empty level, that sounds to me like that bit of memory isn't getting filled, or there's a jump or load instruction that's supposed to do arithmetic/read args from a register/etc.. and it's not implemented correctly.


Yeah, we're saying two different parts of the related thing. Changing Mario's x coordinate is called manipulating his subpixels, which is necessary to manipulate where the warp pipe sends you.


Ah, I see. When I hear subpixel I think rendering, didn't realize that it was also used for that.


Hi Steve, thanks for all you do for Rust!


pcwalton wrote a NES emulator in Rust 7 years ago as a demonstration of Rust: https://github.com/pcwalton/sprocketnes

Also, a nitpick:

> One line of unsafe (std::mem::transmute::<u8>() -> i8)

You don't need unsafe for this, you can do that with an `as` cast.


Yeah, I'm going to fix that. I thought I'd tried `u8 as i8` and it hadn't worked, but I guess not.


Possibly clippy complained about it and you mistook it for a compile error?


[deleted]



Like the that your hobby might be opening Rust code, searching unsafe just for the thrills :)

Edit: Jesus kids, learn a little chill it was a joke. Someone didn’t like this so much they went through and downvoted all my other posts they could. Wow... that’s real serious.


It was mentioned in the README, and stood out to me as surprising (surely you don't need unsafe to cast from unsigned to signed bytes?)


Without knowing enough about Rust to speak authoritatively... u8 can represent larger integers than i8 (no sign bit). Surely it's not completely safe.


`mem::transmute` is roughly equivalent to a `reinterpret_cast` in C++. It treats the bits of a u8 as an i8.

In the Rust definition of safety (mutable xor shared, no data races, memory safety, etc), treating the bits of a u8 as an i8 is safe and can be done with an `as` cast.


Thanks for clarifying that for me!


Hmmm. I was assuming this was just changing interpretations of the same underlying 8 bits. No?

Edit: yeah, “Casting between two integers of the same size (e.g. i32 -> u32) is a no-op” (https://doc.rust-lang.org/reference/expressions/operator-exp...)


Yup, fair enough.


It's a quote from TFA


I recently wrote a NES emulator in Rust aswell. It was one of the most fulfilling side projects I've done. I did a brief write-up too: https://ltriant.github.io/2019/11/22/nes-emulator.html

Kudos to getting to this point. Adding support for more mappers is a nice incremental thing that can be done at your leisure, and once you get save states implemented, playing games at your own pace is super fun :)


Thanks! Mapper 1 is since done as well as battery-backed RAM support, and I'll be implementing mappers 2-4 in the next couple days. I just read your blog post and the audio section caught my eye: I went through the exact same struggles with trying to queue 44,100Hz to SDL, including trying to implement dynamic sampling, but it was never quite right and I got quite frustrated with it. So eventually I took another pass at using the callback method instead, which I thought wasn't possible as you can't exfiltrate a reference to the struct that you give the audio device, but I got it to work this time with an Arc<Mutex<Vec<u8>>> and it's wayyy better.

The APU sends all of its samples to a staging buffer, then between video frames, the Mutex for the SDL buffer is locked and the staging buffer is emptied into it. The SDL callback also locks the Mutex 60 times per second, takes evenly spaced samples from the raw data, and truncates what it consumed if it had more than it needed. Now the audio keeps perfect pace with the emulator, whether it slows to 58 FPS or goes too fast, and there's no popping and clicking. If you're curious, here are the relevant spots in the code now:

https://github.com/spieglt/nestur/blob/master/src/audio.rs https://github.com/spieglt/nestur/blob/e49921541d493b3616352...

I just beat the first Zelda dungeon last night, saving the file afterwards, and it was indeed really satisfying.


ADC = A + V + C

SBC = A - V - (1 - C)

SBC = A + (-1 * (V - (1 - C)))

SBC = A + (-V + (1 - C))

SBC = A + -V + 1 + C

I am not able to follow the mathemagic behind the third line. Is it normal algebra or its in 1's complement?


I think it looks odd, too. If you go with this:

    ADC = A + V + C
    SBC = A - V - (1 - C)
    SBC = A + (-1 * (V - (C - 1)))
    SBC = A + (-V + (C - 1))
    SBC = A + -(V + 1) + C
"The SBC instruction subtracts a value from the accumulator register, with an extra 1 subtracted from the result if the carry flag is NOT set."

The explanation fits though. If the carry flag is not set (C == 0) then you subtract an extra 1. If C == 1 then it's A-V.


Wow, yeah, that's a bit off. And incorrect. It should end up being equivalent to:

SBC = A - V - 1 + C

Thanks. I'll fix this up :)


> but the - 1 is taken care of because of the fact that ADC and SBC do the opposite thing with the carry flag.

Also, this line is bothersome, they do opposite things but that's why you have already done (1 - C).

Maybe the fact that -V is actually (!V + 1) (in 2's complement) that eats up the extra -1 is the correct explanation?

So,

  SBC = A - V - 1 + C

  SBC = A + !V + 1 - 1 + C

  SBC = A + !V + C


Ah yeah you’re right. I had a brain fart and got my -V and !V confused.

At the time of writing I was sure I had it right too!

I’ll fix that up. Thanks :)


That’s an excellent write up and some well written code! I might use this as a reference if I ever try to write an emulator in Rust :)


I noticed that both this and pcwalton's CPU emulation simply perform the operation, and increment the cycle count.

I'm curious: are the cycle-accurate reads and writes (and false reads and writes -- see https://github.com/zellyn/a2audit/issues/5) not necessary for NES emulation?

I've never emulated a NES, only an Apple II. The cycle-accurate reads and writes are only mostly unnecessary, but they can make a difference, as the discussion in that linked issue shows.


Yes, it can matter, though you can get far without it. NESticle was plenty popular back in the day without anything resembling proper cycle accuracy.

Note that my NES emulator doesn't even get the total number of cycles right, and as a result Super Mario Bros. 3 crashes. (It really isn't a good emulator!)


Sort of related, there's a great cycle-accurate (well, that's the goal I think) emulator project for the 6502 recently discussed here:

https://floooh.github.io/2019/12/13/cycle-stepped-6502.html


Interesting. I see they unrolled their switch statement to include a cycle count.

I just called a "tick" function at the end of each cycle, which is definitely slower: https://github.com/zellyn/go6502/blob/master/cpu/opcodeinstr...

I didn't get to the point of caring about interrupt correctness, but I did port the gate-level 6502 emulation to Go, and run Klauss Dormann's fairly comprehensive test suite on both in parallel, checking that all memory reads and writes matched.


Performing the instruction is a 90% solution. You do a 90% solution for the CPU, a 90% solution for the PPU, a 90% solution for the sound, etc, and implement the dozen or so most common backswitching chips. Then 75% of games are playable and 40% of all games are bug free. Maybe less.

Then you start working on those last 10%s, which includes cycle accurate emulation. As it happens, there's a lot of work with goes into the final 10% of the various NES subsystems. Far, far more work than the first 90%.

As for what percentage of games need cycle accurate 6502 emulation I couldn't tell you.


There are a few games that rely on one specific false read:

> They rely on the dummy read for the sta $4000,X instruction to acknowledge pending APU IRQs.

https://wiki.nesdev.com/w/index.php/Tricky-to-emulate_games

MMC1 ignores consecutive writes, and one games relies on the initially written false value:

> Bill & Ted's Excellent Adventure resets the MMC1 by doing INC on a ROM location containing $FF; the MMC1 sees the $FF written back and ignores the $00 written on the next cycle.

https://wiki.nesdev.com/w/index.php/MMC1


It's equivalent to Apple II. You can get pretty far with the naive solution of atomically completing the instruction, and incrementing the cycle count appropriately, but there'll be some issues until you switch to a cycle accurate state machine with all the warts (false accesses, etc.)

NES actually gets a bit grosser because the PPU runs (IIRC) five cycles for every three CPU cycles or something like that, and you need to handle that case. I've seen some emulators that just run it off of a 15x clock, and sparsely run cycles on the CPU and PPU (but most of the master cycles don't do any work).


I think the NES PPU:CPU clocks are 3:1 for NTSC and 16:5 for PAL. The exact speeds are integer ratios of the color subcarrier frequency. The master clock runs at 6x the color subcarrier frequency, which gives you 12 different hues in the color palette.


Oh, yep, that's the ratio.


literally fizzbuzz


Sometimes, especially when dealing with interactions between chips, even sub-cycle behavior needs to be emulated for correctness. I suspect for basic emulation though, rigorous cycle accuracy is not really necessary for most software...


I wonder if the name "Nestur" is a callback to "Nester" from the "Howard & Nester" comics in Nintendo Power: https://hn.iodized.net/main.htm



Ahh... a neat coincidence anyway. :)

Thanks for letting me know!


I see an SDL dll file in the repo, and on my mac the build fails with `library not found for -lSDL2` -- is there something special I need to do to get this to build, or is it Windows-only?


Ah, `brew install sdl2` fixed it




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: