Accuracy takes power: one man’s 3GHz quest to build a perfect SNES emulator

T-hawk · on May 17, 2012

Or consider Air Strike Patrol, where a shadow is drawn under your aircraft. This is done using mid-scanline raster effects, which are extraordinarily resource intensive to emulate.

So what's really going on here is that the emulator must emulate not only the SNES hardware, but also the television. Video game emulators have had to deal with this for a long time, to varying and increasing levels of accuracy. Televisions (especially analog CRTs) have quite a bit of emergent behavior in processing the display input that is not easily captured and replicated by your typical frame buffer. Interlacing is a major such phenomenon; most emulators still simply treat the 60 fields per second as 60 distinct frames rather than interlacing them. (And younger players are used to seeing the games that way, never having played on original console and TV hardware.)

The ultimate example of this effect occurs in emulating games that originally used a vector CRT. An emulator writing to a raster frame buffer simply can't replicate the bright, sharp display of a real Asteroids or Star Castle or Battlezone machine.

TV behavior even goes beyond electronics. Consider the characteristics of the phosphor coating and the persistence time between refreshes. Some games made use of effects where that characteristic mattered, so if you want to emulate that with high fidelity, yes that will take a lot of CPU cycles.

aardvark179 · on May 17, 2012

That particular problem isn't a case of emulating the television, but rather accurately emulating the console's video hardware and its interactions with the rest of the system. If one were simply interested in emulating the television's behaviour then you could construct a frame buffer based on the visible sprites and postprocess that (possibly in conjunction with several preceding fields).

If the console allowed sneaky things to be done on each raster line (like changing the colours) then constructing that frame buffer becomes considerably more resource intensive, as it must now probably be done line by line with the correct timing wrt. the rest of the emulation.

If you could pull tricks mid scanline (presumably through careful timing after an interrupt) then the problem will be a whole lot worse, though I'd guess it can be reduced by recording changes to the relevant hardware registers along with timestamps in the emulation so that the timing of your scanlines' construction becomes less of an issue.

T-hawk · on May 17, 2012

You're correct, this particular problem can be handled with sufficiently sophisticated frame buffer logic. I was generalizing from that to other concepts where emulating the television or its signal processing would be required.

I'll give you another example. On the Atari 2600 game console, the vertical sync is software controlled. The software is responsible for enabling the vertical sync pulse. This can be done 60 times per second as standard -- or you could play tricks with it. Suppose you strobe it at a different or even irregular rate. On an analog TV, the picture starts rolling vertically. That breaks way outside the sandbox of a framebuffer, with signal being displayed in overscan areas, and during the normally-blank retrace interval resulting in ghosting effects. (No commercial game did that, but it's been done in tech demos, and conceivably a horror game could do it intentionally for mood.) To produce that same behavior on framebuffer-based hardware, you need to emulate or at least approximate the workings of a TV's vertical sync logic, none of which appears in the console itself.

(I know this from experience, I wrote an Atari 2600 game: http://www.dos486.com/atari/ )

Keyframe · on May 17, 2012

I recently bought 'Racing the beam' book which goes into Atari 2600 VCS details and programming - http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&... I recommend this book to everyone interested in this topic.

near · on May 17, 2012

> I'd guess it can be reduced by recording changes to the relevant hardware registers along with timestamps in the emulation so that the timing of your scanlines' construction becomes less of an issue.

This would be possible in most cases, but the SNES throws another problem at you: the video renderer can set flags that can affect the operation of the CPU. Range/tile over sprite flags, H/Vblank signals, etc.

In my model, I chose to forgo timestamps because they are very tricky to get right with subtle details. Instead, I render one pixel at a time, but I use a cooperative threading model. Whenever the CPU reads something from the PPU, it checks to see if the CPU is currently not caught up with the PPU. If so, it will switch and run the PPU. The PPU does the same with respect to the CPU.

Even with that, all the extra overhead of being -able- to process one pixel at a time knocks the framerate from ~240fps to ~100fps. And it fixes maybe a half-dozen minor issues in games for all that overhead.

This is because scanline-based renderers are notoriously good at working around these issues. There are lots of games that do a mid-scanline write in error, but only a few that do more than one on the same scanline. So all you have to do is make sure you render your line on the correct 'side' of the write. We actually took every game we could find with this issue, and averaged out the best possible position within a line to run the highest number of games correctly. Other emulators take that further and can make changes to that timing on a per-game basis to fix even more issues.

Air Strike Patrol's shadow is actually the only known effect where two writes occur on the same line, and there is no one point that would render the line correctly.

chocolatebunny · on May 17, 2012

Is anyone else reminded of the "copper" effect people would do back in the day where they would cycle the colors of a screen in sync with the horizontal refresh of the monitor to create bars of color the oscillate up and down in really cool patterns?

T-hawk · on May 17, 2012

Indeed. I did copper too, in DOS x86 assembly. Some programs used it to practical effect: you can exceed 256 colors in an 8-bit framebuffer by swapping palette values mid-screen or mid-scanline.

In fact, every Atari 2600 game is a copper effect. The 2600's graphics chip is one-dimensional, working with only one scanline at a time. To display a picture, the software must run in lockstep as the electron beam traces down the screen, changing sprite bitmaps and colors and positions each scanline as appropriate. In other words, the 2600 literally uses the phosphor on the physical TV screen as the frame buffer. No surprise that this was tricky to emulate, and why 2600 emulators took longer to reach usable compatibility levels than emulators for the later more powerful Nintendo systems.

dspillett · on May 18, 2012

> Indeed. I did copper too, in DOS x86 assembly. Some programs used it to practical effect: you can exceed 256 colors in an 8-bit framebuffer by swapping palette values mid-screen or mid-scanline.

This was used on the BBC Master enhanced version of Elite (and some other games or the era) to get a best-of-both-worlds choice of the Beeb's display modes. The bottom third of the screen was in mode 2 (low res, 4 bit colour depth (well, 3 bit plus flash-or-not)) to get the higher colour variation for the control displays and the top two thirds were in mode 1 (twice the resolution but only 2-bit colour depth) to get the higher resolution for the wireframe graphics.

tripzilch · on May 17, 2012

I made copper bars once, just for having done them. I guess by the time I got into democoding (96/97 or so), they didn't impress as much anymore. It was cool that they were technically full-colour in a 256c screenmode, but apart from that they were just horizontal coloured bars to me :)

However there was another very useful trick to changing colours wrt sync. Basically you wanted to have all the gfx drawing done before the vertical retrace (which is quite a bit longer than the horizontal one), then flip the buffer (during) so you'd get a flickerless display at full framerate. Now if you'd change palette colour 0 (background, including screen edges) to red right after the flip, and then back to black again after your drawing routines are done and you begin waiting for the vsync again, you got to see the top of your screen's background red, up until some percentage of the screen height.

This was basically your performance meter. Code more complex routines and the red area becomes bigger. Add even more calculations, it gets to the bottom of the screen, and when it gets too far you won't be done calculating before the next vsync and your framerate drops to half.

Some times I even micro-optimized bits of assembly code by marking the position with pencil on a post-it on the side of the monitor to see if switching around some instructions would make it go up or down a few millimeters :) It really was that stable (given you did exactly the same calculations every frame--which is often the case for demos, but probably not for games). That is, until Windows came along: multitasking meant you were going to miss that vsync every once in a while and the red bar jumping up and down like crazy.

hackermom · on May 18, 2012

The proper term for this effect is "raster bars", from how you traditionally did the effect by waiting for the raster line register to hit a certain vertical position on the screen, and then changing f.e. the background color of that scanline, then wait for the next line, change the color again etc. The name "copper bars" came out of the Amiga scene from how you could easily and without involving the CPU do this effect (and much more) on the Amiga using one of its co-processors, nicknamed the "Copper".

sp332 · on May 17, 2012

Lots of conversation from when this was posted 9 months ago: https://news.ycombinator.com/item?id=2864531

gouranga · on May 17, 2012

Perhaps we should start using CardBus and PCI-E based FPGA cards and go for hardware simulation rather than software emulation?

Performance is far easier to achieve there.

The devices aren't exactly expensive either.

near · on May 17, 2012

Yes, since the primary problem is trying to simulate dozens of unique processes that all run in parallel with a single CPU thread, modern processors are woefully inadequate for the task. Emulation is possible by way of sheer brute force, with processors being thousands of times faster than the original systems.

Multi-core seems promising, but unfortunately even 4-8 threads aren't going to cut it here. Each emulated chip can have several logic units (eg a four-stage pipeline, an ALU, a DMA unit, etc.) And even then, CPUs aren't meant for this level of synchronization. You can only lock and unlock a mutex between two threads at about 100,000 times a second. And even if that were faster, what's going to be more a burden? Requiring a 3GHz single core CPU, or a 1GHz octa core CPU?

FPGAs are great for writing emulators (although I wouldn't say as easy), but the problem with this is even worse than the octa core CPU. Until more people have the hardware than a 3GHz single core CPU, it will continue to be a worse solution for the number of people your software can reach.

Karunamon · on May 17, 2012

Byuu himself! :D

It's great to see you on here. I've read a lot of the writeups on your site and they're very, very fascinating stuff. BSNES/Accuracy has become my favorite emulator as well, when I can spare the clock cycles.

So thanks for being awesome, and doubly thanks for your attention to detail when nobody else seems to think its important.

evan_ · on May 17, 2012

if you're going to add hardware you might as well just get a real SNES from eBay and use a memory cart.

Wilduck · on May 17, 2012

The issue is that buying an SNES becomes decreasingly viable as time goes on while using an FGPA becomes increasingly viable. I don't know if we've reached the point where the balance shifts yet, but presumably we will at some point.

gouranga · on May 17, 2012

Also, the medium on which the software is delivered is not the problem either. It's just information after all. The problem is accurately emulating the software in software which is easier achieved by emulating the harware in hardware :)

factorial · on May 17, 2012

The SNES is now close to 25 years old, and the older the hardware gets, the more serious the issue of "bit rot" will become. Among NES collectors, it's not rare to find people lamenting not being able to play their original cartridges anymore. Those cartridges don't last forever. Thus, emulation serves an important archival purpose, and without it, those games may be lost forever.

mturmon · on May 17, 2012

Nice perspective.

Conservators in museums face the same issues with video art and installation art. The video equipment becomes obsolete and breaks down. The light bulbs are no longer made (an art-prep person for a recent MOCA retrospective had to drive a van from LA to Arizona to lay hands on the last available batch of a certain type of florescent tube). With some minimalist installations, just changing the bulbs can make a huge difference.

Here's a little piece on conservation challenges with a piece by Nam June Paik, who pioneered wall-size video installations:

http://www.digitalartconservation.org/index.php/en/exhibitio...

"In the case of Internet Dream, the splitting system was the Achilles heel of the installation. The video splitter used since 1994 was produced by the South Korean manufacturer DASH. Since the manufacturer helped Paik with the technical realization of many of his works (including Megatron/Matrix in 1995), it is likely that this device was specially constructed for the installation. By 2008, the device’s shutdown function had become problematic, probably a sign of more serious loss of function to come."

You see the same dichotomy between fine-art creators versus conservators, as you see between video game artists and emulator designers: "whatever works in the moment" freedom versus obsessive attention to detail.

rikthevik · on May 17, 2012

Aren't there a bunch of knock-off systems coming out of Asia these days? I thought I saw a box that could be play NES, SNES and Genesis original cartridges down at my local games shop.

voltagex_ · on May 17, 2012

IIRC these clones may not be able to play all games - especially ones with add-on chips (Super Mario RPG is the only example I can think of right now)

duxup · on May 17, 2012

How about a SNES card for my PC with all the hardware on it? ;)

DanBC · on May 17, 2012

Have you heard of the "Amstrad Mega PC"? An Amstrad 'ibm compatible' with a Sega Mega Drive (US: Genesis) built in.

(http://en.wikipedia.org/wiki/Amstrad_Mega_PC)

£999 in 1993 is ridiculously expensive.

Keyframe · on May 17, 2012

Creative Labs made a 3DO like that back then http://en.wikipedia.org/wiki/3DO_Blaster

jiggy2011 · on May 18, 2012

That's about what you expected to pay for a PC back then. My family bought a 486 DEC PC with a 15" monitor and no sound card for around a similar price (granted DEC computers had much higher build quality than amstrads)

sp332 · on May 17, 2012

An FPGA can emulate a lot of different systems, but a SNES is just a SNES.

joshAg · on May 17, 2012

I think the fact that the SNES hardware is proprietary would make it very tough to catch all the corner cases (since we'd be reduced to tons of test and check cycles or using SEMs to examine the chip).

Plus, many cartridges has supplementary chips, so the fpga would also have to include all the different chips used, and all these are proprietary as well.

wtallis · on May 17, 2012

But all the hard work is done: bsnes emulates all of those chips. Sure, it's not trivial to convert bsnes to vhdl, but there's no need for further reverse-engineering unless you want to make pin-compatible replacements. The bsnes code and documentation contain all the information necessary to make an FPGA into a SNES-on-a-chip.

andrewguenther · on May 17, 2012

FPGAs not expensive? Please let me know where you shop...

jacquesm · on May 17, 2012

It depends on the size. Big FPGA's are amongst the most expensive chips that you can buy ($10K and up), but small ones are affordable. The amount of logic in an 80's era game computer should be within the gate budget of a small to mid sized FPGA.

See http://www.fpgaarcade.com/ for many examples of this.

gouranga · on May 17, 2012

Back in '01 (the good old days), we had volume prices of $40 for Xilinx parts which were big enough to get a PowerPC CPU in it.

jacquesm · on May 17, 2012

What were you building?

gouranga · on May 17, 2012

Surface to air missiles!

jacquesm · on May 17, 2012

Holy crap :)

Remind me to stay on your good side. I suddenly realized there may be bad consequences to 'checking in', such as punching out.

simcop2387 · on May 17, 2012

Well for small sized dev boards you can usually find them for about $100-$150. What I don't know is if this would have enough capability to do anything like this with it. I'd imagine the 6502 would be doable, but i don't know about the snes.

radarsat1 · on May 19, 2012

The DE0 Nano dev board is $86 on digikey.

neverm0re · on May 17, 2012

What I find interesting is that most attention is being spent on BSNES when the subject of cycle accuracy and proper emulation of hardware comes up, like this is a new or novel discovery. BSNES is hardly the first to aim for this sort of goal, nor is its efforts as close to complete as the efforts spent by others. The NES scene in particular is now down to the level of breaking out scopes to measure response times on real NES hardware to get the sort of information they need to further push their level of accuracy up. Even 'obscure' systems like the MSX have had this sort of cycle-accurate push in emulation.

It's no longer the 90s and people shouldn't even have to mention NESticle existed in an article unless they're that out of touch with trends of the last decade.

near · on May 17, 2012

> What I find interesting is that most attention is being spent on BSNES when the subject of cycle accuracy and proper emulation of hardware comes up, like this is a new or novel discovery.

I never intended to convey that this is a new idea, sorry. I do believe my cooperative threading model is a new concept in emulation, but it's still a ridiculously old one in computer science.

> The NES scene in particular ...

... is not as rosy as it seems. Having recently written an NES emulator, I can tell you that they're far from completion. For just one example, all of those mapper chips are basically a big unknown. Those chips have ways of detecting scanline edges to simulate IRQs and split-screen effects. This is done by monitoring the bus for certain patterns from the cart-side. And the details of this stuff? Completely unknown. Not even Nestopia nor Nintendulator attempt to simulate this: they just have the PPU -tell- the mapper when a scanline edge is hit. I could be wrong, but I believe I'm the first to even attempt to have the mapper detect scanlines by monitoring the bus.

And we're talking chips that are dozens of times less complex in the worst case than some of the SNES coprocessors.

> It's no longer the 90s and people shouldn't even have to mention NESticle existed in an article unless they're that out of touch with trends of the last decade.

The important part of that article is that the SNES was (and largely still is) in the NESticle phase of development, which was the purpose of bsnes. Unfortunately, just as we go from NESticle (25-50MHz) to Nestopia (800MHz required); ZSNES (200MHz) to bsnes (2-3GHz) needs a huge jump. But this time, that jump is hitting the wall of where most computer users are at. While people didn't notice Nestopia because everyone has at least a 1GHz processor these days, bsnes was not so fortunate.

So the article was more about explaining what that level of overhead is required for.

leeoniya · on May 17, 2012

source: http://byuu.org/bsnes/accuracy

this is also awesome, posted a while back: http://byuu.org/articles/emulation/snes-coprocessors

tluyben2 · on May 18, 2012

There are subtle bugs and issues with timing, but the worst, for me, is that the games are not accurate. I play a lot of 80s games on original hardware, but also (when i'm not at home) on my laptop/ipad/android in emulation. Games I have been playing for almost 30 years are locked solid in my mind; every enemy, path, timing has been set in my brain. I can play those games blindly on original hardware. But not on emulators. For people playing these kind of things for the first time this is not an issue; for me it's not being true to the original. The quirks which were in there are supposed to be in there. Horizontal shooters which had visual/audio issues in the original game because the end boss was actually too big for the poor Z80 + VDP to do fast enough, is now suddenly smooth in the emulator in case you lose the edge of attacking it when the computer is in pain. More fun or not is not the issue here; correct emulation is important to preserve all millions upon millions of carefully crafted assembler instructions on the platform of choice. Until we have this working well, I'll keep buying old computers for peanuts just to make sure.

normchan · on May 17, 2012

follow up to that story here: http://www.tested.com/news/44376-16_bit-time-capsule-how-emu...

kristofferR · on May 17, 2012

That's a great article, thanks!

zxoq · on May 17, 2012

What worries me is if future generations will be able to enjoy the games of today. Will it ever be feasible to emulate a PS3 to the level demanded here? Will my grandchildren in 50 years be able to play GTA 6 on a PS4 emulator? Processing power does not appear to scale to allow this, and there will barely by any of today's consoles still alive by then (also, I doubt 2060s television sets will have HDMI input).

comex · on May 17, 2012

As consoles get faster, accurate emulation becomes harder, but high-level emulation becomes much more accurate: programs get higher level, more dependent on library functions and (much) less on exact timing; hardware gets more uniform and programmable.

The Dolphin Wii emulator isn't perfect - it has the obscure bugs mentioned in the article - but unlike SNES emulators, it doesn't have a lot of game-specific hacks.

Retric · on May 17, 2012

I don't think PS3 games use the same degree of hardware timing hacks so you can probably get high degrees of accuracy without the same level of overhead.

mappu · on May 17, 2012

Not to mention DRM.. it's a serious problem.

drivebyacct2 · on May 18, 2012

Not in the current generation of console optical media. In the future it will be a rotten travesty as no one will ever physically own their games.

Though DLC is locked up as future consoles' content is likely to be.

radarsat1 · on May 18, 2012

Hm. The reason synchronization is needed is because the emulator must simulate the chip behaviours over time, and ensure that they act as if they are all behaving in real-time lock step. Might an alternative approach be to simply simulate each chip independently, in parallel, without synchronization, but "somehow" ensure that their behaviour is exactly corresponding to real time, so that time does not need to be "faked" by synchronization?

Yes, this is probably a recipe for disaster, and I have little idea what mechanism could be used to ensure time accuracy, but just a thought. (Perhaps an RTOS?) I also wonder what would be possible with FPGAs, whether programmable logic might provide a better approach to emulating these chips in synchrony.

Whodi · on May 17, 2012

Great find. This is the most in-depth article I've read on this subject. I actually use this as an open-ended interview question at the game company where I work. The PlayStation 2 has a bunch of quirks with floating point numbers because it doesn't follow the IEEE standard - for example, floats don't become infinity when they overflow, they just get clamped to the maximum possible float. Now you can't use your FPU for the tens of thousands of floating point calculations that are happening per frame. Sure, your processor is faster, but it's fighting with one arm tied behind its back.

formic_ · on May 17, 2012

I think the author is wrong

It doesn't take a 3000mhz machine to emulate a ricoh 5a22

Look how Nintendo has virtual consoles working on a 730 MHz power PC - the wii

artmageddon · on May 17, 2012

I don't think you're understanding the point of the article. Yes, emulators have been around for quite a long time, but none of them are perfect, which is the purpose of this guy's undertaking to create this emulator requiring a 3GHz CPU.

The Wii isn't going to be around forever: those will start to fail in a number of years. In addition, not every single NES game is available on the Wii virtual console, anyway.

JonnieCache · on May 17, 2012

The emulators on the wii virtual console are not accurate at all, not in the way that byuu is talking about. Also the games get heavily patched for that purpose, as well as the emulator being patched per-game.

artmageddon · on May 17, 2012

I think I may have only bought the Donkey Kong Country series of games for the Wii VC. It worked well enough, but I imagine that there were indeed issues with the emulation on the Wii. I didn't do any research on it so I couldn't comment on that either.

The one thing I realized after the last few years is that you can't count on game companies to do backwards compatibility forever. Microsoft (understandably) gave up on patching the 360 over and over again to expand the original Xbox library of playable games on the 360, and Sony stopped selling the PS3 models that had the PS2 hardware directly on-board(not to mention "Other OS" support, but I digress).

tedunangst · on May 17, 2012

Back compat is only useful in a business sense in that consumers don't have to decide between new console and no games or old console and lots of games. Once the new console has lots of games, that simply becomes the only option.