Hacker News new | past | comments | ask | show | jobs | submit login
Overhauling Mario 64's code to reach 30 FPS and render 6x faster on N64 [video] (youtube.com)
699 points by kibwen on April 20, 2022 | hide | past | favorite | 260 comments



This is a good lesson about the benefit of time when it comes to software development. Mario 64 could have been a much much better game on the same hardware, but the time cost would have resulted in a release late enough that the entire game might have been considered irrelevant.

Game consoles have had the benefit of the platform freezing for a few years allowing improvements to accumulate and later titles utilizing knowledge gained from the earlier titles.

Much of the software we use every day could benefit from this level of scrutiny but the pressure to deliver on time means we won't get to see such benefits unless it becomes a crazy persons labour of love.


There are externalities to consider that make a later release less likely, too - it was a launch title, to match the pack-in of SMB in the NES and Super Mario World as one of two games available when the SNES launched.

Wikipedia’s entry on SM64 tells me that the N64’s launch was delayed because Miyamoto wanted more time to polish the game. I don’t think a later release would have been possible. This was the flagship title, the system seller. Come play Mario. In 3d.


>This was the flagship title, the system seller. Come play Mario. In 3d.

And I think they were right. I bought it just because of Mario64. Otherwise I'd have gone for Playstation.


I decided to buy it for Shadows of the Empire, but Mario64 was actually great (while SotE was not).


You take that back. SotE was rough, but had some excellently varied gameplay and interesting, fun levels with cool enemies. Also some losers in there, but every great game has a bad level. :)


I watched the video through only once, but unless I'm mistaken (the video is a little confusing on this point), the claim is that the original game did run consistently at 30fps (the design framerate of the console), and the point of these hacks is to improve framerate for mods and devices that can play the game at higher rates. Unless I misunderstood something, it sounds like the original game was exactly as good as it needed to be, as released. And that's before you get to the fact that the mods require the RAM expansion pack.


The original Mario 64 would suffer lower framerates in certain areas (e.g. Dire Dire Docks). We can also imagine that if the artists had had more headroom to work with, then they would have spent it on making levels larger and more detailed.


Is it only a techno limitation concerning the size and details of the levels ?

Mario 64 was already pioneering basically everything in the 3D platformers, and maybe even 3D games as a whole.

Given it was a launching title, they had to stop somewhere and ship the thing. So, like any software with ever, they had to make tradeoffs.

And, well, here we are talking passionately about this game 26 years after. For me it’s an amazing proof that they made amazing tradeoffs.

Because, yes, we now have plenty of time to analyse the thing and find issues with the game. But, the 6yo me, the player me, they have nothing to criticize about this game.

For me it is still a perfect masterpiece given its era and technical context. I don’t think the game could have been really better than it is without breaking something and I’m pretty sure that the engineers at Nintendo considered the game as « finished » and liked the result themselves. It shows in the game. The better would have been the enemy of the good.

But in the meantime you are also right because Mario 64 DS was a really good extension. I miss this game, it really deserve to have its remake. But this version hugely benefited from the evolution of the Mario universe and the technical superiority of the DS so I’m not even sure it could have been done in the 90’s.

I hope we’ll today be able to play a newer version of this game. Be it official or not, I crave for SM64DS with the original controls.


I read something that they didn't compile with compiler optimisations, doing so would have fixed the water level.


The video also addressed that the game shipped with the compiler in debug mode. That alone would have helped.


The whole "Nintendo forgot to set the O2 optimization flag" thing has been somewhat debunked, or at best it's misleading. ModernVintageGamer did a video discussing the topic that's pretty well-researched

https://www.youtube.com/watch?v=NKlbE2eROC0

To summarize what he says: since the game was a launch title, it was likely being compiled with an early, less-stable compiler/SDKs, with known bugs that wouldn't be fixed until later (he even cited some official developer documentation that mentions this). So setting that flag could have caused known issues/instability at the time so they just left the optimization out.

And the other thing is the game is using a lot of libraries that were indeed compiled with the O2 flag, and the performance drop by removing those is far more significant than the tiny gain by adding it to the top-level of the makefile where the rumor suggested it should have been added.


Quake launched a day before Mario 64, so I think 3D games would be fine without M64.


I have nothing against Quake, it’s also a wonderful game. But Mario 64 solved more complex problems of the 3D nature : huge scenes with a lot of (moving) objects, moving in all directions, complex move set, camera, camera, camera…

Quake is a nice game as it is but it’s much simpler when the player is the camera.


More complex problems?

I'm not sure. A sophisticated camera is quite nice, but quake introduced a visibility determination solution that was used in dozens, perhaps hundreds of games afterwards. It also helped pioneer modding (it has a compiled scripting language), and later on had online deathmatch with predictive netcode (so it actually worked over the internet!) and was one of the first games to support hardware accelerated 3d graphics on pc.

It also has lightmaps, all surfaces are textured, and runs in software with no hardware acceleration, z-buffering, built-in perspective correct texture mapping, etc.

Obviously, that doesn't make it more fun to play, but Quake was an incredible technical achievement for the time, I don't think anybody would put Mario 64 in the same basket.


Quake supported cameras, as you put it. People made movies in the Quake engine with literal camera shaped models floating around. Also, Quake levels could be huge, much larger than Mario 64. Take a look at Ziggurat Vertigo [0][1], for example. Mario 64 was much more colorful, though, Quake had an intentionally limited color palette.

[0]: https://quake.fandom.com/wiki/E1M8:_Ziggurat_Vertigo [1]: https://www.youtube.com/watch?v=a72k_XBi2ls


They practically bruteforced all technical problems, using dedicated hardware. And simply ignored some others, like lighting.


The epic music of Dire Dire Docks compensated for the framerate. https://www.youtube.com/watch?v=Zqa2mgjbOIM


The music practically eased you into accepting the frame rate, slowing the player down a little with its calming mysterious chime.



I see your Opus 1 and raise you Billy Cobham - Heather https://www.youtube.com/watch?v=e3E9vx5vVck


Wasn't that because it had compiler optimizations disabled, due to a bug in the compiler that was subsequently fixed after the initial cartridge release?


No, it wasn't compiler flags. Put simply, there are too many objects on the screen in that particilar part of the level, and the engine has wasteful code that computes a certain piece of physics three times for every object on the screen iirc, among other things


That’s particularly why performance in dire dire docks was bad, but Kaze goes WAY beyond just enabling -O2 in the compiler.


It wasn't the only reason, but -O0 was a contributing factor. -O2 alone improves things, but doesn't bring the frame rate in DDD up to 30


Assuming they had the spare capacity on the cartridge and the time to develop additional content. Neither of which is a given.


I think the main thing is that Nintendo had splitscreen multiplayer planned, but performance prevented it.

Edit: Got farther into the video and he talks about this at 14:50 ( https://youtu.be/t_rzYnXEQlE?t=890 )


L is real 2401


The video is confusing on this point.

The focus is on all the cool improvements and it seems like there is a lot to like.

But at 6:40, the dev says their solution would not work on the stock 4MB ram included w the N64. It is kind of snuck in there.

It’s kind of an important detail, because like the GTA loading screen fix last year it’s way more interesting when stories like this have a What If?! slant to them.

Another commenter mentioned this below, but it’s buried.

https://news.ycombinator.com/item?id=31104112


That's only for one of the many optimisations he did, making use of different sections of memory to improve throughput. Many (all?) of the other ones don't increase the memory requirements. Quite a few (such as re-rolling unrolled loops) should decrease code size (though that won't improve memory use on a ROM cartridge based system) improving cache efficiency.


About midway into the N64 lifecycle a ram expansion card (expansion pack) was released. His modifications sound like they still play on a stock N64 but the expansion pack is needed. This was fairly typical with higher end games made by Rare. Even as a purist, this added detail doesn’t really detract from the outcome of the project to me.


Almost all games worked without the Expansion Pak ("Pak" intentionally spelt without a 'c'). In fact there were only 3 released games that required it[1] and even some of those were still playable without it, you just lost some content (eg Perfect Dark).

If this Mario 64 mod requires the use of the Expansion Pak then it's definitely not typical of other N64 games and a point that, like the GP, I felt needed more emphasis rather than a passing remark that was very easy to miss.

I'm not taking anything away from the incredible work that developer has done though. It's impressive and really interesting to see. From a hacker perspective it makes total sense to use the Expansion Pak because most retro gamers who might be interested in this mod would likely already have one. But from an authenticity perspective, this is a little bit of a cheat.

[1] https://en.wikipedia.org/wiki/Nintendo_64_accessories#Expans...


For perfect dark the content you lost was the main game, so it was pretty essential for it.


Indeed, however and, somewhat counter-intuitively, the multiplayer game did work without the Expansion Pak. But arguments about the completeness of Perfect Dark aside, my point is that Perfect Dark is still the exception in that the vast majority of games worked without the Expansion Pak. Which is contrary to the earlier commenter who said a lot of N64 games required it.


I believe the game can run at 60 FPS on a real N64 with the Expansion Pack installed using his modifications. I don't think all these changes, and mods, are limited to usse on emulators.


Noob question, how does one get his code to load on real N64 hardware?


https://krikzz.com/our-products/cartridges/ed64x7.html

These things are amazing for retro gaming enthusiasts because they let us play patches, mods, homebrew, and hard-to-find games on real hardware. Not that emulation isn't also wonderful.


An anecdote is that Krikzz was a Ukraine company. He moved to Spain when war begin and is slowly starting to ship stuff again.

Its hardware are amazing.


the framerate dropped in certain areas, it ran mostly consistently at 30fps but not perfectly consistently. These mods make it so that even in those areas it still maintains 30fps.


Target FPS varied between N64 games. F-Zero X, Super Smash Bros. and some others ran at 60 FPS, while Zelda only ran at 20 FPS (17 FPS in PAL regions).


One reason for the gap in those specific titles is FZX and SSB use 1 cycle color combiner. Zelda use 2 cycles.

SSB also have a very specific Z management. Levels are drawn without Z buffer, from bottom to front, then the character section (1 meter width) is drawn with Z buffer enabled, and the rest of the from level is drawn without Z buffer, once again from bottom to front.

I never inspected FZX but I suspect its Z management is managed is a similar way.


> ... unless it becomes a crazy persons labour of love

Or there are valid business reasons to keep chipping away, in particular on performance-per-dollar problems at scale. If you're spending $1M/month on hardware to run something, each 10% win is worth enough to employ a couple senior engineers.


Many senior engineers who operate systems which spend 1M/month make 500k yearly

1M * 12 * 10% = 1.2M

500k * 12 * 2 * taxes >= 1.2M

How many 10% wins do they have to achieve in what timeframe to have a good ROI on micro-optimization, considering that these seniors could be producing stuff that allows for multiple times their salary in revenue?


Improvements are one-time investments of engineering time, but ongoing savings in infra costs. Consider also that these savings also permit scaling to more customers or more work.

Suppose you have two engineers that total to $1M of personnel cost. If they deliver a cost savings of 1% per month (compounding) the savings-per-month will have surpassed their pay per month; you will see your infra costs at 85% of what they were after 16 months, and on month 17 you will have made a net profit on those engineers, where the savings they got you has totalled more than what you've paid them.

If your business is growing, you could project your infra costs to grow beyond $1M per month, and the numbers become even more favorable.


Unless a company is running an engineering team composed of monkeys, it will be hard to achieve 1% optimization per month, even without compounding. There are a lot of optimizations which look good on paper but in practice bring other unforeseen cons

Anyway, a profitable software company will be looking at getting a big multiple of 10x the salary of their engineers


I work in perform engineering, although in a client rather than backend role. We usually do better than 1% a month. In fact we do this even as >1% of regressions make it into the build from the software development process.


It depends what you're optimizing. Mature c++ codebase that is routinely profiled? Large legacy codebase with few tests? Mixed workloads?

All hard. But there's also a lot of easy cases out there - even good engineers might code like monkeys if they're under time pressure and trying to grow the business.

I've done a lot of performance work, and while of course no one individual could achieve a 1% savings per month on the entirety of FAANG compute loads, there is lots of code out there that nobody thought would run at scale or be on the critical path of something important, but now it is and it has gotten expensive.


You shouldn't multiply by 12 if the 500k is yearly. Also what's the 2? 2 engineers? The correct math would be:

1M * 12 * 10% = 1.2M

500k * 2 * taxes >= 1M


Someone said "a couple engineers" hence 2. The 12 was a bad copy paste, thanks for pointing that out


As a rule of thumb, I devote about 0.5% of my engineering org to cost optimization. At about 200 people it makes sense to invest a single engineer. By the team you're approaching 800 it makes sense to have a team dedicates to it.

Though cost opt generally involves way more than optimizing code. It's normally about auditing things, creating budget tooling, right sizing, broader architecture redesigns, etc.


I don't see how a rule of thumb could make sense, since the cost of operations can vary so much from company to company.


I mean in this case, I'm confident that the codebase for Mario64 was used as a basis for subsequent N64 titles; when pioneering something like this, you want to spend some extra time during or after development to tweak these codebases.


It was definitely used for at least Ocarina of Time and Majora's Mask, albeit tweaked. I don't doubt that they used it in other titles too.


Much of the software we use every day does benefit from this level of scrutiny. There was a popular post today about changing C++ std::sort for more perf. There's JavaScript engines running 100x faster for a given CPU than they were in the 90s? gcc and llvm optimize more than ever?


But there’s tons of untapped potential that is rarely realized - compare some of the greatest Doom PWADs made today (on the vanilla Doom engine, not on limit removed engines) vs the original game.

They were technically possible but there wasn’t time to do them and still get the game out on schedule.


Or simply nobody thought that could be done things like that (for example, the classic "invisible" sector bridge), before a PWAD did. Plus the game was released targeting to work under a 80386, so the stock maps was limited to ran at an acceptable performance on these kind of CPUs.


Yeah, “limits” are sometimes in hardware and sometimes conceptual. Look at the demoscene - most of what they do is designed to run on period hardware, but far surpasses anything anyone did at the time (it’s the whole point).


Check FastDoom too :).


Super Mario 64 was a launch title for the N64. One of two launch titles in North America. There really was no way to delay the game without delaying the console.

With that in mind, SM64 is a pretty impressive game for a console launch title. It makes good use of the N64 controller, performs well, and is pretty fun. It's not the best game of all time but it's definitely not a bad game.


This is a good reason to stop changing the core frameworks every 5 minutes, and stop building one off infrastructure to solve individual problems, unless there's a really good reason.

When something is stable and widely used, you can do optimization like this, and justify it for real production systems, and everything built on it benefits.


You clearly need to spend more time in a JavaScript team. Knock that nonsense right out of you :-D


JS: The mediocre language made wonderful by it's great ecosystem.... with a dev community constantly trying to use random crap instead...

Seriously guys we have great utility libraries and bundle optimizers... why are we still piecing together micro libraries like this is an old UNIX mainframe?


This is completely true for non -cloud, on-prem hosted code/programs, but this notion disappears when programs/software is hosted in the cloud, and updates can continuously occur


I don't know if there's a term for this. The limit of reach any task has. Everything has a deadline and we all live with partial results in a way. It's like civilization internal resistance (with a cycle of survival on top)


It's not completely about time, and as much the benefit of 25 years of experience, not writing for brand new prototype hardware, not writing a brand new 3D engine when 2D had been the norm for decades.


That's the part about batch rendering. Everything else there is architecture specific and typically not a problem on modern hardware anymore, unless you really force it to its limits.

The coders accidentally optimizing to reduce GPU compute load when there's GPU compute to spare and memory bandwith is limited, for example.


This is why honestly developers need to stop seeing issues as technical problems and more as being social ones (which they often are).


This is why we have product software engineers, and UI designers. The UI is a social problem and the purpose/goal of the product usually is. The guts that make that UI and achieve that purpose are very rarely social.


"roll those loops back up and don't compile in debug mode to achieve significant performance gains" is not the takeaway I thought I'd have going in, heh.


Unrolled loops trade instruction cache for cycles. Whether it's good depends on what your bottleneck is, in this case ram access is a bigger bottleneck than cpu due to being shared with the rendering coprocessor. I wonder if we'll hit this crossover point again in the future.


And the code N64 was particularly unrolled for whatever reason. IIRC, there was one game that bit me when writing a JIT N64 emulator because there was a single code stream with no branches that was larger than multiple MMU pages. My JIT kind of assumed that actual code in practice wouldn't be larger than the actual N64 I$ without branching. Womp womp.

But hey, I guess I don't want to be judged on my mid-90s code either.


> And the code N64 was particularly unrolled for whatever reason.

My naive assumption is that they performed the majority of testing with debug code (as you would). And, since they had exactly one chance to get things right (being a physical cartridge), they couldn't afford to do all the testing again, with the new/different binaries generated with debug turned off.


My understanding is that the massively unrolling loops thing had to do with how different branches felt versus previous consoles. Compared to filling an I$ line (somewhere in the many dozens of cycles on an N64), a branch mispredict was very cheap. However branch mispredicts were very visible. Delay slots were present in the ISA to decrease their cost and the pipeline stalls were noted in pretty much all literature on RISC at the time. On top of that, the N64 really didn't have great performance monitoring tooling, but was complex enough of a memory hierarchy to have really benefited from such tooling had it been available. It's pretty hard to understand the system effects pieces of code have without that.


From what I understand (as I didn't start writing code until the mid-2000s), loop unrolling was a very highly regarded optimisation of the early 90s.

And it was a very long time before programmers (and compilers) realised just how bad of an idea loop unrolling was once we entered the era of superscalar processors with instruction caches and half-decent branch prediction.


It never occurred to me that loop unrolling could end up being a de-optimization due to compiler optimizations!


The things the parent comment referred to (instruction caches and branch prediction) are in the processor hardware rather than other compiler optimisations.


To be fair early 90s branch prediction was garbage. Back then it seemed like every new processor release was accompanied by a note on how branch prediction was improved by 80-90% compared to its predecessor.


To be even more fair, counted constant sized loops never required particularly good branch prediction, especially if the code is well laid out so that your "miss" does literally one more cheap step.

Misprediction there means you computed one more vector or value, not a huge loss compared to a cache miss. Branch misprediction hurts a lot when it forces a cache miss mostly, whether it's for data or for instructions, or ties up a deep instruction pipeline completely with lousy abort, which most CPUs do not have. (See: Nehalem.)


He did so much more than that. The not compiling in debug mode was well known before this video, and he addresses that. His speedups are independent of not compiling in debug mode.


I know, and I hadn't appreciated all the ways that doing something less CPU-efficient was actually faster due to the way the hardware was architected.

I just actually shouted when he said that if you simply compiled for release mode it ran 30% faster.


It was common belief in that era that you made code faster by making it longer and more complicated. The opposite is true now (because of cache size and deep OoO) - simple predictable loops are great - but even then people tended to overdo it.

Of course, it's a way worse problem that they compiled at -O0.


>Of course, it's a way worse problem that they compiled at -O0.

There was a Modern Vintage Gamer video going into this. Apparently it wasn't a mistake or misunderstanding, but a bug that existed in the original version of the compiler that caused issues when compiled with optimisation. Even then, the main library that did most of the heavy lifting was compiled with o2.

Super Mario 64 was a launch title, and they simply didn't have the time to work out the kinks in the compiler to get those few extra fps in Dire Dire Docks.


As I recall, that MVG video was incorrect—Nintendo could have compiled with optimizations and the game would have worked.

The most plausible explanation to me (and, IIRC, the one that Giles Goddard speculated was the case) is that they simply didn't want to take the chance that optimization could surface unknown bugs in the code, and they knew that the unoptimized version worked and played fine, so they played it safe and shipped the known-good unoptimized version. This is a perfectly understandable decision in terms of risk management, even if in retrospect it was too conservative. Less likely, but still very plausible, was that turning off optimization was just a mistake; mistakes happen when you're rushing a release for a deadline, and build deployment was not as orderly and automated back then as it is today.


Sounds like a good call to me. If you have something mission critical that's tried and tested, and reliably does 99% of what you need it to, you don't mess with it. It's very easy in software dev to get caught up in software optimizations, and it's a lot harder to quantify the risk of continuing to mess with something the closer you get to ship date.

Of course it "shouldn't make any difference" but who among us hasn't at some point been bitten by a "that couldn't possibly affect... ohhh. Ohhhhhhh." style bug?

When you're dealing not just with a single piece of software, but an entire hardware+software ecosystem, you don't risk jeopardizing the whole stack for an extra few FPS in some back corner.


Especially in an era where a buggy release could require an expensive return/reprinting step.

Back in those days it had to ship as close to perfect as possible because you couldn’t just send out a patch layer.


-O2 was considered "OK-ish". With -O3 often you get a can of worms.


Yeah, and most development houses would test and develop with -O2 on. The problems came if you ended up thinking you might need to switch near the end.


No, that's not more plausible. Because it's only the NTSC release that was compiled in debug mode. The Japanese and PAL releases were O2


But those are different versions. Just because they worked with -O2 doesn't mean NTSC would.

I'm not just saying this in a theoretical sense either; I've seen much weirder bugs.


Yeah it's hard to remember the time before you could get 2 near-perfect C/C++ compilers on the 5-ish most popular architectures.

And the architectures have consolidated since then.


It's always been a mix. Inlining, unrolling, and otherwise choosing longer sequences of instructions that are faster is still a major theme of optimization and optimizing compilers.

Even on today's high performance CPUs, which are far more sophisticated than the primitive 5-stage scalar in-order R4300 (not sure it even had any branch prediction actually).


> Even on today's high performance CPUs, which are far more sophisticated than the primitive 5-stage scalar in-order R4300 (not sure it even had any branch prediction actually).

I believe it didn't, with a short scaler pipeline (only 5 stages) the cost of waiting for a conditional branch to be calculated is relatively less, and MIPS of that era actually exposed the idea of a "delay slot" - rather than pausing the pipeline while calculating the jump, it still executed the next few instructions - so they were always executed no matter if the jump was actually taken or not. Filling this with NOPs makes it equivalently the same as a pipeline bubble, but I guess the intention was in some cases it could actually be used for something.

I think a few RISC architectures of that era did similar things, their entire goal was simplicity after all, the argument being that something like a branch predictor and the logic required to reverse speculatively executed instructions "should" instead be used to make the CPU smaller or higher performing in the best case (either as implementing better performing but larger logic, or higher frequency due to shorter critical paths).

I think this was dropped in later ISAs as it confused a lot of people, it wasn't well used by compilers (which is interesting as much of the MIPS ISA was explicitly designed around what would be easy for a compiler rather than handrolled asm), and improving process technology made transistors to handle the expensive superscaler speculative execution architectures relatively cheaper.


Now that's a 10x programmer if I ever saw one. I'm sure the compiler helped too. The generated code of C compilers from the 80's and 90's versus the generated code created by something like modern GCC -- it's like night and day. We're able to build faster tinier software than we ever have before, and it really helps to illuminate the impact modern tools have had when we apply that advantage to something old we already know.


Ironically part of the problem was that the optimising compilers of today make different space and memory throughput tradeoffs vs CPU execution speed that don't hold for the slower memory architecture of the N64. Namely optimisations like loop unrolling that benefit modern CPUs are a detriment to the N64.

I think most of the cool stuff he did here is around restructuring the code to remove contention on the RAMBUS as it can do operations on different banks in parallel when servicing reads/writes from different components vs causing contention if they wish to read from the same bank. Also writing directly into the bank that will be used by the graphics unit from the CPU while disabling cache features of the RAMBUS, freeing up the cache for more code pages for the CPU to use.

Moving more of the code to execute on the "maxed out" GPU and actually cutting execution time because of the reduced memory bandwidth requirements was also pretty neat.


After 20 years not coding (or coding in python, ruby, R, that sort of thing), I started up writing code in rust that needs performance (emulator). I was absolutely amazed by the quality of the compiler today compared to what I had 20 years ago. It's really amazing to see how clever it has become.


10x in one dimension. But not by the modern definition of 10x since this actually took a lot of time. This is more like a craftsman or artisan who toils away until they are ready and the result is perfect.


I really like the way you put that into words; the creator of the video clearly has a deep passion for his work.


I always wonder if the original devs ever find these videos and are like oh shit we should've done that.


It would certainly be interesting to see their reaction.

Mario 64 is a bit of a worst case as the game was being developed at the same time as the hardware and had to get out on day one no matter what. Between the Japanese and US releases I believe they even fixed some bugs.

Some of the things he did seem quite obvious, like cutting down the repeated calculations for every single coin on the screen. You have to think they would’ve thought of that if they have the time.

It would be interesting to see a series like this on a number of games over the console’s lifetime. Especially if it highlighted the tips and tricks the developers started to learn to get more out of the hardware over time.


> You have to think they would’ve thought of that if they have the time.

Stuck forever in the issue tracker as "priority: low"


Speedrunners famously need to hunt down first press cartridges on the US release because the Shindou release fixed the bug that allows BLJs (backwards long jumps).


This explains a lot for me; since I kept trying to achieve a BLJ but for whatever reason I couldn't.

Now, years later, you comment explains a lot O_O


it's worth noting that the virtual console (wii, wii u) releases are built off of the original nstc edition, meaning backwards long jumping is in play. but the most recent 3d all stars edition is based off of shindou, meaning blj's are not possible.


Can't even imagine being a dev back when you only had one shot at your game going gold and couldn't update it after the fact to patch bugs.


Tbh it made things much riskier for the game companies, but a lot better for the players.

For example: It was almost impossible to profit on the “make a killer trailer, presell, then deliver junk and ‘patch it later’” model that’s unfortunately popular these days.


These days, I see this as a subsidy for us older folks who wait a few years to buy a game. The clueless pre-order folks are funding my gaming experience!


The subreddit /r/patientgamers [0] is exactly about this philosophy. Games get simultaneously better and cheaper over time and all I need to do is wait? Sign me up.

[0] https://old.reddit.com/r/patientgamers/


[flagged]


Responsible adults waste their time with respectable hobbies like mindlessly watching TV and going to sports games! /s


> mindlessly watching TV

Also childish.

> going to sports games

More teenagy than childish, but still just as bad.


What do you do for entertainment in your spare time?


Real men die on the battlefield. Staying alive is for children.


> playing video games is for children

Geez, thats a pretty narrow minded opinion


You mean like the old as hell adult people playing tabletop games with Poker or Spanish decks or Domino in a tavern on any European village or town?

Collosal Cave back in the day was an adventure for children. Today it's a great game because the puzzles can be pretty hard even for adults.


Why don't you like fun?


There's always /r/stopgaming


Eh, its just the agile development process coming to gaming. Deliver the minimum viable product, work out which parts of the game people care about most, and perfect those bits. Rather than trying to guess what the market wants and spending the entire budget before the game actually gets real usage. If you just wait 6 months after release you end up with a better product than if the game just came out 6 months later.


There's definitely a balance. Yes, they needed to test their games a lot more, but no amount of testing will equal millions of people trying all sorts of weird things you would've never thought about. That being said, games have been also getting a lot more complex in multiple ways. Lastly, PC games also have to deal with an exponential number of hardware variations that is very hard to test exhaustively in house.


At launch, yes. But revisions were quite commonly slipped into newer production runs in a general industry sense.


One thing mentioned in the video is his memory tricks require the RAM expansion module. Mario 64 being a launch title couldn't have used the RAM expansion. The RAM module came out two years after the console launched if I remember right.


The vast majority of what he does is a combination of benefitting from hindsight with 30 years of industry learned best practices, a vastly superior development environment (including compiler), and just having the time, with minimal pressure or demands, to dedicate to these problems. What he did was impressive, but something someone experienced with developing on these platforms could achieve reasonably well.


Just to be clear for anyone that doesn't watch it, he explicitly acknowledges most (if not all) that in his commentary. His overall tone is quite deferential to the original devs and the time, tool and technique constraints they were operating under.


I didn’t work on Mario 64 but I did work on other N64 games years ago. And yes I did have that reaction :)


Oh good sir can you please share your story?


there are often dev reaction videos on youtube as well!

in response to mods, as well as speedruns (that may/may not use mods). sometimes the commentary from them is interesting.


Do you have any favourite dev reaction videos?


Psychonaut's dev reaction is my favorite one because they had one-to-one with a speedrunner talking about every game exploits used to get the fastest time possible.[1]

[1]:https://www.youtube.com/watch?v=lsDc1YVxHA0


The Cuphead devs commenting on speedrun tricks they didn't anticipate being possible.


Not the same game, but there is this for half-life https://youtu.be/sK_PdwL5Y8g


Or do they just sigh in exhaustion that, even so many years later, people are still demanding more out of them, despite them being underpaid and overworked already? Then take some heart medication and go back to work?


None of the games I play these days are CPU intensive so while I understand FPS is important to a large group of gamers I have never paid attention to it. Before clicking I thought "does frame rate really matter for Mario 64? How big of a difference can it make?" I'm absolutely blown away, I can't believe how smooth this looks. I don't know what else to say, I'm speechless. Stellar job. Vroom vroom.


High framerates are cool but often I wish more games had the option to cap framerate at 30/40/45 and use less battery/produce less heat instead of racing to 120fps.


I'm the opposite - I need the framerate to be as fast as possible. I can't even look at a 30fps game!


Ever played on a 144hz+ monitor?


pretty amazing what you can do when you push the release deadline back by 26 years.


Heh, but even then, 6x is still impressive. I doubt in 26 years you could optimize a tripe A title from today to the same extent. It seems that the vast majority of optimizations is actually achieved manually and not through Compiler magic and other modern tools. But then again maybe in 26 years we will have magic tools that will get the job done. ;)


> I doubt in 26 years you could optimize a tripe A title from today to the same extent.

Well... https://news.ycombinator.com/item?id=26296339


Wow Rockstar released his optimization. That's awesome.


I knew this would come up :-)

Doesn't improve overall game performance though.


Loading times were however the main reason I couldn't enjoy that game. Took like 15 minutes from starting it to joining some MP game with friends, then 5-10 minutes between each game looking at loading screens.


> I doubt in 26 years you could optimize a tripe A title from today to the same extent.

It helps that in this very particular case it was made much easier for him by basically enabling optimizations in gameplay code (it's fairly unlikely you'd get that one for free these days). But look at the difference in quality of launch titles for the PS4 vs what's being released today - that's 7-8 years worth of development. It's not unreasonable to expect that with another 2x that we could make similar gains. I've worked on a few AAA games and seen the codebases (and more importantly the issue trackers and profiling data). Today lots of these problems are "known" but it's a tradeoff - do you spend a person-week on fixing a frame drop in an awkward area or do you spend it on an incremental improvement across the board.


And have access to the ram pack that the original devs didn't have.


Wow, this is amazing! Plus he's adding split-screen co-op mode??! Mind blown. Impressive work, seriously.


Apparently the original game was meant to include it, but it was scrapped because it made the game run too slowly. Now that the game has been almost entirely rewritten for performance, this feature can be backported in :)


How would one learn about this depth of C and its optimizations? I'm interested in it as an art form, something to do in my spare time.


C is not that deep. In the end its just something that produces assembly/machine code: jumps, reads writes etc. you need to get a feeling for what the compiler outputs for a given input (try Godbolt!). Then you need to know how those outputs will perform on your target hardware. This video is a good example: it states, modern compilers unroll loops, because they usually perform better on modern systems, but not the N64. So knowing your hardware is just as important.


> C is not that deep.

It absolutely is. Look at its memory model, or look at any StackOverflow question or HackerNews discussion about edge-cases of undefined behaviour. Example: [0]. Alternatively, look at challenging interview questions that require a deep understanding of the C language.

C gives the appearance of being a simple language, but has many dark corners that many programmers are unaware of.

[0] https://news.ycombinator.com/item?id=22867059


Very little of this is relevant for performance. It’s important to know what’s UB so the compiler doesn’t miscompile your code, but most performance improvements don’t come from “oh the compiler can run aliasing analysis on this better so it’s 10x faster” but “this loop is O(n^3)” or “I should convert this linked list to a flat array”.


That's basically what I meant. But even taking all the ugly parts of C into consideration, they are shallow. You don't need to dig into 10 layers of abstraction to get to the bottom of things. And the example posted is about what others say about how to interpret the C standard. I don't care. At the end of the day you can look at the asm output and understand what's happening for the compiler you're using.


Familiarize yourself with the tools for measuring performance on whatever platform you want to optimize for. Then you can just start messing around and seeing what kinds of things make the number go down.


> Then you can just start messing around and seeing what kinds of things make the number go down.

And this will almost certainly require a deep understanding of the instruction set, and dropping into inline ASM, periodically.


Optimizing memory layout and reducing allocations are pretty ISA-independent. But yeah, at some point you will have to start looking at the assembly to optimize further (even if only to know when to tell the compiler to stop inlining a function that causes tons of register spilling inside a tight loop).


In most cases there’s a lot you can do by just peeking at the source code. Almost all programs ship without having been run through a profiler at all.


apt-get install linux-tools

perf record -g myapp

perf report

Play with the output for a few weeks. "Why is this slow?" is a never-ending question leading down infinite corridors and weird specialist tools.

Cachegrind and VTune are also instructive to play with


Some of that doesn't have to do with C even. Like, changing memory layout of variables, or the way you access data.

A simple example is if you have a 2D array, and you have the data from the individual rows consecutively in memory, but then you loop over the columns in your outer for-loop and over the rows in your inner for-loop. This means you access the first element from the first row, then the first element from the second row, then the first element from the third row, and so on. All these elements are far apart in memory, but every time you access one element, let's assume it's an uint32_t, the CPU fetches a whole cache lane of e.g. 64 bytes and puts it in the CPU cache in anticipation that you access data close to this in the near future. But you don't, so the CPU has to fetch another 64 bytes block for the first element of the second row, uses only 4 bytes from that, and so on. If your 2D array is large enough, by the time you finish the first iteration of the inner loop and start reading the second element of every row, the 64 byte cache lane that was fetched when you read the first element of the first row has already been evicted from the CPU cache again when you read the first element of row 2000, so the same 64 byte block has to be fetched from RAM again. This makes a huge performance difference, and is applicable to pretty much every programming language.


A big part of it is learning computer architecture. If you have a solid grasp of things like how memory access works and how the CPU pipeline works, you can start to get a better mental picture of how fast a particular piece of code will run based on what instructions the CPU is executing and what the memory layout is.

There are textbooks and classes on computer architecture. Funny enough, many of them use MIPS, which is the architecture used in the N64.

Optimizing for N64 also requires understanding how the RCP works, which is a separate topic.


Old skool book recommendation. Not necessarily C, more ASM but still a good book for the early nineties optimization that still has a mentality that's still valid. I believe that people today would be more concerned about how fast their code runs if it was visible, that is, if they profiled it with good tools!

https://www.amazon.com/Inner-Loops-Sourcebook-Software-Devel...


Read Agner Fog's optimisation manuals.

https://www.agner.org/optimize/#manuals


Start with assembly


Machine code is only moderately less convenient.


Its absolutely amazing the level of expertise and dedication a single person can have on the inner workings of something.


Mario 64 has to be one of the most dissected games ever. It’s kind of wild what speed runners, modders, etc. have uncovered.


Naive question : how did we get access to the source ? Is that from a disassembler or did the source code leaks at some points


From what I understand he used Mario 64 decomp code.

The game was reverse engineered as 1-1 compilable to the original rom.

[1]:https://github.com/n64decomp/sm64

Fun fact: The people who did this moved on to Zelda 64 Ocarina of time and recently completed that project as well

[2]:https://zelda64.dev/

There are ongoing projects by other people for Mario Kart 64, Goldeneye, Perfect-Dark, Banjo-Kazooie and others although I think the next target for completion may probably be either Majora's Mask or The Minish Cap.


PD will definitely be the next game to be fully (or near enough) decompiled. The dev behind that one is great.

If you're bored, his write up on the "Challenge 7 Bug" is great read: https://gitlab.com/ryandwyer/perfect-dark/-/blob/master/docs...


Thank you. I was honestly thinking the Minish cap would be done (why I don't know). It sits at 89% complete while PD is at ~74% but I guess I am not drilling into what is actually left in each project. I CAN'T wait to see what will be done with the PD source code!


A 30+ FPS Perfect Dark would be amazing. The RAM extension pack wasn't enough for that game. The AI was pretty good IIRC (although definitely went in loops). Had lots of fun playing 1 vs 11 medium difficulty. I miss throwing N-bomb grenades.


What is "SH" region mentioned in [1]?


https://github.com/n64decomp/sm64/blob/master/Makefile#L34

An updated version that was released later


IIRC from an earlier read about this, the code is from a disassembler.


The code is not from a disassembler. The developers of the sm64 disassembly project used a disassembler to look at the machine instruction output of the game’s functions, and then wrote their own implementation in C that results in the same output.


Why wouldn't Nintendo release the source, considering they're is little commercial value in this particular game now?


Nintendo is still selling Mario 64 in various ways. You can buy it in "Super Mario 3D All Stars" for $60 on the Switch.


Nintendo, like many other Japanese companies, are very protective of their IP.


That assumes they even still have it. Not sure about Nintendo in particular, but many other games of the era are known to have lost source code.

Nintendo did (supposedly) use an emulator for the N64 games in Nintendo Online (for the Switch); so it's possible that they don't have the source (or maybe it had enough machine-specific code that doing a source port would be more effort).

Of course, if they can re-sell it as a subscription service they probably would want to avoid releasing the source anyway…


Virtual Console (or whatever Nintendo calls it now) releases mean that the game can be a moneymaker decades past the end of retail sales. You also see this with MS-DOS games, sometimes modders will voluntarily stop making their fan patches if an official rerelease makes it to Steam.


> I don’t fault them for making most of these mistakes. C was a rather new language at the time.

In the 90s? It was 20 years old by then and mostly ruling the world already.


Not on game consoles, and perhaps not even for PC games more generally (hedging my bets in case Walter Bright waltzes in and starts talking about how he wrote Empire in C in 1972 or something :P ). It wasn't until the mid-90s that C ate the game industry. Recall that C wasn't standardized until 1989, that C compilers weren't free or ubiquitous or good, that plenty of now-forgotten contemporaneous languages existed, and that making entire games in assembly was still quite popular and liable to get you better performance anyway.


Interesting. I didn’t know console’s market was so different. 90s PCs were dominated by pascal and C variants on the OS and app front, with a tiny bit of assembly sprinkled here and there.


I believe the Nintendo 64 is the first Nintendo console where using C started to become prevalent. I’m not sure about the Saturn or PlayStation. But I know the 16 bit consoles were usually assembly because they had such a limited resources.


I don’t know about the Saturn, but the PlayStation is the first mainline console that I know that had a C SDK and games were written in C on it.

The first Nintendo handheld to use C was the GameBoy Advance which was also the first Nintendo console that had an ARM processor.

It’s true that C only really took off in the 90s while the 70s and 80s are more of a proto-C era which still built the entire Unix system.


I know they don't "owe" anyone anything. But it is disappointing they've apparently declined to release their code changes

It's also apparently unclear if the overhauled version will run on a real N64 or if it only runs in an emulator


He is planning to release the source code at a later date. He is waiting because the source code is intertwined with a ROM Hack he made, and he is unable to release the improvements without releasing the ROM Hack in an unfinished state.

The overhauled version will run on a real N64 as long as it has the RAM expansion pack.


It will run on a real N64 but it needs the memory expansion. So that's one thing he mentioned but people do gloss over. It (the extra memory) helped a lot in reducing memory contention between the CPU and the GPU.


He has patreons backing this project, so he may actually owe them something.


He owes them a credit at the end of the video, which they have received.


Well, then as they say "a sucker is born everyday"


I'm a patron of Kaze, and all I want is to keep seeing amazing N64 romhacks. Though having my name appear in the video credits is kinda nice too.


Great work, but a significant fraction of the improvement comes from installing and using the 4MB RAM expansion pak, which isn’t really a fair comparison.


Installing and optimizing the code to make use of the 4MB expansion pak*


My understanding, which may be incorrect, is that while one of the base optimizations he performed requires the second 4MB, most of the rest of the optimizations definitely do not.

So while I would not be surprised if he made some changes that assume more RAM is available in total beyond what was mentioned in the video, I believe most of the refinements covered could still be applied without.


Would be nice if it became the norm that the source code and assets for games were available as an open source purchase after X years


25 years ought to be enough, no?


Very interesting video and it brought back a lot of nostalgia. At the time this came out I was blown away by it. Remember watching some kid play the bbomb level at the department store. Looking back as an adult I no longer feel the awe, but have the greater satisfaction of learning about the ram bus and low level programming considerations


I remember playing it at the store thinking we had reached the pinnacle of computer graphics, and that there was no way they could get better. Boy was I wrong! Still one of my favorite games of all time.


Does this mean that Nintendo could have theoretically shipped Mario 64 as a 60 Hz game on N64? I know some of this stuff requires the RAM expansion, but not all of it.


This is a serious amount of effort for what I imagine is little payout. I really enjoy work like this and hope that he is able to make a living off it.

Anyone have thoughts on how to make projects like this more sustainable?


Someone on HN once said, if you want to build a reputation and a brand, do interesting things and talk about it. This is the perfect example. No, this will not make him money because there is not a large market for superior Mario 64 performance, but you sure as hell know who he is now, don't you?

And I will say his youtube skills are top notch. The video is very entertaining and informative. I just cannot imagine this person will have trouble making money in the future.

EDIT: Added a missing word


This seems to clearly be a passion project. The author is being paid in the satisfaction that he gets in improving this famous game way beyond what the original creators managed to do, all while being bound by the same technical limitations. He also gets to show off his skills to a fairly large (in absolute numbers at least) global audience of people who are also really into this particular game. I doubt you would see someone going to such great lengths for an obscure title that few other people would appreciate or care about.


People often do things for their own merits and not just in exchange for money.

But the guy has a Patreon and this video already has a half a million views which should be at least $1000 in revenue, possibly 2-3x and it's still quite new.


You can give him money using Patreon: https://www.patreon.com/Kazestuff


He does have a Patreon so it's not entirely volunteer work.


Art doesn't need to have a payout or be sustainable.


Submitted this few days ago but did not get any points. I have no idea how HN works


Time of day and day of the week can impact such things. Lastly, that's just the fickleness of social media sometimes. Sometimes a thing will take off and sometimes it won't.

A recent example is Among Us was released for 1+ year as almost a 'failure' with very little players. Received no major updates and just went by unnoticed until the right people pushed for a thing and now it's seeped into so many other things. e.g. Among us player count https://i.imgur.com/M1j0UOw.png


is there any part of this video that is a side by side comparison? didn't see any obvious marks in the description or comments


At some point, these older games should be open source. They can follow the doom/quake model where textures/art aren't included, and Nintendo can also still sell the games on newer consoles for convenience because not everyone will want to compile it themselves

I guess you could argue a mario 64 custom levels mod competes directly with a new mario game. But halo and cod sell fine. Maybe I have a bias that thinks Nintendo is fearful that custom mods for old games will be more popular than anything new and official


Copyright terms should be limited to 20 years. Open source or not, this project would be legal in that case.


I think about my fundamentalist feelings about copyright (between complete abolition and say 20 years like you suggest) and the practical reality of how that would fundamentally change content industries today. For example, I'm thinking of anime franchises, also Disney, etc, who essentially create whole industries out of remakes and repackaging their old IP. Essentially, millions of workers hired to maintain the less creative recreation/maintenance of these IPs would be reshuffled and or lost. Btw, this isn't just creative workers like writers, animators, and the like, but things like merchandise creation, manufacturing, live events, and so on, they'd all be much less of an incentive to maintain a franchise over the mere creation of works that might be separate from each other.

Apart from the business/worker side, the ability for an IP holder to release continuing sequels to works (like manga, movie installments, etc) and the legal force against fan works essentially gives more power essentially to the original creator to shape what is "canon" in their work. Perhaps fans could still decide the original creator's later installments are canon, but the extra legal force sort of makes this defacto, while a world of limited copyright the power of the original creator is less so. Thus, the way fans consume media and experience it will change completely as well.

It would just be a completely different world, one where a lot of these ills are addressed (like the ones mentioned by GP), and things will be very different. It's hard to say from my vantage point whether the differences would also be "ills" or not (and that might be my bias), but it would certainly be a different world with different rules and different expectations from fans and audiences.


I actually think we would see Disney remaking some of their old IP more often if copyright was only 20 years, or at least the popular ones.

If they released remakes at 20 year mark, anyone trying to take advantage of the now expired copyrights would have to compete with the new "official" versions of the IP.


I'm not sure copyright protects you much against someone continuing a story. Trademark protection might.

At any rate, fan-fiction is nothing new and while it might be a bit iffy to do commercially I see nobody even trying something remotely similar without at least the blessing of the original author.

So really I think society has decided who is allowed to continue a story and as far as I can tell it has little to do with copyright and everything to do with public perception.


> Essentially, millions of workers hired to maintain the less creative recreation/maintenance of these IPs would be reshuffled and or lost.

The other side of that coin is millions of workers now available for new and original projects that have to take real chances and break new ground to get off the ground.

> Thus, the way fans consume media and experience it will change completely as well.

Should we prevent creators from selling their copyrights as well, or do we simply make the "canonical" rights non transferable in that case?

> but it would certainly be a different world with different rules and different expectations from fans and audiences.

Vaudeville and burlesque operated under those rules. Audiences didn't seem to mind. "Who's on first" was not an original act.


You are confusing copyright with trademarks. You can't release another official harry potter movie just because it would be legal to copy the old ones.


My controversial opinion is, that everything that cannot be bought for reasonable amount of money, also cannot be "pirated" (the act of copying should not be considered as illegal piracy).

The main reason why piracy is illegal is, that studios/producers/writers lose money, because people pirate instead of buy. If I cannot buy eg. Super Mario 64 from nintendo in my country, and if I copy the game (and emulate it or whatever), I never actually caused any monetary loss for Nintendo by doing that.


You can still buy Super Mario 64. It’s part of the paid subscription virtual console service on Nintendo Switch


So, no copying SuperMario64 then.

But my opinion stands for things that are not available this way.


For a long time Super Mario 64 was not available this way, but now is. How to handle this case? It’s a hell of a lot easier to just have a time duration of copyright expiry.


If someone sells nude photographs of themselves to a rich person for $1 million, does that mean you deserve to have them for free?


Under current law, you absolutely can get them for free after the copyright expires (if copies are still around). And you can make as many copies as you want.


If you already saw them... and had a chance to photocopy them... why not? How much money did the artist/model lose by you doing that (assuming a normal person would never pay 1 mio for that)?


nintendo games have much higher replay value than the others you mentioned. they are mostly off line. they have different mechanics beyond graphics. people will buy them on multiple consoles happily. they are not rotting, these are active revenue streams.

nintendo isn’t afraid of mods, they would rather just sell you mario maker and create an experience.

quake and cod are mostly online, and there’s no community anymore. there’s over 30 cod games, no scarcity on the experience since they’re so similar. the one mostly offline game series you mentioned, halo, is closed source and still sold today. it is far less likely quake 3 is going to be bought and played today vs mario64. those games are not active revenue streams and don’t have a good path to become one again


I'm not sure what Nintendo "thinks" but they've always been protective since the NES days, which arguably allowed them to pick the video game industry out of the funk that befell all the other 8-bit consoles. I'm not sure if there is any philosophy around why they do it vs. it just seems like a corporate culture to them to be controlling of their consoles and their IP. I think a while back, the switch removed a game from the nintendo store that had an interpreter in it, even this with sanctioned basic interpreters around there. They've always been control freaks to the point where it doesn't really even make much dollar sense for it.


Btw, that video game crash was limited to the US - it didn't happen in Japan or European markets. I remember complaints that other language Wikipedias covered it as a global thing anyway because they were translated from English.


This reminds me, a little bit, of this fact about Encarta: https://www.tampabay.com/archive/1999/07/19/encarta-differen... (first link I found)


Interesting piece of history! I am too young to have experienced it, I mostly just know about it from reading video game history, and yes the only cited makers were atari, commodore, and other American companies.


In my opinion the by far biggest mistake with copyright is that it shouldn't depend on the age of the author, it should be dependent on the release date.

The whole 70 years after death nonsense is just an ugly hack to equalize the fact that copyright will last fewer years if you are older. It doesn't even solve the original problem! If you are younger you still get more copyright years.


Only under an economic system that abolishes private property. Won't ever happen under capitalist governments, unfortunately.


Intellectual property is imaginary property, private or not. it's not a real thing until committed to a medium.

Upkeep of the fiction that intellectual property is real property has societal benefits for a limited time insofar as works of art - and the original idea of copyright was reasonably limited. Over 100 years is inhumane.


One might argue that enforcement of traditional property rights is inhumane in many cases too.


I certainly would :)


No it has benefits only for the private property owner, never for society at large. All private property is this way. Capitalism is a system that deifies private property, so it will only ever get more and more 'private'


If capitalism is the nail, private property is the hammer. Quite a lot of development that the United States wanted to see happen in the west (which enriched it immensely) was done because it was willing to make grants of private property to the companies (largely railroads) doing the development.

You can debate whether capitalism is (or should have been) the only method for getting things done, but the use of private property as a viable incentive for driving innovation within that system is not really debatable.


This is pretty much the take you get right outta american high school: full of propaganda. This is extremely simplified, ignores mass amounts of genocide and slavery, and ignores historical events to compare it to. The development of america as a nation is built on the corpses of hundreds of millions of africans, chinese and native americans which notably were stripped of their 'right' to private property so that someone else's 'right' to private property could exist.

You do not in any way need to enslave people in order to develop land; only to develop land AND enrich a bourgeois class of people.


I never said any of that. I think you need to take a step back and re-read my comment. I'm not endorsing capitalism, I'm merely saying that so long as capitalism is the one way you can think of to get things done, giving people property is exchange for development is almost always the way that development happens.


This somehow didn't work for ISPs at all.


Wow, so, ok. Former N64 programmer here, so my take is certainly biased but here goes.

This guys has clearly done a lot of work, building on top of the existing reverse disassembly of the Mario 64 codebase and has a great understanding of the heavily dissected 25 year old hardware. And that's where my paise stops.

The level of snark here is outstanding. Maybe its a cultural thing, maybe he's trying to hype up his Patreon, but he comes across to me as a complete egotistical a-hole. Does he really think the Nintendo programmers were 'smelling the grass' rather working 24/7 to get a ground-breaking game shipped. He does his best to call the team who made several of the most influential and important video games of all time incompetent.

Breaking down his snark:

- The RSP/RDP co-processor had microcode that the game programmers interacted with (you generated a rendering command list on the CPU and called a library function to start the RSP processing that scene). So any criticisms of things the Mario team could have moved to the RSP (ie billboard calculations) are short-comings of the microcode made available by the NCL driver/hardware team - post launch there were regular revisions to the microcode and optimizations made available.

- A lot of the matrix operations were provided in library form to game teams and you were encouraged (forced) to use them rather than rolling your own - there was probably a concern that an overly aggressive optimization would send bad data to the RSP/RDP and cause issues on current or future hardware revisions. The divide by zero checks were likely there for similar reasons - the RSP microcode could run in a mode that skipped near-plain clipping of triangles, which was faster but would blow up spectacularly (huge jaggie triangles all over the screen) if you had triangles that did indeed clip into the near plain.

- I have not seen the Mario 64 source but I imagine some of the lowest level code was written in mips assembly but has been decompiled into C. I may be wrong here, but if that is the case there were likely branch-delay-slot optimizations and cycle counted operations that have been trampled by the reverse engineering. I may be off-base here so would love to know if that is the case?

- Yes the gameplay code has unused variables, duplicated code, etc. But this was a brand new genre of game, there would have been a lot of experimentation, tuning, things would have been in flux until shipping and there certainly was no time to refactor and run the risk of breaking something.

- I suspect the debug flags in gameplay code were to avoid compiler bugs. The mips compiler would sometimes emit some back-back floating point operations that caused the N64 mips cpu to hang (but was fine for all other mips processors). That may have affected the gameplay code with optimizations enabled. Later there was a command-line tool you had to run on your game elf that would look for this bad combination of instructions (and you would then change the offending code until the compiler did not emit the 'bad' instruction combination). Gcc was not allowed by Nintendo until much later, and had its own issues.

- Older floating point units were less predicable with numerical accuracy, rounding errors etc. If early testing showed differences in game control/feel between the debug and optimized builds then I can see why the team would be more comfortable shipping with debug code rather than risking a bug that was not picked up until after you manufactured a million or two cartridges.

- The memory bank conflicts were well documented and games worked around/with that knowledge. Mario 64 was likely well under development before that information was known. As others have pointed out, using the memory expansion module is cheating - that gives you a whole bunch of un-conflicted memory banks to use. 2x the memory also gives a lot of optimization oppertunities.

- I don't see a 6x speedup. I see ~10% gains in most systems and a best case frame time improvement from 29 fps to 41 fps. Which is pointless. An NTSC tv runs at 60Hz, you either hit 60fps or 30fps, anything in-between results in horrible tearing of the screen or your frames bounce between 60 and 30 and movement feels horrific. The game was optimized to hit targets and to ship.

- His split screen optimization is crap! Reducing load on the RDP (rasterizer) by killing rendering area (big black bars on the screen) is lazier than most everything he snarked on. The slightly better approach would be to ab-use the TV output timing registers so that the screen buffer can be less than 320x240 but still fill the screen. You trade black bars for smeary sludge but on the target NTSC TV that was generally the way to go if you didn't want your game rejected by Nintendo QA. Or actually re-write the renderer and reduce the amount of pixel fill.

- And a minor nit pick, the sound processing was not done on the CPU (as he claimed). The cpu built audio command lists that the RCP then used to generate the output waveforms.


> The level of snark here is outstanding. Maybe its a cultural thing, maybe he's trying to hype up his Patreon, but he comes across to me as a complete egotistical a-hole. Does he really think the Nintendo programmers were 'smelling the grass' rather working 24/7 to get a ground-breaking game shipped. He does his best to call the team who made several of the most influential and important video games of all time incompetent.

I just rewatched this video and didn't get any of that, like at all. A little banter to be entertaining but I think you're reading into that wrong.

>- I don't see a 6x speedup. I see ~10% gains in most systems and a best case frame time improvement from 29 fps to 41 fps. Which is pointless. An NTSC tv runs at 60Hz, you either hit 60fps or 30fps, anything in-between results in horrible tearing of the screen or your frames bounce between 60 and 30 and movement feels horrific. The game was optimized to hit targets and to ship.

The idea is to get more overhead for his mods.


>Which is pointless.

Nope. You can target 30 FPS and use the spare cycles to do much more.


I read his snark as being self-deprecating. Rather than insulting the Mario devs, (who, as you note, are building a ground-breaking game on a deadline) he's making a joke about how obsessive someone would have to be to dig deep into a 26-year-old game's codebase and hardware internals in order to get acceptable performance on a mod that does things to the game engine that the devs had no intention of implementing.

Thanks for providing more context for what's going on - all of this stuff is neato.


I guess I got the snark from things like this... "I don't fault them for making MOST of the mistakes, C was a rather new language at the time, they didn't know the hardware they were working with and EVEN programmers in 2022 would make many of these MISTAKES. Besides you either get to know about these things or spend time learning about these things or you get to go out and touch grass it's an either or, you can't do both".

Emphasis mine.

He regularly trashes the quality of other people's code in his twitter https://twitter.com/KazeEmanuar/status/1343569532976304131 despite building his following off the years of work of those same people.


Yeah, he definitely comes off as an ass in that thread of Tweets.


"The level of snark here is outstanding."

I didn't interpret it as snark but as attempts to be funny - which for me were just irritating. I can't imagine how much time it took to put in all those "funny" images but for me it was wasted time. I could have done with 20-30 fewer "vroom vroom"'s too.


> The level of snark here is outstanding. [...] 'smelling the grass'

I thought his snark wasn't too off the charts in this video. I'd bet it's more a cultural thing. If you were working on the N64, you're probably like two decades above Kaze's target audience for his youtube videos haha. Like, he said that you can either be a weirdo who spends all day optimizing nearly 30 year old source code, or someone who "touches grass"[1]. He, of course, was referencing his own lack of life, not Nintendo's programmers. But, you'd have to pretty familiar with current online memes and insults to get his joke.

> I have not seen the Mario 64 source but I imagine some of the lowest level code was written in mips assembly but has been decompiled into C.

No, we only decompiled C code back into C code. The decompilation matches 1 to 1 with the released ROM images.[2] iirc, the only, non-libultra ASM code was the decompressor[3]. The C code was pretty obvious due to "wonders" of IDO -O0 optimization. We were forced to get surprisingly close to what Nintendo wrote. We were stuck with an `f32 [3]` 3-vec instead of a `struct`...

> I suspect the debug flags in gameplay code were to avoid compiler bugs.

As far as we could tell, this wasn't an issue. IDO 5.3 had no issues that we could find when compiling at -O2, but then again it was decompiled code. I personally subscribe to the '-O2 -g' IDO meme, but Giles Goddard thought it was out of fear of the unknown. So, I'd trust him.

We had the `mulmul` patch for IDO. I don't know the chronology for the release of that patch to devs, though. We also haven't been able to find any N64s that have the mulmul issue.

> An NTSC tv runs at 60Hz, you either hit 60fps or 30fps, anything in-between results in horrible tearing of the screen or your frames bounce between 60 and 30 and movement feels horrific.

I wonder what Kaze is doing. I know that there was work on a variable framerate mod/revamp for SM64. I'd guess it'd be outputting at the N64's 60 fps mode with duplicated frames, but you'd have much better luck asking one of the modders/coders who were working on that a couple years ago.

----------

[1] which is Twitter's insult du jour: https://knowyourmeme.com/memes/touch-grass

[2] https://github.com/n64decomp/sm64

[3] https://github.com/n64decomp/sm64/blob/master/asm/decompress...


One tidbit that may be of interest (or not) is that if you see doubles being used in N64 C code it is almost certainly accidental. Doubles were slow on the N64 mips CPU but the mips C compiler would happily output them and promote surrounding floats to keep the precision it assumed you wanted.

float myValue = x_float + y_float * y_float * 3.0;

Looks ok, except the constant is a double. Performance killer.

You'd get pretty good at seeing the missing f (and we eventually wrote some simple tooling to find functions using doubles) but I'm sure a number of games shipped with accidental double calculations.


hahah yes the "0.0f" is different from "0.f" is different from "0.0" is different from "0" is a great IDO meme as well. One of the changes we had to make when we started to support the PAL release, which was compiled at -O2, was to go back and change some constants from explicit "1.0f" to "1" to get the code-gen to match. IDO -O0 didn't care so long as you got the f32 vs f64 constant correct. But, IDO -O2 has different codegen.

The wonders of C constants. Sometimes IDO can do the conversion at compile time, and sometimes it can't. But, the compiler will treat things "0.0f" and a compile-time "(f32)0.0" as different constants for code-gen purposes.

We have a kinda outdated document of weird IDO behavior which is worth a read if you want to trigger some PTSD: https://hackmd.io/vPmcgdaFSlq4R2mfkq4bJg#Compiler-behavior

Also, since SGI/MIPS/IDO was from Stanford (I think...?), there is actually a surprising amount of published papers on IDO's compilation model. Which comes in handy if you are trying for the ultimate joke of decompiling the written-in-pascal compiler into C: https://github.com/n64decomp/ido


Thanks for filling in the details on the disassembly, I'm surprised the matrix functions weren't in asm but I believe you. If I remember correctly NOA did the CPU/lib side and NCL did the RSP code (definitely in asm) and hardware side.

The back to back multiplies definitely could cause hangs on real hardware (actually I'll back up and say it certainly could hang a dev-board and I think the hardware was the same as production). I don't know if there were particular registers or data that caused the hang but I remember QA picked up hangs with builds that had them in there.

Adding extra frame buffers lets you output at 60Hz without tearing but you'll bounce between duplicated frames and single frames. You technically get 40fps (or whatever) but smooth camera movements feel 'off'. That extra frame eats pretty badly into the remaining space in the 2Mb base memory (at 640x480 you could have just one frame buffer and render interlaced, so long as your frame rate always hit 60Hz).


If you think this video was snarky, I also watched the older one he references and I think it was worse (despite the initial caveats): https://youtu.be/uYPH-NH3B6k


Small thing but I hate how a lot of the modding community (and retro gaming) has turned to videos and youtube, I just don't have time to watch a 20 minute video on these things. I'm often searching high and low for basic documentation and descriptions of things and google always recommends some 20 minute video, it's infuriating.


- Speeding up the video saves you time

- Listen to the audio only, occasionally take a peek, don't tell me you are being productive 16 hours a day

- VideoGames are visual so it's a natural to present them with video, hardly a blame for the authors


He's not presenting a video game -- he's presenting the technical details of speeding up a video game.

90% of the video is just stock footage in the background while he talks about the technical details. A better approach would be a written article supported by a video demonstrating the speed up.


The good thing is that a platform like YouTube can support creators like him spending so much time on thing that wouldn't otherwise perhaps be financially feasible, even if his contributions to the modding community aren't as easily digestible.


At least for Super Mario 64, Ukikipedia is a good text-based resource: https://ukikipedia.net/wiki/Main_Page


Just a note, these wiki's are great. nesdev really saved me for example. I am thankful for people who put things out there like this.


Just the modding and retro communities? YouTube has done that to practically every other community, as was to be expected whenever you suddenly introduce the dangling carrot of money.

I really hate the type of guy who is lurking around enthusiast forums looking for topics to "cover" on his next YouTube stream. Yes, these guys exist and yes they do that.

It's not really an improvement in quality, but it is a degradation in wasted time and also in cooperation, since communication becomes mostly one-sided again.


>I really hate the type of guy who is lurking around enthusiast forums looking for topics to "cover" on his next YouTube stream. Yes, these guys exist.

Actually that sounds like a decent idea. You lose nothing by not watching the video and instead focusing on the forums and others who would not otherwise spend time on the forums can be made aware of exciting developments.


You get infuriated that talented modders and hackers perform research and entertain you for free?


Given how much competition there is in "the scene" for clout alone, there is no need to dismiss any criticism with "but its free". consumers/end users/spectators can absolutely complain and don't have to put up with anything and the production value increases.


Idk, for content that doesn't teach me anything important I prefer video over text, as I can play it on the side while doing something else (that doesn't require too much focusing) and if something sounds interesting/important I might pay attention for a while.


This is a generic complaint about any information nowadays. I suppose you could read the transcript of the YouTube video! That's available.


Yeah, being to the point is crucial in these days of abundant content. Sometimes I wish they reinstated the 10-minute limit.


What I do is skip relentlessly through videos (using left/right keys on PC or tap-tap on mobile) and adjust the playback speed to be as fast as can be intelligible, I can get the gist of a 20 minute video in 2-3 minutes this way. Then, if something catches my attention, I go back and watch that at normal speed.


I agree. So often I'm looking for something like how to do something in photoshop and the video is 10 minutes or more when the answer is just "use this tool"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: