Hacker News new | past | comments | ask | show | jobs | submit login
RP2040 Doom (kilograham.github.io)
447 points by xkriva11 on March 14, 2022 | hide | past | favorite | 79 comments



Awesome! A huge amount of work must have gone into this.

I did some playing around with VGA graphics from the Pico when it first came out (wrote a simple library to produce SNES like graphics, wrote it all up on my blog https://gregchadwick.co.uk/blog/playing-with-the-pico-pt6/). It felt like Doom should be doable but I figured you'd need an off chip RAM expansion interfaced via the PIO. Clearly not.

The Pico really is a very fun board to play around with. Could be a great target for a retro style mini console thing.


> Could be a great target for a retro style mini console thing.

I've been playing with this thing. It's quite neat: https://shop.pimoroni.com/products/picosystem?variant=323695...

(Note that it's a dev board; if you just want to play AAA games, not the thing to buy. If you want to program a game and show it off, it's what you want.)


The PicoSystem is really neat and has awesome build quality.

One thing I really like about it is the super quick boot compared to all normal devices that need to boot an operating system first.

Highly recommended.


+1 for build quailty. I love my PicoSystem even if my goal of learning to program a game for it has fallen flat, ha.


I built a retro style game console for myself and now working on building games on it. Never thought doom is possible without lot of external hardware for RAM and storage

https://codetiger.github.io/blog/building-a-retro-style-game...


Costing £58.50 it's way too expensive. ESPlay Micro is only $32 and has better specs.


Thanks for highlighting this - looks awesome.


These are truly fab little devices - i bought a couple for my kids to play simple (not too addictive) games and hopefully get them in to programming.

question is, how would one go about compiling Doom for the Picosystem? that would be so cool…


Fun VGA experiment -- thanks for writing it up. I've done VGA with FPGAs but I like how Pico is way cheaper with an open tool chain and great accessories.


For a retro mini console, VGA32 based on ESP32 with FabGL is a better choice.


This thing even has networked multiplayer! In about 256K of RAM and 2MB flash! (It's the Raspberry Pi Pico board)

Carmack would be proud!


Those are important limitations, but there's a lot of room for solutions when you have a dual core CPU which is many times faster than a 486.


Doom 1 itself has 4MB of RAM as a minimum requirement:

https://www.computerhope.com/games/games/doomx.htm#doom


The article talks about this and allows you access Flash as if it was very slow RAM (fronted with a 16K cache of actual RAM). This allows the author to do many things like directly access the levels and textures without loading them into RAM, And in fact storing them compressed and using the second core to uncompress them on the fly


In layman’s approximation, could this be exactly how modern x86 CPU with multi-level caches work, or is it completely different from such things?


Not the same but not totally different. MCU abstraction is simple and more like vintage stuff. So it would closer to an 80s system that executed many routines from memory mapped ROMs -- in addition to system RAM -- with an instruction cache on CPU.

MCUs can have on chip RAM/ROM and off chip [quad] serial RAM/ROM, and even parallel access RAM/ROM like FRAM. Several ways to skin the cat. Or cut the pie, as it were.


Much of that is for art assets. Do it with fewer or lower-resolution textures and sprites, and you could get away with quite a bit less. The executable code can fit in well under one megabyte. You could even procedurally-generate the art, if you've got way more CPU core available than storage.

The SNES ran Doom with two 64k RAM banks (albeit with textures and data such as level geometry running directly from ROM.)


The SNES used a dumbed-down version of Doom:

https://doom.fandom.com/wiki/Super_NES

While this port makes it a point to port everything accurately, including multiplayer.


In contrast with these limited platforms:

SNES Doom - 128KB RAM, I think a 2MByte ROM, CPU is 65816 at 3Mhz + SuperFX RISC CPU at 21Mhz, which also had its own 64KB of RAM.

PSX Doom - 2MB RAM + 1KB fast scratchpad, able to load from a standard 650MB CD, CPU is a MIPS R3051 at 33Mhz + the PSX accelerated graphics, not used except to draw strips

So doing this in a device that has not too much more RAM than the SNES and also has to livestream the VGA signal Atari 2600 style is exceedingly impressive. It's a dual CPU unit but basically having to spend a core manually bit banging the VGA signal like that is what fascinated me the most.


I haven’t checked but don’t think that the second core was bitbanging the VGA signal. The RP2040 has PIO (programmable I/O) mini cores that can read directly from RAM (DMA) and address the GPIO pins directly. They most likely used that to their advantage.

Edit: yes, see https://kilograham.github.io/rp2040-doom/rendering.html


FWIW, it's possible to bit-bang DVI at 640x480 on the 2040. Takes about half of the available resources:

https://github.com/Wren6991/PicoDVI


That requires a hefty overclock though (252MHz instead of 133MHz).


DOOM overclocks it to 270MHz. :)


I'm not sure why you are just comparing RAM when the newer system has a huge advantage in being dual core and faster clock.


That's cool, it's Doom in a completely self-contained cartridge size. Would it be possible to just hook up a cartridge like this to a monitor (through e.g. usb-c) directly?

Also if the RP2040 is just $1, does this mean we should be able to get e.g. Doom on cheap handheld single-game devices like the old Game & Watch and similar machines? I remember spending hours on these "racing games" or 12-in-1 Tetris LCD machines from the toy shop. How much does a small (2-4") color OLED or backlit LCD cost these days? Actually, what is the cheap handheld market looking like these days? I had a boggle at the local toy shop's website, VTech is still going for it but mainly with baby toys it seems, and those Tetris handhelds are still the same from 20-30 years ago, they cost just €3,99 these days. I'm also seeing some products from a company called Wonky Toys, and miniature Atari arcade cabinets.


There are already a lot of cheap game consoles based on ESP32 which can run Doom and even more advanced games:

QT Py ESP32 Pico: https://blog.adafruit.com/2022/01/24/is-this-one-of-the-smal...

Doom Boy ESP32: https://habr.com/ru/post/512130/

LilyGo FabGL VGA32: https://m.aliexpress.com/item/33014937190.html

ESPlay Micro V2: https://www.makerfabs.com/esplay-micro-v2.html


What a coincidence, I was working on porting doom to the e-ink badger2040 last week. Getting doom to fit into memory was fairly straightforward, but they did a better job than me. I'm very impressed they got the original WADs and networking going as well. Great work!


Hah! I never imagined DOOM as a use case when I was designing the Badger 2040 - how foolish of me, in hindsight it's obvious!

How far did you get with it? Any video of the end result?


I got things drawing with low-complexity WADs, but had issues/graphical snow after a few seconds in and needed more polish to fit the original WAD in memory. I figured the video would be best if it opened with the hangar level, so I haven't made a one yet. Might be worth rebasing off this effort instead.


This is impressive.

I'm wondering about a few things:

- "I decided to leave the XIP cache to do its thing, and select a few small areas of hot code or data to promote to RAM manually"[1]: I understood this as you leaving the XIP cache activated. But this seems at odds with "16K of flash XIP cache, that we’ve talked about, but decided not to use."[also 1], which I'm interpreting as "decided not to make use of the XIP cache (i.e. turn it off)" (maybe I'm misreading).

- I thought ARM32 has 12(-14) usable registers (compared to 14-15 in x86-64), so why these mentions of "scarce Cortex-M0+ registers"? (Does FIQ mode reduce the number of usable registers?)

- "not good on a Cortex M0+ where the overhead of a function call is generally 30-40 cycles, with the corresponding loss of most of your precious “in-register” state": are function calls disproportionally slower on Cortex M0+? (Certainly 30-40 cycles seems high.) Why is that? (Registers r4-r11 are callee-saved[2], thus not lost; mutable data might have to be re-read from memory, though--just like on other architectures, but maybe CPU caches are faster on those.)

- "These OR values can be stored in a lookup table indexed by higher bits in the sample position, and thus the 8x space savings can be realized without needing any branches in the code!"[3]: Cortex-M0+ has a 2-stage pipeline[4], I'd hence expect the cost of a jump to be just 1 additional cycle, for the re-processing of the 1st stage for the next instruction (maybe I'm wrong), which would be the same as a memory access. (Maybe multiple jumps can be saved this way, though.) Did measurements show the lookup table to be faster?

[1] https://kilograham.github.io/rp2040-doom/speed_and_ram.html [2] https://en.wikipedia.org/wiki/Calling_convention#ARM_(A32) [3] https://kilograham.github.io/rp2040-doom/sound.html [4] https://en.wikipedia.org/wiki/Cortex-M0%2B#Cortex-M0+


> - I thought ARM32 has 12(-14) usable registers (compared to 14-15 in x86-64), so why these mentions of "scarce Cortex-M0+ registers"?

Cortex-M0+ is thumb-only (compressed 16bit instruction encoding). "In Thumb state, the high registers, r8-r15, are not part of the standard register set. The assembly language programmer has limited access to them, but can use them for fast temporary storage."

> (Does FIQ mode reduce the number of usable registers?)

There is no FIQ mode in Cortex-M. Instead you usually have the nifty Nested Vectored Interrupt Controller (NVIC) and it is designed so your interrupt handlers can be regular C functions, with no special handling needed (no special interrupt return instruction) needed.


In thumb2 virtually all arm32 Instructions are available, some as 32bit encodings. But even the 16bit encodings include some instructions that work on hi registers.


> - "I decided to leave the XIP cache to do its thing, and select a few small areas of hot code or data to promote to RAM manually"[1]: I understood this as you leaving the XIP cache activated. But this seems at odds with "16K of flash XIP cache, that we’ve talked about, but decided not to use."[also 1], which I'm interpreting as "decided not to make use of the XIP cache (i.e. turn it off)" (maybe I'm misreading).

The way I'm reading it, you can disable the XIP cache and use that 16KB of RAM for anything else you want. But the author "decided not to use" it for something else, that is, the 16KB are still being used as cache.


> You can refer to Doom Wiki - WAD section to get a bit more detail about the types of lumps mentioned below.

I just HAVE to nitpick on Fandom/Wikia search term squatting. There is a real Doom Wiki at doomwiki.org.


The Pi Pico reinvigorated my love of tinkering with electronics. I can hack my way through C on an Arduino (and would probably still use it for any serious deployment that I didn't expect to turn into a big community effort) but for standing up quick proof of concepts, embedded python is outstanding. Incredible for $4.

These newer compatible boards being released are awesome.


Try ESP32 for about $3, you will be amazed, it can even run a full PC emulator (thanks to FabGL and VGA32).


Cool!


Fun stuff, it always amazes me that people are surprised. Not having lived through it is a part of that I'm sure.

The RP2040 is more powerful than an 80286. The PC/AT which was hugely more powerful than the original IBM PC (on which DOOM also ran). Put a keyboard, mouse, and an frame buffer on an STM32F4 or F7 and you've got the computational and capability equivalent of the PC's that powered the world in 1985. People did accounting, CAD, spreadsheets, email, all sorts of things on them. Amazing I know, but here we are.


Some good discussion here, but you've got Doom's original system requirements wrong.

It required a 386 with 4MB of RAM. It would not run on a 286, much less the original 4.77mhz IBM PC.

source: https://www.mobygames.com/game/dos/doom/techinfo

IIRC, Doom was very playable, but not exactly smooth on my cheap 386SX which was... 20mhz? But it ran like butter on the 66MHZ 486's in the school's computer lab.


Isn’t it a __lot__ more powerful than a 286? 32 bit dual core at 133 MHz - roughly a Pentium say?

Plus some of the early 8 bit machines drove a display with minimal extra hardware - eg the ZX80 / ZX81 although it was very very slow as a result!


In terms of instructions per clock and I/O bandwidth it compares more favorably to 16 bit architectures than 32 bit ones even though the Cortex-M family is nominally 32 bits.


I think memory bandwidth is key and I don’t know how the RP2040 stacks up but even 486 wasn’t superscalar and maxed out at 66MHz by comparison.


RP2040 is designed to provide full bandwidth to both cores at once without bandwidth contention. Combined with the PIO's you can do some really impressive bitbanging.


Thanks - that’s good to know. Must be into at least 486 territory then - with the exception of floating point.


Doom required a 386 (and really wanted a 486) iirc.

Wolfenstein worked on 286.


And require 4MB ram... since i have machine with 2MB i wasn't able to enjoy it.

But i found DOS4GW command line option to "emulate" ram with swap file on DOS. I make virtual memory like 4MB and run game. It took 15 minutes to start game and show main menu, another 5 mins to navigate on menu and 15 to run game. Frame rate was some like frame PER minute.


I thought about building toy pc with rp2040 but I wasn’t able to solve gpu problem. Driving display seems very hard task without some dedicated hardware. And using serial output is not fun.


So, FWIW, I've been playing around with this. I've got an FPGA board that has an HDMI output[1]. I have a simple 1280 x 720 frame buffer running on it (read the DRAM, display it on the monitor. I'm building a carrier board to connect it to an STM32F429 Nucleo-144 board using the ST Micros flexible memory controller (FMC) peripheral. This will present the frame buffer contents to the STM32 as memory.

Additionally, some "control registers" are being implemented in the FPGA that can do certain actions. At a minimum they are "clear to one color", "copy region", "scroll region", and "copy glyph". The STM32 has the DMA2 peripheral that does a lot of cool bitblt type functions but these can be nominally slowed down by not synchronizing with the FPGA's schedule for displaying things.

The STM32 is running micropython. The "plan", such as it is, is to let the REPL run using the display as its terminal, and a "graphics mode" to reserve parts of the screen for graphics. The small goal is to re-create sort of the VIC-20/C64/ZX Spectrum kind of "vibe" (interpreted language easy access to the graphics) and then build from there. Clearly the basic frame buffer is like 20% of the FPGA so there is lots of room to do other stuff in there.

[1] https://www.kickstarter.com/projects/1812459948/minispartan6...


Thanks, this sounds awesome and interesting topic for learning in the future.


There are a lot of options on this front. As far as I can tell, most displays for embedded platforms are sold as modules that you interface with via a serial or parallel bus. There are libraries out there to handle the grunt work, if you don't want to dig through data sheets yourself.

If you want something that doesn't use any dedicated hardware, interfacing with analog displays (e.g. NTSC/PAL/VGA) can be done with a handful of resistors on GPIO pins. Conceptually, it is easier but actually dealing with timing is a pain. Again, libraries that deal with the grunt work are available.


You can use ESP32 for a toy PC, check out FabGL.


The computation power was not so much the issue for this port. The challenge was to make it work with much less RAM and storage available.


Yes, they are solving the storage and RAM challenges partially by throwing CPU at it: not using native pointers, switching between multiple struct sizes where the original had one, compressed integer values, etc., also they restructured drawing to happen in slices as the beam travels, which must have its costs. Also I wonder whether original Doom relied on some GPU hardware, whereas here everything happens in software. RP2040 doom also has to emulate the sound hardware, and handle lots of interrupts to initialize the DMA for each individual video scanline.

OTOH they are actually overclocking the RP2040 at 270Mhz.


The original Doom was all software. GPUs weren't a thing for mainstream PCs at that point.


Sure, but it's not about the speed of the hardware but what was done to port it to the hardware. RAM and ROM would be larger then. It has 256KB, I remember 286's having well over 1MB, 386's even more!

I don't think PC/AT was sub $1 either :)


> I2C networking for up to 4 players

How does one even get these ideas?


two free gpios left, clearly


It is better than this:

> For RP2040 Doom, whilst I thought I might need to build my own single pin PIO networking with some sort of token passing, it turned out I had 2 GPIO pins free that could be configured for I2C, so I decided to just use that instead.


it is almost as though i was referring to that!


Music and sound too???

...what am I doing with my life



I'm speechless. That was... an impressive effort.


Stacksmashing was able to get Doom running on an RP2040 with an LCD screen. I'd love to see it running on the Lily-GO board. https://usa.banggood.com/LILYGO-TTGO-T-Display-RP2040-Raspbe...

It has two buttons built-in and support for LiPo batteries. It might be possible to make this a teeny-tiny handheld device.


This is really nice. Trying to port Doom to RP2040 was on my on my todo list, but all along I feared that the SRAM would simply not be sufficient for a port with authentic feel and original assets. I'm glad to be proven wrong. I wonder if DEH support is out of question.

I can't wait to see what kind of a chip they make after the RP2040.


> 320x200x60 VGA output (really 1280x1024x60).

The original game ran (if you had a fast enough PC) at 35 FPS on a 70 Hz display.


> RP2040 Doom supports up to four players in regular/deathmatch mulit-player over a two wire I2C connection.

I thought I had done something worthwhile with I2C.

I was wrong, and I am a bad person.


Just as a programming language must be turing-complete to be worthwhile, a new hardware must be doom-complete to be useful.

Edit: typo


Has anyone seen the schematic for this?

I've got some Raspberry Pi Picos, and would like to try it out.


It interfaces to the Pimoroni Pico VGA Demo Base with the pinout described on the Release page. Is this what you were looking for?

https://github.com/kilograham/rp2040-doom/releases


I would love nethack/slashem on this, but I think NH 3.4.3 needs 2MB of RAM at least.


So I just got my hands on a couple of Picos. I was so excited to find a RPi board in stock, that I forgot to check if it has WiFi or not. I would like to run some form of OpenSprinkler on that (even if not OpenSprinkler, I can use cron jobs to control the sprinkler relays by hand).

Any tips on how to get WiFi working on Pico?


I don't think you understood what you were buying. The Pico has no relation to the rest if the RPi product line.

It is basically an Arduino on steroids. It cannot run cron jobs. It has no network stack, kernel or operating system, and cannot run any software not specifically written for it.


:-D Yeah, I didn't read the description before buying it. On the other hand, it was about $7 each, so not a huge loss.

I have heard that it's possible to run some form of Unix: https://www.zdnet.com/article/now-you-can-run-unix-on-the-ti...

Wanted to hear from my homies here for any ideas.


I guess the RPi Zero W would suit your needs a lot better, and it seems to be in stock in some shops. The Pico doesn't run Linux so cron jobs aren't possible.


Pico can run FUZIX, which incorporates parts of real Unix.


This is a work of art, I've learnt a lot reading this, thank you.


Oh boy, that is really impressive.


Wow. That's just amazing work.


Well done!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: