Hacker News new | past | comments | ask | show | jobs | submit login
The random number generator of DOOM (github.com/id-software)
115 points by danshapiro on July 1, 2015 | hide | past | favorite | 74 comments



In 1982 (several years before DOOM), Ken Perlin invented an algorithm meant to produce random numbers that more closely resembled a human's interpretation of "random" numbers (versus those generated by computers). Humans, it seemed, tended to choose a lot of different numbers, whereas random numbers from a CPU actually produced lots of repeating sequences. His work in the field was vital to developing the first 3D shaded graphics in a Hollywood film (Tron), for which he won an Academy Award for Technical Achievement.

Perlin Noise is still used to this day for generating clouds, natural-looking terrains, and other textures that are pleasing to the human brain.

Python implementation: https://github.com/caseman/noise/blob/master/perlin.py

Excellent talk by its creator: http://www.noisemachine.com/talk1/


I remember many years ago the iTunes player had to include an option for "humanizing" the randomness of their shuffle play (or whatever they called it).

Turns out that good randomness didn't seem random to people. They might hear three songs in a row from the same album and think "that's not right, it should be random!" Of course, humans often have a different idea of what "random" is.



What I expect from a music player is to play all my songs in random order, such that once a song has been played, it is not picked again until all songs have been played.

There is a difference between randomly picking an element from a list and reordering a list in a random order. This is not a matter of how random really are the random numbers.


Hence the term "shuffle" rather than "random" I suppose


I've found that a nice randomness of songs is to play your library by song tilte, alphabetically.


One of the classic radio stations in Chicago does this a couple times a year -- they play their entire catalog A-Z. I agree that it does come out pretty nice.


I used to get annoyed by car stereos randomizing CD tracks rather than shuffling. My Honda did once play the same song three times in a row.


And not just textures or terrain. Apply Perlin noise to a vector field and you create smoothly random motion, which can be used to simulate things like air currents.


Good point! Here's two OpenGL animations across multiple 1-dimensional vertices:

Perlin- https://i.imgur.com/ROww6bw.gif Random- https://i.imgur.com/EurTdaZ.gif


I animated these tentacles using Perlin noise: http://www.youtube.com/watch?v=RMwJXP8EOU4

(By manipulating the available variables in a Bezier curve.)


Is your account name homage?


I'd wager it's just random


'perlin' is a brand new account, and therefore apparently created for this article, so I was wondering if it might in fact be Ken Perlin himself.

But of course homage accounts are more common these days.


Why is the python implementation so much more complicated than the doom one? Are they supposed to be the same?


Seriously? The DOOM prng is just a table with a pointer stepping through the values. The Perlin code generates values based on the Perlin algorithm.

The values in the DOOM table could have been generated by Carmack throwing a dice many times for all we know and just tabulated the results.


To be fair, the key insight to the DOOM prng is that it was used frequently enough for hundreds of mundane events that a player didn't realize (or care) that it wasn't very random at all. There was never a "guess the number" game. A good write-up is here: http://jmtd.net/log/deterministic_doom/


Yes, that is totally correct.

But that does not make the question why the DOOM solution is so much smaller than the perlin noise code more valid. It is simply not the same thing. One is a table of set of generated data, the other a generator of data that can be tabulated.


There's enough old Softdisk/id Software code released that you can trace the evolution of this RNG.

First, check out Catacomb (1989). I think this is the oldest of Carmack's games for which source code has been released. This game uses a lagged fibonacci generator:

https://github.com/FlatRockSoft/Catacomb/blob/master/CATASM....

This makes sense, it's a generator that requires little memory and no expensive operations such as multiplications. You'll probably also find it in Softdisk's Apple II releases.

Carmack's first 3D game is Hovertank 3D, released in 1991. The LFG still exists, but now a "table-based RND generator" also appears. This seems to be the first appearance of the "Doom RNG".

https://github.com/FlatRockSoft/Hovertank3D/blob/master/IDAS...

The LFG is used when setting up the map, while the table-based is used for enemy AI during gameplay. Obviously every cycle counts when trying to do 3D on a 286.

The same year also saw the release of Keen Dreams, in which only the table-based RNG survives:

https://github.com/keendreams/keen/blob/master/id_us_a.asm

(The state variables of the LFG are still defined, but the code is missing.)


That's a fascinating history. Thanks for tracing it back.

According to a comment in the Catacomb source, the LFG logic was derived from a macro supplied with the Merlin assembler for Apple 2 GS computers:

    ;this routine was converted from
    ;the Random macro on Merlin GS



Haha, just wanted to link that. To draw people's attention: this article explores results of replacing this array of numbers with predictable values.


There was a post about tampering with randomness in Doom some time ago with some interesting hn comments explaining the reason for this design: https://news.ycombinator.com/item?id=9429889


I hope I don't sound too dumb, but how does it work, exactly? It doesn't look random to me, more like an unordered (but repeating) sequence..


Quoted from the Doom wikia:

" The Doom pseudorandom number generator is simplistic yet adequate for gameplay. Its simplicity has the virtue of speed.

The file m_random.c in the Doom source code contains a static table 256 bytes long containing numbers between 0 and 255 in a fixed, scrambled order. There is an index to this table which starts at zero. Each call to the function P_Random advances the index by one (wrapping around to zero after 255) and returns the table entry at that index.

There is another function, M_Random, that is identical except that it uses its own independent index. P_Random is used in play simulation situations, such as calculating hit damage. M_Random is used otherwise.

The reason for the existence of two individual indexes is to maintain multiplayer synchronisation: for example, M_Random is used to apply a random pitch variation to sounds. As two players may not hear the same sound effect (they may be in different parts of the level), using a single index would cause the game to become desynchronised. To use a model-view-controller analogy, P_Random is used for random number generation at the 'model', while M_Random is used for random number generation at the 'view'.

The function M_ClearRandom resets both functions' indexes to zero. It is called during initialization of each new level so that demos will be the same each time they are played, and so that multiplayer games are synchronised.

Although the table is 256 bytes long, it does not contain all of the numbers between 0 and 255 inclusive. For example, 0 appears twice and 1 does not appear at all; 145 appears five times, more than any other number. Thus the values are not uniformly distributed, but in fact they are nearly so. The mean value is 128.852, whereas it would be 127.500 if all values were equally likely. All of this suggests that the table was generated using a conventional pseudorandom number generator of reasonable quality. "

[0]: http://doom.wikia.com/wiki/Pseudorandom_number_generator

edit: here's a much better article contributed by user aciuix in this thread : http://doomwiki.org/wiki/Pseudorandom_number_generator


TL;DR: rand() loops through a fixed set of values, but unpredictable user input leads to unpredictable sequence of rand() calls (e.g. an animation calling it before or after a sound-effect selector), providing an unpredictable output from rand().


This is the most interesting part to me, and the irony of the random output.

The "random number generator" isn't actually the table, but rather the table <-> the amount of times the function is actually called (aka player input as a source of randomness).

If the game were coded in such a way that there are more deterministic random calls (enemy seeding at beginning of a level, etc) then it would feel less random. If it were coded in a way that there are more non-deterministic random calls (enemy spawning based on table index when a player enters a room) then it would feel more random.

The table is deterministic, but the exact value returned for any given event X is the product of how many random() calling events happened before event X.

Or, in the ideal engineering way: achieve the minimal randomness required for player belief, and use the saved computational overhead on my interesting things (e.g. graphics).


All PRNGs all just repeating sequences. Usually they are just longer and use clever math to generate the next term of the sequence instead of just storing it in array. But always it repeats itself eventually.

For Doom speed was more important than length of the sequence so they just made it a lookup table.


That's the case with every pseudorandom number generator (ie any deterministic algorithm that spews "random" numbers).

The period here is unusually short, but it is enough for the use case.

If you want a non-predictable random bit source, you have to sample a quantum phenomenon using dedicated hardware.


It is basically just that :-)

What I don't understand is the following line:

    prndindex = (prndindex+1)&0xff;
(I'm not a C programmer.) AFAIK, the &0xff is a pointer - but what precisely does it do?


Sibling posts have correctly pointed out that the & here is a bitwise AND. However, I wanted to note that `&` is also the take-address-of operator in C, which is why you think it has to do with pointers: there are two operators (one taking the address of something, the result of which is a pointer, another taking a bitwise AND); they both use the same symbol "&".

The difference is that the AND is a binary operator — i.e., it takes two arguments:

    A & B
whereas the address-of operator "&" is a "unary" operator; it only takes one argument:

    &A
There can be something before it in a more complete context, such as:

    x = 3 + &y;
(though this is more often conventionally written the other way "&y + 3"; I only use this for demonstration)

But here "+" is the binary operator. (You can't have two binary operators in a row; that would be like "3 + * 2", which is nonsensical.) This is similar to "3 + -2": the negative here is a unary operator, the "+" here is the binary operator "add"


It's not a pointer, it's a bitwise AND.

It's essentially the same[0] as `prndindex = (prndinex+1) % 256`, except bitwise AND takes less effort to compute.

Modulo is essentially[1]: `mod = a - b * (a / b)` unless the compiler can manage to use bitwise shifting/masks instead.

Back when Doom was popular, the bitmask would have been much more efficient.

[0] http://stackoverflow.com/q/3072665/2967113 [1] I'm not 100% positive that's how it's implemented with an IDIV instruction.


Any reasonable compiler will implement x%256 using bitwise arithmetic, but C semantics require that x == d*(x/d) + (x%d), which means that if x is negative (x%256) must be negative, while (x&0xff) will be positive.

This difference means that the AND is still faster than the MOD when used on signed integers, if the compiler is not smart enough to prove that the input will never be negative.

For example, with gcc 4.7.3 the resulting assembly is

  movl %edi, %edx
  sarl $31, %edx
  shrl $24, %edx
  leal (%rdi, %rdx), %eax
  andl $255, %eax
  subl %edx, %eax
I.e., six instructions instead of one. This is still faster than using a divide.


OK, thanks for the clarification everyone!

> there are two operators (one taking the address of something, the result of which is a pointer, another taking a bitwise AND); they both use the same symbol "&".

Don't you just love C? ;-) (On a serious note, I've been meaning to learn it for some time, but didn't yet get down to the nitty-gritty.)


in this case the & is not to point to an adress, but a bitwise and. 0xff is hexadecimal for 255, or 00000000000000000000000011111111 (assuming 32-bit integers), so the logical and masks out the lower 8 bits of (prndindex+1), effectively doing modulo 255.


No, it's a bitwise AND operation. The line adds one to prndindex modulo 256.


Incrementing the index and limiting the value to 0-255 (AND 0xff)?


This is great RNG for games although most computer scientists would strongly disagree. We focus so heavily on having huge cycle length for RNG that we forget that human memory is not all that great at remembering exact sequence of even just handful of random numbers, let alone 255 of them. So you don't need RNGs with guarantees of huge cycles for gaming scenarios like adding error in to projectile's path. The advantage of this RNG is that it's blazingly fast (think all the cache hits!). Of course this would be useless for any real work on physics simulation or weather simulation etc.


Curious. Just eyeballing I see there's no 1, 2 and 222 are repeated as well as 36 and 136. There are probably more.

If there's a purpose behind the table being constructed with those omissions/repetitions it's lost on me.


Maybe they were chosen at random?


Looks like it by the note in the other comment by serf.

I had begun to wonder if certain values caused issues when used in code and it was deemed easier to just not get certain values back.

That's crazy talk though.


Naa, that sounds like something game developers totally would do.


http://bit.ly/1NxaZrC

A few numbers like 239 and 242 occur 3 times. And a bunch do not occur at all.


Obviously numbers like 222 and 136 are way more random than 1.


random_number = 4// just tested it with a dice


For anyone missing the reference, here's the obligatory xkcd: https://xkcd.com/221/


Interestingly, Half-Life (released in 1998) uses a slightly more complex table-based generator:

https://github.com/ValveSoftware/halflife/blob/master/dlls/u...

Note that this is only used for certain parts of the game. For others it uses a closed source non-table based generator.


Interested as to why this method was picked over say doing "MSPlayed % 256"


Assuming by "MSPlayed" you mean "milliseconds played";

Another poster[1] links to a Doom wiki article[2], which explains,

> The reason for the existence of two individual indexes is to maintain multiplayer synchronisation

If you sync the PRNG's state at the beginning of the game, you can simply transmit the other client's input (such as keyboard/mouse): since each PRNG is in the same state on all the players' machines, you don't need to distribute over the network the results of PRNG choices: each client can just compute it locally. So long as all the code uses the PRNG's output stream for the same purposes, in the same order, everything is deterministic (while appearing to be random).

Milliseconds played would also likely require some sort of OS/hardware interaction, to get to a timer. This is an add, an AND, and a memory lookup: likely significantly quicker (bear in mind the hardware Doom was created with/for). (Perhaps there are cycle counts, but I'm not sure that instruction was a thing yet? Though it was more reliable then…)

Also see this post: https://news.ycombinator.com/item?id=9429889

[1]: https://news.ycombinator.com/item?id=9810073

[2]: http://doomwiki.org/wiki/Pseudorandom_number_generator


That makes sense, thanks!


That seems like a very unusual (and large) way to produce a pseudorandom sequence. Why not a linear congruential generator or LFSR? They have approximately the same state storage requirements, but much less static data.


Presumably they were optimizing for CPU cycles rather than RAM, and the 256 bytes that it does consume isn't much in the scheme of a game like Doom.

This must be pretty fast (and compact) code: rndindex = (rndindex+1)&0xff; return rndtable[rndindex];

The example C code at https://en.wikipedia.org/wiki/Linear_feedback_shift_register looks like it would compile to more bytes, and require more CPU cycles to execute.

Based on https://en.wikipedia.org/wiki/Linear_congruential_generator it seems LCGs also micro-optimize for memory rather than CPU performance.

Doom was probably squeezing every last cycle out of 386 @ 25 MHz machines to render frames, but they did have at least 4MB–8MB RAM; 256 bytes seems a good trade-off.


I presume that speed was the determining factor, not saving bytes.

This may also present the advantage of being tweakable to avoid repeated sequences that better generator may naturally produce but are not perceived as 'random enough' by humans.


For a video game, I suppose it was random 'enough'.


What does the P and M stand for in the function names?


P for "Game logic/behaviour", M for "Miscellaneous".

http://doomwiki.org/wiki/Doom_source_code


Mandatory xkcd mention: https://xkcd.com/221/


If you realize that any RNG on [0,256) always has a cycle of at most 256, this is actually pretty decent.


Is that true?

Imagine I want to sample [0,2), then I can use the following table: [0,0,0,1,1,0,1,1]. This has cycle length 8. Much longer cycles can be obtained, it just requires the state of the prng to be larger than range being sampled.


> Is that true?

It is not, at least not the way you (and me) are interpreting it. A RNG's period is bounded by the number of states it has, internally[1]:

> The period is bounded by the number of the states

> If a PRNG's internal state contains n bits, its period can be no longer than 2 ^ n results, and may be much shorter.

Our state in the Doom generator is the variable rndindex (or prndindex; we have two generators here); rndindex holds an index in [0, 256) (it is 8 bits), hence we have at most that many states (2 ^ 8, or 256). (I'm assuming the table does not repeat itself, as that would be silly, so in this case, the period is exactly 256 outputs long.)

[1]: https://en.wikipedia.org/wiki/Pseudorandom_number_generator#...


Thanks for pointing this out. I have confused the size of output with the size of state.


Yet another random number generator that depends on global mutable state and therefore unsafe to be used in multithreaded context.


DOOM was, of course, single-threaded and built to run on the single-core CPUs of the time.


People should not get into the habit of depending on global mutable state. What if the project evolves and now going to work multithreaded? What if parts of the project are extracted to be used in another one? What if a toy project grows to be massive?

I say this because I have investigated many crypto libraries for usage and nearly every one of them (especially openssl) have random generators that depend on global state. The attitude of not caring about correctness and thread safety is just pervasive.


It's not crypto. It's a game. The whole thing is global mutable state. ( https://github.com/id-Software/DOOM/blob/master/linuxdoom-1.... ). On the original DOS/4GW version, I don't think there were even any synchronisation primitives available.

DOOM was one of those games that relied on stretching the hardware to its limits to produce previously-unseen visual effects. The only thing that mattered was getting pixels on the screen as fast as possible.

Edit: no, of course there weren't any mutex primitives in DOS/4GW. It's a single-process single-thread bare metal environment. You have to realise that, in 1993 PC gaming, shared memory multithreading was simply not something that developers would ever encounter. It had no concievable advantages as you had only one CPU, not even a graphics accelerator. Games generally wouldn't multitask either; again, this predates consumer availability of Win32 and certainly is several years before DirectX.


The Doom engine uses a lot of global mutable state. In context, there's nothing wrong with this. Why pass around state explicitly when there's just one instance of the state? Multithreading wasn't on the horizon for game engines in 1993. Single CPU cycles were very much a concern. Developer time was also limited (most of this code was written by a single person in about a year). This wasn't a toy project that could have evolved in a million different ways, and it wasn't a crypto library either. There was a game that had to be shipped, and John Carmack was surely more concerned about getting the game engine to do precisely what it needed to do to run the game than proofing the code for hypothetical future scenarios.


You wouldn't use the Doom PRNG for crypto as much as you shouldn't use a spoon to cut down a tree. Have fun eating cereals with a chainsaw.


You do realize this code is around 20 years old?


I did not. Perhaps I was being too harsh to criticize this particular project.


Have you really never heard of DOOM or know how old it is?

I would have thought it was quite well known among the tech crowd, even non-gamers.


Doom was released in 1993. I'm guessing that netheril96 wasn't made until 3 years later.

Tech is a very ahistorical discipline. We don't look back at the "old masters". We reinvent things every generation (or in Javascript land, every week).


As Alan Kay said, Programming is "almost a field" because it forgets its own history. As the now-typical HN commenter says, "Who is Alan Kay?"


That's truly sad. And ironic.


Sometimes we forget it, I like to see old snippets of code, to learn or for curiosity.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: