DOOM 3 BFG Technical Note [pdf]

mcphilip · on Nov 2, 2013

I love reading about the development of the doom series. One of my early exposures to reading non-trivial code came from studying the (very surprisingly complex) damage output of the BFG9000:

http://www.gamers.org/docs/FAQ/bfgfaq/#3A

tadfisher · on Nov 2, 2013

I used to play competitive DOOM (was in the ZDaemon top 10 almost a decade ago), and while the BFG9000 was certainly the most powerful weapon in the game, it was also the most seriously misunderstood.

New players would always attempt to spam the BFG in a crowded room, then duck behind a corner or run away. Due to the mechanics of the BFG, this is the opposite of the correct strategy.

A short time after a BFG impact with any surface or object, the game basically performs a cluster of hitscans radiating from the player in the direction the shot was fired. Thus the most effective BFG strategy is to stand to the side of an entrance to a crowded room, shoot the BFG at a wall, and strafe toward the entrance until as many enemies as possible come into view.

Get the timing right, and you turn the room into an uninhabitable no-man's land until your six shots run out (ten if you grab the ammo backpack).

I miss those days :)

angersock · on Nov 2, 2013

What's really interesting is the section on linked-lists: author seems to suggest that yeah, arrays of pointers have much better performance.

I'm curious what sort of performance they run into in MT environments--lockless applications are handy with linked-lists.

eliasmacpherson · on Nov 2, 2013

I only clicked to see the comments because it said Doom 3.

Thanks for posting the article and thanks for pointing out the section on linked lists. I agree that that section is really interesting. In what sense do you mean lockless applications - applications built around compare and swap primitives or something else?

I can't really follow what:

"byte otherMembers[64]" is used for.

What useful data is stored in there? If pointers are 32bit and cache lines are 64bit, the comment:

"// over a cache line worth of other members"

doesn't really make sense because you can only fit two full addresses per cache line.

It's a shame that the idList type he is suggesting in favour of the other types, a resizeable heap array, is not declared in the body of the pdf. Might be worth looking it up.

The distinction between non-intrusive and intrusive linked lists is new to me, I didn't know there was a word for that!

This is the best "link" I've found on them:

http://www.altdevblogaday.com/2011/12/06/intrusive-lists/

hrydgard · on Nov 2, 2013

Cache lines on modern architectures are usually 64 or sometimes even 128 bytes, not bits.

eliasmacpherson · on Nov 2, 2013

Thank you, I misread this from the article as saying 64 bit cache lines: "(assuming 32 bit pointers and 64 byte cache lines)"

So:

byte otherMembers[64];

is 512bits of addresses to other members, that will hold 16 32bit addresses.

From the article: "In the case of an array with pointers there may still be a cache miss when reading the ‘valid’ flag but there will be at most one additional cache miss every 16 pointers"

so idList must use operator[] to sensibly iterate over otherMembers.

class idMyClass {

    bool valid;
    byte otherMembers[64]; // over a cache line worth of other members

};

idList< idMyClass * > myObjects;

for ( int i = 0; i < myObjects.Num(); i++ ) {

    if (myObjects[i]>valid ) {
    }

}

simias · on Nov 2, 2013

If your resource is read-mostly you'll probably still get better performances out of an array and clever/partial locking. That being said you can get correct performances out of linked lists using explicit prefetching while iterating.

angersock · on Nov 2, 2013

Ah, interesting. Do you know of any good examples showing prefetching for linked lists?

simias · on Nov 2, 2013

The classic: http://www.akkadia.org/drepper/cpumemory.pdf

Look at 6.3.2 for software prefetching. Like the rest of the paper it's very x86 centric but most of the concepts apply to other architectures.

NAFV_P · on Nov 2, 2013

This class is also confusing because it is used both for a "link" and the "head" of the list.

IMO this class is politically incorrect, there is no mention of the "tail".

larrik · on Nov 2, 2013

Very interesting. I'm just sad they dropped Linux support in the process, just before Linux gaming is taking off.

drill_sarge · on Nov 2, 2013

since doom 3 bfg engine is os:

https://github.com/RobertBeckebans/RBDOOM-3-BFG

mortyseinfeld · on Nov 2, 2013

Oh really? What makes you think Linux gaming is just about to take off? Something to do with Valve?

pachydermic · on Nov 2, 2013

Between that and the humble indie bundles (or just indie games in general), I think it's fair to say Linux gaming is looking much more exciting and healthier than it has ever been in the past (at least the recent past).

Does that mean it's "about to take off?" I guess we'll just have to wait and see. I certainly hope so. Games have always meant pushing technological boundaries, so if games get big on Linux it will hopefully move the whole OS forwards. Finally, the year of the Linux desktop is upon us! (...don't worry about those pesky locked-down mobile devices)

Well... a man can dream, at least.

Confusion · on Nov 2, 2013

Yes, of course. What is your point?

npinguy · on Nov 2, 2013

I swear, as a C#/Java/Javascript developer, reading stuff like this just makes me feel like a shitty developer. I know I'm a very competent programmer and architect but I feel like I could not grasp this kind of low level optimization even if I tried.

breckinloggins · on Nov 2, 2013

Trust me, if you're really a good developer you'd be able to swap all this stuff in your head in a reasonable amount of time.

For example: I'm not generally a Java guy and I'm not an Android expert, but I once had to create an android touch interface to some custom hardware... and it had to run on one of the first "embedded tablet-like" systems available (this was 2009-ish).

In a month or so, I went from knowing nothing about Android and being extremely rusty with Java to:

- Doing nifty tricks in Java to make code DRY and maintainable

- Writing a fancy flashy Android app

- Using the NDK to write a native component that integrated the app with a couple of device drivers

- Deeply investigating the performance characteristics of the Dalvik VM and tuning all sorts of things (both in the OS and in my code) to keep things working smoothly

Could I do any of this again right this second? Nope. I've moved to other projects and other technologies and all this has been swapped out of my head. But the point is that I could get back there again if I needed to.

As a "good developer", you would no doubt experience something similar in this situation. If someone sat you down with a poorly-performing game, some C++ code, and pointed you to enough reference information and profiling tools, in a few weeks you, too, would be talking about cache misses and alignment boundaries and SIMD intrinsics, and...

The funny thing about tech jargon is 99% of it sounds like gibberish even to those of us who've been there before! Don't be too hard on yourself :)

com2kid · on Nov 2, 2013

Read about how the C# compiler creates Lambdas. Learn how eventing really works. Learn what your LINQ code is rewritten into.

A good C# developer knows the runtime and their compiler, the same way a good C++ developer does. Yes there is a performance gulf between what a good C# programmer can do and what a good C++ developer can do (although the size of that gulf is an argument for another day!), but there is also a huge performance difference between what a good C# developer does and what an average C# developer does.

As an example, my team has a log file that is consumed by other multiple teams. Log file parsers have been written in either C++ or C#.

The log file has a bunch of discreet entries in it, each one of which is independent of the others and measures in the tens of bytes.

One of the C++ parsers tries to read the entire log file it at once. It is dog slow and just chokes on larger files.

One of the C# parsers relies on the file system to predict future reads and just pulls the file in entry by entry.

The C# parser is faster and doesn't have file size limitations.

(On a related note, I have worked with C++ UI frameworks that are heavier weight than C# UI frameworks!!)

My point is, a good developer is a good developer, independent of the language.

(That said, some of the analysis of cache performance impressed me, instruction count is not everything!)

drill_sarge · on Nov 2, 2013

the newer virtual texturing stuff is much more interesting

http://s09.idav.ucdavis.edu/talks/05-JP_id_Tech_5_Challenges...

edit:

http://mrelusive.com/publications/presentations/2012_siggrap...

Tibbes · on Nov 2, 2013

Does anyone know the purpose of the self-assignment on page 3? (in the body of IsWriteCombined)

  DWORD error = GetLastError();
  error = error

corysama · on Nov 2, 2013

I often do that just to have a statement on which to place a breakpoint. It gets optimized out, but in debug builds it works fine. Putting a breakpoint on a line that just deals with the error code immediately after the error code will reliably display the error in the Auto Variables watch window in VStudio maligning it very convenient for rapid debugging.

null_ptr · on Nov 2, 2013

No need for the scribd link, Firefox's built-in PDF reader has taken all the Pain out of PDFs for me :)

adamnemecek · on Nov 2, 2013

I think that those are added automatically, scribd was in YC after all.

steveklabnik · on Nov 2, 2013

That is correct on both counts.

buster · on Nov 2, 2013

But Chromium doesn't..