Amazingly brilliant work, especially given the CPU capabilities at the time. Carmack's use of BSP trees inspired my own work on the Crash Bandicoot renderer. I was also really intrigued by Seth Teller's Ph.D. thesis on Precomputed Visibility Sets though I knew that would never run on home console hardware.
None of these techniques is relevant anymore given that all the hardware has Z buffers, obviating the need to explicitly order the polygons during the rendering process. But at the time (mid 90s) it was arguably the key problem 3D game developers needed to solve. (The other was camera control; for Crash Andy Gavin did that.)
A key insight is that sorting polygons correctly is inherently O(N^2), not O(N lg N) as most would initially assume. This is because polygon overlap is not a transitive property (A in front of B and B in front of C does NOT imply A in front of C, due to cyclic overlap.) This means you can't use O(N lg N) sorting, which in turn means sorting 1000 polygons requires a million comparisons -- infeasible for hardware at the time.
This is why many games from that era (3DO, PS1, etc) suffer from polygons that flicker back and forth, in front of and behind each other: most games used bucket sorting, which is O(N) but only approximate, and not stable frame to frame.
The handful of games that did something more clever to enable correct polygon sorting (Doom, Crash and I'm sure a few others) looked much better as a result.
Finally, just to screw with other developers, I generated a giant file of random data to fill up the Crash 1 CD and labeled it "bsptree.dat". I feel a bit guilty about that given that everyone has to download it when installing the game from the internet, even though it is completely useless to the game.
> Finally, just to screw with other developers, I generated a giant file of random data to fill up the Crash 1 CD and labeled it "bsptree.dat". I feel a bit guilty about that given that everyone has to download it when installing the game from the internet, even though it is completely useless to the game.
Wonderful! THIS is the kind of silly nitty gritty detail I want to hear more about - particularly the whys and wherefores of the decision, largely-unconsidered as it may well have been. Puts me in mind of the semi-affectionate rivalry between different demo/crack houses in the eighties and nineties, engaging in largely unseen conflict, all submarine-like :) And, if you're reading this - know that this thoroughly un-tech nerd has read all of your Crash Bandicoot breakdowns, no matter how arcane they might seem to me :)
> None of these techniques is relevant anymore given that all the hardware has Z buffers
Not true if you consider transparent objects. Rendering with order-independent transparency is still a hot topic without a clearly winning solution on GPU.
Web browsers have lots of semi-transparent rectangles, which can be transformed under "preserve3d" context. This is a classic case of effective BSP that is actual. (background: implementing BSP in WebRender a few years ago).
Wasn't Michael Abrash harping on BSPs breaking down in the case of transparence and portais, in Zen of Graphics (the black book?)? Glad to see some found ways around it.
I'm developing an isometric city builder game. I've figured out how to depth sort everything on the CPU efficiently, though there were some ugly workarounds I had to implement. Suffice to say, for anything outside of buildings, sorting on the screen.y position of the texture works perfectly (so long as each asset is one tile in size, or broken up into multiple tiles).
But, I am starting to implement vehicles, first time adding assets that are multiple tiles in size. It would be really nice if I could create one asset for the vehicles, but the sorting on screen.y will not work for 3D rectangles, so I am breaking the up into multiple tiles...
Do you think BSP trees will work with thousands of moving assets? i.e., I would have to recreate the BSP tree every frame.
Somewhere between one to $tile_size pixels. Tile size depends on the zoom level (I'm using SFML, had to add a separate sprite sheet for each zoom level cause SFML's built in zoom out method is awful).
This is the first time of hearing BSP, and I read most of the OP's article to have a very basic understanding how it works.
Since this is a tree, reordering N elements would be approach N^2 complexity, would it not? (edit: I assumed you would have to find each node from the root, which could very well be a bad premise).
Yes, it would average nlogn and worst case would be n², unless I'm thinking about it incorrectly.
But this is one of those cases where asymptotic complexity seems a bit too reductive. One of the "n" in the complexity is really a fraction of the n.
That said, if I understand your use case correctly, you might not benefit much from having a binary subdivision of space for this. It sounds like what you need is more of a plain ordered list. One of the ways to implement that is as a binary tree, but an array that is kept ordered may be fast enough due to locality etc.
there's probably some optimization that you can do, but you can't know that the second frame isn't the conclusion to a move from 10 frames ago that is what reveals a giant area.
frame 0: open door
frame 10: still can't see past edge of door
frame 11: door opens enough to see big new area
between 10 and 11 it turns out there's a huge difference, even though they're only two frames apart.
The door being open or closed shouldn’t change the order of most elements. So it’s only relevant if you haven’t been sorting occluded objects.
Which gets into one of the major tradeoffs in rendering engines, do you try to minimize the average effort per frame or the worst case effort per frame.
This works perfectly for isometric cubes, and is what I do currently.
If you imagine two long rectangles, one in each of your two hands, and pretend they are two cars passing each other in an isometric world, you will see that pretty soon, one car's screen.y position will be below the other before it's clear of the vehicle which should actually be occluding part of it.
I thought about this for a bit, and I have a feeling that as long as everything is touching the ground, then making covering loops is impossible, and so there exist a simple ordering you can compute.
The ordering is as follows: I'm assuming the isometric rendering of a map as a 45 degrees tilted square, and I'm only considering tile ordering just for simplicity but it should generalize fine. The uppermost tile is where you want to start rendering. From there, you render following the two 45 degree diagonals until you are done (so you don't only look at the y axis). Once this is done, you restart the process from the tile just below the uppermost corner, and so on. This ordering makes sure that all rectangular objects that are aligned with the 45 degree diagonals are rendered correctly.
Now you need an additional trick to render rectangular objects that are transversal to those diagonals correctly. What you do is you keep track of the boundaries of all such objects, so that the rendering loop described above can tell when it encounters one. Once it encounters it, it pauses rendering the current diagonal and considers it temporarily complete. The diagonal on the other side still needs to be rendered fully though --- or at least as far as possible with the same stopping condition. The next rendering pass will likely at some point re-encounter the same transversal object, just at a further point. Stop again, start the next diagonal. Once the rendering encounters the lowest and last part of the transversal object, then that object can be rendered, and the first stopped diagonal can be resumed (and after this resume all the paused diagonals in order).
This should always give you the correct order to render everything without errors. Let me know if this made sense, otherwise I can try to clarify.
This is an interesting approach, though I dont think it will work. Vehicles/units move pixel by pixel, not tile by tile. Each wall tile in my game takes up one tile, and walls sit on the edge of a tile. I dont think this will handle unit/wall rendering order properly unless I also form larger shapes from the walls.
I forget the exact algorithm, but you can factor in your x with y. For right-facing sides, anything at the same y with a greater x is in front of a lesser x, and vice versa for left-facing sides.
> None of these techniques is relevant anymore given that all the hardware has Z buffers, obviating the need to explicitly order the polygons during the rendering process.
You can’t mean that all the polygons in a game world are now sent to the GPU, entirely relying on viewport culling and Z buffer to remove the ones out of view? I’m not an expert but I’m sure that’s not true - doesn’t Source and latest iD Tech use BSP right now for example?
That's known as "occlusion culling", and it's still a bit of an open problem [0]; a lot of games do indeed just send all draw calls inside the frustum. Turns out a good Z-prepass is pretty free and helps a lot with things occlusion culling would normally help with. And deferred rendering helps even more with things like quad overdraw from disparate objects, as long as your material mask is mostly the smae.
The Source Engine still uses has a pre-built BSP for surface visibility. Source 2 has replaced this with a world-aligned visibility octree [1].
[0] The common solution these days is a low-resolution depth buffer rasterized on the CPU with SIMD, because doing it on the GPU would have noticeable latency... you've probably played a game where you turn into a twisty hallway and the walls/object in there only show up after a few frames... that's GPU occlusion culling at work.
It's a bit more nuanced than that. The parent commenter is correct that modern game engines don't need to sort polygons to render them correctly (with some exceptions e.g. transparent objects), and doing so at a fine-grained level is generally not worth the CPU cost. Especially since the bandwidth between the CPU and GPU can be a bottleneck, so if possible you want to only upload the level geometry once instead of having to rearrange it on every frame.
Game engines can of course still do their own coarse-grained occlusion culling, in order to reduce the amount of geometry that needs to be processed per frame. And there can still be a benefit to approximately sorting objects: if you draw objects in approximate front-to-back order, then a shader can do "early depth testing" and skip the expensive shading computations for pixels that are occluded by already-drawn polygons.
Even for modern games that don't care about depth ordering, they still tend to render first-person weapons or third-person characters first, and the sky last, just because it's an easy way to skip a bunch of overdraw, so why not?
It should be noted that virtually all renderers do frustum culling, meaning anything in the game world that is guaranteed to be out the camera's viewing volume is not rendered. Frustum culling is quite simple. Occlusion culling is usually done after frustum culling and is more complex, and the method tends to vary depending on the scene type and game engine, or is sometimes not done at all (modern GPUs are so powerful that smaller or simpler game areas render fast enough without occlusion culling).
Occlusion culling is still relevant and of course BSP trees help with that as well. What I meant is there is no value in sorting polygons any longer. (As far as I know; the last time I worked on a game was 1997.)
From what I've seen in the modern game engines, the current state of the art seems to be executing a compute shader which does frustum and occlusion culling on GPU and dynamically generates an indirect argument buffer which gets drawn in several or a single indirect draw.
I think BSP was kind of obsolete even when it was used in Source 1, but was left in because removing it would have required changing a lot of code for no real benefit. The blocky brush-based level geometry system from GoldSrc still remained, but they added a lot of higher-fidelity features on top of it. IIRC, the level editing pipeline code for things like PVS and baked lighting were still based on code from GoldSrc. Better, newer techniques existed but what Valve had wasn't broken, so why fix it?
If you're not an expert why are you sure it's not true. Most games render a lightweight pass to do all rasterization to a g-buffer, then do the expensive shading later. This separates visibility from shading. If fragments are already occluded by the z-buffer they can be skipped as soon as they can test their z value against the buffer.
Do you have my comments bookmarked or something? You reply to me with something negative to say far more frequently than if you came across them randomly. Can you give it a rest if you can’t manage to tone down the snide please?
As other comments say, including the original comment author, game engines actually do still rely on their own space partitioning to reduce draw calls. Source 2 just does it quad rather than binary. Source is still used in actively developed games and is still BSP, so it’s not true that the techniques are not relevant.
You're right insofar that CyberDildonics has been replying to you frequently and negatively, in a way that suggests a strong bias if not outright harassment. It's complicated, though, because you've also replied to CyberDildonics frequently and negatively. I don't see the same evidence of bias in your case because there are other users you've replied to more frequently. But it's clear that it's a two-sided situation in that you've also been feeding the problem. I realize it can be annoying when people show up to pester you in different threads. It sucks to just ignore it, but that's the best option (i.e. the worst option except for all the others) so please do that in the future.
The person you replied to made an informative comment and you seemed to want to poke a hole in a generalization that people could learn from, which a lot of other people have pointed out, so I don't know why you would get upset at me for explaining more about how rendering works.
First, chrisseaton is correct insofar as you have replied to him way more times than you have to anyone else, and your replies have been haranguing if not harassing. That's abusive, and we ban accounts for that kind of thing, so please stop.
It's also the case that chrisseaton has posted a lot of negative replies to you, so this situation isn't exactly one-sided. Both of you need to leave each other alone from now on.
Second, more generally: your account has a long history of posting in the flamewar style, being aggressive towards others, and generally breaking the HN guidelines. We ban accounts that do this. I'm not going to ban you right now because (a) you also post good things, and (b) it has been a long time since we warned you about this:
But it's clearly a longstanding problem, and we need you to correct it or we're going to end up having to ban you. If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
> A key insight is that sorting polygons correctly is inherently O(N^2), not O(N lg N) as most would initially assume. This is because polygon overlap is not a transitive property (A in front of B and B in front of C does NOT imply A in front of C, due to cyclic overlap.) This means you can't use O(N lg N) sorting, which in turn means sorting 1000 polygons requires a million comparisons -- infeasible for hardware at the time.
A key insight for binary space partitioning is that it also solves the problem above my making sure this never happens. It slices polygons for that specific reason.
Excellent work on Crash! I still play it to this day (the original version, but the recent remaster is also great).
Your point about flickering just made me wonder if ordering problems were exacerbated by lack of floating point hardware; we already know that this is the reason for the Playstation's wobbly (charming) graphics, but I imagine that it may have also contributed to errors in the ordering process, since the positional errors would include Z.
This is a common myth about the PS1. In actual fact geometry calculations can have a decent amount of precision, as they are usually carried out using the 4.12 fixed-point format supported natively by the console's geometry coprocessor. The wobbly graphics are due to the GPU's lack of support for subpixel coordinates and antialiasing, but supporting them wouldn't have required floating-point hardware.
The ordering issues arise from how ordering is actually implemented. When applying perspective transformation to a polygon's vertices, the geometry coprocessor can optionally calculate and return the average value of the Z coordinates after projection [1]. This value is then scaled, rounded and used as an index into what's known as the "ordering table", which is basically a set of command buffers which are executed sequentially by the GPU (implemented in hardware as a linked list) [2]. The number of ordering table buckets is usually limited to 256 or 512 to save RAM, however that also limits the Z index resolution to 8-9 bits and thus results in polygons close to each other being drawn in more or less random order; hence the glitchy overlapping polygons that would occasionally appear in games with highly detailed models.
> None of these techniques is relevant anymore given that all the hardware has Z buffers, obviating the need to explicitly order the polygons during the rendering process. But at the time (mid 90s) it was arguably the key problem 3D game developers needed to solve. (The other was camera control; for Crash Andy Gavin did that.)
In your opinion, What is the key problem 3d game developers need to solve in 2022?
> sorting polygons correctly is inherently O(N^2), [...] because polygon overlap is not a transitive property (A in front of B and B in front of C does NOT imply A in front of C, due to cyclic overlap.)
Well ok but I don't get this:
> This means you can't use O(N lg N) sorting, which in turn means sorting 1000 polygons requires a million comparisons -- infeasible for hardware at the time
ISTM you CAN'T sort it full stop because of cycles - so what kind of sort (never mind O(N^2), any sort) can order cycles? They can't.
Your intuition is correct. However, in the context of an O(N^2) comparison sort you can implement a tie-breaker check that at least ensures order stability between frames.
As ray tracing becomes more common, I think we’ll see a big return of various BSP trees in games. BSP is still the crucial element in offline rendering.
Do you know how much changed for the sequels? From some reverse engineering, Crash Warped relies on depth bucketing for dynamic triangles, while level geometry is streamed from the disk based on the camera position, already at the appropriate LOD and sorted and bucketed. Is the BSP logic you're talking about real-time, or part of a similar pre-processing step?
We didn’t use BSP trees for Crash 1-3; instead we pre-rendered frames of the game on our SGI workstations and recovered the sort order from the (software) Z buffer. Then we stored the sorted list of polygons, using differential compression between key frames to reduce the space required. It was a total brute force hack.
What I think you’re seeing in Warped is what we did with Crash 1 and 2 as well: I had to sort the foreground object polygons into the pre-sorted background polygons. That was modified bucket sorting, but there were manually tuned adjustments to make it work and stable between frames.
BSP (binary space partitioning) was a well known algorithm, not something Carmack picked out of obscurity. It is well covered in every edition of Foley and van Dam's bible "Computer Graphics". The arcade game "I, Robot" from Atari (1983) used BSP to render the polygons back to front -- there was no z-buffer.
That isn't to deny that Carmack was brilliant. But him using BSP isn't some masterstroke of genius in itself.
Yeah, when I was inexperienced what impressed me about Carmack was all the "cutting edge" techniques and tricks he used to push hardware to its limits. As I've became more experienced what impresses me is how iD was able to crank out so much software so quickly, have time to research these more advanced methods, quickly triage which were promising and which were not, and get them working and well optimized in production software.
I can read research and incorporate it into my code, but it takes me a while, and I usually have multiple dead-ends where it isn't until after I've tried using a technique with real data that I realize it has weaknesses the researchers didn't point out and which can't easily be fixed. I can crank out quick code, but it isn't going to be advanced or optimized. I struggle finding the right balance of how to trade between these three things, and yet Carmack and iD were able to squeeze in all three at once.
> As I've became more experienced what impresses me is how iD was able to crank out so much software so quickly
> usually have multiple dead-ends where it isn't until after I've tried using a technique with real data that I realize it has weaknesses the researchers didn't point out and which can't easily be fixed
If you listen to John Romero himself telling the story of id and the fast iteration, it was "due to having, basically, 10 years of intense game development experience prior":
("it's also due to the first principle we had - no prototypes")
A theme I tend to see is that a lot of folks with super output appear to others like icebergs - a LOT of effort that nobody can see, and a tiny peak at which everyone marvels.
Carmack even discusses this on the Lex Fridman podcast. Carmack basically had high school level math around that era and was a voracious reader of every scrap of game programming lore he could find. And outside of it. He is a very intellectually honest guy and it is refreshing to hear him talk about how he worked in those days. I came up similar to Carmack and he is my programming hero, so, I have a lot of admiration for how he managed to do things.
I spent several game nights shooting guys in TF2 while listening to this interview. It is long. The channel has a few good interviews with biology guys as well
Yes, If you were reading DrDobbs at the time, you would know that they actually tried every algorithm in the computer graphics bible (and they reported about it in that magazine). Some algorithms were in fact better than BSP, but were a bit instable performance wise (better avg performance, but worse worst case)
I think you're referring to Michael Abrash's Ramblings in Realtime articles which were written about Quake. When I was a teen I used to carry these around with me and read them over and over while I was implementing my own engine.
I, Robot used a lot of great tricks, but I'm gonna have to consult John Manfrida on the use of actual BSP trees.
They did use normal vectors to determine visibility and may have done simple back to front within objects using that, but the playfield is a grid and mostly rendered back to front.
To be fair, I didn't ever look at the code. I took the word of Dave Sherman, the guy who designed the hardware for it (though the "math box" was lifted from another game, probably Battlezone).
Here is one more interesting thing about the display that Dave told me. DRAM was expensive, and at the time the I, Robot game was much more expensive than most of the arcade machines they put out. To save on DRAM, they used fake double horizontal resolution.
This is a memory from a conversation 30 years ago, but there is 3b per pixel to specify color, and one bit to indicate if the edge should be pushed out half a pixel. That is, when rasterizing the polygons, they saved one fractional X bit in the pixel. In the interior of polygons it had no effect, but when generating video, if pixel at offset (x+0) and pixel at offset (x+1) had a different color index, the fraction bit from pixel (x+0) was used in the video timing generator to place where the color transition would happen, doubling the apparent horizontal resolution. When more than one edge crossed a pixel the information isn't that useful, but such pixels are a tiny fraction of all edge pixels, so overall it was a big win.
You know reading this caused some forgotten neuron in my brain to fire.
When I was reverse engineering the thing, I seem to remember there being a bit related to color/shading that I couldn't figure out what it did (I could see that it was being used).
The system had a 64 color palette. The software broke that up into 8 sub-palettes to give you for 8 main colors with 8 shades each (0-7 first color, 8-15 second color, etc.)
Polygons could call out colors directly using full palette index. Most polygons "faked" shading, they really just specified different shades of the same color for effect. For example, the dodecahedron / meteors in the space waves aren't real-time shaded, just colored to look that way.
However the hardware could do shading, which cost an expensive dot product. So it was used sparingly. To do shading, the polygon used a 3 bit color "index" to call out one of the 8 sub-pallets. Then you add an offset of 0-7 based on the dot product of the normal vector to apply shading (I used 8 x dot product to get you an offset from 0 to 7.9999).
I recall there being an extra bit there (4 bits instead of 3) but couldn't figure out what that last bit was for (it seemed like a "1/2 bit" to me). It looked like it was being used, but didn't make sense, so I stripped it off, and it didn't seem to affect anything so I forgot about it... till now. If I'm remembering correctly, this might literally be a "1/2 step" to add between shades? That can't be right can it?
Another way I read what you wrote is that it could be considered an "anti-aliasing" bit to do some alpha blending at the edge of a polygon to fake higher resolution? Maybe that's a better way to read it?
My memory is probably not 100% here... Would love to hear more.
I don't think it was for antialiasing (to interpolate between shades at adjacent pixels), it affected the horizontal timing.
I don't recall the actual resolution, but say it was 256 pixels per scan line. If the color for pixel 100 was index 5 and for pixel 101 it was index 3, the straight forward way would be to send the right RGB phase for whatever index color was throughout the interval pixel 100 and then send the right RGB phase for whatever was the right color for pixel 101. The half bit modified the timing to delay the transition from pixel 100 to pixel 101, somewhat doubling the horizontal resolution. It made polygon edges less chunky.
Or another way to conceptualize it: each row had 512 pixels, but memory for only 256 pixels. By default each memory location would be used to fill in the color for pixels X and X+1. But the half bit would modify it to say whether the transition should be shifted out one more pixel. If the transition was from red to blue, the default way would be to generate four pixels out (R, R, B, B), but if the half bit was set, the video generator would output (R, R, R, B).
OK so I did some digging into the schematics.
Posted relevant parts of schematics here [1]
Yeah it totally jibes with what you said above.
The rasterizer outputs what looks to be 10 bits of horizontal position (for the pixel being written I'm assuming). Since screen is only 256 pixels wide, this would imply that there are fractional horizontal bits here. The rasterizer keeps track of fractional bits to compute sloping (jaggies)... so the fractional bit implies the pixel being written fills up more than 1/2 of the next pixel.
Video memory is 7-bits per pixel. 6 bits are for 64 color palette index. The fractional 1/2 pixel bit (after some AND/OR/XORing) is stored as the 7th bit of the pixel. Indicates its a "wide" pixel.
Later, when video memory is being read/scanned to the monitor, the buffer holding the 6-bit color to output is sometimes latched a half clock cycle later. The half clock cycle delay is the 7th bit stored in the video memory, the 1/2 bit.
So yeah.. you're essentially delaying the latching of the next color if the pixel is "wider". It makes sense to think of the additional bit as a width bit.
This is a crazy way of doubling resolution... instead of needing double the screen buffer memory, you add 1 bit to indicate the pixel is wider. The caveat is you really don't have true double resolution, you can't have a single pixel at half resolution. You only get pixels of normal width, or 1.5 pixel width. Still that's amazing!
It should be possible to see this artifacting on a real game. Since the alphanumerics overlay is scanned at normal pixel size, if there is a half pixel in the bitmap, the overlay should sometimes not appear aligned to the background. Paul, you might be able to see this on your machine. If you play around in doodle city, you can move polygons where they're behind the alpha overlay, slight movements might show the half pixel in relation to the overlay.
>> The caveat is you really don't have true double resolution...
And only on the right side of a polygon. If video RAM is cleared to 0-black, there would be no "wide" background pixels, only wide polygon pixels and they're extended to the right only.
I will look at Doodle City and try to move things under the text overlay.
Also, what is that big chip in the linked schematic doing? HPOS6-15? PADD0-PADD10. It has a 5MHz clock (the dot clock) but that FF has a different clock which might be 10MHz?
The big chip (ICY) is the custom rasterizer ASIC. Basically a very primitive GPU, maybe the first? I don't know the internals (I'm told they're floating around) but I know what it does.
Yes that chip has a different clock, as its writing pixels to the frame buffer. The monitor always displays the other buffer not being written to, so the rasterizer does not need to be sync'd to the monitor pixel clock (which has overscan areas etc.), therefore the rasterizer can fill pixels at a different speed. My guess is that they couldn't run the ICY faster than the 5MHz, because I'm sure they wanted to run it as fast as possible!
Essentially the chip takes the display list, which is a bunch of instructions for DOTs / VECTORs / POLYGONs, and draws them to the frame buffer pixel by pixel. Starts at the top of the display list (address zero), goes instruction by instruction filling pixels until list end is reached. Note the START and HALT signals coming from the main 6809 CPU telling the chip there is work to do. When you "HIT A BLACK HOLE" everything is halted and the CPU tries to recover things.
Commands generally look like this (this is a major oversimplification so don't take this as 1:1 with the real hardware)
CMD_TYPE COLOR XLEFT XRIGHT YTOP SLOPELEFT SLOPERIGHT NUMSCANLINES
Where CMD_TYPE is [dot/vector/polygon], COLOR is the palette index, and coordinates are in 16-bit with fixed precision slopes.
The chip fills pixels from left poly edge to right edge, when it reaches the end it adjusts the left/right edges and adds 1 to the scanline, eventually stopping when all scanlines are drawn.
As I said I know the internals are floating around, and I know of at least 1 project attempting to duplicate the ICY on a daughterboard. ICYs are 40 years old and are starting to die and there is no replacement.
I had read about that fractional pixel online, and the bit does seen to exist in the vram (3 bits color, 3 bits intensity, one other bit).
I don't think it ever got implemented all the way through. If it works as described, it should cause the right edges of polygons to appear with higher resolution than the left edges, particularly on black (blank) background. I've got an I, Robot in my living room and have looked for this visual difference and can not see it. Doodle City even provides the opportunity to move and rotate objects, so even with direct control of object and orientation I can't see it.
Apparently there were other things that never got finished with the game too.
See my reply above... there is a 1/2 pixel being kept track of... looks like it indicates the pixel is "wide" and takes up at least half of the neighboring pixel to the right. This seems to delay the latching of the color to the monitor guns by a half pixel cycle.
I agree that if it's there, its hard to see (I found some actual arcade footage online). I wonder if that's due to issues with the lower res monitors? Would be interesting to ground the 1/2 pixel signal pin and see if that has a noticeable effect on the screen visuals.
FYI I'm the author of the original I Robot emulator from 1998. This is the one that interprets the mathbox, which means I only know what the mathbox does at the highest 'API' level (can't 100% speak to what the mathbox microcode actually does)
Paul is essentially correct. It's mostly back to front, with some assumptions. I can confirm quite a number of tricks were used, not entirely sure they qualify as a full BSP (although it does involve precompiled mesh optimizations) but then again I'm no BSP expert (just an I Robot expert).
The rasterizer draws a list of polygons (full polys, not just triangles) which must be fully computed before sending to the rasterizer. This "display list" is drawn in order. There is no Z buffer, so everything draws on top of what was already there. So it's important that the display list have the polygons somewhat sorted. Also the polygons are drawn as is, which means rotation/projection is done before being written to the display list.
The camera was fixed (can't yaw) as to simplify math. This means you are always looking straight ahead in Z axis, which allows for all sorts of optimizations. It also makes it trivial to sort objects back to front, simply by checking z coordinate.
Next, the hardware only supports 16 moveable "3D mesh" objects that can be drawn per frame. With an object count this low it's trivial to sort back to front without consuming many CPU cycles. 6809 index instructions make this part a breeze.
The game playfield is "built on the fly", essentially mirroring a 16x16 (16x24? 16x32?) playfield tile array in RAM (each byte encodes tile height), and building each cube face polygon on the fly.
Because of the fixed camera, no real sorting or culling is necessary when creating the display list at the highest level. You simply draw from back of the screen (furthest playfield row) to front (closest row). This process never has to change, it's a linear traversal of playfield array memory, and it always works as long as the camera doesn't yaw. Just draw each row on top of the last one. Guaranteed to not cause problems.
To draw a row (16 tiles, 15 visible) all of the front/top/side polygons are computed and written to the display list. Again, because Z axis is fixed it makes creating the polygon coordinates trivial (you always see front of cube, no culling needed). Once that's done you draw any 3D meshes that are in the row (objects sitting on the playfield cells). Then you move to the next row, and the next, and then you're done.
So at this high level, no real culling is done... things essentially get drawn in order of depth, which always works as long as you don't yaw the camera.
Now... where things get interesting is the 3D mesh objects themselves are stored in a sort of "pre-computed" way to optimize surface removal (not sure if this was done by hand, or by an automated/compile process). It's not so much a tree structure, as it's actually done through specialized instructions. I suppose the jumping/branching is the equivalent of tree traversal.
Basically the meshes are held in ROM, and are really just specialized subroutines, written as a series of simple instructions for the Mathbox. To keep this writeup simple, treat the Mathbox as being able to execute two instructions:
1) write a polygon to the display list
2) "relative jump to instruction" if normal vector dot product points towards/away from camera
A 3D mesh object might be coded in ROM like this
Mathbox3DMeshObject:
if normal vector dot product < 0, jump to X ; pointing away from camera
write poly 1 to display list ; forward facing polygons
write poly 2 to display list
...
Jump to Y
X:
write poly 10 to display list ; back facing polygons
write poly 11 to display list
...
Y:
if normal vector dot product < 0, jump to Z ; sideway facing?
write poly 20 to display list
write poly 21 to display list
...
Z:
write poly 99 to display list ; always drawn
...
END
A mesh object could code any number of jumps, to cover side facing objects, corner cases, etc. Note I'm glossing over the fact that the mathbox would also multiply the coordinates by the current rotation matrix, project to X/Y, perform shading, etc. before emitting to the display list.
As code it's not a real tree structure, although executing the code behaves a lot like traversing a tree. I guess this is sort of a BSP? Can anyone chime in here?
It was pretty clever how it worked out. The mathbox executes the instructions, and creates draw instructions in order. Polygons get skipped by virtue of the fact you're periodically checking normal vectors, and skipping around to different instructions as needed.
20 years after playing around with your emulator, I get to tell you that the "metafile dump to Window's clipboard" feature is just so nifty it still sticks out in my memory today.
The issue to me is that a tree is a data structure, but the meshes in I, Robot are encoded as literal subroutines of machine instructions. There is no tree to traverse, the Mathbox simply executes the instructions as it encounters them.
These aren't even "interpreted" (fake) instructions for a software VM (like Sweet16). They're literal instructions for the custom Mathbox microcode. The Mathbox executes the instructions, and outputs polygons to the display list as told.
It's a lot like a compiler optimization that unrolls a loop inline... Can you even call it a loop once it's unrolled? In this case, it's as if a BSP was created, then turned into literal instructions for a custom machine. Many times the polygons were duplicated in the code (in-lining polys works faster than branching back). It blurs the line between code and data.
I'm really blown away by the sophistication of the I, Robot Mathbox... The microcode was exceedingly complex, much more than just some fast matrix math (like Battlezone etc). It executed instructions to compute dot products, which would potentially result in X/Y/Z points being multiplied by a rotation matrix, projected, and written as polygons to the display list, for later rasterization by a different processor altogether.
Sure, that's still a BSP tree. It's just represented as code instead of data. The duality of data and code is a long-established optimization path for static information.
These aren't even "interpreted" (fake)
instructions for a software VM (like Sweet16).
They're literal instructions for the custom
Mathbox microcode. The Mathbox executes
the instructions, and outputs polygons to
the display list as told.
Wow, that's amazing, I love that. Thank you so much for explaining this.
I was studying computer graphics at the time. The book we used was "Computer Graphics Principles and Practice" I don't recall which edition. BSP trees are covered in the book, and like the article says, had been written about more than a decade prior.
What Carmack had was the ability to read research papers and translate them into working code. I can do that too, but it does seem to be a less common skill in the software world.
What Carmack had was the ability to read
research papers and translate them into
working code. I can do that too, but it
does seem to be a less common skill in
the software world.
Well, that's the argument for Carmack being merely very very very good and not a "genius."
Here's the argument for him:
There were a lot of exceptionally talented people working in the games industry, and for four+ generations of PC hardware (the Keen, Wolf3D, Doom, and Quake eras) Carmack's engines were years ahead of everybody else's and had an absolutely massive impact on the industry.
Of course, the term "genius" is nebulous and it can mean whatever we want it to mean.
However, describing him as merely a guy who implemented other peoples' algorithms may be true in a literal sense but really misses the big picture IMO.
Not just working code -- code that worked well in a real-time video game on 386/486 class hardware. That's the genius bit as far as Doom's code is concerned.
When Carmack did his work, the computers he had were 386/486 class. So the distinction you're making doesn't make sense. Working code is code that works in the computer you have at the time you're writing the software.
Not quite when Doom came out, but within a few years you could find a surprising amount about graphics on grad students' homepages. I remember learning about PVS, Portal based rendering, and a number of similar things from the original paper author's homepage.
I implimented a BSP renderer in my somewhat unremarkable CDROM game Dark Fiber [1] in 1995 or so. I also remember seeing the algorithm in Foley and Van Dam, but it wasn't until a conversation with John at some conference/tradeshow that I went for doing the engine using BSP. It was a lot of fun figuring out how to optimize the algorithm and I also spent a lot of time using Mathematica figuring out the equations. Well, that was at least my excuse for buying a fairly expensive piece of software as a small gaming startup.
> The really neat thing about a BSP tree, which Fuchs, Kedem, and Naylor stress several times, is that it only has to be constructed once. This is somewhat surprising, but the same BSP tree can be used to render a scene no matter where the camera viewpoint is. The BSP tree remains valid as long as the polygons in the scene don’t move. This is why the BSP tree is so useful for real-time rendering—all the hard work that goes into constructing the tree can be done beforehand rather than during rendering.
Here's my explanation of BSP trees, written as much for my own benefit as anybody else's (maybe it will help you):
To start off with, we have a basic geometric fact: if you have a surface that divides space into separate regions (this can be a closed surface like a sphere, or an infinite surface like a plane), a line of sight can't cross between the two partitions of space without passing through the surface. If the surface is convex, a line of sight can only cross it once. (That's one definition of convexity.)
Now, suppose that you have a lot of objects, and none of them are on both sides of the boundary. You can ensure that condition by splitting any object that crosses the boundary in two. No matter where the line of sight begins, and no matter what direction it moves in, the boundary then offers a definite, albeit partial ordering of events: objects on the same side of the boundary may intersect, then the line will intersect the boundary, then the objects on the other side may intersect.
If you find an intersection with an opaque object on the same side of the boundary, you never need to check any objects on the other side, because the order of events above. That is a big benefit when about half of the objects in front of you are on your side of the boundary: if you are right in front of the boundary, or if there's nothing on the other side of it, it doesn't help a lot.
Checking all of the objects on one side of the boundary is the same kind of problem as checking all the objects, and that means you can speed those up too by having sub-boundaries for the sub-problems. That gives you the recursion and makes it a tree. If the reasoning based on boundaries continues until there is only one object in a sub-problem, you don't need to sort lists based on distance - meaning that the whole ordering problem can be solved just with partition trees.
Choosing a good tree is a difficult problem because, unlike the fact that the ordering problem can always be solved, the amount that an individual boundary division helps depends on where you and the objects are in relation to it. Planes are generally used instead of other things like spheres because splitting a polygon along the edge of a sphere doesn't result in another polygon, while splitting it along a plane does.
Because the input to the BSP traversal is the view position.
Every node in the tree is associated with an infinite plane that divides space in half. Stuff on one side of the plane is in the left subtree; stuff on the other side of the plane is in the other subtree.
The viewer is on one side of the plane or the other (or maybe on it, oops).
We know that stuff on the other side of the plane cannot occlude the view of stuff on the viewer's side of the plane. So for instance if we're doing back-to-front rendering, we would draw the material from the other side of the plane first, then stuff from this side. (Subject to the content being rotated into the correct view and clipped to the view frustum.)
There is no polygon in the tree that intersects/straddles these dividing planes, which is the point. When the BSP is constructed, whenever these planes cut through some input polygon, it is cut into two pieces that get assigned to different subtrees. There are then situations in which those pieces will not get drawn at the same time though they belong to the same logical surface.
The planes which subdivide space in half are independent of each other; they go every which way. So if we have nodes like
a
b c
d e f g
it is possibly the case that plane b cuts through the polygon sets f and g, and maybe some of them even straddle plane b. None of that is relevant to them because the b subset is on the opposite side of common ancestor a, the b plane subdivision is concerned only separating d from e. Binary space partitioning is not actually a mathematical partition (exhaustive division into non-overlapping parts) of the space. The set of poygons gets partititoned, of course; not the space itself.
Quite a few games (eg: Starfox) in the 90's would use BSP in a slightly different way - purely within each object. The tree would be generated offline (mechanically or by hand), and would often be some sort of branching bytecode (test this, draw that etc.)
It was very useful to be able to use one plane test to discard a whole bunch of geometry, eg: a surface plus its detail, or everything visible through a window.
This still left the problem of sorting between objects - mostly a depth sort was just fine. Special cases like weapons on hardpoints, or vehicles in hangars could be handled by having some sort of 'call submodel' for spaces within the parent model. Beyond that, just dial up the hostile firepower until players stop complaining about rendering artefacts.
I'm reminded of the render algorithm in Comanche, the 1992 video game [0]. I find it remarkably ingenious, and it certainly felt incredible at the time.
Apologies; I've looked it up and the one I found that most it wasn't voxel based. "Silent Thunder: A-10 Tank Killer II" was a first gen 3D accelerated game, nothing to do with Comanche, which I also remember playing with delight. Memory fades, next thing I'll be conflating Descent with those games...
For people not reading the whole thing (reasonable, it is long): the author comes to the conclusion that it was not so “genius” after all, given the prior art, and that he read it in a computer graphics book. He still gives Carmack props for doing his homework and getting a working implementation built in a novel game engine.
Doom is truly an amazing showcase of engineering achievements and going the extra mile for optimizations.
Games these days just depend on raw hardware and optimization takes a back seat. Games like Plague Tale Requiem, while look fantastic, are equally bad at the engineering part. How is an RTX 3080 the minimum recommended spec for 1080@60fps, is beyond me. Sadly, people these days do not care about software optimization and those who do are bombarded with comments like "Get better hardware".
The truely great games do. I am sure the developers of BOTW were very serious about optimization and you can actually feel that playing the game (on switch). It’s a masterpiece just like doom.
BotW performs well enough, but there's some areas where things get REALLY choppy. It seems like areas with tall grass are the worst for it, most notably Korok forest in front of the Deku tree where you get the Master sword and talk to Hestu.
Yeah, there are a few companies who focus on optimization a lot. Rockstar's games are usually well optimized. Also Sony's recent offerings as well. Maybe even throw FPS games in the mix, Call Of Duty, Battlefield all look amazing and do not require super high-end hardware.
I took a 3D computer graphics course in university in the early 2000s, and the final project was to make a game. IIRC we could use as many open source libraries + assets as we wanted, but we had to write the renderer.
Our professor was a big Buffy the Vampire Slayer fan, so we made a Buffy game. My team thought we'd have some easy points for that. When we presented our plan, he says "I know you know I'm a Buffy fan, so this better be good"... Ooops. Time to get to work.
We used the Quake map format, which is BSP, and our renderer worked pretty well. We used Quake 2 character models, which worked pretty well on its own. I think it was because someone had made some vampire models for it.
When we integrated the two, the performance was terrible. Like a slideshow. After a brief panic, we realized that the Z axis was inverted between the two modules we'd written, and the guy that put them together was inverting the model every frame. After inverting it once when the model was loaded, they worked pretty well together.
We added a simple hitbox (a rectangular prism at the maximum points of the model), and had a fun little game after that!
I think the “sit and read research papers” part is often omitted, to our own peril, from the romanticized image of the genius computer hacker of that era.
Start with Foley & van Dam to lay the groundwork & terms of art, then grab whatever API tutorial you like and get started. You'll find plenty of further resources once you're more familiar with the field and start discovering your interests. Graphics programming is a seriously deep discipline and will take years of study to start doing real work in. But it's a super cool field with plenty of work on both the research & practical implementation sides.
This tutorial is well regarded for learning the ins and outs of modern OpenGL including how the GPU pipeline works, which I think translates pretty well to Vulkan or Metal (at least the concepts anyway) https://learnopengl.com/
I just finished it last night. It's certainly a fun read but I felt a little underwhelmed in the technical aspect of things. I don't think the author is a programmer so I can't really blame him. I recall the book describing DirectX as a programming language at one point.
BSP's were first used in Quake, not Doom. Doom was not 3D, just 2.5D. Full BSP's as used in common VR's applications those times were not needed then.
The Doom precomputed z-buffer hack is described in this article, but I wouldn't call it genius. The Quake BSP was much better, but neither genius as a lot of people used it before to their advantage. It was a common technique, just not in games. Even we used it in our VR application years before Quake.
It seems that a challenge in BSP is going to be artifacts, when a surface, particularly a textured one, is chopped into pieces by several planes. If this is not done very carefully, the cuts might show, due to edge artifacts caused by the separate scan conversion of the fragments. Certain shading tricks like Phong may be difficult to apply in such a way that the result looks as if the original undivided polygon had been rendered.
Brilliant synthesis of the problem and why it was difficult.
My feeling is that Carmack is really good at speeding up important parts of video game performance to achieve qualitatively different gaming experience.
Sometimes I wonder if Carmack was born 30 years later, when computing resources make such advances less needed, he would still be as successful? Perhaps he would go on to improve performance of other problems eg virtual reality and deep learning…..
Of course being famous does open doors, but there's no reason to doubt he'd break into VR in the counterfactual. He did break into game development in reality after all.
"computing resources make such advances less needed"
That's arguable.
Someone has to improve the graphics engines at Unity and Unreal, with enormous reach.
There's a big potential market in porting what only runs in dedicated VR headsets and expensive gaming computers and consoles to cheaper phone-in-a-helmet VR headsets and notebook computers. Then to cheap phones and netbooks.
The beauty of the BSP tree is that it used recursion. When first approached this is a challenge to wrap your head around- very similar to the quicksort algorithm. Great read.
These days octtrees more or less replaced BSP trees i guess. They are easier to handle and work better with more polygons.
I’m surprised by the claim Doom was the first game to use BSP. I remember in the late 90s casually browsing the internet and coming across a demo of BSP and incorporated it into my own amateurish game projects
My first introduction to bsp was qeradiant while making q3a and later sof2 maps. There used to be a bsp compilation stage that I recall. At the time I had no idea what it meant.
What I don't understand is, how the tree can be used even after movement? How do you know which polygon is behind the other if camera position and angle are freely chosen?
When the tree is traversed, you make a decision on which subtree to render depending on the camera viewpoint. This is covered in the article right before all the pictures of the trees.
I thought this was going to be about all the small hallways in Doom where you have to close one door before opening another - this was done so that the engine would never have to render more space than 1 room at a time. I don't know if that's true, just something I've often heard repeated.
I played Doom, Doom 2 and Final Doom all the way through about 20 years ago - I can’t recall a single instance of this kind of level design. Doors stayed open once opened. Maybe it was done with corners/sight lines instead? To be honest I remember some vast areas rendered, especially when accessing secrets.
It doesn't split by axis, it splits by planes that can be oriented however. Planes split a space into 2 half-spaces, hence the "binary" part. Only the older Wolfenstein-style rendering cared about the axis.
That makes sense if I think about everything existing on a single axis (objects in front of each other on the z-axis).
But if you then turn 90 degrees, then suddenly a bunch of those z-axis items are side by side, and order is based on the x-axis of my original perspective. So I don't understand how a binary structure accounts for that perspective shift.
Edit: For clarification, my understanding is that there's a single tree built per-level in advance. If I'm mistaken and it's re-building the tree 30 times per second, the lack of axis concerns makes sense, but the article seemed to indicate that that's computationally prohibitive.
The tree is built once. The tree can be traversed from any point in the world because each node stores which plane it used to split, and it is easy to compute which side of the plane you are viewing.
Bizarre how Carmack is idolized as being some sort of genius for reading a few research papers and applying an existing technique. What about the academic who actually invented the technique!
As someone working at the same time as him on similar problems, I honestly and sincerely disagree. At the time my reaction was "this is space cadet stuff". 486 hardware was really slow, and the idea of doing any kind of real computational geometry back then was incredibly far-fetched. Until he showed it wasn't.
Let's not underhype it, either: It seems you are underestimating the impact that Doom had on PC gaming when it came out. Using BSP trees to perform zero overdraw rendering definitely was not something that had been done before on consumer level hardware and it can definitely be argued that applying them this way for Doom was a stroke of genius.
You're setting the bar for "genius" too low, which unfortunately is commonplace in computer programming. If you take something that already exists and apply it to your code I don't call it genius. Hi quality programming, yes, but genius no.
FWIW, the inverse square root function wasn't popularized by Carmack. Most people shared it because they were amused by the confused comments surrounding it, and how it demonstrated an extremely vertical slice of optimization.
Treating anyone like a rockstar is a bad idea, but Carmacks and Wozniaks deserve the praise more than the Zuckerbergs and Jobs' out there.
From the Carmack worship story: “So Carmack was not the first person to think of using BSP trees in a real-time graphics simulation. Of course, it’s one thing to anticipate that BSP trees might be used this way and another thing to actually do it. But even in the implementation Carmack may have had more guidance than is commonly assumed.”
The story of tracing the magic number back is really fun and it's interesting to see how the trade of computer graphics programming was passed down from one company to another. I don't think it's legendary status was at all diminished when it turns out not to have been Carmack's invention.
None of these techniques is relevant anymore given that all the hardware has Z buffers, obviating the need to explicitly order the polygons during the rendering process. But at the time (mid 90s) it was arguably the key problem 3D game developers needed to solve. (The other was camera control; for Crash Andy Gavin did that.)
A key insight is that sorting polygons correctly is inherently O(N^2), not O(N lg N) as most would initially assume. This is because polygon overlap is not a transitive property (A in front of B and B in front of C does NOT imply A in front of C, due to cyclic overlap.) This means you can't use O(N lg N) sorting, which in turn means sorting 1000 polygons requires a million comparisons -- infeasible for hardware at the time.
This is why many games from that era (3DO, PS1, etc) suffer from polygons that flicker back and forth, in front of and behind each other: most games used bucket sorting, which is O(N) but only approximate, and not stable frame to frame.
The handful of games that did something more clever to enable correct polygon sorting (Doom, Crash and I'm sure a few others) looked much better as a result.
Finally, just to screw with other developers, I generated a giant file of random data to fill up the Crash 1 CD and labeled it "bsptree.dat". I feel a bit guilty about that given that everyone has to download it when installing the game from the internet, even though it is completely useless to the game.