Drawing isometric boxes in the correct order

munchbunny · on Aug 18, 2016

Great explanation for a generally messy problem!

Just one thought because I spent some time thinking about this type of problem before deciding to do it a different way in my own work:

If you use an orthographic projection and just actually draw 3-D boxes, this problem becomes trivial, including the intertwined boxes problem, because the GPU will take care of it at a per-pixel level with the Z-buffer.

Of course this also means you're suddenly working in 3-D, which complicates other things. But you can still use 2-D textures by picking the right texture coordinates for the vertices on the box.

These days it usually makes a lot of sense to be doing 2-D stuff using a 3-D engine/library anyway, if only for access to hardware accelerated post-processing and hardware accelerated layering/compositing, in which case using actual boxes on a tilted camera and using "no projection" for your HUD/UI works pretty well.

JoshTriplett · on Aug 18, 2016

"Use a 3D engine with Z-buffering" does seem like the right solution to these problems today. However, even if you don't, taking inspiration from one seems like another way to solve this class of problems, including the case of the three mutually overlapping boxes in the last example: independently decide the "front" box to draw for each screen coordinate. For each box, record which boxes appear "in front" of it, non-recursively. (So, A > B, B > C, C > A.) When rendering, only consider boxes that would render at the given screen coordinate in the absence of occlusion. (So, in the given example of mutual overlap, any given screen coordinate only needs to consider two boxes.) Then, render the front box at that screen coordinate.

If you have any translucency (water, plants/trees, objects), then render the frontmost opaque box followed by other translucent boxes in front of it, in back-to-front order.

(That approach would completely break in the presence of concave shapes, which could allow a single box to appear both in front of and behind another at the same screen coordinate.)

Depending on your shapes (e.g. boxes), you wouldn't need to do that for every screen coordinate; you could make the same decision for entire regions, bounded by the screen-projected coordinates of the vertexes.

However, once you reached any significant degree of complexity, you'd probably still be better off implementing a full 3D engine in software if you didn't have 3D hardware. Z-buffering isn't particularly complex to implement. (More complexity comes in when trying to pre-cull some geometry without rendering it at all, to avoid rendering over the same screen pixel many times.)

munchbunny · on Aug 18, 2016

Thanks for expanding on my musings! Two thoughts to add:

1. Pretty much every 3-D graphics engine worth using supports importing and batch-rendering meshes that are actually made up of multiple sub-meshes. So you can fix the circular occlusion problem and the concave object problem with pretty much no performance loss by just subdividing those offending shapes during the art phase. This simplifies rendering because the engine doesn't need to do any dynamic mesh subdivision.

2. Pretty much every 3-D graphics engine worth using supports depth sorting for translucency, so you effectively get that for free too.

I should have clarified that "3-D" by itself doesn't necessarily get rid of your problem. However, there's very little reason to build your own graphics engine anyway, and modern 3-D graphics engines abstract away the occlusion problem for free.

I suppose I had a third point, a response to your last paragraph: it seems hardly worth it to basically implement your own software rasterizer with a RAM based Z-buffer if getting the dedicated hardware to do it is just a matter of pulling in a 3-D graphics engine. And doing it on dedicated hardware is going to be much faster. Granted you have to learn how to use the 3-D graphics engine and you'll have to do a bunch of work through shaders, but if you're doing graphics stuff, this is staple, bread and butter knowledge. Kind of like how web frontend developers would be expected to be able to write CSS and to be fluent in at least one JS framework.

JoshTriplett · on Aug 18, 2016

> 1. Pretty much every 3-D graphics engine worth using supports importing and batch-rendering meshes that are actually made up of multiple sub-meshes. So you can fix the circular occlusion problem and the concave object problem with pretty much no performance loss by just subdividing those offending shapes during the art phase. This simplifies rendering because the engine doesn't need to do any dynamic mesh subdivision.

True, but this problem also just goes away if you represent objects as meshes of triangles, because individual triangles aren't concave.

> it seems hardly worth it to basically implement your own software rasterizer with a RAM based Z-buffer if getting the dedicated hardware to do it is just a matter of pulling in a 3-D graphics engine.

Agreed completely, if you have that hardware available. I could imagine constrained-computing scenarios (either as a challenge, e.g. "standalone minecraft clone in 4k of code", or a deeply embedded environment) where you wouldn't have a 3D engine available and you might want to implement a simplified rendering engine for a highly constrained rendering problem.

But yes, for almost every scenario, you want to just use the GPU.

munchbunny · on Aug 18, 2016

I'm going into pedantic territory, but since we're on the topic of drawing order for translucent meshes... might as well.

You don't actually avoid the problem entirely when you're using just triangles because it's still possible to construct a "A overlaps B overlaps C overlaps A" cycle of occlusion using three triangles. It's more or less the same corner case as a cyclic occlusion problem with convex pole-shaped meshes.

For this reason it seems like graphics engines tend to just batch-render entire sub-meshes at a time because rendering entire meshes presents little additional complexity over rendering individual polygons (and is more performant). It's a leaky abstraction but the leaks are pretty rare.

JoshTriplett · on Aug 18, 2016

I wasn't suggesting that triangles avoid cyclic occlusion; I was suggesting that a triangle can't be concave. :)

The best solution for the cyclic occlusion problem still seems like a depth buffer.

badlogic · on Aug 18, 2016

A z-buffer solution will not help as soon as you involve transparency. Which I'd assume is the case in 99℅ of applications. Unless your application/game is all boxes :)

phire · on Aug 18, 2016

No, transparency is actually quite rare in games, for pretty much this exact reason, and outside of games is rare to have more than one or two layers of transparency.

The typical method is to draw all opaque objects first with zbuffering enabled. Then sort all the remaining trasnparent objects by depth and draw them from back to front

munchbunny · on Aug 18, 2016

I didn't discuss transparency because transparency handling comes more or less free with a graphics engine.

That's besides the point that partial transparency is actually intentionally used very sparingly in games because it's computationally expensive.

In the same sense that you almost always start a web app front-end with some JS framework, you almost always start a 3-D game with some graphics engine. Graphics engines generally take care of the depth sorting for alpha blending for you.

If you're building your own graphics engine, then I'd mostly just question whether you really need to. Sure you might make something leaner, but the benefits will be marginal at best.

In other words, yes, you're right, transparency complicates things, but it's a well-understood problem and the solution is more or less free with the package when you pick up a graphics engine.

Of course, when you really dig into the bowels of the problem, you'll notice that most graphics engines don't gracefully handle the circular overlap problem. In practice, it's more efficient and lower cost to just cut the object up at asset creation time than to dynamically split things at render time. That means you're just avoiding the problem, but it's a rare enough corner case that it's just cheaper to work around the corner case.

white-flame · on Aug 18, 2016

Boolean transparency can be performed in any order, so you can use non-boxes straightforwardly. It's only semi-transparent textures or alpha-blended objects that need more help if they're to be blended correctly relative to each other (and most situations will involve particle & sizzle effects, which don't need true correctness anyway).

LukeShu · on Aug 18, 2016

You can also implement the Z-buffer in software fairly cheaply, allowing you to keep working in 2D.

dtf · on Aug 18, 2016

I was very glad to find this page earlier this week, after scratching my head while implementing an isometric renderer for a client. I had assumed it would be a simple back-to-front order, but this page and a few others pointed out that the partial ordering required to 1) use an intersection test to build disjoint sets of confused objects, 2) use a pairwise order test to determine inter-object dependencies, 3 ) use a topological sort to resolve the ultimate drawing order.

This method was implemented in some fantastic 1980s computer games, and the developer had named their technique "Filmation". Here's a discussion of that system, plus a demo of the rendering bug caused by not bothering to break dependency cycles as discussed in the main article:

http://bannalia.blogspot.co.uk/2008/02/filmation-math.html

(also I just found another with a load more details on the Filmation games: http://retrospec.sgn.net/users/nwalker/filmation/)

ndr · on Aug 18, 2016

I'm still reading through the article, but can you give an example of why rendering all the blocks ordered[0] by (min x, min y, min z) lexicographically doesn't work[1]? I thought the sophistications is necessary for performance only, i.e. not drawing anything that doesn't appear in the final result.

[0] Assuming (0, 0, 0) it's in the middle at the top of the screen.

[1] Assuming this is what you meant by "simple back-to-front".

Jare · on Aug 18, 2016

Elongated boxes may not be sorted correctly no matter which point you use (min, max, center).

ndr · on Aug 19, 2016

What do you mean by elongated box?

tobr · on Aug 18, 2016

If you're interested in this topic, Andrew Russell recently wrote a blog post and made a video about how they sort sprites in River City Ransom Underground. It's even more complicated because the sprites aren't necessarily simple boxes.

http://andrewrussell.net/2016/06/how-2-5d-sorting-works-in-r...

https://www.youtube.com/watch?v=Ssrkq6_6JYU&t=8m48s

undershirt · on Aug 18, 2016

Thanks, this is very interesting! I added it to the article.

The only difference is that bounding boxes are still used for sprites, but he adds a sort of voxel heightmap inside of it to represent occupied volume.

> heightmap: https://youtu.be/Ssrkq6_6JYU?t=15m21s

If you imagine this heightmap as a building with multiple floors (e.g. 1st floor, 2nd floor) the algorithm takes the lowest common floor of each of these buildings and performs an intersection test between them and nothing else. This allows the sprites to interact more fluidly inside the bounding boxes! Neat.

> heightmap floors: https://youtu.be/Ssrkq6_6JYU?t=16m12s

mparlane · on Aug 18, 2016

The way I have seen this commonly solved is to draw each item from back to front (highest Y value, assuming 0 at bottom of screen and >0 at top). But every item is the size of one square, or is in fact multiple items the size of 1 square which make up the larger object. It just seems like a simpler and more efficient solution and you don't need to worry about cycles.

Jare · on Aug 18, 2016

This is pretty much what we did in the Commandos games back in '97. As explained in other posts, you can't use some simple depth metric for the entire box and z-sort them because elongated boxes will cause problems.

One of the reasons to do this instead of Z-buffering is that Zbuffer will not help with transparency, and most games with these kind of needs will use it, for antialiasing sprite edges if nothing else.

joneholland · on Aug 18, 2016

What's wrong with naively drawing every box ordered by depth? This seems like a bunch of extra work for the same result?

TheCoreh · on Aug 18, 2016

The boxes are not necessarily cubes, so ordering them by depth of say, a central point, would not yield correct results for some edge cases (e.g. elongated boxes)

thomasahle · on Aug 18, 2016

From the examples in the article, the boxes can all be broken into fairly few pieces, which are square when seen from above. Small 'cube towers'.

If you break them up like that, naive painting in order of depth should be fine.

phire · on Aug 18, 2016

Note, it's only worth going through this trouble if you are drawing boxes with a CPU only API.

If you have a GPU available, you might as well be using an orthographic projection and the depth buffer.

panic · on Aug 18, 2016

Using a depth buffer would also side-step the issue of cycles and support intersecting blocks.

Retr0spectrum · on Aug 18, 2016

Even on CPU, how much overhead would a depth buffer cost?

phire · on Aug 18, 2016

It would basically double the memory bandwidth and cache requirements for drawing if you used a full 32bit depth buffer.

But it would probably be worth it. A 16 bit buffer would probably be good enough for isometric drawing and would be faster.

joakleaf · on Aug 18, 2016

I implemented drawing of polygons orthographically quite some time ago. The output was for PDF, so I couldn't raster + zbuffer. Instead I drew all polygons back to front.

I ended up creating a BSP tree (1) for the polygons and drawing them in order according to the tree.

The problem in general with sorted back to front drawing is exactly the "Conundrum" example. However, a BSP tree will take of that, and divide at least one of the cubes in that example.

It may feel like overkill, but I would probably have done something similar in this case. Implementing BSP trees for cubes is very simple, because all planes are xy, xz, or yz.

[1] https://en.wikipedia.org/wiki/Binary_space_partitioning

jaunkst · on Aug 18, 2016

I worked on an Isometric framework for a flash project in 2009. I remember coming to an acceptable solution using axis sort and volume data. I would have killed to have a GPU available then.

pjc50 · on Aug 18, 2016

If you're doing isometric, then use a tile system like 90s videogames. It makes far more sense.

zyxley · on Aug 18, 2016

I was just thinking that.

If you really want an isometric look but need more complicated rendering than layered tiles, it seems like it would be much easier to do 3d rendering but lock the 'camera' into an orthographic projection mode.

Unity and Unreal Engine both have this as a built-in feature that basically just takes a checkbox, for examples off the top of my head.

hcarvalhoalves · on Aug 18, 2016

Much better approach. Also because it makes rendering particles - or whatever things that won't align w/ the grid - correctly a non-issue.

white-flame · on Aug 18, 2016

I'm curious as to the first implementation of this. I sat down and solved this nuisance of a problem in 2008-2009, for as3isolib. I later found a thesis by Ernir Erlingsson from around the same time period, but with a bit of a different data structure holding the objects to sort. I'm not aware of any "classic" games that do a true sort of isometric objects.

The vast majority of actual 2d isometric games use a very naive grid walking, with uniform tile sized elements. If everything moving is limited to 1 plane, then varying square footprint tiles can be used with a simple sort. However, the linked page is the "true" solution to isometric sorting, allowing any size axis-aligned rectangular solid to exist in any non-interpenetrating location in 3-space and sort correctly.

mclightning · on Aug 19, 2016

I don't understand why it has to be so complicated. If you position pivot of your objects on the bottom,lowest corner of its image. Then you can start drawing them by the order of this position.

Start drawing from highest Z position to the lowest. When you have objects staying on top of each other, use order of the parent object as the drawing order for child objects.

1 rule : All buildings are only allowed occupy their own air space only. They are not allowed to have balconies over reaching to airspace of other X/Y area.

chad-autry · on Aug 18, 2016

Seems just break all boxes into cubes, then draw in z index order is the best.

Also as some say, just throwing it into 3D and locking the perspective is waaaay easier on the math. For example, 2 of my own projects.

http://chad-autry.github.io/hex-grid-map/#/demo vs http://chad-autry.github.io/hex-grid-map-3D/#/demo

efnx · on Aug 18, 2016

I may be naive but I do have some experience with isometric games and my solution was to set the index origin of each box as the top back corner (the top of the hexagon if flattened), and then order bottom-to-top, back-to-front. Then draw in order.

http://blog.efnx.com/wp-content/uploads/2007/10/fpsExample.s...

fao_ · on Aug 18, 2016

Wouldn't a 'painter's algorithm'-style solution of:

  break all the shapes into boxes that can occupy only 1 space
  
  for height = 0 .. n:

    for <however many boxes can be displayed in a column>
  
      for <however many boxes can be displayed in a row>
  
        draw box, step to next box (x+1, y+1)

solve these problems? I'm not sure how fast it would be in practice.

munchbunny · on Aug 18, 2016

The way you would do this in practice would be:

1. Start with the abstraction that every shape is actually a group of one or more sub-shapes. (Well-understood and pretty much universally applied abstraction.)

2. Pre-process all of your geometry so that you won't get cycles of occlusion, which is the corner case the author mentioned. If you do have a cycle, then subdivide and group as mentioned in #1.

3. Use some fast but crude algorithm to preemptively reject non-visible shapes.

4. Apply painter's algorithm. to "probably visible" shapes.

resist_futility · on Aug 18, 2016

Unless you take the voxel draw every cube approach, I can't see how you can account for every possible shape and position. Taking the last example further what if the red box continued below the purple box. The next best thing I could think of, without going 3D, was to split each box into layers.

Retra · on Aug 18, 2016

This only works on rectangular prisms (which is the definition given for 'box' in the article.) If the red box continued below the purple one, it wouldn't be such a box anymore.

steaminghacker · on Aug 18, 2016

why not draw the individual polygons as a single batch?