Unless you take the voxel draw every cube approach, I can't see how you can account for every possible shape and position. Taking the last example further what if the red box continued below the purple box. The next best thing I could think of, without going 3D, was to split each box into layers.
This only works on rectangular prisms (which is the definition given for 'box' in the article.) If the red box continued below the purple one, it wouldn't be such a box anymore.