Hacker News new | past | comments | ask | show | jobs | submit login
An interactive explanation of quadtrees (2014) (jimkang.com)
102 points by nbaksalyar on Feb 14, 2023 | hide | past | favorite | 36 comments



I have had to do some non-trivial work with quadtrees, and one thing I realized, which is not so well-advertised, is that there are really two main categories of quadtrees. So-called "linear" quadtrees are a somewhat different beast that the more typical "pointer to children" ones.

They're really just different representations of the same thing (a quadtree), but those representations have such different use-cases and performance characteristics that I tend to think of them differently.

Linear quadtrees typically use Morton codes (or another space-filling curve), and are good at doing fast queries against static data. You build the tree once, then query it lots. Whereas "standard" quadtrees are more about dynamism — being able to add and remove items quickly.

This is a great SO post on dynamic-style quadtrees: https://stackoverflow.com/questions/41946007/efficient-and-w...

It was quite hard for me to find open-source implementations of linear quadtrees. Whereas there are all kinds of implementations of the more common kind.

One day I'll open-source my code. I was going back to papers from the '80s for some of that stuff (:


I'm also working on a linear octree (like quadtree but 3d) right now.

An array sorted by Morton code is a quadtree/octree and an equally spaced kd-tree.

Radix sorting can be used for sorting in linear time (and GPUs).

When looking at the Morton code, if the most significant bit is 0 then it is on the left side of the x=0 plane, and right side if the bit is 1. You can find the split position where the bit flips with binary search.

Then recurse to both halves using the next most significant bit which corresponds to the y=0 plane. Each level of recursion splits the space in two.

You could also get the splits "for free" from the histograms produced during radix sorting to reduce the number of binary searches.

This idea can be extended to AABBs instead of points by XORing the end point Morton codes and counting the leading zeros.

A lot of literature was written on this topic in the past 10-20 years for the purpose of building acceleration structures for Ray tracing on the GPU.


> It was quite hard for me to find open-source implementations of linear quadtrees.

You probably know this, but the S2 library has one: https://github.com/google/s2geometry


Thanks for the link — no, I had not heard of it.

Though I must say, looking at it briefly, I see no direct mention of "linear quadtree" (or even just "quadtree"), nor morton codes, z-order curves, BIGMIN/LITMAX functions, etc.

I suppose that's another difficulty with quadtrees — they're so common and amenable to hand-rolling that there are lots of different variations on terminology.

Do you happen to know where the linear-quadtree-related code is, in that library?


S2 uses Hilbert curves, which is slightly different but definitely the same idea. https://s2geometry.io/devguide/s2cell_hierarchy

A cell ID in the S2 coordinate system is equivalent to he path to a quadtree node, as in 0101101 describes which sub-sub-sub-etc-quadrant you're talking about.

It seems like linear quadtree in this context would essentially be an array sorted by S2 cell ids? (I'm not entirely clear is linear quadtrees are just the old trick of array-ifying a fixed-fanout tree structure, where k's children are 2k+1, 2k+2, etc.)


A linear quadtree is (most commonly, in my experience) just a list of Morton codes, sorted. So yes, very similar to sorted S2 cell Ids, it would seem.

One way to efficiently do a spatial query on such a sorted list is the BIGMIN/LITMAX algorithm I mentioned earlier. I believe the originating paper is: http://hermanntropf.de/media/multidimensionalrangequery.pdf (see section 4 — "Range Search").

In my particular case, I also only cared about the "black" nodes in a tree (that is, ones with some data inside), so the linear quadtree is also sparse, speeding it up more. I think since S2 is modeling the entire sphere's surface, it is fully dense. A sparse version might be "only include the cells that contain land," if that were all that was needed for the application.

Though in the sparse version, you don't have fixed-fanout (or at least, it's not embodied in the array), so can't use the trick you mention. Pros and cons (:

One key advantage of Morton codes is that, given a cell ID, you can split it into the "x bits" and "y bits", to get a genuine (x,y) coordinate in the mapped space. Then you can, for instance, easily navigate to your (spatial) neighbor by incrementing X or Y. If I understand your last sentence correctly, this is not something you can do so easily with a plain array-ifying trick.


> I think since S2 is modeling the entire sphere's surface, it is fully dense. A sparse version might be "only include the cells that contain land," if that were all that was needed for the application.

I mean, S2 can express any point on the surface, but you don't have to populate any such data structure for all the points, and normally wouldn't. Generally S2 containers are sparse, and not using any custom geo-container code.

Background: S2 cell ID are compact 64 bits, but still contain the size of the cell (roughly, "more trailing zeroes mean we're talking about larger and larger cells"). That is, you can cover "continental US" with just a few cell IDs.

(I'm not a Googler but) The way S2 is normally used, cell IDs are just keys in an ordered key-value store, and if you want to ask for example "what country contains point P", you store cellId->country mapping with as large cells as possible. Then you'd look up the key closest to P in the store. Cell ID 101101... is contained inside cell ID 10110... is contained inside 1011... etc, so all you need to do is find the node with the max cell ID <= the point you are looking for.

B+tree etc are all great for it. For an in-memory array, a binary search would work. No need for fancy custom data structures, just general key-value storage where you can find the key near input P (that is, ordered, not hashmap).


As yencabulator said, you can do point queries quite easily when indexing using s2 cells. But you can also do range queries efficiently, you just have to be a bit more careful with creating the query cells.


It's the S2 cell ids (S2CellId in the library). See also this slide deck: https://docs.google.com/presentation/d/1Hl4KapfAENAOf4gv-pSn...


At one point I wrote a quad tree as part of the Map in SLAM to store probabilities of a point containing an obstacle on a 2d map, the thinking was I could use the quadtree to then navigate the robot.

The reality was for the detail I needed at the scale I needed quad trees sort of sucked at this (university is a great place to try and fail at things, I learned a bunch!)

I'd be really curious what is used now, if there's some shape estimations to collate points into blobs that are cheaper to store and navigate around. Been awhile since I read any books on robotics and navigation and SLAM though.


Understanding how quadtrees work was how I finally managed to develop an intuitive grasp of how octrees work. Great to know for any budding voxel enthusiast


Inspired by this post two years ago, I wrote a more comprehensive version: https://chidiwilliams.com/post/quadtrees/. Also showed how quadtrees could be used for compression.


I feel like quadtrees are almost never the best data structure for a given problem. The rare exception being HashLife.


Quadtrees have the distinction of being the simplest space decomposition data structures that also manifest every pathology and difficulty of said data structures. It is a great data structure to study for this reason.

While they work well in some applications, many people opt for one of two algorithm selection strategies as an alternative. First, select a more narrowly tailored algorithm that fits their application domain and requirements but avoiding many of the pathologies, giving up some generality but retaining the simplicity. R-trees are arguably an example of this (albeit a bit more complex). Second is to use more exotic and complex space decomposition algorithms that are significantly more difficult to implement in practical systems but retain at least as much generality while mitigating or eliminating most pathological cases.

A quadtree isn't a local maxima in the algorithm phase space in the same way a b+tree is.


What pathology and difficulty?

"Second is to use more exotic and complex space decomposition algorithms"

these being what

What is this: "algorithm phase space"


Quadtrees have exceptional scalability but also notorious issues with space complexity and selectivity for some common data models.

Space complexity is unbounded space for non-scalar types, hence why r-trees are used in practice for things like polygons. They have poor search selectivity as a function of index size for many data models, quasi-monotonic data like time-series being an extreme case. In alternative algorithms that try to preserve scalability, these issues are handled via a combination compressing out non-selective branches of the index, for which there are many techniques in literature with various tradeoffs when applied to space decomposition, and by embedding the data model into a higher dimensionality index of a similar type, allowing the algorithm to be selective on additional properties of the data not for the purpose of search but to cause the data to organize in such a way that it is more resistant to manifesting selectivity and space complexity issues.

There are hundreds of described spatial indexing algorithms in literature that tackle the selectivity and space complexity issues in various ways, often by sacrificing the natural scalability of quadtrees in various ways. The space of possible algorithm designs with interesting tradeoffs is very large but bounded. Hanan Samet's canonical text on the algorithm design space weighs in at over a thousand pages and still doesn't cover the full scope of notable algorithm design clusters in public literature.


what else would you use for spacial subdivision? bsp trees?


Something adaptive. Maybe AABB trees depending on the context.


Why cant you use an N tree, what's special about 4?


Maybe having the struct size be constant allows for better space locality in memory? You never have to reallocate nodes, you can pack everything into an array.


I'm now very curious on this. My gut was more that it is 2^(number of dimensions) that is important here. As such, for a single dimensional storage of data, standard binary tree. Add in another dimension, but still want to halve the search space with every comparison, add another branch factor to the tree.

That said, this implies that for 3d space, you would want 8 way trees? But, I don't think I've ever heard of that being done/used.


8 way trees are used for 3D, they’re called Octrees.

https://en.wikipedia.org/wiki/Octree?wprov=sfti1


I'm now super amused with myself for not looking for "oct." Which, yeah, would have been the obvious prefix to look for.


> My gut was more that it is 2^(number of dimensions) that is important here.

Yes, it's just binary search but applied to every dimension.


This was my thinking, exactly. I have not tried implementing one before.


In ray-tracing, people debate back and forth between kd-trees, octrees, quad-AABB trees, oct-AABB trees, and I’m sure there are a few more.

I think quad-AABB has been the most popular option for a while now.


Generally the most popular is a BVH system between objects and a kd-Tree within objects, in my experience.


I'm a noob to DS&A, but I'm writing a little add-on for Blender and have come up against BVH and kd-trees. What makes one more popular for 'within objects' and the other for 'between'?


BVHes handle sparsity and very large scenes with variable density very well, so they are good to be able to see which objects you might intersect. kd-Trees are much better for objects because they handle the high density of geometry very well and are thus very effective if you want to test against a million triangles in close proximity, for example.


Thanks!


So many things to read up on, now! :D Thanks for the pointers!


A B-tree does in fact have >2 children for only a single dimension.


Right. But that is more to optimize cache/block reads, right? Been way too long since I've looked at many of those details. :)


Yeah pretty much, a node in a B-Tree is designed to fill a single page of memory


I'm a fan of spatial hash grids for most of the things I would consider quadtrees for.

It's simpler to reason about and there's no monkeying around with trees when you add an element or move one around.


Discussed at the time:

An interactive explanation of quadtrees - https://news.ycombinator.com/item?id=7668628 - April 2014 (19 comments)




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: