I love quad and octrees but it's good to note that there's quite a bit of literature on them and for a good reason.
I'm by no means an expert on them but here are some things worth knowing if you plan on using them:
1) The implementation shown is called a point region quadtree (pr quadtree) and subdivides the space equally for all divisions. This may seem like a neat approach until you consider what happens when there are two points very close to each other. Because you need each point to be in it's own node, you need to keep subdividing the region, which can result in a theoretically unbounded depth. There's a variant called compressed pr quadtree which handles this problem by removing all nodes that only have 1 non-empty child. This reduces the depth appropriately.
2) There is another strategy for dividing the space called point quadtrees (p quadtrees) which divides the regions at the point you are inserting. This gives an upper bound of the tree depth as the number of nodes inserted, which is still undesirable. To fix this, you have to choose the correct order for point insertion to obtain a well balanced tree with log_4(n) height.
3) The non-academic implementations I've seen do clever things like setting a maximal subdivision number and allowing multiple points within a leaf.
4) There are often alternatives to quadtrees that are simpler and usually effective enough such as subdividing the region into an mxn grid of equal proportion and using the grid squares as buckets for the elements that fall within them.
Spatial data-structures are kinda fun but can have gotchas.
Simply mentioning PR and P Quadtrees tipped me off since that class is the only time I've ever heard of those data structures mentioned. Then I checked out your website and saw you went to UMD. Case closed, haha.
There is another strategy for dividing the space called point quadtrees (p quadtrees) which divides the regions at the point you are inserting. This gives an upper bound of the tree depth as the number of nodes inserted, which is still undesirable. To fix this, you have to choose the correct order for point insertion to obtain a well balanced tree with log_4(n) height.
It seems like you could balance as you went along in a similar fashion to balanced binary trees such as b-trees or AVL trees.
Cool visualization. I hadn't heard of quad trees and this presented them in a very clear way. Within a couple of minutes I understood how they work and what they are used for (the hit detection use-case particularly resonated with me).
Couple of minor gripes, though. It encourages you to play with the first pair of tree/map, which involves a lot of scrolling (which is stolen and zooms if I accidentally hover the tree). This would've been a lot easier if I'd been encouraged to move on and see the side-by-side (or maybe omit the bigger versions completely).
Also, you asked my biggest question ("Why four children?") and then completely failed to answer it. This looks like it could be implemented trivially as a binary tree (twice as deep, of course). Why 4? Why not 2, or 8? Arguing "by definition" tells me nothing.
A quad tree is just a 2 dimensional representation of binary tree. Instead of splitting at the midpoint of 1 dimension, you have a split at the midpoint of 2 dimensions (resulting in 4 divisions).
A different way of putting it is, a binary tree partitions a line, a quad tree partitions a plane. An octree (http://en.wikipedia.org/wiki/Octree) partitions a space, and so on.
Okay, that explanation sits a little better with me. The mathematical simplicity of splitting each dimension in half is very nice and clearly the simplest way to go when looked at in this way. Replacing the current non-answer with something along those lines would be a big improvement, I think.
A description of Quadkeys would be great here too.
The quadkey can be thought of as a representation of the route to a point, i.e each time a rectangle is divided, each of its quadrants is assigned a letter A-D (or number 0-3) and the quad key for a point is the concatenation of those labels from the top of the tree down to (typically) an arbitrary depth.
If a quadkey is then used as an indexed field in a database, the question "All the points in a specific quadrant" is simply a prefix search
Very interesting! I developed a similar system myself, for dates on a timeline instead of spatial data. I just knew that comparing dates of every item was inefficient!
If my points are always moving and updating in position, is this method efficient? If not, what's a good way of finding nearest-neighbor for a set of data points that are constantly morphing?
I'm by no means an expert on them but here are some things worth knowing if you plan on using them:
1) The implementation shown is called a point region quadtree (pr quadtree) and subdivides the space equally for all divisions. This may seem like a neat approach until you consider what happens when there are two points very close to each other. Because you need each point to be in it's own node, you need to keep subdividing the region, which can result in a theoretically unbounded depth. There's a variant called compressed pr quadtree which handles this problem by removing all nodes that only have 1 non-empty child. This reduces the depth appropriately.
2) There is another strategy for dividing the space called point quadtrees (p quadtrees) which divides the regions at the point you are inserting. This gives an upper bound of the tree depth as the number of nodes inserted, which is still undesirable. To fix this, you have to choose the correct order for point insertion to obtain a well balanced tree with log_4(n) height.
3) The non-academic implementations I've seen do clever things like setting a maximal subdivision number and allowing multiple points within a leaf.
4) There are often alternatives to quadtrees that are simpler and usually effective enough such as subdividing the region into an mxn grid of equal proportion and using the grid squares as buckets for the elements that fall within them.
Spatial data-structures are kinda fun but can have gotchas.