Thanks for sharing these details! This looks like a completely new implementatio...

ISV_Damocles · on Jan 13, 2018

Yep, Dr. Sahr came to the office a few times when we were first getting the original version done. He implemented the core of the algorithm and we took on many of the utility methods, most of them in https://github.com/uber/h3/blob/master/src/h3lib/lib/algos.c

Then we dug in on code formatting, performance tuning (we've removed almost all of the H3IndexFat struct representation usage and switched most things to bitwise operations), and testing coverage.

As for the bindings, I don't know which will be open sourced, so I can't say more, but we had asked Dr. Sahr to make sure the API itself never allocates memory, it must be passed in allocated memory, so it's not too hard to make bindings that work with both manual memory management and garbage-collected languages.

Making the emscripten-compiled front-end Javascript "binding" was a lot of fun, though. :)

dvanduzer · on Jan 13, 2018

Awesome! It has been awhile since we spoke, but it is immediately obvious how much cleaner this codebase is compared to the earlier research work. Even if I have to write my own bindings, this is a huge leg up.

A lot of the stuff I'm wanting is stuff I wouldn't expect Uber to care about, but it doesn't hurt to ask. Did you look at implementing pathfinding? (As I recall, Dr. Sahr said A* should be easily implementable in this scheme. Elsewhere in the thread someone mentioned that joins are not easy. Any other tidbits like that that came up?)

jandrewrogers · on Jan 13, 2018

Spatial join performance and scalability is sensitive to tessellation geometry, particularly when dealing with polygons and other non-point geometries or if you have high precision requirements. The abstract join algorithms are similar but it is much more difficult to make them rigorously correct in the general case with hexagonal tessellations, and they'll never be particularly fast at scale.

One of the main reasons that equal-volume cubic tessellations have emerged as the default choice for high-scale analytical DGGS is that they are nearly optimal for scaling out spatial joins between arbitrary geospatial data models. And relatively optimal in most other regards as well, especially computationally; the primary "downside" is that they are 3-dimensional, which is slightly wasteful, though more and more geospatial analysis applications make good use of a direct 3-dimensional representation.

dvanduzer · on Jan 13, 2018

Could you point me at any references on equal-volume cubic tesselations? I've found Dr. Sahr's work on DGGS incredibly accessible given my rudimentary understanding of the math disciplines involved.

An unfortunate aspect of all this, is how few good implementations of these algorithms exist outside of big commercial GIS packages. I'm extremely grateful that a public university financed this particular research originally, or we might not have gotten a well funded open source library.

jandrewrogers · on Jan 13, 2018

So a few things:

I have been writing a blog post that elaborates on the specification of what I believe is the state of the art DGGS for most applications. It is a specification of the best DGGS I know how to design. This is not proprietary IP, just esoteric knowledge. Will be pushed sometime over the next few

If you look closely, I am one of the authors of the standards for such systems. :) There is a boundary where cartographic systems and technology cease to be useful.

One of the big advantages of the 3D embedding DGGS is that the math is dead simple compared to the forced 2D versions. They are extremely powerful in terms of expressiveness, performance, and precision but also relatively transparent. The mere fact of attacking the 2D problem in 3-space reduces its complexity. People just aren’t used to it. The implicit dimensional reduction of 2-space has consequences. In a few years I think all geospatial data will be handled this way.

espeed · on Jan 13, 2018

Isn't the prevailing wisdom to skip 3-space and go directly to 4-space, which makes things even simpler and more powerful -- note 4D projective space (homogeneous coordinates, not quaternions and not counting a time dimension).

Reasons for going to 4D are numerous and varied, such as symplectic geometry only works in even dimensions, image/video reconstruction, matrix multiplication, transformations, and other reasons related to computer vision, gauge theory, and topology.

ISV_Damocles · on Jan 13, 2018

Shortest path was something we were thinking of implementing, but yes, we didn't have an immediate need for it so we didn't. The main complexity is switching between different IJK coordinate systems across different icosahedron faces.

If you only need to know the hex grid distance between the hexagons, but not the actual path, there's a quicker algorithm looking for a common parent between the hexagons (but can have unpredictable slower paths when the hexagons don't share the same base cell, and might have to fall back to an A* algo in that case, anyways).

I don't know what exactly they mean by "joins" elsewhere. We usually use it as a hash index, but the index has a gosper-like curve so b-tree indexes work on it, too, for particular use-cases, and you can also use the parent-child operations (just some bit twiddling) to get approximate area joining super cheap, too.