> The performance of a graph traversal in all current implementations will be at...

jandrewrogers · on Sept 12, 2021

These methods only work for very small graphs though. Secondary indexing structures are inefficient and don't scale at all regardless of the implementation details, hence why even non-graph systems don't use them at large scales on single machines. The scale threshold where using indexes is worse than the alternatives is low. Direct representation without indexes (like your Neo4j example) are even worse due to the latency introduced for each step by very poor locality. There is no possible version of these methods that will work well; this would have been a solved problem 30 years ago otherwise.

Latency is the performance killer for graph traversals, so locality is everything, and preserving locality across recursive traversals is the critical computer science trick that is often missing. The computational cost of a comparison is irrelevant to the performance of graph traversal.

The way most "pure" graph systems implement graph traversals is unusably slow outside of trivial cases because they mitigate latency poorly. In practical graph systems, common traversals are typically materialized ahead of time so that a graph traversal operation is not required to satisfy queries, which defeats the point in many cases.

rajman187 · on Sept 12, 2021

Very well summarized, particularly about the point of locality.

In terms of intractability large graph problems, here’s an example of a scalable solution that provides incredible performance (measured in terms of response times, queries per second, etc) by Facebook to represent the social graph https://www.usenix.org/system/files/conference/atc13/atc13-b...