I wonder, why aren't graph databases used more often? Why is neo4j relatively alone?
It seems obvious to me that graph databases are much more parallelizable AND more scalable, since you are essentially able to break up parts of the graph into their own computing nodes quite easily.
The lookups are usually O(1) instead of O(log N) and instead of indexes and table scans to do joins you literally just traverse a graph at runtime. Plus you have more flexibility because instead of relational algebra you can literally run any code at any poit to walk a graph.
Why aren't they supplanting relational databases despite being faster and more parallelizable and more powerful?
Using the word 'literally' doesn't magically imbue speed into a system. Traversing a graph -- how does that work in a transaction? Is it going to be quicker than striding a packed in-memory hash?
Simple. You store the exact pointer to related data, so you go and get it in O(1). In a join, you have to do a O(log N) search through an index. And all indexes usually have to be loaded into memory, to boot.
How would that work in a scale-out, distributed cluster? What is a pointer? How do I figure out what machine an object is really located? What happens if that machine is down? What if I want to move the object/rebalance the cluster? How do I keep multiple copies of an object (for e.g. fault tolerance)? How do I figure out which copy is the right one?
How do I organize the pointers? Would I use a hash table? A tree? A graph? How would that data structure be distributed? Would every machine store a copy of the lookup data structure, or just some specific machines? What if those machines fail? How do I maintain copies? How do I keep the lookup data structure up to date?
It seems obvious to me that graph databases are much more parallelizable AND more scalable, since you are essentially able to break up parts of the graph into their own computing nodes quite easily.
The lookups are usually O(1) instead of O(log N) and instead of indexes and table scans to do joins you literally just traverse a graph at runtime. Plus you have more flexibility because instead of relational algebra you can literally run any code at any poit to walk a graph.
Why aren't they supplanting relational databases despite being faster and more parallelizable and more powerful?