SQL might be a good fit to model Knowledge Graphs, since FOREIGN KEYs can be named, using the CONSTRAINT constraint_name FOREIGN KEY … syntax. We thus have support to label edges.
This kind of approach is pretty common, including in compute engines like Spark's graphx. I suspect a lot of teams using graph DBs would be better off realizing this: it's good for simple and small problems
it does fall down for graphy tasks like multihop joins, connect the dots, and supernodes. So for GB/TBs of that, either you should do those outside the DB, or with an optimized DB. Likewise, not explicitly discussed in the article, modern knowledge graphs are often really about embedding vectors, not entity UUIDs, and few/no databases straddle relational queries, graph queries, and vector queries
> it does fall down for graphy tasks like multihop joins, connect the dots, and supernodes.
These can always be accomplished via recursive SQL queries. Of course any given implementation might be unoptimized for such tasks. But in practice, this kind of network analytics tends to be quite rare anyway.
One should note that even inference tasks, that are often thought of as exclusive to the "semantic" or "knowledge" based paradigm, can be expressed very simply via SQL VIEW's. Of course this kind of inference often turns out to be infeasible in practice, or to introduce unwanted noise in the 'inferred' data, but this has nothing to do with SQL per se and is just as true of the "knowledge base" or "semantic" approach.
I mean performance breaks: asymptotics catch up to you on bigger data. And again, most graphs are small so often fine, or can be done out-of-DB by sending the client the subgraph
yes, you can always map most structures into tables, or even excel.
but it think "good fit" is a stretch. when designing systems you generally want to look at data access patterns, and pick a data exec approach that aligns to that.
in tech, unfortunately, RDBMS are the "hammer" in "if your only tool is a hammer then every problem looks like a nail."
Honestly I think people's assumption that graph databases must be better in representing binary relations might be a bit optimistic. After all there's no reason relational databases (named after the n-ary relationships that tables represent) couldn't handle binary relations.
The one thing that's definite is that SQL is a bad choice for particular kinds of queries, though most graph databases don't seem to go much further than improving (?) the syntax a little bit and adding transitive closure (which is also present in several SQL databases). A few graph databases do allow for more complex (even arbitrary) inference, but this somehow never seems to make the headlines.
Well the most useful way I've found of classifying stuff is by analogy with formal grammars. If you view the kind of paths you can query as the set of words in a language then you get something like:
However the hierarchy of languages doesn't end there. And some graph databases allow you to add arbitrary grammar rules, which make it possible to add some complex rules like:
- If condition X,Y and Z is satisfied then person A and B are the same person
- Equality is transitive
- If two people are equal then each property of one is also a property of the other.
The part that makes this tricky is that figuring out two people are equal can cause other people to now suddenly satisfy the condition for equality.
You could also construct some examples by creating conditions which aren't "regular" (e.g. person A and B have an ancestor which is the same number of generations back for both of them).
Graph databases likely are more optimized for this sort of data storage, but you've hit it on the head that SQL databases can be used to represent node/edge style data.
Nodes = Tables
Edges = Foreign keys
Edge labels = Foreign key constraint names