> Say, you want to know [people in SF who eat sushi]....If an application external to a database was executing this, it would do one query to execute the first step. Then execute multiple queries (one query for each result), to figure out what each person eats, picking only those who eat sushi.
A query like that in SQL could also suffer from "a fan-out problem" and could get slow. It's often faster to put subqueries in the From clause than the Select. It's certainly faster than an app taking the rows of one query and sending new queries for each row, as many developers do. For example:
select
p.name,
(
select max(visit_date)
from visits v
where v.person = p.id
) as last_visit
from people p
where p.born < '1960-01-01'
can be slower than:
select p.name, v.last_visit
from people p
join (
select person, max(visit_date) as last_visit
from visits
group by person
) v on p.id = v.person
where p.born < '1960-01-01'
In the second example, you first form a new table through a subquery of the original. This is not what a new programmer would first try. The first example, with the subquery in the Select clause, is closer to the train of thought. Also you would guess that getting the last visit dates of each person is more efficient after you know who to look for (like, only the people born before 1960). But in my experience, it often hasn't been.
Therefore likewise with this San Francisco sushi query, I was thinking that if it were SQL then I would (1) get all people in San Francisco, (2) get all people who like sushi, and then (3) join them, to find their intersection. Lo and behold, I then read that it is the same solution in this humongous graph database:
> The concepts involved in Dgraph design were novel and solved the join-depth problem.... The first call would find all people who live in SF. The second call would send this list of people and intersect with all the people who eat sushi.
I don't want this to become a flame war between SQL and Graph. But, we see a lot of developers come from SQL to Dgraph, because the join performance of SQL gets worse as the data size grows, and so devs have to resort to doing aggressive data normalizations. Even more of a problem when your entire dataset is on a single machine.
> Say, you want to know [people in SF who eat sushi]....If an application external to a database was executing this, it would do one query to execute the first step. Then execute multiple queries (one query for each result), to figure out what each person eats, picking only those who eat sushi.
A query like that in SQL could also suffer from "a fan-out problem" and could get slow. It's often faster to put subqueries in the From clause than the Select. It's certainly faster than an app taking the rows of one query and sending new queries for each row, as many developers do. For example:
can be slower than: In the second example, you first form a new table through a subquery of the original. This is not what a new programmer would first try. The first example, with the subquery in the Select clause, is closer to the train of thought. Also you would guess that getting the last visit dates of each person is more efficient after you know who to look for (like, only the people born before 1960). But in my experience, it often hasn't been.Therefore likewise with this San Francisco sushi query, I was thinking that if it were SQL then I would (1) get all people in San Francisco, (2) get all people who like sushi, and then (3) join them, to find their intersection. Lo and behold, I then read that it is the same solution in this humongous graph database:
> The concepts involved in Dgraph design were novel and solved the join-depth problem.... The first call would find all people who live in SF. The second call would send this list of people and intersect with all the people who eat sushi.