Hacker News new | past | comments | ask | show | jobs | submit login

This reminded me of SQL.

> Say, you want to know [people in SF who eat sushi]....If an application external to a database was executing this, it would do one query to execute the first step. Then execute multiple queries (one query for each result), to figure out what each person eats, picking only those who eat sushi.

A query like that in SQL could also suffer from "a fan-out problem" and could get slow. It's often faster to put subqueries in the From clause than the Select. It's certainly faster than an app taking the rows of one query and sending new queries for each row, as many developers do. For example:

  select
      p.name,
      (
          select max(visit_date)
          from visits v
          where v.person = p.id
      ) as last_visit
  from people p
  where p.born < '1960-01-01'
can be slower than:

  select p.name, v.last_visit
  from people p
      join (
          select person, max(visit_date) as last_visit
          from visits
          group by person
      ) v on p.id = v.person
  where p.born < '1960-01-01' 
In the second example, you first form a new table through a subquery of the original. This is not what a new programmer would first try. The first example, with the subquery in the Select clause, is closer to the train of thought. Also you would guess that getting the last visit dates of each person is more efficient after you know who to look for (like, only the people born before 1960). But in my experience, it often hasn't been.

Therefore likewise with this San Francisco sushi query, I was thinking that if it were SQL then I would (1) get all people in San Francisco, (2) get all people who like sushi, and then (3) join them, to find their intersection. Lo and behold, I then read that it is the same solution in this humongous graph database:

> The concepts involved in Dgraph design were novel and solved the join-depth problem.... The first call would find all people who live in SF. The second call would send this list of people and intersect with all the people who eat sushi.




Now consider doing this in a distributed setting, with SQL tables splits across dozens or 100s of machines (see Facebook TAO).


So are there any benefits in dgraph over SQL/RDBMs(matureness and widespread query lang) for single machine setups?


I don't want this to become a flame war between SQL and Graph. But, we see a lot of developers come from SQL to Dgraph, because the join performance of SQL gets worse as the data size grows, and so devs have to resort to doing aggressive data normalizations. Even more of a problem when your entire dataset is on a single machine.

For e.g., Stack Overflow normalizes their data to avoid doing joins: https://archive.org/download/stackexchange


Did you mean denormalize?


Yes, I meant denormalize. Thanks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: