Hacker News new | past | comments | ask | show | jobs | submit login

The data gets sharded, and can be served by different servers; all talking to each other to respond to queries. This is how v0.2 works, which is what the demo is running right now.

A typical RDF data is (subject, predicate, object). We shard the data based on predicates. So, all the RDFs corresponding to one predicate are on one server. This allows us to find, say lists of friends of X really quickly; with a single lookup.

I think PageRank should be relatively straightforward for DGraph, provided we have edges in the right direction. We don't automatically generate a reverse edge, it has to be provided, if needed for queries.




"We don't automatically generate a reverse edge"

Wait, then you don't actually have relationships as first class citizens? Does every "thing" know what is connected to it? I mean, where do you draw the line between graph databases and an object database with one way links?


The problem with automatically generating reverse edges is that it causes data explosions and duplications -- for e.g., if you're adding facts like X -- IS_A --> Human; then automatically generating the reverse would cause a Human -- Reverse(IS_A) --> X; which would list all the humans on the planet. Whichever machine serves this list would immediately run into memory issues, not to mention, any query using such a relationship is going to be very slow.

DGraph uses type schema to understand the relationships an entity can have. So, you can have an entity of type A, where A has relationships R1, R2, and R3. So, you can then deduce that A has relationships R1, R2, and R3. Of course, each entity can be of multiple types (for e.g. Tom Hanks is an actor and a director).

This approach is very scalable, because it avoids unnecessary scans over the distributed database to find all the relationships an entity can have. Rather, utilizing a schema to deduce such information; and then hitting the right servers to get the data.


Wait, I don't understand. If I create a relationship "User 1" -[:LIKES]->"Chocolate Ice Cream". Does the "Chocolate Ice Cream" know which users liked it or not?


My takeaway is that "Chocolate Ice Cream" would not know about "User 1" unless an explicit relationship "Chocolate Ice Cream" -[:liked_by]->"User 1" is created


I'm guessing not.

This isn't unusual - it's somewhat analogous to the derived (inferred) relationship thing in RDF-style graph databases (eg, Sydney is-in Australia, Australia is-in Oceania, therefor Sydney is-in Oceania).

This sounds like a great idea, but in practice doesn't always work so well. You end up with an explosion of relations, some of which are completely useless.

I'm not opposed to this decision being a choice.


Since you mentioned RDF - are you planning SPARQL?

Not a huge fan of SPQRQL, but OTOH I don't know GraphQL at all so I can't comment sensibly on a comparison.


Up until v1.0, we're only looking at GraphQL. After that, we'll consider adding other languages depending upon the demand.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: