Both Cayley and TitanDB aren't native graph databases. In fact, Cayley supports many storage engines, including MongoDB etc. This is because both are graph layers, and the data maintenance is done via the real database underneath. This has benefits in the sense that it's easier to build, but also doesn't perform as well when it comes to query latency.
DGraph, OTOH, is a native graph database. We do use RocksDB, but the data distribution and maintenance is done by us. It's optimized for decreasing the number of network calls, to keep them linear to the complexity of the query, not the number of results. This is of incredible value when you're running real time queries, serving results directly to end user. The query latency hence, isn't too affected by high fan out of intermediate results and should remain low; while providing high throughput.
In fact, the entire HN traffic is being served by one GCE n1-standard-4 instance right now, using all 4 cores really well :-).
1. I wouldn't term anything up until 1.0 as production ready.
2. API doc? Basically, there's only one endpoint, called /query. All the queries just go through that. There's a wiki page with some test queries to get you started.
4. It's truly distributed. The data is actually sharded, with each shard containing part of the data and served by a separate instance. The bulk loader instructions generate 3 instances.
5. To keep the queries, data storage and data transfer efficient, we assign a uint64 ID to all entities. UID assignment is that operation.
I take that it's a self-funded project? Good luck, and I hope you hit production sooner. You've got any roadmaps for us to keep track of?
A few Qs abt the storage layer:
DGraph supports replication too, in case a node fails...?
Given your description, I take that you've implemented a custom data distribution protocol on top of rocksdb? Do you have plans to extract this 'distributed rocksdb' out to its own implementation? How would something like this compare to actordb.com and/or rqlite?
We have funding now, would be made public soon. So, we have enough to keep us going for a bit, and focus solely on the engineering challenges.
DGraph would support high availability, which means all our shards would be replicated 3x across servers, so in case one server fails, the shards would still be available for querying and mutations. In addition, shard movements to other servers would happen so the replication factor remains the same. We aim to achieve this using (Etcd's) RAFT protocol, by version 0.4.
RocksDB is just a medium for us to have something between the database and disk. All the data arrangement, handling, movement etc. happens above RocksDB. So, no there's no "distributed rocksdb" here.
For me the demo doesn't work, CORS violation while trying to access http://dgraph.xyz/query. (EDIT: manually accessing it sends me in a cloudflare(?) captcha, that might mess with the query?)
DGraph, OTOH, is a native graph database. We do use RocksDB, but the data distribution and maintenance is done by us. It's optimized for decreasing the number of network calls, to keep them linear to the complexity of the query, not the number of results. This is of incredible value when you're running real time queries, serving results directly to end user. The query latency hence, isn't too affected by high fan out of intermediate results and should remain low; while providing high throughput.
In fact, the entire HN traffic is being served by one GCE n1-standard-4 instance right now, using all 4 cores really well :-).