Hacker News new | past | comments | ask | show | jobs | submit login
Graph database reinvented: Dgraph gets $11.5M to pursue unique, opinionated path (zdnet.com)
189 points by bryanrasmussen on July 31, 2019 | hide | past | favorite | 93 comments



Congrats on funding round, we need more options in this space. I'm a Neo user today, and I've said before that while I don't think most people will switch to graph databases, at some point I do think most new projects will use them. I chose Neo precisely because of py2neo.

The decision to adapt GraphQL instead of supporting Cypher, what would you say were the big trade offs?

As a startup, your price point is in the 20k/y ballpark, but GrapheneDB/Neo comes in a lot lower. What would I get for paying that much more?

I find that the audience for something that operates at the level of abstraction that a graph does tends to be different than the user who makes decisions based on lower-level features. e.g. design level problems vs. optimization problems.

Would you say more that you can get Mongo and Redis (+redisgraph) users to switch to Dgraph, or instead more that the main customers for graph databases are still in school right now and deciding what tech skills will underpin their careers?

Fascinating space and looking forward to trying the product.


(Dgraph founder here)

> The decision to adapt GraphQL instead of supporting Cypher, what would you say were the big trade offs?

Cypher was built around SQL, with the idea that people are used to SQL. Therefore, by having something similar, people would find it easier to adapt. Many years later, I'd say it still hasn't gained as much popularity beyond Neo4j as sometimes projected.

We bet upon GraphQL in 2015, because it was easy to understand, worked with JSON and resulted in sub-graphs. While Cypher and Gremlin return lists of things. One can go from subgraph to lists, but not the other way around, because relationships are lost.

Within 3 years of draft spec, GraphQL has taken the developer world by storm. Developers find it easy to wrap their heads around and enjoy working with it. In fact, GraphQL+- (our language) has become one of the USPs of Dgraph.

> Would you say more that you can get Mongo and Redis (+redisgraph) users to switch to Dgraph, or instead more that the main customers for graph databases are still in school right now and deciding what tech skills will underpin their careers?

We have heard from many user who would have used MongoDB or SQL if not for Dgraph. With Mongo, you get scalability, but at the cost of query complexity and transactionality (/correctness). Keys and Docs are nice, but they don't let you use relationships to interact with a wider dataset. With Dgraph, you get scalability, while also gaining a much richer and sophisticated querying ability, transactionality (distributed txns), correctness without losing query performance. My blog post says more about this.

Redis is a cache, so people are not directly comparing it against Dgraph, a persistence-first database. One could use Redis to perhaps store Dgraph results.


Hey Manish! Congratulations to you and your team on the round! It's a phenomenal achievement.

You mentioned transactions... I work for MongoDB so I just wanted to add that MongoDB does have ACID transactions since version 4.0. Distributed transactions will be included in the upcoming 4.2 release. There's more info on distributed transactions in the 4.2 release here if you're interested: https://docs.mongodb.com/master/core/transactions/

By the way, it's awesome to see you're using Go. I just got into it recently and it's such a nice language to work with.


Good to hear Mongo would have distributed transactions. Go is awesome. I picked a t-shirt from Mongo's booth at Gophercon, nice stuff!


Oh I was just at GopherCon! I was on the Mongo booth.


If anyone on your team is interested, we are working on a new plain text graph database called TreeBase: https://treenotation.org/treeBase/. Perhaps there are ideas you all could take away that would help make your products better.

I've been following Dgraph for a while and your GraphQL+- iteration on GraphQL. Clearly you folks understand the semantics and how to build a database that can handle massive scale.

Our focus is on much smaller databases (at the moment, nothing larger than what would be a few million rows), but I think the syntax we have makes the whole thing easier to understand and makes data more accessible to users. Maybe this is completely irrelevant, but if there might be something of value to your users here, feel free to have someone get in touch: yunits@hawaii.edu. (we are trying to get help advancing the state of the art in DB to make cancer and medical research easier)


> Redis is a cache

Redis is much more than a cache, you can model graph data structures with Redis primitives, and the redisgraph module compares directly against Dgraph.


Yeah, sorry for missing Redisgraph.

The entire graph is stored in memory and only spans one server. I personally am not sure what happens when that server goes down. How many minutes worth of data is lost? What's the replication strategy for another server to take over the traffic? And what about transactions and correctness?

Those things matter a lot to Dgraph users who are storing and serving their source of truths. Horizontal scalability, persistence, correctness and avoiding downtime with consistent replication have been cornerstones for the design of Dgraph.


It's a bit disappointing that you would talk down a competitor without knowing much about them, and then, to add insult to injury speculate about all kinds of things without having facts to diss them some more.


I'm more disappointed that a founder answering questions has somebody jumping down their throat for a perceived slight in a tiny portion of their reply. But that's HN.


You're misinterpreting me. I haven't talked down about them. And definitely don't think of them as a competitor.

My point was that those questions I mentioned above are important to Dgraph users and I don't think Redis even projects itself as a persistence-first DB. I might be wrong. Either ways, our audience is quite different when it comes to enterprises at least.

I know the founders of Redis Labs, met them as well and they're awesome people.


Thanks! We're very happy to finally share the announcements.

We bet on GraphQL very early on, right after the initial specs came out, as their language was surprisingly similar to what we were working on. We have not looked back since then, as GraphQL growth is pretty impressive as the creation of GraphQL foundation.

We are considering supporting cypher because many people are actually quite used to it, it's an open specification, and would also help us make more "apples to apples" comparisons with other databases (hi, Neo4j!).

For now, though, we're focusing on providing native support for pure GraphQL (not only GraphQL+-) but we don't discard adding Cypher later on.

As for the pricing, this is for a cluster with 2 nodes (1 alphas + 1 zero) and includes support and access to enterprise features. This, actually, ends up being often more affordable than our competition (or so our customers have told us).

We believe it's time for Graph DBs to become the default storage system, in the same way people consider SQL and no-SQL options nowadays, so our target audience is much wider than just people interested on graph databases.

Lastly, regarding Mongo + Redis I was not aware of this so I'll be checking it out soon and then I can give you my opinion :)

Happy to continue the conversation on slack.dgraph.io or discuss.dgraph.io!


FWIW, I would LOVE to see openCypher supported on dgraph! I find it far more expressive and easier to work with for many types of traversals and logical operations.


Very cool, thank you. I'm also eyeing the Janus graph for my next project, mainly because I'm interested in the Compose platform, and it's nice to have competing approaches. Will watch closely.


> As a startup, your price point is in the 20k/y ballpark, but GrapheneDB/Neo comes in a lot lower. What would I get for paying that much more?

Should have tackled this in my other post. But, let me approach this question from two angles.

Is it worth paying $20K for any DB or DB support? If it would save you 1/10th of an engineer per year, it becomes immediately worth. That means, can you avoid 5 weeks of one SWE by using a DB designed to better suit your dataset? If the answer is yes (and most cases it is), then absolutely that price is worth. See my blog post about how much money it must be costing big companies building their graph layers.

Second part is, is Dgraph worth paying for compared to Neo or others? Note that the price is for our enterprise features and support. Not for using the DB itself. Many companies run a 6-node or a 12-node distributed/replicated Dgraph cluster and we only learn that much later when they're close to pushing it into production and need support. They don't need to pay for it, the distributed/replicated/transactional architecture of Dgraph is all open source.

How much would it cost if one were to run a distributed/replicated setup of another graph DB? Is it even possible, can it execute and perform well? And, when you add support to that, what's the cost?

I have no doubt, when you consider the factors of scalability, Dgraph comes out much cheaper.


Neo4j basically only runs in single-node mode with optional replication for failover. You can't really do distributed graph queries without losing all the advantages of a graph db (namely following links is constant-time).

I haven't looked at DGraph much but if they are trying to store the graph in a distributed manner then the use-cases will be different.

From experience, using GrapheneDB/Neo4j takes much less than 1/10th of an engineer / year to manage, so unless your data doesn't fit in 1 box you'd be better off with Neo4j


Isn't this the trap that most startups fall in? "We don't have the money but we do have the engineers, lets build our own in-house solution for cheaper"

20k per year seems like an incredibly reasonable price for a managed distributed cloud store, especially when considering maintenance cost.


(Dgraph author here) We see that over and over again when talking to companies. In fact, my blog post[1] talks about that at length.

[1]: https://blog.dgraph.io/post/how-dgraph-labs-raised-series-a/


> If it would save you 1/10th of an engineer per year, it becomes immediately worth.

This ignores second-order effects.

Is it worth limiting yourself to an ecosystem with only users who are ok paying $20k per year, with "open source" development but all development activity done by one company that is trying to make a profit off something unproven, and then tie your core business data to it? Maybe. Not so clear cut though.


Hi there, I lead product at Dgraph and I'll be happy to answer any questions you might have.

We're very happy we're finally able to share our fundraise, and this is just the beginning of many more features and improvements!

PS: we're hiring ;) https://dgraph.io/careers


I work with hypergraphs, where edges have an arbitrary number of nodes. I'm quickly looking through your documentation, Discuss, and GitHub issues/roadmap to try and find out if edges may have more than two nodes.

This has been asked about already: https://github.com/dgraph-io/dgraph/issues/1#issuecomment-40...

While we have your attention on HN, could you comment? (Sorry if I missed this information elsewhere.)

Thank you!


Hi Richard,

What applications/problems are you solving using hypergraphs? There is definitely a dearth of high performance hypergraph processing engines/systems.


(Dgraph founder here) Dgraph's edges only have two nodes, sorry.


in what scenario would you need an edge with two nodes rather than just making a node to represent the connection


Hi, I'm looking for a graph-friendly DB that can sync well between many types of client (desktop, mobile) and server, similar to CouchDB/PouchDB. I found [1] below that seems relating but no further discussion. Could you give some more information about that?

Thanks, and congrats on the new funding!!

[1]: https://discuss.dgraph.io/t/data-sync-between-clients-server...


I think Dgraph is not a great choice for this right now, as it seems you're looking to be able to use those different replicas offline too?

We use RAFT to reach consensus on transactions, so the nodes need to be able to communicate with each other.

I'm definitely adding this as a feature request though, I would love to chat with you more about your requirements if you're interested (francesc@dgraph.io).

Also, have you considered CRDTs? (https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...)


Thanks for your confirmation, and addition for a feature request (sorry for a bit late response). So, it will be more practical that I rethink my requirements first. Will send you email if there is further question!


Serious question: what are some practical examples of graph-database backed features providing significant value to “common” software applications (e-commerce, CRUD, CMS, CRM, etc.)? I lack a strong understanding of graph databases and their usecase and tend to learn best by seeing them in use in domains that I do understand.


I'm using a graph for a collaboration platform where the back end benefits from the idea of dynamic relationships between objects where someone "owns," "shares," and things "comprise," etc.

Instead of tacking an RBAC system onto a flat document management system, I just designed my data model as a set of objects and different types of relationships. Saves me the trouble of metadata, normalization, and access control just becomes a traversal instead of having to manage a separate policy store.

Anything that requires collaboration, sharing, user profiles, consent, etc.


For e-commerce, a big one is recommendations. But graphs are already known for that.

We see a lot of usage of Dgraph among developers who are building apps with social data. Or, platforms where they need to unite multiple different data sources from multiple relational DB silos.

To debunk the myth that Graph DBs are special-purpose, we've built two projects ourselves (open source) which put Stack Overflow and Twitter data on Dgraph and serve it. The links are in the blog post under Use cases.

https://blog.dgraph.io/post/how-dgraph-labs-raised-series-a/


The highest value use-cases of graph databases tend to not be customer-facing applications. Think about things which don't have a simple question and answer, but instead require deeper investigation into the connections of things. Things like: customer/employee 360, intelligence (business and government), compliance / reg tech, anti-fraud, anti-money laundering. In terms of customer-facing applications, then I guess recommendation engines is the main one.


In addition to e-commerce as mentioned above, any sort of 'related content' discovery engine to surface additional content to a user based on what they are viewing. A la YouTube related videos.


Using neo4j in a commercial app for healthcare data as a primary store, using cypher and graphql


Do you use the Neo4j GraphQL integration? [1]

Or some other approach for building your GraphQL layer on top of Neo4j?

[1] https://grandstack.io/docs/neo4j-graphql-js-quickstart.html


I would love to chat with you about your needs and requirements in the healthcare field!

If you're willing to share some info, my email is francesc@dgraph.io!


The blog post announcing this is quite informative: https://blog.dgraph.io/post/how-dgraph-labs-raised-series-a/


The post states "Most other graph databases or layers are built as sidekicks."

What specifically falls into that? As someone new to the space, I'd like to understand if he is referring to things like Neo4j, ArangoDB, etc..


Neo4j, at least as of about a year ago, was seriously lacking in constraints and guarantees to use it as a database of record. Performance-wise, it's fundamentally built to perform well at a small set of specific graph operations, while more normal DB operations or even doing graph things plus fairly ordinary data fetching with those graphs was, necessarily, anywhere from so-so to terrible, as far as performance. Looking at how N4j stores its data is enlightening. It's also a huge memory hog (Java, plus leaning hard on cached data in memory to keep performance tolerable due to the aforementioned trade-offs).

Didn't stop N4j's marketing folks from trying to sell it as suitable for basically any purpose. Gotta get that unicorn growth. Very early-MongoDB in that regard. "No, really, it's suitable for whatever you're doing—probably the best for it, in fact! Uh, what is it you're doing, again?"

That one, at least, is best suited to assisting some other, primary DB, for a certain narrow set of tasks.


Most graph databases are being used as secondary systems, often used as indexing for other databases, etc. We believe this is due to a lack of features or performance at scale.

Our goal at Dgraph is to be used as main storage similarly to how people use Postgres or Mongo.

Happy to ask Manish (the author of the post and CEO of Dgraph) for more details!


Congrats on the funding! I've been looking into Dgraph (and also played a bit with Badger), as well as other graph databases as a way to store chronographic event data, while enabling rich relationships between the observed artifacts belonging to each event. The problem is that the solutions I find seem an ugly hack compared to a relational solution. Can you point me to any specific Dgraph documentation or case studies for these kinds of workloads?


Let's chat! We're working on improving our docs specifically regarding the data modeling aspects.

If your use case is open source we could even use it as one of our case studies :)

francesc@dgraph.io


By chronological event data, you mean timeseries data? Dgraph can be used for storing that, though, it's not specifically designed to store data that "flat".

It should work conceptually. At least, for smaller datasets, it should be alright (gigabytes or something), but for bigger datasets (terabytes), I think a specific TS DB would make more sense.

However, you can take the aggregations from there and store that along with relationships into Dgraph. That'd be a perfect fit.


fyi - We built a product around a way to store chronographic data in a graph database, capable of handling any resource, relationship, property, and/or event detail you can throw at it. It's called IBM Agile Service Manager.

Take a look, feel free to reach out if you'd like to know more:

Update from our technical team on the latest version: https://www.linkedin.com/pulse/new-ibm-asm-v115-some-paralle...

Black Friday with IBM Agile Service Manager (video from our technical team): https://www.youtube.com/watch?v=lJGVAJU6qp8


generally I think of GraphDbs as being fast read, slow write - does Dgraph has this issue? This is actually why I don't think of graph dbs as the source of truth db.


Dgraph's writes are actually very fast. The one-time bulk loader loads at millions of records per second. With recent optimizations [1], the live loader can load 21M records in 5 mins without indices and 20 mins with tons of indices. Note that all live data load also does WAL, consensus and disk syncs before every write call finishes to ensure crash-resilience.

[1]: https://github.com/dgraph-io/dgraph/commit/d697ca0898f0ac951...


Their non-standard GraphQL query language is a huge problem to adoption: https://github.com/dgraph-io/dgraph/issues/933

Cloud managed Dgraph instances would also be nice.


We're working on that for sure. ETA end-Q3 2019.


Working on standardising dgraph's graphql? Or working on a managed cloud version?


GraphQL+- will still be a thing, since GraphQL doesn't provide all of the things we need to manage a database.

That said, we're working on a pure GraphQL support, which will help with any kind of integration with other GraphQL projects. This is really exciting and will hopefully help many that do not want to spend their time learning Cypher or Gremlin and already know GraphQL.

Another project we're working on is offering Dgraph as a service. We're currently doing this with some of our customers and so we have the expertise and trust on our product necessary to run this service at scale. The ETA for this is not as clear but I do expect to have a private alpha by the end of the year.

I hope that helps, if you have any other questions feel free to join us on https://slack.dgraph.io or https://discuss.dgraph.io, or reach me directly on francesc@dgraph.io


From one of the other comments here: Adding support for standard GraphQL


Can anyone recommend readings on what are the trade offs between graph database vs key value (whether its column base cassandra, or key value base aerospike...etc) and traditional acid compliant db (postgresql)


Depends how much time you have...

Fifteen minutes: “How to Choose a Database” by Ben Anderson (https://www.ibm.com/cloud/blog/how-to-choose-a-database-on-i...)

Three hours: Jepsen analyses of distributed systems safety. Kyle tests software ranging across the database spectrum.

One week: Designing Data-Intensive Applications by Martin Kleppman.

Disclaimer: I work with Ben and think he takes a really nice tact on this subject, while it may be orthogonal to your immediate question regarding trade-offs.


Thanks for sharing these, cool handle name btw :)


The line isn't drawn between graph/kv and acid compliance. Dgraph claims to be both a graph database and ACID.


On the topic of ACID, Dgraph is the only graph DB (AFAIK) which has gone through Jepsen testing for its distributed transactional ability. Note that Dgraph passes all of them in the latest releases. And provides MVCC, snapshot isolation with linearizable reads.

http://jepsen.io/analyses/dgraph-1-0-2


FaunaDB passed. It also has GraphQL.


quoted from http://jepsen.io/ "We analyzed Dgraph, a distributed graph database, and identified numerous deadlocks, crashes, and consistency violations, including the loss and corruption of records, even in healthy clusters. Dgraph addressed many of these issues during our collaboration, and continues to work on remaining problems."


Yes, we fixed all those issues and we're currently rerunning all of the jepsen tests on our new release coming up in a couple of weeks.

Working with Kyle is always a pleasure, so we definitely don't discard having him have another look at our database in a bit - once there have been enough changes to it that our trust on the project is not where it should.


  > ...they showed Dgraph to be 10 times or more faster than other options (naming names, which we won't), there is nothing to show for this at this point. Jain said Dgraph may release some benchmarks in the future...
That is pretty dubious to make claims without benchmarks.

I'm a competitor, also Open Source and VC-backed, and there is a long line of Databases publishing benchmark for at least baseline tests:

- https://redis.io/topics/benchmarks

- https://gun.eco/docs/Performance (mine. Also see https://www.youtube.com/watch?v=x_WqBuEA7s8 for benchmark against 100M+ records/day, 100GB+ data)

- https://www.datastax.com/nosql-databases/benchmarks-cassandr...

- https://www.arangodb.com/performance/

etc.

Please please please don't hype, just publish results, even if they're basic tests.


Hi there,

We are not hyping, we're blogging about our fundraising success while working on our new release which will include a series of benchmarks which we'll be happy to share with all the details, obviously including source code and hardware setup.

Until then, feel free to get in touch with us if you have any specific questions or need help with performance analysis!


Thanks for a having a Docker Swarm HA deployment recipe !! https://docs.dgraph.io/deploy/#using-docker-swarm

quick question - im concerned about the underlying storage here. You dont seem to be using any exotic filesystem, so what's really happening ? what's the fault tolerance story in case a node dies ? If i add a new node to the swarm, will it automatically recover. any way of debugging, etc ?

What's the rule of thumb for number of nodes, etc

Also..im not sure if you noticed, but the name of one of your component "Dgraph Alpha " is poorly chosen. In case you were not aware, warning bells start ringing in devops teams when they pattern match anything starting with "alpha".


I’ve looked into graph databases a few times. I feel like there’s a use case I’d love to apply it towards more seriously but I always stop after not fundamentally “getting” the “gains” out of it.

I’m doing quite a bit of NLP and similar conceptual structures of edges with unstructured / structured relationships to content and I I’m always wondering if there is some boon to utilizing a graph.

I guess I lack some success stories where I can see the wins.

I get a sense that there was an attempt like noSQL back in the day with graph - and it has its specific cases but it’s not massively used for general use cases. And a new sense that there’s an emerging thought leadership on doing graph db’s a new way that is more general.

I love new tech! I just want to know how to use it for the unknown unknowns :).


Congrats on the round, and well done - I am quite impressed by Dgraph so far.

Big question for the founders: how are you going to fend off AWS when you will get bigger? (see what's happening with Elastic, MongoDB, etc)

Also, what prevents you from changing the license down the road?


AWS has Neptune as their managed graph database so it seems unlikely they will fork Dgraph. Azure and GCP could, though.

I believe Dgraph played a little with their licensing but the community's reaction was negative so they reverted back to something somewhat risky for them.

Let's hope they release their managed service quickly and garner a good following!

Observing Dgrpah and ArangoDB closely...


> Big question for the founders: how are you going to fend off AWS when you will get bigger? (see what's happening with Elastic, MongoDB, etc)

Just judging by the stock price, at $100 and $150 approx today, Elastic and Mongo stock are doing well. I don't see them going down as badly as the media projects it to be.

Think our current license is pretty good. Can't predict far future, but at least in the near future, I don't see a reason to change that. It's not something we're contemplating right now.


I don't want to sound arrogant, but I think you are underestimating how important this topic will be in the coming years, especially if you will be successful and your company will grow significantly.

I know the space really really well, and have worked at AWS from 2008 to 2014. Most large enterprise customers are worried about this, especially because it might mean that the supporting company (MongoDB, Elastic, etc) will not have an easy way to survive in the future.

Right now you are capturing the long tail of the market, and you can grow a lot just with that. But eventually you will need to get enterprise customers, and that's when this topic will surface.

I don't want to be an alarmist, just saying that thinking about this ahead of time might be important for your company.

Tangentially, I am bullish on Elastic, less on MongoDB. My 2 cents.


I am curious on the language choice of Go. Why not C or C++? Apologies if this was already answered.


I'd love to talk about this at length. Having built distributed systems in Google Web Search with C++ for over 6 years, I still enjoy the manual memory management model than a GC based model. Identifying and fixing memory leaks is easier, because that's in your control as a language user. Making a GC work better is beyond. In fact, that'd be the biggest gripe I have with Go. Wish Go had a manual memory control mode, I'd take it in a heartbeat.

Alright, now that I've criticized Go, time for why I chose it. Go code is simply more manageable and readable over time. Tools like gofmt are habit forming. Go does memory management, but doesn't put everything on heap. You still get a lot of control over pointers, something that other GC based langs hide from the user.

Go tooling is the best I've seen and closely replicates what we had at Google. Go profiles are almost copies of what Google had and the fact is that they're part of the language is incredible.

Go allows normal devs to run thousands of goroutines, utilizing all possible cores of the server well. That is just not possible with other languages. And not as simple with C++.

Go is a DELIGHT to work with as a programming language. If Go weren't around, I'd be writing Dgraph in C++. Go is perhaps slightly slower on a per-thread basis, but it more than makes up for it with easy to build concurrent systems.

In fact, both Dgraph and Badger outperform many other systems written in C++ and Java. So, you only gain performance with Go. What you do lose is ability to tightly control memory deallocation, but that's just GC for you.

P.S. If you want to shout Rust: Sorry, I just don't know much about it. And don't plan to switch.


Thank you for this reply, very informative and personally, I love hearing about design decisions and tool choices.


Managing memory is a pain and dangerous, doubly so when your objects form a graph (std::shared_ptr forsakes you), and Go's GC is quite performant. It's a database, so it's mostly I/O, and not CPU bound. Databases also do lots of stuff concurrently, which is easy in Go, and a pain to do correctly in C/C++.

There's also prior art for databases built in Go as well, such as CockroachDB.


The Faq says "If your data doesn’t have graph structure, i.e., there’s only one predicate, then any graph database might not be a good fit for you."

I know what a predicate is, but I don't understand how it is being used here. Can someone explain how I can determine if my data has a graph structure? What sort of predicate do we have here?


Think of a JSON map as a document or an entity. Then the keys would be predicates, and the values would be either references to another JSON map (document/entity) or a value (like a string, int or something).

{"uid": "0xabcd", "friend": [{...}, {...}], "name": "HN user", "age": 21 }

This is a valid JSON for Dgraph. It means, the overall JSON map is an entity of UID, 0xabcd. It has friends (other maps), name "HN user", and age 21. Here, "friend", "name", "age" are predicates.


Thanks for the reply mrjn. So a predicate includes a relation, but we take it from its logical point of view. (Rather than in an RDBMS, where we contort the relation so that we can see it from the perspective where it has a cardinality of 1.)

And the notion of one-ness of the predicate comes not from the fact that there's only one relation, but the fact that there's only one _logical value per predicate_ - here, we have two values of `friend` which cannot be conveniently coded in an RDBMS without the use of a join table.

So do I correctly interpret your faq "when not to use Dgraph" as saying "Dgraph is probably overkill if predicates naturally have a single value - that is, join tables are rare and you can naturally put foreign keys in the object they logically belong with, rather than in the table where they have a cardinality of 1".

This makes me think my other by @thundergolfer is probably wrong (sorry) - actually, an ecommerce site would benefit from an efficient graph db since you have an order, and now you want to find all the items in it, so the link from an order to an item should be associated with the order. Yet in the standard model, as with his, once you've found the order now you have to filter through the items to find the ones which reference the relevant orders; indeed, this is almost the only way that relation will ever be travelled - logically backwards.

I appreciate your time @mrjn. I like to hope you can benefit from answering my silly questions because someone can improve that faq. I'm trying to pick the right db for a personal project but I never expect to make money from it so I don't feel like I can give you any direct benefit from your time.


I’m not 100% sure, but a basic ecommerce site I think would be a classic example for a relational data model or key-value data model if you want to be fancy. Assuming the former, we’d then expect that in this data domain we could only come up with one maybe two predicates.

Without thinking too hard and deep, this seems true. An Order HAS Items, an item HAS a price, a customer HAS Orders, Addresses, etc.

I think you’d get quite far modelling the entire problem just with HAS.


But "has" isn't a predicate. "Has" is a relation. If we take "has an item" an "has a price" and "has an order" to be predicates, then we already have three predicates. So that can't be the definition of predicate.


Yes HAS is a relation, but in my example it's also the predicate.

A predicate is a statement that may be true or false depending on the values of its variables. In my example we'd have HAS(order_i, item_2), where both the order and the item are variables (or vertices) in our graph.

It wouldn't make sense to model with HAS_AN_ITEM(order_i), because as you say you'd proliferate predicates all over the place when you introduce new entities.

So yes, HAS is a predicate, and it handles nicely models a shopping cart.

A single Order, order_i, has the edges:

```

HAS(order_i, item_1) HAS(order_i, item_2) HAS(order_i, item_11) HAS(order_i, item_49) ... ```

DGraph models this with a 'PostingList' anchored on the predicate (they call it the "attribute"). This model is not particularly advantageous though in this shopping cart case as using a single HAS predicate means that in practice almost every 'PostingList' will have the same predicate and thus we will more often find ourselves joining across PostingLists when doing single predicate queries, which shouldn't happen.


In this example, item, price, order would be predicates.


No they're variables, or in DGraph parlance, 'Entity's and 'ValueId's.


Hi thundergolfer. mrjn is the person who created dgraph, so I think he knows what the terminology is.


Oh wow haha. This classic turn-up has now happened to me.

I’ll have to rethink the shopping cart example to see why those would be predicates.


Why did you stop supporting remote employees?


We have generally coalesced into remote offices, which have enough folks in there to help each other. We've also started encouraging pair programming, so engineers can question each other more and review code together.

We still have remote employees, in fact, 4 of them are remote. But, it's considered on a case by case basis. The biggest issue with remote comes down to individual personality, i.e. how communicative and independent are they. Some people make great remote workers. But, I'd say many don't.

When an engineer is at office, they are team-motivated, more likely to get help without asking and avoid getting stuck.


Management by walking around?


How "most advanced graph database" does not supports already standardized SPARQL query? Have you found anything wrong with standard SPARQL?


We are happy to get new feature requests on github.com/dgraph-io/dgraph!

SPARQL is something we're considering but we haven't heard much interest from any of our customers so it's not very high up on our roadmap.

I'd love to chat with you to figure out how adding this could help your use case though. You can reach me at francesc@dgraph.io.

Cheers!


Can someone explain to a complete n00b what you need these graph databases like Neo4J and Dgraph for? I've just built some small business SaaS always on MySQL.


Did I understand the feature table correctly that backups are an enterprise feature?


Somebody please look at Actian's Versant Object Database (now called "Actian NoSQL") and just clone it. You'll make very big money.


Cool!


Congratulations on the funding, I've been following the project for a while (your tech blog is great) and am especially fond of the spin-off project Badger[1]. I'd like to ask a few questions and make a few "content requests" for the tech blog if I may, since I see @campoy is here and @mrjn will probably stop by later =)

Some things I'd love to read about on your blog

1 - I'd love to read more on the local information retrieval strategies you use for Badger, especially if you have use cases that are comparable to sequential reads (something that Kafka does, but with a structure optimized for magnetic drives) or ISAM style indexation (traditional SQL databases), and how you leverage the metadata byte to speed things up (this has been touched on only superficially on the blog afaik) along with other specialized features/operations Badger supports;

2 - Some insights on how Dgraph's data model influenced the design of Badger, and what parts you decided to explicitly generalize for other use cases, and what use case constraints have you purposefully decided on (aside from the obvious ones like SSD optimized, etc);

3 - Any progress you might have made regarding why some queries in Badger take longer than in RocksDB. That so far has been the biggest cliffhanger in the tech blog, I've been waiting for the sequel to that one for some time now :)

It's difficult to find good resources where people go in depth on these database engineering topics, but your blog is a very good one and a joy to read. Thanks for not simplifying things and keeping it very technical!

Some questions relevant to the post and Dgraph itself

4 - When I (and I assume many others) see "Enterprise Features" on databases that is kind of a turn off. That usually correlates to keeping separate codebases that are being manually kept in sync (which has problems on it's own), and it sometimes feels like it's gatekeeping some pretty crucial features (e.g. better backups) which make people either accept a more vulnerable architectural situation, or makes it unsuitable for a few use cases for lack of such features. I do understand and respect the need to make money to keep Dgraph being awesome, but have you considered something like the CockroachDB approach (selling support + mandating a license to sell the DB as a managed service)? If you did, why did you turn it down and what would make you reconsider it? On a second note from a maintenance perspective, how are you planning to handle the "feature flagging" and "repo syncing" of community and enterprise editions?

5 - As someone who's considered using Dgraph in the past, something that I found interesting is that graph databases in some use cases alleviate some of the same data modelling woes that Datomic does[2][3]. How would you diff them apart and what would you say they differently excel at?

6 - Graph databases sit in a spectrum of use cases, where some people want a system to work similar to a traditional RDBMS but with more flexible/powerful data modelling capabilities, and some people might want to load up big graph-heavy datasets for analytics purposes (i.e. loading a giant product catalog to map product relationships and either do a bunch of batch queries to feed a secondary dataset or keep it live for realtime queries on a recommender system). How would you place Dgraph and it's strengths/weaknesses in this spectrum? What alternative solutions would you recommend for the places where Dgraph would be unsuited for?

Sorry for writing too many questions (which will probably take up a lot of time to answer) but many of these things (especially on the second half) are not explicitly said anywhere, and are valuable knowledge for making an informed decision on a database product.

[1] https://github.com/dgraph-io/badger

[2] https://augustl.com/blog/2018/datomic_look_at_all_the_things...

[3] https://augustl.com/blog/2018/datomic_look_at_all_the_things...


I'm Lucas, and I'm a backend engineer working for Dgraph Labs. A quick comment regarding 4: Cockroach DB is also employing the "Enterprise Features" model https://www.cockroachlabs.com/docs/v19.1/enterprise-licensin...


Security features should NOT be limited to enterprise only. Performance, monitoring, advanced query capabilities are much better options. Security should be table stakes for ALL software today. Please reconsider.


Has the word opinionated become a meme recently? It seems to be everywhere these days.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: