Hacker News new | past | comments | ask | show | jobs | submit login
Why Google needed to build a graph serving system (dgraph.io)
324 points by pplonski86 on Feb 16, 2019 | hide | past | favorite | 90 comments



Hi all, author of Dgraph and the article here.

Glad to see this post trending on HN. I'm around if you want to ask any questions, and will share whatever I can about my experience with graphs and the decisions at Google.

Since leaving Google, I've built Dgraph [1]. It is open source, designed for horizontal scalability, ACID transactions and provides incredible performance for both reads and writes. And of course, it solves the join-depth problem as explained in the post. So, do check it out!

And do remember to give us a star on GitHub [1]. Your love helps other open source users give Dgraph a real shot in their companies with similar leadership struggles as I experienced at Google.

[1]: https://github.com/dgraph-io/dgraph


I'm sorry your efforts failed internally. Our infrastructure is somewhat ossified these days: the new and exotic are not well accepted. Other than Spanner (which is still working to replace Bigtable), I can't think of a ton of really novel infrastructure that exists now and didn't when you were around. I mean, we don't even do a lot of generic distributed graph processing anymore. Pregel is dead, long live processing everything on a single task with lots of RAM.

I suspect your project would have been really powerful had you gotten the support you needed, but without a (6) or a (7) next to your name it's really hard to convince people of that. I know a number of PAs that would benefit now from structuring their problems in a graph store with arbitrary-depth joins and transactions. I work on one of those and cringe at some of the solutions we've made.

We've forgotten what it's like to need novel solutions to performance problems. Instead, we have services we just throw more RAM at. Ganpati is over 128GB (it might even be more) of RAM now, I suspect a solution like dgraph could solve its problems much more efficiently not to mention scalably.

Good on you for taking your ideas to market. I'm excited to see how your solution evolves.


One of the TLs was a T7. Got overruled by T8 :-).

As Google is growing, there are many T7s around. So, you really need T8 and above to actually get something out of the door, when it comes to impacting web search.

P.S. Dgraph is Apache license, written in Go/Protobufs/Grpc. Google might have an appetite for that, worth pushing internally. https://github.com/dgraph-io/dgraph


My understanding is that by default, dgraph query responses are in json. Is it possible to get gRPC responses instead?


Dgraph exposes both HTTP endpoint and gRPC endpoint. All our official clients use the gRPC endpoint, including Go, Java, NodeJS client.

https://docs.dgraph.io/deploy#ports-usage


> Our infrastructure is somewhat ossified these days: the new and exotic are not well accepted

I think that's rather uncharitable. I work in T.I. and while I can't namedrop a number of internal projects I can assure you there's a lot of deep innovation happening in our corner. Are they poised to get Spanner-esque adoption across the whole company? Unlikely. But it's unfair to knock the infrastructure and assume we're way past some innovation renaissance.


> Our infrastructure is somewhat ossified these days: the new and exotic are not well accepted.

I know that it does not involve storage, but Tensorflow is a purely infrastructure project. And it's certainly newer than the Freebase story.


And that has backing from Jeff Dean. Which underscores point mentioned elsewhere in this thread and TFA - with right kind of political backing, you can do wonders.


What's "Ganpati" ?


To summarize the join-depth problem as you describe, it is because other solutions use hash partitioning instead of range partitioning to distribute data?

i.e. if all S/P/O were stored in sorted order such that a given S/P/O were on one machine, you no longer have to do broadcasts and can instead issue queries to just the machines holding that range of the keyspace. Since most "scalable" systems use hash partitioning, they have to broadcast, whereas dgraph uses something more like range partitioning.

Or is this too simplistic of an explanation?


It's a good start to solving an array of problems when considering building a low-latency arbitrary depth graph DB.

Dgraph does these: 1. It shards by predicate so a full join can be executed by one server. 2. Storing SPO as a record (or row) would still be slow because, in graphs, a single SP can result in thousands or millions of objects. That would involve a lot of iteration and multiple disk seeks to read, which gets slow. So, Dgraph stores them in a posting list format, using SP -> list of O, as a single key-value. 3. That can then result in values which are really large. So, Dgraph converts all nodes into 64-bit integers. 4. Intersections are really important in graphs. So, posting lists store all ints in sorted order. To allow quick intersection between multiple posting lists. 5. Add indices, then replication, HA, transactions, MVCC, correctness, linearizable reads, and ...

voila! 3 years later, you have Dgraph!

P.S. I intend to find some time and write a paper about the unique design of Dgraph. There's a lot of original research involved in building it.


Yours is the first public mention of MindMeld I've seen. There were a bunch of other related projects. Some of that experience eventually percolated to Tensorflow.

Anyway, good luck with Dgraph! It looks very useful.


I have some experience with Bazel, the open source version of Blaze, and I noticed that it's build graph implementation appears to be very memory hungry as well. Such that a few million lines of code already use 10-30GB of memory. Like you mentioned it requires the entire build graph to fit in memory on one machine.

Do you know what sort of graph solution Blaze uses to handle 2 orders of magnitude more code, ie. manage 100's of millions of lines of code? I always assumed it was a distributed graph database, but your article seems to indicate something else.


Google's graph serving system is not generally available to be used by other teams, because its custom-built just for one purpose: to serve knowledge as part of web search infra.

I think if Blaze is using some graph solution, it must be custom built.


That and the availability of machines with enough RAM to waste.


There are some crumbs that leak out, for example there are mentions of blaze’s “skygraph” feature in the bazel sources.


For the past couple of years, I've been working on knowledge graph solutions for the enterprise market, and know acutely the pain of trying to get traction on these types of approaches within one's own organization.

Some time before I started working at my current employer, I had a chance to talk to an architect on an EdTech knowledge graph project who advised giving up on these types of efforts due to the complexity of the technology, the politics of the organization, the skills gaps in the engineering corps, and the expectations of customers.

So when I ignored that advice and started this project I did have a chance to look at Dgraph, but ultimately avoided it to help relax some of those political and skills constraints. We ended up building a graph serving system out of Cayley processors, but maybe as the skills gap closes from experience on Cayley and the political forces wane as revenues increase then using Dgraph will become viable -- it would help us solve more problems and simplify certain operational headaches.

Thanks for writing this experience up. It's encouraging to learn that even giants like Google struggle due to internal factors in this space.


What is your opinion on network representation learning? Can it be used in Dgraph? For instance, do you think it is possible to use node embeddings as indices for faster retrieval?


I'm a bit late to the thread, but what happened to Cerebro and the concept of a Knowledge Engine? You wrote that it was launched as "Knowledge Strip" but wasn't as impactful as you had intended Cerebro to be.


Thanks for writing this up! I worked with the Knowledge Graph as a contractor at Google in 2013. My manager had a neat idea for adding our own Schema and triples (actually quads) for a specific application.

It surprises me how many large companies do not have a ‘knowledge graph strategy’ while everyone is on board with machine learning (which is what I currently do, managing a machine learning team). I would argue that a high capacity, low query latency Knowledge Graph should be core infrastructure for most large companies and knowledge graphs and machine learning are complementary.


I agree. And I saw how averse companies were to graph databases because of the perception that they are "not reliable." So, we built Dgraph with the same concepts as Bigtable and Google Spanner, i.e. horizontal scalability, synchronous replication, ACID transactions, etc.

Once built, we engaged Kyle and got Jepsen testing done on Dgraph. In fact, Dgraph is the first graph database to be Jepsen tested. http://jepsen.io/analyses/dgraph-1-0-2 (all pending issues are now resolved).

Dgraph is now being used as a primary DB in many companies in production (including Fortune 500), which to me is an incredible milestone.


Can you share more about the kinds of use cases and qps that your customers have for dgraph?


Dgraph usages are quite wide-spread. It is being used for typical graph cases, like recommendation engines, real-time fraud detection, adtech, fintech uses, etc.

The design is very well suited for just building apps (scalable, flexible schema) as well, given the early choices to use GraphQL (modified) as the query lang and JSON as the response. So, we see even Fortune 500 companies using Dgraph to build mobile apps.

Most open source users use the simplest 2-node cluster, but we easily see enterprise customers use 6-node (High Availability) cluster or 12-node cluster (HA + Sharding). Given synchronous replication, query throughput can scale out linearly as you add more replicas/machines (each replica can reply without worrying about issues with typical eventual consistency models. Dgraph provides linearizable reads).

Write throughput wise, Dgraph can sustain XXX,XXX records/sec in the live path (and millions in the offline path). See my recent commit: https://github.com/dgraph-io/dgraph/commit/b7189935e6ec93aec...

Some recent public usage mentions of Dgraph: https://github.com/intuit/katlas https://twitter.com/pg_kansas/status/1096260809171353600


Thanks!


The first wave would be people building products directly on a knowledge graph, the second would be as part of support systems to augment your work and then probably a third wave to capture inside knowledge potentially. Governments would be a key client I think.


Sounds like a good Enterprise saas product idea


The beauty of RDF is it can support schema promiscuity i.e. you can have many schemas for your same data. You can do that with OWL. In typical graph databases you are fixed on nodes, properties and edges, but in RDF you can choose what should be node and properties arbitrarily. My issue with RDF has been performance works great for categorical, relationship heavy data, but not so much for numerical data.


I am also a fan of RDF and OWL. My two ‘practical semantic web’ books (one in Common Lisp and one for Jave/Clojure/Scala) are now 8 years old, but you can get free PDFs from my web site. Those books badly need second editions, the material is interesting but outdated.


Mark, are you up for helping with solid security for cl-solid? (The library now works with both Neptune and Allegrograph) https://github.com/gibsonf1/cl-solid - live use of cl-solid library here https://graphmetrix.net , sometimes very lively discussion of Solid here: https://gitter.im/solid/chat (including Tim BL)


I have no time right now. Contact me in 5 weeks and we can talk then.


Great


What's a modern, easy to get started with RDF-compatible database?


I have thought about this also. I am retiring from my job in a month but I would like to keep writing books and also have one commercial software product. I used to sell my NLP toolkit written in Ruby but open source NLP libraries, and deep learning solutions, are so good that I don’t want to be in that business.


If you're interested in challenges, could you consider temporal graphs? Both from the perspective of tracking graph evolution (audit trail), and using a graph to model historical events, where relations occur for periods of time - it always sounds doable and then I sit down to do it :)

Throw in probability and vagueness (history examples: this happened sometime in the 1950s / we're only 50% certain that Henry VII was the father of this child) and it becomes a whole lot more complicated; yet what can be inferred increases in usefulness.

My inspiration is http://commons.pelagios.org/ and the digital humanities field.


OOT, but the illustrations on their home page ( https://dgraph.io/ ) is veeery cute, you should check it out.


It makes you wonder how the product handles mushroom and more importantly, snake.



We recently made this image, inspired by my favorite movie. It's licensed under CC-by-ND.

https://twitter.com/dgraphlabs/status/1088516069852012544

Feel free to use it! :-)


Is that a skunk? Kind of an interesting choice…


Badger.


I've been using DGraph for quite a while now in a side project of mine.

I looked at a number of other graph dbs, and landed on dgraph for a few reasons.

a) I like the query language. It felt natural to me. That's just my opinion though. Other languages felt cumbersome, and I didn't feel like I was operating on a graph.

b) I needed write performance. I'd talked to colleagues about their experiences (I have some friends at graphy companies) and every time I asked "what about X graphdb? could it handle this write load?" the answer was "no, definitely not".

I chose dgraph and it's been working like champ. Support via slack has been solid, they're focusing on the right things (performance, correctness), and I have a lot of confidence in the future of the project.


Not just you, a lot of Dgraph users appreciate the intuitive model of GraphQL+- (modification on GraphQL), and how it fits naturally with JSON and apps. When I started in 2015, I didn't imagine that our decision to use GraphQL would become such a major "feature." GraphQL has taken off like a wildfire, which was hard to imagine back in 2015 when it just launched in July 2015.

In fact this year, we plan to align GraphQL+- even more with the official GraphQL spec. Stay tuned!

re: write perf, it just got a whole lot better with 3 changes I made recently, and there're still a lot of room for optimization: https://github.com/dgraph-io/dgraph/commit/b7189935e6ec93aec...


Like I said, a big part of feeling confident in my choice to use DGraph is the direction - the improvements to the QL and performance, as well as the Jepsen tests, are exactly the work I want to see as a consumer.

And that is one hell of an improvement. Nice work.


This a really neat historical perspective, although I do find one detail odd:

>If you are sneering at the memory requirement, note that this was back in 2010. Most Google servers were maxed at 32GB.

As I recall, Nehalem EP (launched 2008 or 2009?) could handle in excess of 100GB per socket? Not cheap necessarily, but definitely still counted as "commodity hardware" for that era. I say this recalling that even my mid-tier workstation from then could handle 48GB (in that wonky era of triple-channel RAM), though I only had it loaded with 12GB. Then again I could see, if said servers in 2010 were at the end of a purchasing cycle from 2007 or so, that they were "maxed" at 32GB?

Anyway, my from-memory nitpick doesn't detract from the article's ultimate point, though: distribution was an obvious need that would only become more pressing.


The hardware replacement cycle is much longer than you’ve implied. Servers I am building today will still be TCO-positive for 15 years. In 2010 Google data centers would still have been full of their first generation dual-socket Opteron machine.


I will happily stand corrected, because I was only implying based on very limited awareness. I am only directly familiar with workstations, where I've seen replacements occur on the order of 4-7 years, not servers or entire datacenters. However, everything I read seemed reinforce an idea that servers follow a similar pattern. Maybe I was just wildly misinterpreting?

With the relative stagnation in per-core performance since about 2010 I could easily see something built then expecting to last to 2025, power consumption improvements nonwithstanding.


My first guess would be that power consumption drove those decisions.


Insightful article / ad. Maybe the problem with the idea is that in practice not many users want to do joins at arbitrary depths levels, e.g. "sort all the children of US presidents by height" is probably not a very common query needing a massively distributed architecture?


"Travel from NY to LA over the weekend" was probably not a very common demand needing a massive investment in air travel infrastructure and equipment circa 1900. User demand is bounded by the capability of the tools commonly available.


True, but to a certain extent only. The tail only gets so fat...


That's an impressive quote! Nails it.


I suspect that although any given query that needs a join is rare, there is a long tail of many different such queries. Just having more than one predicate is enough, e.g. "weather in <city> during <event>" or any of the queries in footnote 2 of the article.


Consider [sort all comedy movies by rating]. Such queries happen on a daily basis on movie sites like IMDB or Rotten Tomatoes.

The only way to avoid joins is when you specialize your data to a particular vertical. Therefore, your flat tables are then built to serve say movie data, and can avoid some joins.

But, if you're building something spanning multiple verticals, like Knowledge Graph is, which houses movie data, celebrity data, music data, events, weather, flights, etc. Then building flat tables specific to each vertical's properties is almost impossible.


Yes, you're right about the need for many flat tables specific to each vertical. However, if other aspects of these verticals need to be specific to the vertical anyway (for example, UI: weather for the week is presented differently than a list of movies and their ratings; or: data quality), then there's still a significant amount of eng work per vertical, so that data flattening step may not be the bottleneck in growing the number of verticals served...


That was the exact issue Google found itself in. Every OneBox had their own backend, which meant many different teams were involved in running and maintaining them.

Now, with the graph serving system in place, all they need to do is to slap a new UI for the vertical, while the backend remains the same. Of course, there's real effort involved in building a new UI for the vertical, but it's a lot smaller compared to building a whole stack for each vertical.

Not to mention, just the movie vertical itself has many "types" of data. Movies, Actors, Directors, Producers, Cinematographers, etc. -- all of these have different properties. By the time one is done flattening all of these into relational tables, they've built a custom graph solution -- which is what happens repeatedly in companies.


That makes sense... Though after reading the article, I thought Google wasn't using a system that supported arbitrary join depths?


This reminded me of SQL.

> Say, you want to know [people in SF who eat sushi]....If an application external to a database was executing this, it would do one query to execute the first step. Then execute multiple queries (one query for each result), to figure out what each person eats, picking only those who eat sushi.

A query like that in SQL could also suffer from "a fan-out problem" and could get slow. It's often faster to put subqueries in the From clause than the Select. It's certainly faster than an app taking the rows of one query and sending new queries for each row, as many developers do. For example:

  select
      p.name,
      (
          select max(visit_date)
          from visits v
          where v.person = p.id
      ) as last_visit
  from people p
  where p.born < '1960-01-01'
can be slower than:

  select p.name, v.last_visit
  from people p
      join (
          select person, max(visit_date) as last_visit
          from visits
          group by person
      ) v on p.id = v.person
  where p.born < '1960-01-01' 
In the second example, you first form a new table through a subquery of the original. This is not what a new programmer would first try. The first example, with the subquery in the Select clause, is closer to the train of thought. Also you would guess that getting the last visit dates of each person is more efficient after you know who to look for (like, only the people born before 1960). But in my experience, it often hasn't been.

Therefore likewise with this San Francisco sushi query, I was thinking that if it were SQL then I would (1) get all people in San Francisco, (2) get all people who like sushi, and then (3) join them, to find their intersection. Lo and behold, I then read that it is the same solution in this humongous graph database:

> The concepts involved in Dgraph design were novel and solved the join-depth problem.... The first call would find all people who live in SF. The second call would send this list of people and intersect with all the people who eat sushi.


Now consider doing this in a distributed setting, with SQL tables splits across dozens or 100s of machines (see Facebook TAO).


So are there any benefits in dgraph over SQL/RDBMs(matureness and widespread query lang) for single machine setups?


I don't want this to become a flame war between SQL and Graph. But, we see a lot of developers come from SQL to Dgraph, because the join performance of SQL gets worse as the data size grows, and so devs have to resort to doing aggressive data normalizations. Even more of a problem when your entire dataset is on a single machine.

For e.g., Stack Overflow normalizes their data to avoid doing joins: https://archive.org/download/stackexchange


Did you mean denormalize?


Yes, I meant denormalize. Thanks.


For the scalability side of graph, we've just started using Amazon Neptune RDF, and have been amazed that we can easily and very quickly run sparql queries on 2.6 billion triples on their smallest 2 core 15 gig machine. Incredible capacity.


Where Neptune appears to fall down is write performance. This is what made it non-viable for me. I have colleagues who are struggling / hacking around the write performance issues.

DGraph's write perf seems to be considerably better - I haven't benchmarked formally, just going off of discussions I've had with others.


As others mention, Neptune is based on Blazegraph. It is a layer on top of Amazon Aurora and has the typical graph layer issues I mention my blog post (in particular the join depth problem).

When Neptune was launched, I wrote another article critiquing its design. Worth a read: https://blog.dgraph.io/post/neptune/


I'd missed the announcement of Neptune, so thanks for alerting me. Have you reviewed the Microsoft Azure Cosmos DB, with its graph access layer?


I haven't. But, the design is the same. Graph layer on top of another DB, because they didn't want to invest in building a native graph DB.

In general, that approach doesn't lead to good performance, as I explained in the blog post.


FYI, Neptune is based on Blazegraph. Amazon's acquisition of Blazegraph halted the database's open development. I'm sure they would welcome interested contributors: https://github.com/blazegraph/database/issues/86


I actually found out about Neptune through finding out about Blazegraph. Do you know what's happening with the Blazegraph project now?

Just followed your link and saw this comment from one of the original authors in Jan: https://github.com/blazegraph/database/issues/86#issuecommen...

Since Amazon is effectively selling machine time to run Neptune with no cost for the db, wouldn't it be excellent if the original author and team continued to contribute to Blazegraph. I'm not sure if the linked note is a potential nod in that direction or not. (We have customers that also want to have on-prem deployments, which you obviously can't do with Neptune)


I've been looking for an open source graph nosql style database that makes it easy to do gather-scatter on a GPU. It's been tough.

Plug: some people are working to fork/restart Blazegraph development, which is the database Amazon took to make Neptune. Its GPU features are missing, as they were originally kept proprietary. If you have any interest and know how, this could be an exciting project to contribute to!

https://github.com/blazegraph/database/issues/86


There is also Janusgraph which Google contributes to http://janusgraph.org


If you don't do datastructures right, then it will suck on performance.

Imagine doing a search-engine in a rdbms without special datastructures.


janusgraph is usually used with Cassandra. This is the schema - https://docs.janusgraph.org/latest/schema.html


See mrjn replies. You need native graph db.


Slightly off topic, but while we're on the subject of graph databases: could anyone point me at some useful introductions to the subject? Thank you.


I'd suggest starting with https://docs.dgraph.io/get-started/, and then going to https://tour.dgraph.io. It gives you all you need to understand the query language (based on GraphQL) and get you up to speed building your first app on a graph DB.


I'll echo this - I learned about graph databases using DGraph.


Look at my Python graph database 0.8.1 is the last release with the vanilla graphdb API, after that it only expose a triple store https://github.com/amirouche/hoply/tree/fa77d31757c835688098...


How does Dgraph compare in performance, features to something like Neo4j?


Slightly old post, but we did some benchmarks couple of years ago: https://blog.dgraph.io/post/benchmark-neo4j/

We also get to hear benchmark conclusions from users who tried out Neo4j, Janus, Datastax DSE Graph, etc. They typically find Dgraph to be the best performing one.

There's a comparison chart here: https://github.com/dgraph-io/dgraph#dgraph-compared-to-other...


Thank you for the comparisons. I think the approach you've taken with the documentation and the support out of the box for libraries in Go, Python etc is excellent, the playground is definitely a winner!!



Thank you. That page freezes Chrome, ironically.


This was a great article, and I'm a competitor! :) Learned a lot about the history.

My one nit-pick is they say distributed joins are hard though. But they don't really justify it other than the assumption other engineers have come from a traditional DB background. It is not a hard problem if you try to solve it on your own though: You do the join on the client, incrementally (or debounced if necessary) as the data is streamed. This works very well at scale in production (TB/day) for our system.


There are a variety of distributed join scheduling models to consider depending on the latency of the network and the size of the data sets. Techniques like Bloom filters used to accelerate joins by minimizing the data transmitted are novel to many developers. Layer in HA and transactions, and it becomes hard.


We are using AWS Neptune in building our marketing analytics product. What I have realized is that it has best of both worlds, the querying power of SQL databases and schemaless functionality like nosql databases.


Whenever I read mention of Neptune, I feel obligated to mention Blazegraph, which Amazon based Neptune on.

As I've shamelessly advertised elsewhere in these comments, Blazegraph is in the early stages of a fork and/or reboot, as Amazon's acquisition severely hampered progress on the open source project.

https://github.com/blazegraph/database/issues/86



I loved working with Dgraph, but I hate doing system/database administration work.. do you have any plans of providing a hosted solution?


No concrete plans yet, but that's a direction we want to go in.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: