Hacker News new | past | comments | ask | show | jobs | submit login
GraphQL Introduction (facebook.github.io)
168 points by jbraithwaite on May 1, 2015 | hide | past | favorite | 50 comments



It really pleases me to finally see a big credible player tackling the REST orthodoxy. They state very well why REST APIs are not working well for mobile.

Notably: "Fetching complicated object graphs require multiple round trips between the client and server to render single views. For mobile applications operating in variable network conditions, these multiple roundtrips are highly undesirable."

Now, I'm wondering how they manage to make the computation of the responses on the server side no too expensive. It seems clear that there is a risk in such a system to define queries that pull way too more data at once. Also, the question of pagination comes to mind. How can you handle that efficiently?


Re: the risk of overfetching, this is certainly a risk. Like any tool, it can be misused. One of the motivations of Relay is in fact this very issue. By coupling the data-fetching with the view more tightly, we can more accurately track and fix overfetching earlier in the development cycle.

In terms of being not too expensive, an important attribute of the system is that the server publishes capabilities that clients selectively use. For example, we explicitly do not allow clients to send up arbitrary strings for filtering and query predicates and what not; servers have to explicitly expose those via arguments to fields. eventMembers(isViewerFriend: true) { ... } or similar formulations that are encoded in the type system. This prevents people from ordering inefficiently (e.g. large data sets without indexes).

Re: pagination. This is absolutely a core pattern that we are excited to talk about as we explain the system more. Broadly we handle pagination through call arguments, e.g. friends(after: $someCursor, first: 10) { ... } There's a lot of subtlety there which I won't go into until we dive into that topic deeply.


Thanks a lot for these insights. I'm definitely looking forward to discover more about all this.


I like how JSON-LD lets one embed many resources inside a document. And there are plenty of good preludes to this recent emergence: Soundcloud wrote about how they used an instance store 3 years ago in their clientside to extract all relevant models, and push them into an "instance store" where anyone else might find the data when they need it. https://developers.soundcloud.com/blog/building-the-next-sou...

What's really exciting is an expressive model that allows code to state what dependencies it has. The transport and fulfillment of needed data- what intermediary stores the data gets to and how pending views are signaled availability- is a more mechanistic, rote, already faced task, albeit one that each company seems to tackle on their own. The step further- declaring and modeling dependencies, is what makes GraphQL an interesting capabilities.

Soundcloud's 3 year old blog post is a good reference to show that "instance stores"- these clientside object database services- have been around for a good while, and can be done fine with REST as easily as without.


It seems to me that GraphQL could be awesome when you exclusively control the back-end and the front-end, but do you think it will work as well if you're building an API that also needs to support third-party clients? Would REST still be better in that scenario?

Edit: I see that they partially address this: "Many of these attributes are linked to the fact that “REST is intended for long-lived network-based applications that span multiple organizations” according to its inventor. This is not a requirement for APIs that serve a client app built within the same organization."


That's a great point. We definitely have designed it for first party clients in mind. This doesn't preclude use cases in the future, but for the short term, this is a nongoal.


It has recently become popular to be anti-REST. It makes you seem smart and more knowledgable than average.

In practice, I think that an GraphQL API is still a form of REST.

People hate on REST when they should be hating on the bad instances of things that others made and said 'But it's REST!', regardless of whether it was or wasn't or that person had RTFM https://www.ics.uci.edu/~fielding/pubs/dissertation/fielding...


The post specifically calls out that distinction:

> We are interested in the typical attributes of systems that self-identify as REST, rather than systems which are formally REST.


In OData, these problems are solved by:

* Client communicate their desired projection and page size (via $select and $top query string parameters), which can then easily be mapped by the service into efficient calls to the underlying data store.

* OData client page sizes are polite requests, not demands. The server is free to apply its own paging limits, which are then communicated back to the client along with the total results count and a URL that can be followed to get the next page of results. Clients are required to accept and process the page of entities they are given, even if that number differs from the count which was requested due to server limits.

I'd assume GraphQL will adopt similar functionality, if it hasn't already.


I think http/2 solves some of of the issues regarding round-trip. One real problem with mobile vs. web is versioning, the inability to ensure that you can keep your client & server in sync.

In some ways one of the biggest advances I see from GraphQL/Relay is that should avoid most of versioning hell for mobile - there's effectively an agreed interop language for communicating data needs, and thus backwards compatibility during API evolution should be far less complicated.


HTTP/2 removes some of the overhead of requests, but there is still the problem of multiple round trips.

For example, if you request your top three friends and their most recent post using REST you'll probably need to do four requests. And you can't parallelize them because you need to know your friends' IDs before you can construct the comment requests.


Actually this can be addressed with HTTP/2 although I think the solution may be just as complicated. If the yet-to-be-known parameters are encoded in the query string then the requests can be pushed to the client before the client knows it will be requesting them. This could be done with a middleware that used the Referer header (and maybe some fancy isomorphism) to determine what should be pushed.


True, but then you have the server duplicating logic from the client. This is very similar to the custom endpoint solution, which breaks down when you have multiple clients needing different data. You end up either under or over fetching.


To prevent the over/under fetching you're describing you could partition your endpoints and make multiple requests. Although, thats definitely a code-maintenance win for GraphQL.

It seems like if you were co-executing the client on the server you could trivially achieve perfect fetching. GraphQL may actually over fetch in many situations. Here's an example: the client fetches a list of objects, filters it, and then fetches more data referenced by the results. With GraphQL, if you don't automagically parse the filter out of the client code, you over-fetch. However, the HTTP/2 solution could just push the 2nd fetch as it was made by the co-executed client.

All that being said, GraphQL certainly alleviates the server-side load co-execution would imply and that's likely more suitable to the scale Facebook operates at.


Yes, that's a tricky problem. Generally GraphQL solves it by filtering on the server. You'd request something like `friends(age: 25, order: dob, first: 10) { name, profilePicture }` and pass that straight through to the UI.

There are some situations where this doesn't work to well. For search suggestions, for example, you might not want to request `searchSuggestion(query: "Tom Smi") { <lots of data> }` on every keystroke because sequential queries will have a lot of duplication. In this case we can just fetch the IDs of the results and do a separate query to fetch the data for the people that we don't know about yet.

Having the server know about client queries (and therefore preemptively running queries) is something we specifically avoided with GraphQL. If the server knows what the client wants then sending any query across the wire doesn't make sense at all. It also falls down if data requirements client changes across client versions. You quickly end up in a place where the server has to know about all possible client queries, which is complex difficult to maintain.


I assume Relay performs the diff-outward-query/patch-inward-data thing. Frankly, that's fucking brilliant. I hope the people at Facebook are extremely pleased with themselves. If I understand correctly then once the number of items shown was less than the initial range of requested items, further refinements wouldn't need to make a second request. Relay could detect from the ids of the first request that it had all the data it needed.

HTTP/2 optimizations could just be layered on top of GraphQL post hoc. GraphQL/Relay is likely better for that purpose than a bag of endpoints. You get all the benefits you've mentioned and extremely bearable unoptimized performance. I guess I just needed to tease the two problems apart in my head.


Yup, and the same optimization can apply to two different queries requesting different fields on a single object, not just ranges.

GraphQL endpoints don't have to return a single JSON object either, you could hypothetically stream back objects as they become available (Netflix's Falcor seems to be heading in this direction too).


Ah, so it goes the other way too. Sort of like a request inliner/outliner that can perform dynamic optimizations. The client can inline requests using view-state unknown to the server. Then the server can outline them in whatever desirable way to provide an immediate and eventually complete response. That's clever.


As an ex-Facebook employee, I can't wait for this to be released to the public. It is hard for me to overstate how much this infrastructure makes building products a joy, not to mention the tremendous developer and server efficiency that can be attained. I miss having the GraphQL stack so much when working on my own stuff.

The client application should drive the decisions about what data to fetch. After all it's the client that knows what it is actually going to be doing with the data, not the server. Current approaches like having a "fields" option on a REST endpoint are at best a hacky approximation to this.


I'm also excited to see Jafar Husain's Falcor:

https://www.youtube.com/watch?v=hOE6nVVr14c

It's a different implementation of the same concept.


I'm curious to learn how API response caching is affected by GraphQL. In a REST setup, there's a possibility that API responses can be cached. But if the response structure is dictated by the client, it seems like responses might differ and not be cacheable.


In practice it wouldn't make much sense to cache whole query responses to GraphQL queries because your hit rates would be too low due to the variability of the queries. You end up pushing a lot of the caching to the client. That's not really a big issue because if you're writing something like a Android or iOS app because you already need to be caching lots of data on the client-side to make the app responsive.

On the server you end up caching at lower levels in the stack. For example a query for user(id: 123456) {id, name} is going to need data from a key-value store containing user info. That access can easily be cached with something like memcache, saving load on the database. Cache-invalidation problems are also much easier to solve at these layers.


Worth noting there's a massive performance penalty to pay when caching at the app level depending on your stack. On hardware where rails + memcached is struggling to handle 500 concurrents varnish or Nginx will easily handle tens of thousands.


As someone who's used OData for this sort of strong typed, ad-hoc interaction with the server, I'm really happy to see Facebook push this idea. Compared to OData, I really like how GraphGL's approach of making a query look like the data it's requesting.


I don't know, there are some tradeoffs there. Their sample appears to be "nearly JSON", which doesn't seem too helpful. Being close to but noncompliant with a standard doesn't bring anything but confusion.

And it isn't obvious what they're using for transport, but it seems like they aren't attempting to model programmatic resources as web resources the way that OData does. This is an okay decision if they're trying to make it transport-neutral (i.e. you can issue the same GraphQL request via Thrift or by HTTP POST), but in that direction lies the sins of SOAP.

In the past I've written a client-side caching layer for OData which was capable of doing the same automatic batching and partial cache fulfillment for hierarchical queries that they describe in the article. It is a good tool for writing complex client applications against generalized data services without giving up performance, and I'm not surprised that companies in our post-browser world are starting to move in that direction.

I'm a little bummed that Facebook is throwing its considerable weight behind yet another piece of NIH-ware, though. Beating up the REST strawman was a poor use of half of this article; I'd be much more interested to hear why we need GraphQL when there exists a standard like OData.


The reasons they list in favour of GraphQL vs REST and Ad HOC APIs are really convincing. Being a developer for an API that powers multiple mobile applications and a website this looks really interesting and would solve a lot of problems we have right now. Unfortunately I know it would probably take too much time to re-implement all our backend and clients to even think about using this in a near future.


People often talk in terms of there being only two layers of a webstack: The client/webapp layer and the server layer. I think what Netflix did (explained http://techblog.netflix.com/2012/07/embracing-differences-in...) is really the way to go.

It would be nice having custom adapter endpoints for your clients and devices that in turn fans out multiple calls on the backend border (if you are using a service based backend architecture), while still having the option of going directly to the service endpoints for third party integrations and what not. Having this adapter layer based on GraphQL would be neat, assuming one could break down GraphQL queries to individual REST based endpoint calls.


What are the alternatives to using something similar to GraphQL and Relay now? I've found Transmit [0], but what are the other alternatives? Has someone tried to integrate BreezeJS [1]? Also, any news about the upcoming Falcor [2]?

[0] https://github.com/RickWong/react-transmit

[1] http://www.getbreezenow.com/

[2] https://www.youtube.com/watch?v=WiO1f6h15c8

P.S. Also, GraphNoQL [3] came out quickly after the announcement, but there's been no progress ever since.

[3] https://github.com/lutherism/GraphNoQL


If you only care about iOS and a layer on the backend that can sit between your actual app server and the client you could consider Jetstream:

https://github.com/uber/jetstream

https://github.com/uber/jetstream-ios

It's a considerably different model however. More about realtime updates.


Thanks! I'm definitely gonna look into it!


Forgot to add the following: JayData [0], engen [1], and Astarisx [2]?

[0] http://jaydata.org/

[1] https://github.com/storehouse/engen

[2] https://entrendipity.github.io/astarisx/


FYI - I enjoyed a podcast recently on GraphQL/Relay - http://devchat.tv/js-jabber/152-jsj-graphql-and-relay-with-n...


Thanks!


I don't see why using something like GraphQL should push REST out of the way. I want to create and manage my resources, publish them on the web via a RESTful API - and also provide a way for the user to query those resources in a meaningful way with just one call. That's exactly what I'm doing today - but with a proprietary language (and querystring - which is not ideal).

I see this as the perfect companion for REST and I hope it will be standardized. Kudos


Excuse me for being unreasonably giddy but I've been eager as hell for this to be released for a time. I have grown to adore React and React Native and really enjoyed Flux. Facebook dev hitting on all cylinders!


If you can't use GraphQL due to some reason then there is an alternate - http://www.getbreezenow.com/breezejs


Well, we have to wait a few months for GraphQL to get open-sourced, but, anyway, I find BreezeJS better personally as defining GraphQL in strings is more old-school than using promises or generators.


Since there is already a mix of predicates within the query (for sub objects) they should have unified the syntax. Something like:

user {

    id: 3500401,

    name,

    isViewerFriend,

    profilePicture  {

      size= 50,

      uri,

      width,

      height

    }

  }
As shown you could use a different indicator for filter properties that should not be included in the serialized object graph.


Yeah, it's odd. It's not clear why they're just not using json... json will give you an homoiconic language of arbitrary descriptive power.


This looks seriously useful, but I'm having a hard time seeing how something like Relay will play ball with, say, vector clocks, or client-side undo, as it sounds fairly welded to the Component.

Maybe the idea is to do that all on the server? Or in the Store? Very curious to see the implementation.


This is great. Can't wait for the actual release. One question I have is how GraphQL/Relay works for writing/modifying data on the server?


1. How much more complicated does this make the server? Seems like some pretty fancy code would have to be written to turn GraphQL tot SQL.

2. I think the biggest issue with HTTP verbs is that there is not an flexible way for the client to control the data that comes back from server. GETs dont have a body and the other verbs are for add/mod of data. I'm assuming that they are using POST for everything.


It is absolutely not that simple on the server, but we hope do to a lot of the heavy lifting for you via our open source release of code and spec in terms of lexing, parsing, and executing a query.


Can you comment on whether or not you'll be releasing any implementations and if so, what languages they'll be in?


+1, knowing the languages of the open sourced implementations would be very helpful.


Basically an ORM with JSON request/responses, isn't it?


So is Relay an alternative to Flux or how are the two related?



I read that already :) thanks.

The question I have is rather whether Relay is an alternative to Flux or that the two are meant to be used in conjunction. Also, which usecases would make Flux better and which Relay (if they are not to be used together).


Relay is one specific implementation of the Flux pattern.

Flux is a generalized pattern and there are a number of libraries out there which implement their own flavor of Flux. Relay is one of those libraries implementing the Flux pattern which focuses on describing data dependencies in the same place the data is used.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: