Best Practices for Designing a Pragmatic RESTful API

teddyh · on Dec 15, 2019

> Should the media type change based on Accept headers or based on the URL? To ensure browser explorability, it should be in the URL. The most sensible option here would be to append a .json or .xml extension to the endpoint URL.

I disagree with this. The most elegant solution is to use Accept headers, and you should therefore implement that. Of course, since those are hard to use from a browser, you should also solve that problem, but solve that problem separately. I usually do that by supporting an extra ?type=application/json query parameter, which internally the server-side code converts to an Accept header, which is then interpreted normally. Note that I use the media type, not a possibly ambigous “.json” extension. Would “/foo.json” mean that the data is of type application/vnd.hal+json or maybe application/vnd.api+json? Who knows?

IMO, file name extensions do not belong in URLs. URLs were never meant to be files, and we should try to avoid .html, .cgi and .php in our URLs. See also Cool URIs don't change from 1998: https://www.w3.org/Provider/Style/URI

wwweston · on Dec 15, 2019

Using Accept headers is the right thing, for the reasons you specify, and also the additional reason that they allow for the client to specify a list of possibilities that can be negotiated down to a result (server doesn't understand `application/vnd.hal+json`? maybe it can still send you `application/json`).

That said: the extension-implies-media-type approach may not be right for an application server that renders a resource on the fly, but it does seem to have a place in the specific kind filetree-via-http web server, where the resource specified by a given URL is already rendered to a specific media type, and the server really only has two choices for figuring out what that type is: parse the file (potentially expensive) or apply a heuristic set up in the server configuration to the filename (potentially not specific enough or outright wrong). Neither choice is necessarily wrong for that subcase.

I'm iffier on saying as much for media-type-in-query-string method. It's easy enough to use a client built for sending headers like Postman or curl or to augment common browsers with extensions that using the understood HTTP convention seems like the right thing for most cases. The only exceptions I can think of would be those where debugging a media type specific issue needs to happen on machines devs don't control. Needing to debug issues on machines devs don't control is common enough, but issues specific to rendering one media type should be rarer, and the intersection of both of them should be vanishing unless something isn't right elsewhere in the dev process.

Bombthecat · on Dec 15, 2019

Also, rest is interesting because it solves a lot of "base protocol" questions. Accept header is part of standard http.

So that should be the default way.

Supermancho · on Dec 15, 2019

That isnt a problem. Your API docs are the solution. Mapping communication protocols on to your applicarion for zero benefit over any other strategy. Yhis idea has been a hindrance at various cargo cult teams for decades. At least the RFC doesnt try to fool others into the fantasy.

PretzelFisch · on Dec 15, 2019

I don't agree that there should be browser explorability. I don't understand why that would even be a concern. When I explore it's from documentation, postman or curl. The browser is kind of horrible for this why support it at all?

detaro · on Dec 15, 2019

If "browser explorability" is important, this article fails to discuss it properly when it comes to authentication, too.

rumanator · on Dec 15, 2019

Great sunday reading. However, in the section "But how do you deal with relations?" the author presents nested resources as a best practice. I don't feel this is the best course of action. Instead of nesting message resources in, say, `/tickets/12/messages/5` shouldn't a better approach be to store them in `/messages/5` and keep `/tickets/12/messages` as a collection of IDs or summary resources? I mean, messages are a separate entity which might even be moved to a dedicated microservice. Why is it a good practice to nest them within an API?

arethuza · on Dec 15, 2019

"might even be moved to a dedicated microservice"

Allowing for scenarios like that would actually be an argument for HATEOS (where, at least in theory, URLs are treated as opaque).

NB I have tried designing using HATEOS and I'm not a huge fan - but it would help in this case.

avereveard · on Dec 15, 2019

there's a trade-off involved in latency size and server hits

shipping name and description of some resources often read together allows to minimize cost and optimize performances, up to a point

ultimately the client knows what the UX needs and it's in the optimal position to ask for the minimal resources needed, hence graphql et al.

but that leaves the rest API designer in a rut, where to make the call for nesting and where for searching related?

if I had to draw the line, if the related entity has a non null foreign key (or the nearest applicative equivalent) toward the parent it's a prime candidate for nesting

rumanator · on Dec 15, 2019

> there's a trade-off involved in latency size and server hits

I might have not conveyed the point adequately, but my point was orthogonal to networking or HTTP calls. I was referring to how resources were being needlessly nested, thus leaking direct dependencies wrt other resources. More specifically, although tickets might refer to messages, I didn't understood why nesting messages within a ticket passed off as a best practice. I mean, if ticket already provide a collection resource of message resource IDs, why is the path to message resources being defined specifically with regards to specific tickets?

bsaul · on Dec 15, 2019

I think there’s some confusion on what the original problem is: how do you get the ticket, together with its messages, in the minimum number of api calls ? That’s why parent post mentions latency and graphql.

The other problem : « having already queried the ticket, how do i get its messages », does indeed allow for various answers : tickets/Id/messages , or /messages?ticketid=xx or even /messages?ids=a,b,c,d are valid, depending on how orthogonal tickets and messages are, and don’t depend on latency, indeed.

jayd16 · on Dec 15, 2019

Some questions that aren't answered in the article:

How do you structure batch operations like creating multiple things? Is the current answer to make N queries and hope you're using http/2?

Whats the best practice for media types in a json api? Should every object type have a specific media type or maybe just a normal and error wrapper type? Seems like most of the big tech apis don't actually get more specific than 'application/json'.

veesahni · on Dec 15, 2019

> How do you structure batch operations like creating multiple things? Is the current answer to make N queries and hope you're using http/2?

OP here. Two approaches:

1. Have a special endpoint (POST /batch) where you send an array or requests [ {method: "", path: "", body: ""} ] and get an array of responses

2. Yes, assume HTTP/2

Based on conversations with teams who've based their API designs on this post, I've previously recommended #1. But it has it's flaws - a request that's dependent on response from another can't be part of the same batch.

Today, I'd say it makes sense to make sure you implement your API with HTTP/2 and reduce per-hit rate limit "costs" for clients connecting with HTTP/2. This way, you're encouraging HTTP/2 adopting for heavy API users.

Note, if you need to batch only on GET, then something like GraphQL is also interesting.

jayd16 · on Dec 15, 2019

>Have a special endpoint (POST /batch)

This is what always seems hacky to me. Why don't we start with batch handling. Why shouldn't every (POST /resources) accept an array of new resources? We've already resigned to using plural everywhere.

HTTP/2 would be nice, but as a dev that has to serve Unity client, we can't even design APIs that require PATCH.

wwweston · on Dec 15, 2019

> Should every object type have a specific media type

Probably not. Consider that a given object in a system might have a json representation, an xml representation, an html representation, etc. Media types are less about the in-system representation or type for a given entity, and more about how the entity is rendered. Many different in-system entities might be represented by the same media type, a single in-system entity might be rendered as several media types.

(In practice, yeah, application/json will cover you, though application/vnd.api.vN+json gives you some room to maneuver negotiate version/app specifics with various clients using Accepts headers.)

thexa4 · on Dec 15, 2019

If you're interested in API design the upcoming RFC standard might be of interest: https://tools.ietf.org/html/draft-ietf-httpbis-bcp56bis-06

avdempsey · on Dec 15, 2019

While this link is very interesting, its advice doesn’t seem to completely pertain to the kind of “single deployment” APIs most of us are probably making.

From the draft: “This document specifies best practices for writing specifications that use HTTP to define new application protocols, especially when they are defined for diverse implementation and broad deployment (e.g., in standards efforts).”

That’s not to say there aren’t useful ideas here (I found it very interesting in its own right), but the provisions against fixed URL schemes is followed by no commercial HTTP API I’ve ever seen.

rumanator · on Dec 15, 2019

Your link is a bit stale. Here's a link to the latest version of the draft, published October 31st 2019:

https://tools.ietf.org/html/draft-ietf-httpbis-bcp56bis-09

corentin88 · on Dec 15, 2019

It’s great that a standard will eventually be set on API. So many different ways of using them so far.

bradleyjg · on Dec 15, 2019

An API that uses the Link header can return a set of ready-made links so the API consumer doesn't have to construct links themselves. This is especially important when pagination is cursor based.

In the header is nice because then there’s no need to parse the payload to get the next page. But better still is to avoid cursor based pagination. Instead give me a cheap endpoint to get the total number of results and the configured max results per page and have constructable urls. (E.g. “?page=4” or “?offset=500”). This way generating all the urls can be a completely separate process from pulling the results.

mtsr · on Dec 15, 2019

That's indeed the nicer way of paginating, but it breaks if the underlying resultset changes between requests. Which is exactly when cursor based pagination is generally used.

bradleyjg · on Dec 15, 2019

If the changes in the resultset are additive it's no problem as long as they are sorted in such a way as new results go to the end (which the api should at least make an option if possible). Updates to data within results may be a problem because you can end up with a dataset that has a view of the world that doesn't represent any particular time, but in many cases are safe. Deletions screw everything up and should be avoided if possible.

The general solution to this problem is to allow as part of the query some particular time that you want the results to reflect the state of the world as of, but that's obviously going to be expensive to serve.

ptman · on Dec 17, 2019

How about keyset pagination instead of limit & offset? https://use-the-index-luke.com/no-offset

bradleyjg · on Dec 17, 2019

I don’t like it because it introduces a linear dependency for every step on the ingest on the all the prior steps.

It’s convenient for the server but not the client.

caseysoftware · on Dec 15, 2019

+1 to what your other response said

You have to remember the goal of pagination: to move through a collection of results sequentially. If your underlying page is constantly changing (as the other response noted), then you have NO way to know what should be your next intended offset/page to move either or back.

A simple sort with "results always go here" seems like a good approach but now you're packing additional understanding into using your API which is totally out of band with it. Or using a different sort blows it up rending that approach useless.

Cursors are the only approach that actually accomplishes the goal.

bradleyjg · on Dec 15, 2019

If the underlying data is constantly changing, how do cursors solve that problem? The only guarantee I get asking for the next page after a given item is that it won't contain the last item I've already seen. There's no other inherent guarantees. The page could contain all items I've already seen, early pages (in this new version of the underlying dataset) could have items I've never seen, and so on. It's as arbitrary as page numbers but without the corresponding convenience.

rumanator · on Dec 15, 2019

Regarding API versioning, FTA:

> There are mixed opinions around whether an API version should be included in the URL or in a header. Academically speaking, it should probably be in a header. However, the version needs to be in the URL to ensure browser explorability of the resources across versions (remember the API requirements specified at the top of this post?).

I don't see how this is relevant with a RESTful API. In REST, resources are transparent and found through HATEOS/autodiscovery. Thus it's really irrelevant if the URL found through HATEOS includes an API version or not.

However, URL versioning is a indeed a side effect of having multiple services dedicated to serve each version of an API.

In the end, it doesn't feel like path/media type versioning is a relevant issue because it's not an either/or type situation but actually complementary.

PretzelFisch · on Dec 15, 2019

When you see versioning it's a pretty good hint that they use is as a buzz word and it's just RPC wrapped in REST clothing

rumanator · on Dec 15, 2019

RPC-over-HTTP passed off as REST is indeed a nuisance but unfortunately that doesn't get rid of the need to support multiple versions of the same API, specially if you don't control which clients are consuming your services.

injb · on Dec 16, 2019

It's worth reading, but his arguments against using hyperlinks are vague and frankly weak. He says there's not much to be gained because you can't make "significant" changes without breaking client code; that may be true for certain values of "significant", but the bottom line is that you can change a lot more without breaking clients if you use hyperlinks everywhere, than if you don't.

It's not totally clear what he means by "not ready for prime time" but HATEOS has been achieving what it was designed for for a long time - that is, reducing the interdependence between client and server code.

It's also worth noting that using server-generated links everywhere eliminates the need for an entire category of documentation, and makes debugging much more pleasant and efficient (especially when the person debugging didn't write either the client or the server code).

Honestly, if there's one feature that determines whether an API should be described as RESTful or not, it's this. Think very carefully before building an API that requires the client to know how to build URLs!

JamesSwift · on Dec 15, 2019

This has been my goto reference for API design, and is what I send others when discussing API design, for years now. Nice to see it pop up here.

ptman · on Dec 17, 2019

RFC7807 for JSON errors instead of everyone inventing their own schema https://tools.ietf.org/html/rfc7807

jbjohns · on Dec 15, 2019

This has nothing to do with REST (I realize it says RESTful but I wish that term would die as RESTful has nothing to do with REST either). It seems to re-invent OData as well.

I don’t know how it has gone so wrong with REST as it seems pretty easy to understand: it’s how your browser has always worked. Your browser doesn’t know any URLs at all (yes I know about the search engines but that’s configuration). What it knows are data types. It can be taught new data types (e.g. PDF) but not new URLs because it doesn’t know any. So if you want a REST API you need to be designing data contracts between client and server. URLs are a server side implementation detail and completely irrelevant to the discussion.

The advantage to this kind of architecture is that the server and clients can develop in a more more decoupled manner. They need only agree on data types. A new web site existing never requires rebuilding a browser. Only if a new kind of data (e.g. HTML5) is to be supported.

But one thing to be considered, as with all architectural patterns, is if it fits the domain you’re architecting. REST isn’t optimal for any possible problem. Sometimes HTTP-RPC (what the article describes) is an easier fit. But once you realize and accept this you no longer have to follow standards that seem not to make sense for what you’re doing (e.g. HATEOAS which is done automatically if you’re really doing REST, and seems so inefficient if you’re not).

duregin · on Dec 15, 2019

gzip + ssl is still (and will always be) a risky choice, isn't it?

Do any of the newer compression algorithms fix that problem?

deathanatos · on Dec 15, 2019

It depends.

The problem was that SSL supported compression directly, so you could compress the encapsulated stream. What happens in HTTP is that, say the cookie header contained the user's session cookie, and the body was somewhat controllable by an attacker. (E.g., by making CORS requests in the background.) The attacker could repeat "Cookie: auth=a" many times; if your auth cookie started with "a", it would compress slightly better as both could get compressed together, things would be slightly faster, and an attacker could use timing information to discern that he'd gotten the first character correct, and move on to the second.

See: https://en.wikipedia.org/wiki/CRIME

HTTP compression being mentioned in the article only compresses the body. It's still possible to execute the same sort of attack situationally if there's some part of, say, a response body that an attacker can control and a part that response body that the attacker doesn't control and is sensitive and wants to know and somehow only has access to the timing information.

While there is a Wikipedia article on this variant ("BREACH"), I think this is more informative: https://security.stackexchange.com/questions/20406/is-http-c... ; it lists a decent example of trying to get at a CRSF token.

But generally, JSON responses don't mix secret data + attacker controllable data, I feel, so compression should usually be okay. (And IME, it's typically done.) SSL/TLS compression should usually be left off, as that seems much easier to exploit.

Supermancho · on Dec 15, 2019

Anything that includes put patch delete is not pragmatic. Changes have side effects in any nontrivial system, making the semantic goal little more than wishful thinking. As others (even within the first few comments) figure out, being able to specify every mutable element from a GET is the most pragmatic quality of a modern web api.