1. Emphasize synchronization over imperative API calls. Imperative APIs encourage data silos. They are the underlying technical part of the problem that Zapier and ITTT try to solve. See [0] and [1] for some ideas.
2. Allow users to submit "agents" rather than "requests." An agent is a small program given extremely limited resources (limited process/memory/runtime budget) that you can safely run on your server. It is a possible answer to many of the standards/protocols that wouldn't exist if frontends were running on the same machines as backends. [2]
3. Emphasize composition over integration. Functions compose (nest, recurse). Event emitters don't. As long as APIs are built to be integrated rather than composed, making them work together is a full-time or full-company job (eg. Zapier).
4. Make things immutable. Immutability allows APIs to "play" with one another without fear of setting off the nukes (ie. side effects). It's possible that this approach would make it so that integrating two APIs becomes a job for ML/AI rather than humans.
Agents should use a limited, non-Turing-complete model of evaluation. They could combine API calls locally and run simple FSMs which are easy to formally check for e.g. absence if loops. This could save round-trips without needing to sacrifice general orthogonal APIs.
Imagine that you could give an app server a formula to combine several API calls, much like you give an SQL server a formula to join and filter tables.
Also, you can easily cache agents by calculating a hash of their source, and call them repeatedly without resending their bodies, much like "prepared statements" work in SQL databases.
> Agents should use a limited, non-Turing-complete model of evaluation.
That doesn't solve any of the caching worries. If it's not a resource then it's not cacheable.
> This could save round-trips without needing to sacrifice general orthogonal APIs.
Since the inception of HTTP2, a relative large number of requests of cached resources easily beat a relatively small number of requests that are computationally demanding or hit a database.
Is it really much different? In the data flow you're just conceptually moving the network from being after the agent to being before the agent. You can still have the cache in the same spot, it's just that the agent is hitting it directly now instead of over the network.
As far as the actual data access goes, it is equivalent, but you can't cache the agent process itself. So every agent is potentially repeating work centrally, rather than spreading that load out to each client. That may still be a worthwhile win, but it is a new burden on the data centre.
> Allow users to submit "agents" rather than "requests."
Having dealt with something like this, I can assure you that no one wants to learn a new scripting language, which only runs on someone else's machine, and no ability to debug it.
I agree things that can be immutable should be, but so much functionality requires mutability. State has to change, or the internet will be a very boring place. If I am shopping, I need to be able to buy something... Facebook users need to be able to post their updates, upload photos,etc.
Am I missing something, or do all of these use cases preclude immutability?
I think it's about delaying the side-effect until `transaction.commit()`, and implementing as much of the business-logic as possible to happen inside the transaction.
Virtual worlds (a model of the real world) could also be useful. Like, let the side effect run in the virtual world and let me inspect that virtual world. Here, "me" could be a programmer figuring out how to code against your API, or some ML thing trying to figure out how to make two APIs talk to one another.
> Allow users to submit "agents" rather than "requests." An agent is a small program given extremely limited resources (limited process/memory/runtime budget) that you can safely run on your server. It is a possible answer to many of the standards/protocols that wouldn't exist if frontends were running on the same machines as backends.
Forgive my ignorance, but how would this work in practice? Would agents be arbitrary code that a client would submit to an API, or would the agent be something that the server itself implements and manages in the backend?
EDIT: after some further reading of the mentioned threads, it appears to me like the idea is that we ship data with their own interpreters of some kind? I can help but fear this will cause even more JavaScript code to be pushed to the server. :)
I think at a minimum, agents are programs that clients submit to the server. The server runs the agent with a very limited API and likely as a pure function (lang could be JS or wasm maybe). The agent could run in response to events on the server (like webhooks), or in response to requests coming in from the client.
Effectively we are talking about something on a par with stored procedures, but hopefully a bit nicer and portable.
You would of course want to apply at least as much thought to the design of the agent as you would to a prepared statement, so that they can be reused. And that would probably mean, like a prepared statement, that you are sending a query and some config for that query as two separate inputs, one of which is much more stable than the other.
As a founder of a startup whose Mantra is sync, I couldn't agree more. No API developer looks at API design from this angle. In fact, in a few cases, the design actively prohibited these use cases (for obvious reasons).
Another aspect is intimating error conditions. One would be amazed on the myriad different ways an API fails and many do a poor job of conveying what went wrong, and how to recover, if possible.
I'm curious, what startup that is? FWIW, I think there is a large opportunity here (make sync+collab easy => ecosystem of products that tightly sync) that can be disruptive to many SaaS businesses.
While these ideas (imperative vs sync, immutability + idempotency) a great in theory, I have yet to see anyone putting them in practice while still keeping a great developer experience - that includes Zapier. I guess Stripe is doing this more or less well?
I just wish people implemented their webhook systems well (a popular standard would be nice as well) - REST is fine, it mostly works and it's pretty standard.
This is also something that lacks in openapi - a standard way to describe websockets/SSE would be nice. Being able to listen for remote changes cuts down on API calls and leads to more responsive changes.
(and ideally the webhook and websocket pushes and websockets event pulls look as much alike as possible, so you can switch between them easily as the situation warrants)
Synchronization is still an open problem. CRDTs for example are still evolving fast [0]. I'd imagine the idea would be more doable when the underlying sync primitives mature.
Immutability is a different beast, but it appears that decentralization/p2p is working on it through necessity.
I've coincidentally thought a lot about agents as you describe them, but ultimately it's been hard for me to come up with many real use-cases for them in common scenarios.
In the context of regular REST API requests, it seems like all they're good for is batching requests together and filtering the results down to exactly what the client wants. That can be more directly accomplished in most cases with a query language like GraphQL or SQL. An SQL query is itself an example of an agent you send to a database server. Another option is an API for batching requests and specifying which fields you want on the responses (Google APIs support this).
I can imagine cases where users want to make potentially long-running queries against a large dataset with a richer agent than a regular query language supports (I'm imagining examples where the dataset is something like world terrain and navigational data, or the live state of everything in a large active multiplayer videogame world), but most of these cases I've thought of would strongly benefit from the data being pre-processed to have indexes so that the individual queries would be more efficient and not so long-running. ... Okay, actually that suggests a pretty radical and interesting example for an agent use-case: the client submits an agent to run on the server which reacts to data updates and maintains a set of indexes on the server, and then the client can issue requests directly to their agent running on the server to query the pre-processed data. (The dataset host may charge the user for the cpu+memory resources to continually execute the agent and for the storage resources to keep its indexes.) In the case that multiple users submit the same exact indexing agent to run over the same input data, the server could execute the indexing agent just once, and then offer a discount to the users for coordinating.
Another use-case for agents also involves retained long-running agents, as a replacement for webhooks. Currently I imagine a lot of service integrations look like this: service A is configured with a webhook to Zapier, and on receiving the webhook, Zapier calls back to service A to get some more data related to the webhook, and then calls service B to do something with the data from service A. This adds extra latency and a third party (Zapier) as a point of failure. If instead of a webhook, service A supported users submitting an agent that gets called with dataset updates, then a user could submit an agent to service A which processes update events and directly contacts service B. This cuts out all the roundtrips with Zapier. (If the agent only contacts service B on some events, then doing this would eliminate most communication between service A and the outside world!)
Related, I think agents fit in a lot of cases where latency is important:
* An obvious example is for a bot in an online bot-vs-bot game (like https://screeps.com/). Having the bots be implemented as agents submitted to the game server and executed there is a lot more efficient, dependable, and low-latency than having everyone implement their bots as clients on their computer which they must leave on and connected to the game server as long as they want their bot to be active. (Additionally, having the agents be under the game server's control allows the game to have mechanics involving making a temporary fork of the agent and see how it reacts to certain possibilities, without giving the mainline agent or the player the ability to know what hypotheticals it was tested on.)
* In a spreadsheet, you could consider a cell's formula as an agent. For comparison, consider the very naive solution of implementing formulas in a spreadsheet web-app as a webhook: the user implements the formula as a web service on their own server, and then enters the URL to their web service in the spreadsheet as the formula webhook for a cell. Every time the spreadsheet wants to show a formula result, it calls the web service (passing the formula inputs in the request) for it. This means when the user edits one of the inputs, their computer makes a request to the spreadsheet web-app server with the new input value for the new formula result, and then that server makes a request with the input values to the user's web service. By having the formula instead be an agent that's submitted to the spreadsheet web-app server, then obviously the roundtrip to the user's web service can be cut out, but you can go even further and cut out all the roundtrips on edits by having the spreadsheet web-app server send the agent directly to the client to execute there on changes. Then a user tweaking values in the spreadsheet can immediately see how it affects the formula results without waiting on communication to any servers.
Another use-case for agents I've thought of involves enabling account delegation with arbitrary restrictions: imagine I want to have a counter in a spreadsheet on google docs that I want to allow my friend (or an external service, etc) to increment. I don't want to allow my friend to be able to access anything else about my account, view the spreadsheet, or change the counter in any way other than increment it. Maybe even I want to enforce a rate-limit. The standard solution for this would be for me to make a web service running on my own server which has an "increment" endpoint, and my web service has access to my google account credentials (or maybe an API key that gives it access to fully read+write my spreadsheets). This isn't great because it means I need to maintain a separate server, it's an added point of failure, it's an extra hop of latency, and if it's hacked/stolen/etc, then someone gets access to my account that lets them do more with my account than increment a value. Instead, it would be convenient if I could upload an agent that lives next to my Google docs data that exposes an "increment" endpoint. An agent-created endpoint like this could be set to be either publicly available, available to certain users, or available to requests with a specific API token.
---
I think the biggest point of agents is that they let the logic come to the data. It seems like the number one practical motivator is latency, though there are side benefits like increased availability for the logic and allowing users to avoid needing to source their own always-on webserver or external service just for their logic to live on. It is a big paradigm shift though, and I'm not sure most of my use-cases are enough to warrant it alone. The very latency-sensitive examples like the formula cell and game bot examples seem obviously useful to be worth it, but the rest seem like they would only be done if we had very high latency networks (maybe a deep space network would find these strategies useful) or if we had a widespread computing model already in place that made it easy and safe for services to implement.
Maybe HN is not suitable for such long threads, but we have an online forum where we talk about such ideas and I think people would be interested to discuss what you've written here. Shoot me an email (my handle at gmail) if you'd like to join and continue the thread there.
Agree, the author doesn't paint the picture of what might be coming. Would love to hear from the HN crowd on what's coming next in API design. I see a lot of buzz around grpc, will this be the next standard?
- optimise the amount of data you send (see graphql/this persons agent idea)
- optimise where you send it to/from
The latter has a hard limit of c, which we’ll always try to move towards. Distributed computation helps, but trades off speed/consistency. The question then becomes whether you can have an inconsistent model for those few seconds.
You invite the problem of eventual vs strong consistency, split brain issues, etc. There's a reason that the two hard problems in computing are naming things, cache coherency, and off by 1 errors.
At the risk of being considered non-constructive, I wish the author made a leap of faith and shared their educated guess regarding where the API engineering is moving to.
I appreciate the effort of sharing a perspective, but I would have warmly welcome a bit of prospective.
That being said, nice prose and interesting article nonetheless.
It's a pretty good article but It's odd that auth wasn't mentioned in the article. If chatty clients are a concern then I would expect that token-based auth to be bundled with the problem.
> To work around these complexities, Google built Spanner, a database that provides strong consistency with wide replication across global scales. Achieved using a custom API called TrueTime, Spanner leverages custom hardware that uses GPS, atomic clocks, and fiber-optic networking to provide both consistency and high availability for data replication.
I find it interesting that most of this complexity simply falls away if users host their own data. In my estimation, most people's computing needs would best be satisfied with a smartphone + a raspberry pi in their house hosting their data, protected by a simple auth scheme, and accessed using simple protocols built on HTTP. That would be more than enough to access all their photos and videos, documents, and social feed for their few hundred friends to consume. Things like email would probably still best be handled by the one cousin in the family who works in IT, to manage spam etc.
If only the technical side were the actual problem.
This depends on the user; tech-savvy users may prefer a self-hosted version, especially if it installs in a few clicks. But they are outnumbered by IT-naive users whose only realistic option is the hosting by a vendor.
> tech-savvy users may prefer a self-hosted version
Not necessarily. I'd hate to self host stuff. It's a maintenance burden, time wasted getting stuff to work with all of its complexities, and once that's done you need to make sure it keeps working, that bills keep being paid, and you're responsible for drive failures, backups, missing a bill, etc... And that's despite my software and sysadmin experience.
I don't think it has anything to do with "tech-savvy". Hosted is just the better option in almost all cases, especially at individual or small-medium scale.
I actually think the converse to the typical self-hosted statement is true: only a small number of tech-savvy users like to self-host. They just happen to be a vocal minority.
We could improve the convenience of self hosting. And we should. The pendulum always swings, we should start thinking about what we want it to look like, or somebody else will.
A smartphone + raspberry pi does not provide any level of backup, so it is a disastrous setup for important data, such as pictures. Adding backup (and restore) functionality is pretty non-trivial. Adding high-availability ups the level of difficulty even more.
Not to mention, a lot of computing is not done with your own data. Sure, my pictures could stay in my house, and I could access my friends' pictures with relatively simple APIs. But I also need to work with Wikipedia, with StackOverflow's database of answers, with Amazon's database of products, with YouTube's and Netflix's video databases etc.
Maybe room for a middle-ground compromise - "street hosted data", some kind of deployed turnkey micro-datacenter that only accepts traffic from physical networks within a fixed radius / whitelist, etc.
So you get some economies of scale ... I wonder what the "breakeven topology" would look like.
Yup. It's an important topic, but I've been designing APIs for decades; just not the kind he talks about. I do things like device control APIs, these days.
The ones that I do have some issues that he doesn't cover, and a whole lot of issues that he discusses are of no concern to me.
In the APIs of tomorrow clients should ask servers to prepare operations without executing them. Servers should return ids which clients may embed within follow up operations, effectively allowing a client to construct a complex request out of simple operations and combinators.
Eventually the client will ask the server to evaluate the request and produce a result.
This lazy style of execution is what effect systems like https://zio.dev support where a program constructs an effect which is executed later by the runtime.
I've attempted building this sort of effect-based API over protocols like ReST a few times without much success. The biggest problem isn't technical. The hardest problem is convincing coworkers and api consumers of the value of this approach.
1. Emphasize synchronization over imperative API calls. Imperative APIs encourage data silos. They are the underlying technical part of the problem that Zapier and ITTT try to solve. See [0] and [1] for some ideas.
2. Allow users to submit "agents" rather than "requests." An agent is a small program given extremely limited resources (limited process/memory/runtime budget) that you can safely run on your server. It is a possible answer to many of the standards/protocols that wouldn't exist if frontends were running on the same machines as backends. [2]
3. Emphasize composition over integration. Functions compose (nest, recurse). Event emitters don't. As long as APIs are built to be integrated rather than composed, making them work together is a full-time or full-company job (eg. Zapier).
4. Make things immutable. Immutability allows APIs to "play" with one another without fear of setting off the nukes (ie. side effects). It's possible that this approach would make it so that integrating two APIs becomes a job for ML/AI rather than humans.
[0] https://github.com/braid-work/braid-spec
[1] https://writings.quilt.org/2014/05/12/distributed-systems-an...
[2] https://news.ycombinator.com/item?id=23900749