Hacker News new | past | comments | ask | show | jobs | submit login
JSON Patch (zuplo.com)
300 points by DataOverload 34 days ago | hide | past | favorite | 160 comments



I quite like JSON Patch but I've always felt that it's so convoluted only because of its goal of being able to modify every possible JSON document under the sun. If you allow yourself to restrict your data set slightly, you can patch documents much simpler.

For example, Firebase doesn't let you store null values. Instead, for Firebase, setting something to null means the same as deleting it. With a single simple restriction like that, you can implement PATCH simply by accepting a (recursive) partial object of whatever that endpoint. Eg if /books/1 has

    { title: "Dune", score: 9 }
you can add a PATCH /books/1 that takes eg

    { score: null, author: "Frank Herbert" }
and the result will be

    { title: "Dune", author: "Frank Herbert" }
This is way simpler than JSON Patch - there's nothing new to learn, except "null means delete". IMO "nothing new to learn" is a fantastic feature for an API to have.

Of course, if you can't reserve a magic value to mean "delete" then you can't do this. Also, appending things to arrays etc can't be done elegantly (but partially mutating arrays in PATCH is, I'd wager, often bad API design anyway). But it solves a very large % of the use cases JSON Patch is designed for in a, in my humble opinion, much more elegant way.


The article has a section at the bottom "Alternatives..." [1]. It links to "JSON Merge Patch" which is what you are describing: https://zuplo.com/blog/2024/10/11/what-is-json-merge-patch

That's the format that people tend to naturally use. The main problem is that arrays can only be replaced.

[1] https://zuplo.com/blog/2024/10/10/unlocking-the-power-of-jso...


json merge patch is pretty good. I think it just needs an optional extension to specify an alternative magical value for “delete”. null is a pretty good default, and comports well with typical database patterns, but is outright bad for some things.

I think it also needs a “replace” option at the individual object update level. Merge is a good default, but the semantics of the data or a particular update could differ.

You’re almost surely doing something wrong if replace doesn’t work for arrays. I think the missing thing is a collection that is both ordered and keyed (often not by the same value). JSON by itself just doesn’t do that.

So maybe what’s missing is a general facility for specifying metadata on an update, which can be used to specify the magical delete value, and the key/ordering field for keyed, ordered collections.


Once you add all that, it loses "no need to learn something new". At that point, I think I'd just go with JSON Patch which solves all of these, and more.


> You’re almost surely doing something wrong if replace doesn’t work for arrays. I think the missing thing is a collection that is both ordered and keyed (often not by the same value). JSON by itself just doesn’t do that.

Yeah, you could assign an identity value to each element of the array and then use a subresource to manipulate those elements by identity value. Then you could PUT using the same JSON merge mechanism to clear individual fields, and you could DELETE to remove items from the array by subresource.

This just seems like a reinvention of a crufty piece of XML.


I just mean a mechanism for specifying metadata, and a little metadata.

Off the top of my head, an optional header like "MergeMetadataObjectPropertyName: @mergeMetadata"

Which would cause objects in the merge containing the property "@mergeMetadata" to be treated specially.

The merge meta data could (optionally) specify an alternative to null for the special delete value. Or (optionally) specify the key value for an array representing an ordered, keyed collection. (Or, possibly, to specify the order value for an object used to represent a keyed, ordered collection.)

I guess you could just do without the header and specify the metadata using magic values (in the same way null is used as a special value meaning delete), but it seems better to opt in to things things like that.

(IMO, json merge patch would have been slightly better if it had no special values by default, but it's not bad. "null means delete" is a small thing, you probably need delete regardless, and, anyway, the ship has sailed on that one.)


Nice! I gotta say I didn't expect a thing called "JSON Merge Patch" to be simpler and more concise than a thing called "JSON Patch" :-)


Same here, I actually made a tool for this called www.jsonmergepatch.com - give it a try


Also the first thing I was thinking. The only reason I can see for using JSON Patch is for updating huge arrays. But I never really had such big arrays that I felt the necessity for something like this.


What if you represented arrays as recursively nested triples like '(1 2 3 4 5) as [[1 null 2] 3 [4 null 5]]. Then you could parch the tree of triples much more succinctly. You might have to disallow nested arrays but this would as bad a restriction as disallowing nulls as map values. You could append and delete array indexes better. Or maybe make it an 8-way tree like Clojure does for its vector representation to condense the patch further.


I wonder if adding a merge op would be a viable option e.g.:

    { "op": "merge", "path": "/", "value": { "score": null, "author": "Frank Herbert" } }
It's kind of nice to retain the terse and intuitive format while also gaining features like "test" and explicit nulls. It's of course not spec compliant anymore but for standard JSON Patch APIs the client could implement a simple Merge Patch->Patch compiler.


> appending things to arrays etc can't be done elegantly

Are you referring to the possibility to point to the end of the array? If so, a single minus sign might solve it: "/path/to/the/array/-"

RFC 6901 JavaScript Object Notation (JSON) Pointer > exactly the single character "-", making the new referenced value the (nonexistent) member after the last array element


> I quite like JSON Patch but I've always felt that it's so convoluted only because of its goal of being able to modify every possible JSON document under the sun.

It seems like this is mainly a problem if you're implementing this _ad hoc_ on the client or server side -- is that right?

I mean: presumably most of the time that you want to either of these, you already have both the old and new object, right? Is it not straightforward to write a function (or library) that takes two plain objects and generates the JSON Patch from one to the other, and then use that everywhere and not think about this (but retain the advantage of "being able to modify every possible JSON document under the sun").

If there are cases where you're making a delta without the original object (i.e., I know I always want to remove one field and add some other, whatever the original state was), it seems like you could have nice helpers like `JsonPatch::new().remove_field('field1').add_field('field2', value)`.

I haven't actually done this so maybe I'm missing something about how you want to use these things in practice?

edit to add my motivation: I'd much rather having something robust and predictable, even if it means writing tooling to make it convenient, than something that _seems_ easy but then can't handle some cases or does something different than expected ("I wanted this to be null, not gone!").


Immer can generate patches from your changes: https://immerjs.github.io/immer/patches

I think I've seen this in other libraries too but I forget which.


> I mean: presumably most of the time that you want to either of these, you already have both the old and new object, right?

Hm, maybe? I'm thinking about this from the perspective API design, eg for REST APIs, JavaScript modules etc. How to let users change a single field in an object, and leave the rest alone? Like set the subject of a conversation? JSON Patch lets you do that, and so does this. And whether you have the target object already is kind of.. maybe? Sometimes? I wouldn't make that assumption tbh.


How does it handle arrays/repeated fields?

Eg: how do you update a field of the third element on an array when there are 5 elements in an array?

The same that in `jq` would be:

`jq ".people[2].age |= 50" example.json`


You overwrite the entire array. Replacement is a safe, idempotent operation.

If you're concerned about concurrency for array updates, you'll usually need a form of concurrency control whether the database supports partial updates or not.


Unless your array is only a relatively small part of the file this defeats the purpose of patching. And yes, concurrency is also an issue in simple cases (e.g. append/prepend) that json patch handles fine on its own.


There's no real delete, just add a status: archived for "deletion" and you 're fine, nothing else to learn.


Isn't this just application/merge-patch+json?

(RFC 7396)


`/` is a weird choice of delimiter for JSON.

Since JSON is a subset of JS, I would have expected `.` to be the delimiter. That jives with how people think of JSON structures in code. (Python does require bracket syntax for traversing JSON, but even pandas uses dots when you generate a dataframe from JSON.)

When I see `/`, I think:

- "This spec must have been written by backend people," and

- "I wonder if there's some relative/absolute path ambiguity they're trying to solve by making all the paths URLs."


You're not the only one who thinks that!

JSON Patch uses JSON Pointer (RFC 6901) to address elements, but another method from (very) roughly same time is JSON Path [0] (RFC 9535) and here's one of my favorite mnemonics:

- JSON Path uses "points" between elements

- JSON Pointer uses "path separators" between elements

[0] https://en.wikipedia.org/wiki/JSONPath


and json path is supported in Postgres as a way to query json documents. It’s surprisingly full featured!


I agree "." would make more sense than "/".

I actually think an array would be better, ["foo","bar"] for "foo.bar". How many many bugs are introduced by people not escaping properly? It's more verbose, but judging by the rest of the standard, they don't seem to be emphasizing brevity.


Definitely agree on the array. They were smart enough to use JSON for the patch format so that people didn't have to write custom parsers but then used a selector syntax that requires a custom parser (and serializer). It seems like an obvious unforced error.


Not only that; both a dot (“.”) and a forward slash (“/”) are allowed to be used in/as a JSON property name.

An array of keys alleviates this issue entirely because any string element of the array can be used as a JSON object property name too.


You would need to somehow differentiate between absolute and relative paths, though.


Also the escaping uses "~" as the escape character, and it escapes "~" and "/" as "~0" and "~1" instead of "~~" and "~/". This whole spec feels like it was written by aliens.


That I really don't like, it doesn't make any sense. "/" as a path delimiter feels totally ok though. I mean, it's path, after all. Also, I'd expect "." to be a part of a key much more often, than "/". And also it really doesn't matter what delimiter you use.


JSON is derived from JavaScript, it is not a strict subset.

The most glaring issue is JSON number type versus JavaScript float. This causes issues in both directions whereby people consistently believe JSON can't represent numbers outside the float range and in addition JSON has no way to represent NaN.


Is there any legal JSON that's not legal JavaScript?

If not, it's fair to say it's a subset.


It is a subset as of JavaScript edition ES2019, when JavaScript strings are now allowed to contain U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR. Prior to ES2019, that was the only known example of legal JSON that was not legal JavaScript


JavaScript used to forbid U+2028 line separator and U+2029 paragraph separator in string literals, but JavaScript now allows them for compatibility with JSON.

The remaining wrinkle is different handling of "__proto__"


It sort of is like a URL here's some more examples with both relative and absolute queries https://opis.io/json-schema/2.x/pointers.html


I would have expected paths to be an array of string/number (e.g. ["persons", 3, "name"]) because keys can be any string.

Such a document may not be wise, but how would you update something like:

    { "charCounts": { "/": 1234, "a": 100, ".": 99 } }


Maybe we're talking about different things, but resources in REST are identified by their URL and URLs use '/' to separate elements in the path.


When convertcsv.com converts JSON to CSV, it uses the / separator to store the structure in the column headings. Probably could use an option for . too.


Yeah, but nobody ever looked at

    {
      "a": {
        "b": {
          "c": []
        }
      }
    }
and thought "I need the list at /a/b/c"


If you've ever done XPath you do!


Yeah, wrote my own XPath-like extension methods to manipulate JSON just like that. Felt very natural and makes it quite easy to generate and process JSON for the cases serialization/deserialization isn't the best option.


Seems like a reasonable thing to me.

    {
      "makes": {
        "toyota": {
          "models: [ ... ]
        }
      }
    }
"I need all the models of Toyota cars."

Or

"Toyota came out with a new Camry, I need to update the Camry object within Toyota's models."


It quickly gets convoluted since Camry has been produced for more than 30 years with regular model refreshes throughout the years, e.g. 1997, 2003, 2025 etc. JSON Pointer quickly falls short since we now require a selection expression to figure out which year model we need to update in the array of models: «makes.toyota.models[?year == `2025`]» (using JMESPath).

Both, JSONPath or JMESPath, support query expressions whereas JSON Pointer does not.


Yeah, but that's makes.toyota.models, not /makes/toyota/models.

The point is that this is a data structure and not a web server. It's using a convention from one domain in a different one. Relatively minor in the scope of possible quibbles, but it's just one more thing to remember/one more trivial mistake to make. I'm sure much collective time will be lost to many people using dots where the spec says to use slashes, and not realizing it because it looks right in every other context with dots. Dots also makes copy/pasting easier, because you wouldn't have to migrate the format after pasting.


> The point is that this is a data structure and not a web server.

URIs are not only used on web servers though, they're all over the place, probably most notable your filesystem is using `/` as a path separator, so it wouldn't be completely out of place to use it as a path separator elsewhere.


Oh I see what you mean. I misunderstood, I also don't like slash as a separator.


Anecdotal, I did and I do. It's no different than a path on a filesystem.


But it's different than object notation in JS, and considering JSON stands for JavaScript Object Notation, I think dot notation would have been more appropriate for JSON Pointer (and by extension JSON Path). As a bit of a rebel myself, I use dot notation when describing a location in a JSON document, unless I'm forced to use the slash for something like JSON Pointer.


this was one of the biggest learning curves / adjustments tbh but once I got over that it's surprisingly powerful.

It tackles like 80% of cases


Extending the 80/20 analogy, how much additional efforts does the last 20% take here? The format seems efficient enough, but I'm wondering about the complexity trade-offs one can expect.


it's really good for re-organizing schemas from for example Notion collections to a simple flat CSV / Google Sheets schema and Airtable

It's really bad at creating deeply nested schemas though, (eg from CSV -> Notion you'll get trouble). It's also limited in things like renaming keys, splitting values into multiple keys (eg a csv string into an array) and those sorts of things


Yeah, it sucks. It was borrowed from XPath. And it's not appropriate for JSON.

https://en.wikipedia.org/wiki/XPath


Why is the path a string and not an array? That means you have to have some way to escape / in keys, and also you need to parse the path string. Parser in parser syndrome. Or otherwise it can't handle arbitrary JSON documents.


JSON pointer escapes slashes by encoding them as "~1", and tildes are escaped by encoding them as "~0". But I agree that using an array would have made much more sense. It would also have allowed the use of integers to disambiguate array indices from object keys that happen to be numbers, without having to parse the document to be patched.


Probably because JSON Patch was "influenced by" XML Patch.


I've only use JSON Patch once as a quick hack to fix a problem I never thought I would encounter.

I had built a quick and dirty web interface so that a handful of people we contracted overseas can annotate some text data at the word level.

Originally, the plan was that the data was being annotated in small chunks (a sentence or two of text) but apparently the person managing the annotation team started assigning whole documents and we got a complaint that suddenly the annotations weren't being saved.

It turned out that the annotators had been using a dial up connection the entire time (in 2018!) and so the upload was timing out for them.

We panicked a bit until I discovered JSON Patch and I rewrote the upload code to only use the patch.


> This pointer identifies the name field within the user object. You can refer to an array's entries by their index (ex. /user/friends/0 is Alice). The last element in an array ca be referenced using - (ex. /user/friends/- is John). If a property name contains ~, then inside the pointer it must be escaped using ~0, or if contains / then it must be escaped using ~1.

The biggest thing on my wishlist for a system like this is a standardized syntax for choosing an item in a list by an identifying (set?) of key-value pairs. E.g. for

  {
    "name": "Clark St Garden",
    "plants": [
      { "latin": "Phytelephas aequatorialis", year: 2009 },
      { "latin": "Ceiba speciosa", year: 2009 },
      { "latin": "Dillenia indica", year: 2021 }
    ]
  }
I'd like to be able to specify that I want to update Ceiba speciosa regardless of its index. This gets especially important if we're adding items or trying to analyze diffs of previous versions of a json item


Query by content reminds me of XPath, so I looked it up to see if there was a version for JSON...

Turns out there is https://www.ietf.org/archive/id/draft-goessner-dispatch-json...


Yeah, one option is to use a different content-type for your json-patch values and basically extend JSON Patch[1] to use JSON Path[2] instead of JSON Pointer[3].

[1] https://www.rfc-editor.org/info/rfc6902

[2] https://www.rfc-editor.org/info/rfc9535

[3] https://www.rfc-editor.org/info/rfc6901


I also want to be able to insert into the array after an item as opposed to at a specific index.

Biggest issue with JSON patch is its inability to handle even the simplest concurrent writes.


> Strengths: ... Idempotency: JSON Patch operations can be safely retried without causing unintended side effects.

So, wait, you can't add an item to an array with this (except at a predefined position)? I.e. "add" with path "/.../myarray/~" (if I've understand their notation right) isn't allowed?

I'm not sure if that's good or bad, but it's certainly surprising and could do with saying a bit more explicitly.


You can add to the end of an array by using the path /.../myarray/-, i.e., the index is replaced by a dash.

JSON patch is indeed not idempotent.


I am using JSONDiffpatch made by Benjamín Eidelman some years in production now. It is perfect, works in a browser and on a node/cloudflare worker/etc. How does JSON Patch compare to JSONDiffpatch? It is not mentioned in the alternatives list.

https://github.com/benjamine/jsondiffpatch


How does it do with array insertion? I didn't like how most diffs handle them so I smashed together two pieces of code I found elsewhere to get something I thought was better. https://github.com/kybernetikos/fogsaadiff


I remember using that package (and it's compatible .NET implementation) ages ago, glad to see it's still around and being maintained.

I remember testing out various libraries (not sure if a proper JSON Patch library was already around back then, looking at the spec I think it should...) and picking it over all the others because it handled complex objects and arrays way better than all the others.

Would also love to see how it compares.


Thanks for sharing, I'll try and find time to compare and write about it


A challenge I experienced with JSON Patch is convincing external parties it's worth learning. I used this in a customer-facing API at $PREVJOB and had to do a lot of evangelism to get our users to understand and adopt it. Most of our customers weren't tech shops, however, so the contracted engineering staff may have had something to do with it.


I would love to see some (optional) checksumming or orignal value in the protocol and a little more flexibility in the root node for other metadata like format versioning etc. rather than just the array of patch ops in the root.

```

{ "checksum": { "algorithm": "sha1", "normalization": "minify", "root-checksum": "d930e659007308ac8090182fe664c7f64e898ed9" }, "patch": [ { "op": "replace", "path": "/id", "node-checksum": "b11ee5e59dc833a22b5f0802deb99c29fb50fdd0", "value": { "foo": "bar", "nullptr": 0 } }, { "op": "replace", "path": "/cat", "original-value": "foo" "value": "bar" } ] }

```


This is what "op": "test" is for. You can use it at the beginning of a patch to verify that the server's object hasn't drifted from your own.


Good point, thanks. Specified here: https://datatracker.ietf.org/doc/html/rfc6902#section-4.6

Yet, I would still like a checksum option because of its constant size impact.

I didn't consider before though that checksumming would differ from json's understanding of equality if items are reordered (objects, but also some other cases like integers with explicit sign +0, 0, -0). Those cases could (optionally) be considered in a normalization step.


I'm probably missing a use case here, but with the JSON Pointer spec they use feeling so "URL-y", couldn't you skip the whole meta-json syntax? So rather than doing

    HTTP/1.1 PATCH /user
    Content-Type: application/json-patch+json

    [
      {
        "op": "replace",
        "path": "/username",
        "value": "bob"
      }
    ]
why not

    HTTP/1.1 PATCH /user/username
    Content-Type: application/json

    "bob"
I feel like you could pretty sensibly implement replace/delete/append with that.

Edit: "test" and "copy" from the json patch spec are unique! So there is those, as well as doing multiple edits all at once in a sort of "transaction".


And then you'd be limited to only one change at the time and lose the benefit of making lot of changes with one request.


I do get that, just saw the "test" op, to either pass or fail the whole change as a sort of transaction. That is really neat.

But I just find that the 1 by 1 approach is easier to reason about if you're opening this up to the internet. I'd personally feel more comfortable with the security model of 1 URL + Session => 1 JSON key.


I'm actually using it at the moment in my work and I'm often doing 3-4 updates per patch.

You want them to all fail or not,

One-by-one is a bit of a weird suggestion tbh. You shouldn't be reasoning that way about code.

If you are going to get a 4xx response to one of the 4 property updates you want them all to fail at once.

Just like anything else we use like SQL.


$WORK project heavily utilizes the test op to enable atomic updates to objects across multiple competing clients. It winds up working really well for that purpose.


Who generates the test op? Client? Or the server?


Can you elaborate on how this affects the security model?


IMHO, the "JSON patch" concept is useful in contexts that have nothing to HTTP, just like having a "diff" format for files.


You might like JSON Merge Patch then - much simpler syntax that avoids the URL stuff


What about auto-formatting the json and sorting all keys, to create some kind of canonical form? Then we can use standard textual patch.


Standard text patches (diffs) are great because they work with all text files but for a specific representation like JSON you can do a lot better. In terms of code volume it's a lot lighter to find a node based on a json path than applying a diff.


The article suggests json patch is idempotent, but it isn’t idempotent for array mutations. JSON merge patch on the other hand is fully idempotent (in the array case the array is replaced).


Indeed it does not seem idempotent. If you move A to B and insert a new A, then rerunning the patch would yield the new A at B.

I suppose a subset of this is idempotent though.

Broadly this seems fraught with peril. It sounds like edge case upon edge case, and only would work in the narrow case where you are 100% sure exactly what the remote document looks like such that you can calculate the patch. If anything gets out of sync, or serialization differs between local and remote, etc youre going to get subtle bugs...


Similar to how MongoDB update queries work. Natural, since documents there are essentially JSON.

Does anyone know if it’s possible to use that update language on local files, without having a full mongodb running?

https://www.mongodb.com/docs/manual/reference/operator/updat...


We use MongoDB for data storage, so using JSON patch in the PATCH endpoints feels very natural. Just fetch the Mongo document and apply patch. Kind of slow, though.


There is https://github.com/kofrasa/mingo for in-memory objects


I first got excited reading the article, but the following left me thinking that I won't use it:

  Weaknesses
  Maintenance Costs: As APIs evolve, the paths specified in JSON Patches might 
  become obsolete, leading to maintenance overhead.
Then again, if a path becomes obsolete, it means that I would be about to update with a wrongly formatted document anyway.


Never liked it. Ignores the wonderful fact that javascript's type system natively distinguishes undefined from null values.

{ "name": "bob", "phone" null }

This would set the name to bob, null out the phone, but leave all other fields untouched. No need for a DSL-over-json.

Only trouble is static type people love their type serializers, which are ever at a mismatch with the json they work with.


> javascript's type system natively distinguishes undefined from null values.

JSON is not JavaScript (despite the "J"), and `undefined` is not a part of JSON specification.

However, I think every language out there that has an dictionary-like type can distinguish between presence of a key and absence of one, so your argument still applies. At the very least, for simple documents that don't require anything fancy.

I believe this simple merge-based approach is exactly what people are using when they don't need JSON Patch. If you operate on large arrays or need transactional updates, JSON Patch is probably be a better choice, though.

> Only trouble is static type people love their type serializers, which are ever at a mismatch with the json they work with.

I don't think it's a type system problem, unless the language doesn't have some type that is present in JSON and has to improvise. Typically, it's rather a misunderstanding that a patch document (even if it's a merge patch like your example) has its own distinct type from the entity it's supposed to patch - at the very least, in terms of nullability. A lot of people (myself included) made that blunder only to realize how it's broken later. JSON Patch avoids that because it's very explicit that patch document has its own distinct structure and types, but simple merge patches may confuse some.


I'm working on something right now that has a need to add/remove a few items to a very large array. (Not merely updating properties of an existing Object.) I ran across JSON Patch as a solution to this but ended up implementing just the part from it that I actually needed. (The "add" and "remove" operators.)

The alternative is the modify the large array on the client side and send the whole modified array every time.


You're looking for JSON merge patch which they briefly mentioned https://www.rfc-editor.org/rfc/rfc7386


I hadn't heard of this, but it's nice to see that it's standardized (although I'm not really prepared right now to evaluate that standard properly).

Some time ago (9 years, apparently - time sure does fly), I made a thing to describe patches to a binary using JSON (https://github.com/zahlman/json_bpatch). It was meant primarily for hacking content into someone else's existing binary file, and I spent way too much time on fancy algorithms for tracking "free" (safely modifiable, based on the user's initial assessment and previous patches) space and fitting new content into the remaining space. Overall, I consider it a failure - but a fair amount of this project DNA is likely to survive in future projects.

I also had the idea at some point to make some kind of JSON diffing tool that works at a JSON-structure level instead of a textual level. I guess I don't need to reinvent that wheel.


There is a more important concept at play here: structured data access as first class citizen. We have pointers and reference but we don't have anything for making more than one jump. I'd argue that about 30% of functional programming is just about accessing stuff.

I was trying to extend Python's dataclasses.replace() function to be able to replace deeply nested elements. In vanilla Python this function is used as obj = replace(obj, element1=newval, element2=newval2). Meaning that to replace nested readonly dataclasses this must be called recursively.

Implementing a sequence of objects (var1.var2.var3) was mostly easy, but playing around the [] operator for sequence or mappings or slices was filled with edge cases. And implementing edge cases is a hornet nest just in itself.

Functions as first class citizens is one of the most useful concept of make dynamic code that remains readable. The next step would be structured accessors as first class citizens.


I'm working on a project using CRDTs (Yjs) to generate efficient diffs for documents. I could probably use JSON Patch, but I worry about relying on something like fast-json-patch to automatically generate patches by diffing JSON documents.

If I have json like

[{"name": "Bob", age: 22}, {"name": "Sally", age: 40}]

and then I delete the "Sally" element and add "Jenny" with the same age, I end up with

[{"name": "Bob", age: 22"}, {"name": "Jenny", age: 40}]

However, the patch would potentially look like I changed the name of "Sally" to "Jenny", when in reality I deleted and added a new element. If I'm submitting this to a server and need to reconcile it with a database, the server cares about the specifics of how the JSON was manipulated.

I'd end up needing some sort of container object (like Yjs provides) that can track the individual changes and generate a correct patch.


I'm currently using json patch in a work project for undo/redo. The patches and their inverse are being collected by `immer`, which can hand you a proxy and record changes made to it.

My plan was to dig up an operational transform implementation on json patches to implement collaborative editing when we decided we need that feature (we don't need fine grained editing of text, but we do have arrays of objects).

I'm now evaluating automerge - it seems to be performant enough and not likely to disappear.

(I can implement OT, but I won't be on this project forever and need to leave something that less senior devs can work with and maintain.)


I confess to counting soft deletes until i have enough of them.

[{"name": "Bob", age: 22}, {"name": "Sally", age: 40, "deleted":"2024-10-18 12:34:56"}]


Just add some unique IDs to your records.


One of the biggest drawbacks of jsonpatch is not being able to patch an associative array (e.g. arrays with a unique key element).

This is most obvious in Kubernetes, where you have a list property like:

    "conditions": [{"type":"a", "status":"b"}, {"type":"c", "status":"d"}, ...]
and you can't really create a patch like "change the conditions[type="a"].status to foo". As a result, you usually end up doing a whole field patch.


I used this standard a long time ago to make a simple server -> client reactive state for a card game by combining it with Vue's reactive objects to get something like a poor man's CRDTs. This is a rough diagram of what it looked like: [1]. Although there was a reducer step on the server to hide private state for the other player, it's still probably wildly impractical for actual games.

[1] https://user-images.githubusercontent.com/50021387/184360079...


JSON Patch doesn't let you represent insertions or deletions to strings. You can only replace them. This makes it useless for collaborative text editing. Thus, we can't use it in Braid-HTTP: https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-b...


For anyone who wants to try this in Python, `yyjson`[1] supports both JSON Patch (RFC 6902) and JSON Merge-Patch (RFC 7386)[2]

[1]: https://github.com/tktech/py_yyjson [2]: https://tkte.ch/py_yyjson/api.html#yyjson.Document.patch


While I acknowledge that JSON Patch can be useful in certain contexts, I find that there are far better alternatives, particularly for the scenarios I encounter. Specifically, when using kustomization.yaml to generate slightly different Kubernetes manifests for various environments (dev/staging/production), tools like jsonnet offer superior functionality and flexibility.


Ooh goodie, I wonder what the next popular data format is going to be. I want to be the first to re-invent XSDs and XSLTs for that one as well!


XML didn't invent the concept of patching files either. :)


Check out JSON Merge Patch


I always found that a simple merge style patch is enough, and a lot simpler. You just have to know that there is a difference between null and undefined. For large arrays, it's usually some sort of bulk operation that can be solved by an array of patch/delete requests, which is both safer and simpler. Maybe I've just not hit a proper use case for this yet


Really sucks we don’t have the JQ/JS notation style for paths in this standard I think it would have made it much more accessible


The pros/cons section is giving ChatGPT to me


The RFC 6902 - JavaScript Object Notation (JSON) Patch standard is also used by the AWS Cloud Control API:

https://docs.aws.amazon.com/cloudcontrolapi/latest/APIRefere...


I imagine something like so:

   {
   "delete":["123-234","567-700"],

   "insert":[123,456],

   "substrs":[{"foo":"bar"},[]]
   }
Then delete the ranges from the string, convert the substrs to strings and insert them at the offset.


I'm not sure I understand what you mean, but it almost sounds like something we already have a single verb for: "splice". It means "replace this range with these values":

  z = ['a', 'b', 'c', 'd', 'e', 'f']
  z.splice(2, 3, 'w', 'x', 'y', 'z')

  Array(7) [ "a", "b", "w", "x", "y", "z", "f" ]
Unfortunately it's not on strings in javascript, but the verb has basically that meaning even outside of programming and could work here too.

I don't see how the values in your example would work though.


That seems to be it.

You compare the old string with the new one. What should be deleted is marked for deletion. Say char 10 to 30. Then the insert part points at spots where new stuff should be inserted.

Commas are left as an exercise for the backend.


Funny enough, it makes me think about iTop's way of patching XML files: https://www.itophub.io/wiki/page?id=latest:customization:xml...


This declarative approach is always limited compared to an actual programming language (e.g. JS), and so over time things get bolted on until it’s no longer truly declarative, it’s just a poor man’s imperative language. A good example is HCL which is a mess.

What about just stringifying a JS function?


This is essentially the RPC vs REST debate. Do you want your API to be a schema of data types (REST), or a list of function signatures (RPC)?


You raise a good point, but REST isn't quite declarative, and RPCs isn't quite what I'm proposing.

REST is great but I wouldn't exactly call it declarative, it relies on HTTP verbs, plus there are times when you simply can't express things with pure REST so usually you break the textbook pattern with something like POST /resource/doSomething.

As for RPCs, you have to predefine the operations you want to expose as part of your schema, which is even less flexible than what I'm proposing. Imagine for example that instead of passing a JSON Patch string you'll pass a real JS arrow function (with some limitations, e.g. no closure state). It allows for max flexibility, the user doesn't have to learn any APIs, plus the implementation becomes trivial. It's kind of like SQL without inventing a new language.


This basically is a procedural DSL for patching, built on JSON. Which gives me an idea.

What if the client supplied actual code to do the update? I'm thinking something sort of like ebpf in the kernel - very heavily restricted on what can be done, but still very flexible.



Exactly this. See also the profusion of filter expression languages, eg here's Algolia's: https://www.algolia.com/doc/guides/managing-results/refine-r...

Or here's one built into Symfony: https://symfony.com/doc/current/components/expression_langua...


I don't think Roy Fielding's dissertation included a solution to the halting problem.


So how do you validate the data? You can apply all the changes to existing record and validate the result, but then you need to put everything in memory. Verifying the operations however sounds dangerous... Any pointers?

Also, if someone is using this in production: any gotchas?


If you are using Java, you may want to check out the library I created for American Express and open sourced, unify-jdocs - it provides for working with JSON documents outside of POJOLand. For validations, it also has the concept of "typed" document using which you can create a document structure against which all read / writes will be validated. Far simpler and in my opinion as powerful as JSONSchema. https://github.com/americanexpress/unify-jdocs.


The approach I've generally seen used is that you have a set of validation that you apply to the JSON and apply that to the results of the patch operation.

You probably want to have some constraints on the kinds of patch objects you apply to avoid simple attacks (e.g. a really large patch, or overly complex patches). But you can probably come up with a set of rules to apply generally to the patch without trying to validate that the patch value for the address meets some business logic. Just do that at the end.


JSON Patch solves real problem, but when you have that problem you wish you didn't have it. Often you can split the document in smaller subdocuments that are easier to work with.


We use JSON patch for all PATCH endpoints in our .NET APIs. The only downside I see is slowness if you need to patch some tens of thousands of resources.


Judging from the tone, is this article written by AI?


This is a problem much easier solved when you allow yourself to diverge your representations of object creation, fetching, and patching.


As I understand it, JSON works well as an interchange format, for ephemeral messages to communicate between two entities.

For JSON Patch to be useful and efficient, it requires both sides to use JSON for representation. As in, if you have some native structure that you maintain, JSON Patch either must be converted into operations on your native structure, or you have to serialize to JSON, patch, and deserialize back to the native structure. Which is not efficient, and so either you don't use JSON Patch or you have to switch to JSON as your internal representation, which is problematic in many situations. (The same arguments apply to the sending side.)

Not only that, but you become dependent on the patch to produce exactly the same result on both sides, or things can drift apart and later patches will start failing to apply, requiring resynchronization. I would probably want some sort of checksum to be communicated alongside the patch, but it would be tough to generate a checksum without materializing the full JSON structure. (If you try to update it incrementally, then you're back to the same problem of potential divergence.)

I mean, I can see the usefulness of this: it's like logical vs physical WAL. Or state- vs operation-based CRDTs. Or deltas vs snapshots. But the impact on the representation on both sides bothers me, as does the fact that you're kind of reimplementing some database functionality in the application layer.

If I were faced with this problem and didn't already represent everything as giant JSON documents (and it's unlikely that I would do that), I think I'd be tempted to use some binary format that I could apply the rsync algorithm to in order to guarantee a bit-for-bit identical copy on the other side. Basically, hand the problem off to a fully generic library that I already trust. (You don't have to pay for lots of round-trip latencies; rsync has batch updates if you already know the recipient has an exact matching copy of the previous state.) You still have to match representations on the sending and receiving side, but you can use any (pointer-free) representation you want.


Isn't this easy though? Just don't over-use OOP. Structured data can also be stored as just structured data.

> I think I'd be tempted to use some binary format

And now you require both sites to use a binary format for representation. And then you have the same list of challenges.


> Structured data can also be stored as just structured data.

Fair point. I probably overstated the weaknesses of the JSON model; it's fine for many uses.

But I like Map, and Set, and occasionally even WeakMap. I especially like JS objects that maintain their ordering. I'm even picky enough to like BigInts to stay BigInts, RegExes to stay RegExes, and graphs to have direct links between nodes and be capable of representing cycles. So perhaps it's just the nature of problems I work on, but I find JSON to be a pretty impoverished representation within an application -- even with no OOP involved.

It's great for interchange, though.

>> I think I'd be tempted to use some binary format

> And now you require both sites to use a binary format for representation. And then you have the same list of challenges.

Requiring the same format on both sides is an important limitation, but it's less of a limitation than additionally requiring that format to be JSON. It's not the same list of challenges, it's a subset.

Honestly, I'm fond of textual representations, and avoid binary representations in general. I like debuggability. But I'm imagining an application where JSON Patch is useful, as in something where you are mutating small pieces of a large structure, and that's where I'd be more likely to reach for binary and a robust mirroring mechanism.


Weird, you work at Mozilla and ignore JSON in DBs is a thing (and has been for 15+ years).

Anyway, a few resources to help you learn:

https://firebase.google.com/docs/firestore

https://www.mongodb.com/

https://www.postgresql.org/docs/current/datatype-json.html


To the substantive comment: if your JSON is in a DB, then the DB can do whatever fancy thing it can come up with in order to replicate changes. Databases are good at that, and a place where the engineering effort is well-spent. Both sides of the JSON Patch communication can then read and write to the DB and everything will be fine -- but there'll be no need for JSON Patch, no need for any application-level diff generation and application at all.

As for working at Mozilla: oh, it's worse than that, I'm the one who did the most recent round of optimization to JSON.stringify(). So if you're using Firefox, you're even more vulnerable to my incompetence than you realized.

Furthermore, I'll note that for Mozilla, I do 90% of my work in C++, 10% in Python, and a rounding error in JS and Rust. So although I work on a JS engine, I'm dangerously clueless about real-world JS. I mostly use it for fun, and for side projects (and sometimes those coincide). If you expect JS engine developers to be experts in the proper way to use JS, then you're going to be sorely disappointed.

That said, I'd be interested to hear a counterargument. Your argument so far agreed with me.


The point is, 99% of cases where JSON is used, it is already:

* Agreed by both parties to be the protocol they'll use

* Used for "representation" (assuming we mean the same by that, if not, please clarify)

>So if you're using Firefox

I jumped ship ten years ago; but I've heard you guys are doing quite well?

>I'm dangerously clueless about real-world JS.

Agree.

Disclosure: I'm @moralestapia but my laptop ran out of battery and this is my backup account, lol.


Reminds me of devicetree overlays :-)


Where are the recommended diff and 3-way merge libraries for each language?


Meh. I get it.

It’s just so bassackwards to use an imperative, indirect form to describe state, even if it’s just state changes.

Maybe simply specify the new state?


You can also use general purpose compressors like Zstandard to create a generic patch:

    zstd --patch-from old.json new.json -o patch.zst
    zstd --patch-from old.json -d patch.zst -o updated.json


And last I tested, this was more performant and space efficient than JSON Patch...


JSON-patch is not a generic patch. It's not the same thing.


JSON is so simple that stuff like this isn’t a disasterpiece. I’m a fan and definitely need to keep this in mind next time I want to keep an edit history on a json-based dataset.


it's all fun and games until you need to reorder an array


the paranoid feeling that what you just read was LLM generated


especially the weaknesses section reads like LLM output


i just want to patch an array of objects by matching a key...


What’s nice about JSON is that it’s actually valid JavaScript, with some formal specification to avoid any nasty circles or injections.

Why can’t your protocol just be valid JavaScript too? this.name = “string”; instead of mixing so many metaphors?


Because that would require consumers to have a Javascript interpreter to use it.


Because that would require consumers to have an interpreter for the most widely deployed language, ever, and by far.

FTFY


security nightmare; sometimes you don't want consumers to execute code arbitrarily


This is what makes Tcl great as a data interchange format. It comes with a safe mode for untrusted code and you can further restrict it to have no control flow commands to be non-Turing.


Not true. Google, Meta, ... do it at a massive scale, no issues.

It's not really hard to protect yourself against that.

Any (competent) security guy can give you like 4 ways to implement it properly.


I am a (hopefully competent) security guy, please don't run arbitrary code if you can help it. Especially for something as trivial as JSON patching.


Do you mean the ads they serve that contain malware?


Ok hear me out, what if my API accepts WASM fragments that I run against my database but in a sandbox!


Nah, in that case Python would be a better option as it's already installed everywhere.


That is so derangedly untrue.


Starlark is a nice embeddable scripting language, though. Java, Go, and Rust implementations: https://github.com/bazelbuild/starlark/blob/master/users.md#...


But what's your point? Would you truly want consumers of JSON Patch data to embed a JS interpreter?


My point is that the JS interpreter is likely already there.


only if you think of JSON in the context of a browser. JSON is used as serialized representation of objects in embedded systems, config files, etc. where a JS interpreter is unnecessary, absent or unwanted (size, security, platform preferences, ...)


> Why can’t your protocol just be valid JavaScript too?

It is.


It’s delivered in JSON, but you need an interpreter. But the actions are just JS assignment statements and a little glue. Your interpreter could as easily handle that, and with far less bytes. Why call a member variable /name when it’s already .name?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: