They reference Protocol Buffers as a source/definition of a "schema" at the bottom and IMHO they're pretty rotten as schema definitions.
There are no meaningful validations or constraints, properties/members can be missing/omitted, cross-language support is pretty rough when you need more advanced features like annotations to fill-the-gap caused by crappy types.
Moving to protocol buffers is just pushing the problem out a while until we end up back here. Which is to say; a place where we just need to stop bike shedding how we define our fucking data. To quote the ineffable R. Sanchez:
"Sometimes I can't even tell you apart, because I don't go by height or age, I go by amount of pain in my ass."
I feel the same about all of these alternatives to HTTP APIs. I can only differentiate them by the amount of pain in my ass they cause; and on that front they're all the same.
> They reference Protocol Buffers as a source/definition of a "schema" at the bottom and IMHO they're pretty rotten as schema definitions.
A past team of mine went from "raw C structures over the wire consumed by 4 different implementations on 4 platforms" to protobufs and it was a huge improvement.
One schema definition shared by all the teams was an end to the nightmare of some developer putting a field in the wrong order and burning debugging time trying to figure what was going on. Or even better, that bug we hit in .NET that, even though we explicitly defined the struct, size, and ordering, the compile reordered one of our fields and we couldn't find a way around the bug so we had to make changes to the structure on all platforms so the bug didn't exhibit itself.
Or the great lengths we went through to avoid making any sort of breaking changes. We didn't originally pad our structures out more than an occasional handful of bytes (embedded, constrained storage and slow transfer speed over BTLE) so we quickly ran into all sorts of horrid issues.
Life was much better with protobufs.
Our main complaint was the lack of unsigned types. That sucked.
We didn't use any of the RPC stuff, we just used it for the schema and as a binary data format. Worked great, would recommend any day over raw C structs.
Funny, I hated Protobufs in C. The generated code felt bloated and I was never at all happy with the memory management. We used NanoPB. This was on embedded so that might be a different perspective than yours.
Aye, I think (not that my opinion is worth a shit) y'all are using protobufs correctly assuming I'm grokking your use case correctly. I'm mostly focusing on discussions moving from JSON to protobufs... or essentially any problem space where serializing/de-serializing your data types isn't a ever going to be an issue (assuming one is not totally negligent).
Most of these articles focusing on protobufs as schema definitions are able to use gzipped JSON just fine. Their only reason to use protobufs is for schema definitions because they believe the "type system" will help them enforce constraints, validations, and or enable consistency across application boundaries.
The binary format, specification, and platform independence are completely irrelevant for these "schema" definition scenarios being brought up on HN constantly... and yet, they should be the things at the top of the list if you need protobufs and having a better more robust schema definition language should be damn near the bottom.
I think I just figured out a way to sum up my protobuf feelings (so, sorry for the late tldr):
Protobufs are a contract for serializing/deserializing data structures NOT enforcing validations/contraints of those data structures.
If you're using 'em as a serialization contract: fuck yeeeeeeaaaaah. If you're trying to use them to improve validations/constraints then: fuck naaaaaaah.
They removed mandatory / optional in Protobuf version 3, and this rendered it useless for Confluent Kafka schema registry at our company.
I read the explanation for this change - to be more flexible about breaking changes, and while it may make sense for some cases, we could not rely on Protobuf 3 in event driven architecture with stricter requirements to data consistency. We went with Avro.
So far so good. Avro is the longest supported serialisation format in Confluent Kafka, recently JSON and Protobuf were added. If you are on JVM stack, the drivers to work with schemas are well supported. We use Python, and Confluent driver is lagging behind if you want some advanced stuff like supporting Avro unions for multiple event types per topic approach [1] and missing auto-resolution for schemas for that scenario in Avro deserialiser. It is not difficult to implement by ourselves, but I would prefer not to do it.
It's a vendor pitch, so take it all with a nice grain of salt.
Protocol buffers have large problems in their own way. Just because Google produced them does not mean they are the right choice for any broader adoption than they already have.
I agree with the idea of schemas for API definition. JSON and XML are more transport-level, lacking major semantics that must be enforced in the software, Therefore, schemas need to be expressed with language bindings.
Protocol buffers have good traction here because of the investment that Google has made in IDL with multi-language bindings. There are other serialization formats as well with many language bindings, but investment in the IDL needs to be made. The OP is drafting off of Google.
So, yes to the thesis and no to one of the conclusions.
There are no meaningful validations or constraints, properties/members can be missing/omitted, cross-language support is pretty rough when you need more advanced features like annotations to fill-the-gap caused by crappy types.
Moving to protocol buffers is just pushing the problem out a while until we end up back here. Which is to say; a place where we just need to stop bike shedding how we define our fucking data. To quote the ineffable R. Sanchez:
"Sometimes I can't even tell you apart, because I don't go by height or age, I go by amount of pain in my ass."
I feel the same about all of these alternatives to HTTP APIs. I can only differentiate them by the amount of pain in my ass they cause; and on that front they're all the same.