They reference Protocol Buffers as a source/definition of a "schema" at the bott...

com2kid · on Nov 13, 2020

> They reference Protocol Buffers as a source/definition of a "schema" at the bottom and IMHO they're pretty rotten as schema definitions.

A past team of mine went from "raw C structures over the wire consumed by 4 different implementations on 4 platforms" to protobufs and it was a huge improvement.

One schema definition shared by all the teams was an end to the nightmare of some developer putting a field in the wrong order and burning debugging time trying to figure what was going on. Or even better, that bug we hit in .NET that, even though we explicitly defined the struct, size, and ordering, the compile reordered one of our fields and we couldn't find a way around the bug so we had to make changes to the structure on all platforms so the bug didn't exhibit itself.

Or the great lengths we went through to avoid making any sort of breaking changes. We didn't originally pad our structures out more than an occasional handful of bytes (embedded, constrained storage and slow transfer speed over BTLE) so we quickly ran into all sorts of horrid issues.

Life was much better with protobufs.

Our main complaint was the lack of unsigned types. That sucked.

We didn't use any of the RPC stuff, we just used it for the schema and as a binary data format. Worked great, would recommend any day over raw C structs.

SV_BubbleTime · on Nov 14, 2020

Funny, I hated Protobufs in C. The generated code felt bloated and I was never at all happy with the memory management. We used NanoPB. This was on embedded so that might be a different perspective than yours.

com2kid · on Nov 15, 2020

We also used NanoPB, in both C and C++.

I am pretty sure we used preallocated buffers, so memory management wasn't an issue.

Being able to have a set of definitions that works on all major mobile platforms was all sorts of nice.

rubyn00bie · on Nov 14, 2020

Aye, I think (not that my opinion is worth a shit) y'all are using protobufs correctly assuming I'm grokking your use case correctly. I'm mostly focusing on discussions moving from JSON to protobufs... or essentially any problem space where serializing/de-serializing your data types isn't a ever going to be an issue (assuming one is not totally negligent).

Most of these articles focusing on protobufs as schema definitions are able to use gzipped JSON just fine. Their only reason to use protobufs is for schema definitions because they believe the "type system" will help them enforce constraints, validations, and or enable consistency across application boundaries.

The binary format, specification, and platform independence are completely irrelevant for these "schema" definition scenarios being brought up on HN constantly... and yet, they should be the things at the top of the list if you need protobufs and having a better more robust schema definition language should be damn near the bottom.

I think I just figured out a way to sum up my protobuf feelings (so, sorry for the late tldr):

Protobufs are a contract for serializing/deserializing data structures NOT enforcing validations/contraints of those data structures.

If you're using 'em as a serialization contract: fuck yeeeeeeaaaaah. If you're trying to use them to improve validations/constraints then: fuck naaaaaaah.

playing_colours · on Nov 14, 2020

They removed mandatory / optional in Protobuf version 3, and this rendered it useless for Confluent Kafka schema registry at our company.

I read the explanation for this change - to be more flexible about breaking changes, and while it may make sense for some cases, we could not rely on Protobuf 3 in event driven architecture with stricter requirements to data consistency. We went with Avro.

rubyn00bie · on Nov 14, 2020

I've been looking at Avro for a while; how are you all feeling about it? Any suggestions or gotcha from real world usage?

playing_colours · on Nov 16, 2020

So far so good. Avro is the longest supported serialisation format in Confluent Kafka, recently JSON and Protobuf were added. If you are on JVM stack, the drivers to work with schemas are well supported. We use Python, and Confluent driver is lagging behind if you want some advanced stuff like supporting Avro unions for multiple event types per topic approach [1] and missing auto-resolution for schemas for that scenario in Avro deserialiser. It is not difficult to implement by ourselves, but I would prefer not to do it.

[1] https://www.confluent.io/blog/multiple-event-types-in-the-sa...

quantified · on Nov 14, 2020

It's a vendor pitch, so take it all with a nice grain of salt.

Protocol buffers have large problems in their own way. Just because Google produced them does not mean they are the right choice for any broader adoption than they already have.

See https://news.ycombinator.com/item?id=18188519 for example, and note that the the discussion on personal vs technical aspects has already happened.

I agree with the idea of schemas for API definition. JSON and XML are more transport-level, lacking major semantics that must be enforced in the software, Therefore, schemas need to be expressed with language bindings.

Protocol buffers have good traction here because of the investment that Google has made in IDL with multi-language bindings. There are other serialization formats as well with many language bindings, but investment in the IDL needs to be made. The OP is drafting off of Google.

So, yes to the thesis and no to one of the conclusions.