What's the issue with protobufs?

jahewson · 2024-01-12T22:23:30 1705098210

I used protobufs for a short while and came to the realization that they’re just Go’s opinionated idioms forced on other languages via awkward SDKs. Particularly did not like having to use codegen, rely on Google’s annotations for basic functionality or deal with field masks that are a sort of poor man’s GraphQL.

I get it, Google made trade offs that work for them, and I agree with their position - but for someone at a smaller company working in a non-Go/Java/C programming language it was just a ton of friction for no benefit.

secondcoming · 2024-01-13T02:00:40 1705111240

protobufs have been around a lot longer than Go

neonsunset · 2024-01-13T16:49:12 1705164552

Annotations are somewhat cursed agree. The code generation part does not have to be painful though.

In fact, the tooling in Go isn't even an example of the easiest way to do it and requires you to do more steps than for example .NET where getting server boilerplate, fully working client or just generated POCO contracts from .proto boils down to

    dotnet add package Grpc.Tools
    <Protobuf Include="MyApiContracts.proto" /> (in .csproj)

hinkley · 2024-01-13T18:17:28 1705169848

Are you old enough to remember SOAP? Is this drawing any “history repeats” sentiments for you?

skywhopper · 2024-01-12T23:11:46 1705101106

So many issues, primarily the non-idiomatic (even in golang!) code generation and the requirement for special tooling to do any troubleshooting, and after all that you still don't end up with anything particularly interoperable.

At best it works tolerably in a monorepo with tightly controlled deployment scenarios and great tooling.

But if you don't have a Google-like operations environment, it's a lot of extra overhead for a mostly meaningless benefit.

reindeerer · 2024-01-12T19:17:56 1705087076

The first issue is that protobufs arent a standard. That inherently limits anything built on top of them to not be a standard either, and that limits their applicability

Also depending on the environment you run in, can code size bloat vs alternatives can matter

tonyarkles · 2024-01-12T21:17:16 1705094236

> Aren’t a standard

You mean like an IETF standard? That is true, although the specification is quite simple to implement. It is certainly a de-facto standard, even if it hasn’t been standardized by the IETF or IEEE or ANSI or ECMA.

> inherently limits anything built on top of them to not be a standard either

I’m not sure that strictly follows. https://datatracker.ietf.org/doc/html/rfc9232 for example directly references the protobuf spec at https://protobuf.dev/ and includes protobufs as a valid encoding.

> depending on the environment

I’ve had several projects that ran on wimpy Cortex M0 processors and printf() has generally taken more code space in flash than NanoPB. This is generally with the same device doing both encoding and decoding.

If you’re only encoding, the amount of code required to encode a given structure into a PB is very close to trivial. If I recall it can also be done in a streaming fashion so you don’t even need a RAM buffer necessarily to handle the encoded output.

Do I love protobufs? Not really. There’s often some issue with protoc when running it in a new environment. The APIs sometimes bother me, especially the callback structure in NanoPB. But it’s been a workhorse for probably 15 years now and as a straightforward TLV encoding it works pretty darned well.

withinboredom · 2024-01-12T23:36:55 1705102615

Sounds like Stockholm Syndrome. I've work mostly with JSON/CSV/Thrift in the last 10 years, and xml/soap before that, and just recently started interacting with protobuf, so I'd disagree that it is a "de-facto standard."

My largest complaint: observability. With almost literally any other protocol, if you can mitm on the wire, your human brain can parse it. You can just take a glance at it and see any issues. With grpc/pbuf ... nope. not happening.

Also, I really don't like how it tries to shim data into bitmasks. Going back to debugging two systems talking to each other, I'm not a computer. Needing special tooling just to figure out what two systems are saying to each other to shave a quarter of a packet is barely worth it, if at all, IMHO.

FridgeSeal · 2024-01-13T01:07:11 1705108031

> You can just take a glance at it and see any issues. With grpc/pbuf ... nope. not happening.

Sure, but on the other hand, the number of times I’ve needed to do this, compared to JSON/string/untyped/etc systems is precisely zero. There’s just a whole lot of failure that are just non-issues with typed setups like protobufs. Protobuf still has plenty of flaws and annoying-google-isms, but not being human readable isn’t one of them IMO.

hinkley · 2024-01-13T18:27:00 1705170420

I haven’t worked with protobufs, but I’m old enough to know why people thought they needed protobufs (because hand writing terse wire protocols is painfully dumb).

Be careful about “need”. When people are avoiding doing something painful they invent all sorts of rationalizations to try to avoid cognitive dissonance. You don’t reach for to tool that hurts to pick up. You reach for something else, and most do it subconsciously.

Nobody is going to try to read protobuf data. Doesn’t mean they don’t need to understand why the wire protocol fucked up.

withinboredom · 2024-01-13T12:52:36 1705150356

Nothing is preventing a system from sending you a un-deserializable message disguised as a protobuf, just like with any other encoding. In these cases, you need to diagnose the issue, no? Having something human-readable is simply straightforward.

If you haven't needed to do this, perhaps you aren't working on big enough systems? I've primarily needed to do this when dealing with hundreds of millions of disparate clients, not so much on smaller systems.

FridgeSeal · 2024-01-15T04:06:10 1705291570

> Nothing is preventing a system from sending you a un-deserializable message disguised as a protobuf,

I guess it depends on where you come down on Postel’s law. If you’re an adherent, and are prepared to be flexible in what you accept, then yeah, you will have extra work on your hands.

Personally, I’m not a fan of Postels law, and I’m camp “send correct data, that deserializes and upholds invariants, or get your request rejected”. I’ve played enough games with systems that weren’t strict enough about their boundaries and it’s just not worth the pain.

withinboredom · 2024-01-15T06:49:57 1705301397

When you have hundreds of millions of clients, there’s a good chance the client thinks it’s sending you the right data (especially when the customer says everything looks right on their end). You need to figure out if there is packet corruption (misbehaving switch), an outdated client, an egress load balancer screwing with packet contents, an office proxy trying to be smart, etc.

This requires looking at what is going over the wire in a lot of cases. It has nothing to do with Postel’s Law, but more to telling a customer what is wrong and making your own software more resilient in the face of the world.

hinkley · 2024-01-13T18:22:29 1705170149

Please, please, to anyone following this thread:

Implement a human readable protocol, then use standardized streaming compression on the wire to get your message size down. Something LZ family because there are tools everywhere that speak them. And consider turning off transport encoding for local development.

Being able to scan data saves so much time on triage. And using zgrep and friends on production data is almost as easy. You will spend tons of effort trying to make something 10% more efficient than zlib or for certain zstd, and the cost is externalized onto your team.

arccy · 2024-01-13T19:48:45 1705175325

human readable == inefficient parse and processing

hinkley · 2024-01-14T08:03:03 1705219383

It’s IPC. I think are vastly overestimating the local processing overhead.

arccy · 2024-01-14T13:02:16 1705237336

there's overhead every time you turn a number from native into a human readable decimal representation, and telemetry is mostly numbers.

Groxx · 2024-01-13T17:32:45 1705167165

This is true with any non-self-describing format. Which includes the vast majority of JSON depending on who you ask - if you aren't specifying a schema in the request, what does `name` really mean?

Self-describing comes with rather large costs over a compact format in essentially all cases, there are lots of good reasons to prefer it. Particularly in internal infrastructure, like telemetry tends to involve.

secondcoming · 2024-01-13T02:04:43 1705111483

Is compression also bad because you can’t immediately observe the data?

I’ve heard the ‘binary makes debugging difficult’ excuse before and it’s just nonsense really.

jen20 · 2024-01-13T02:27:28 1705112848

Not as bad as encryption if those are the requirements!

Spivak · 2024-01-13T01:05:31 1705107931

I think the issue is that the ergonomics of protobuf kinda suck and the ergonomics of gRPC really suck and having to interface with a gRPC API is likely everyone's introduction to protobufs.

Protobufs are a really great idea that's hampered by heinously subpar tooling for everything but Go.