Hacker News new | past | comments | ask | show | jobs | submit login

If I remember correctly the initial version allowed required fields but it caused all sorts of problems when trying to migrate protos because a new required fields breaks all consumers almost by definition. So updating protos in isolation becomes tricky.

The problem went away with all optional fields so it was decided the headache wasn't worth it.




I used to work at a company that used proto2 as part of a homegrown RPC system that predated gRPC. Our coding standards strongly discouraged making anything but key fields required, for exactly this reason. They were just too much of a maintenance burden in the long run.

I suspect that not having nullable fields, though, is just a case of letting an implementation detail, keeping the message representation compatible with C structs in the core implementation, bleed into the high-level interface. That design decision is just dripping with "C++ programmers getting twitchy about performance concerns" vibes.


Zoox?


According to the blog post of one of the guys who worked on proto3: the complexity around versioning and required fields was exacerbated because Google also has “middle boxes” that will read the protos and forward them on. Having a contract change between 2 services is fine, required fields are probably fine, have random middle boxes really makes everything worse for no discernible benefit.


proto2 allowed both required fields and optional fields, and there were pros and cons to using both, but both were workable options.

Then proto3 went and implemented a hybrid that was the worst of both worlds. They made all fields optional, but eliminated the introspection that let a receiver know if a field had been populated by the sender. Instead they silently populated missing fields with a hardcoded default that could be a perfectly meaningful value for that field. This effectively made all fields required for the sender, but without the framework support to catch when fields were accidentally not populated. And it made it necessary to add custom signaling between the sender and receiver to indicate message versions or other mechanisms so the receiver could determine which fields the sender was expected to have actually populated.


You can now detect field presence in several cases: https://protobuf.dev/programming-guides/field_presence/#pres...

This is very solid with message types, but for basic types you can add `optional field` if needed as well (essentially making the value nullable)


proto2 required fields were basically unusable for some technical reason I forget, to the point where they're banned where I work, so you had to make everything optional, when in many cases it was unnecessary. Leading to a lot of null vs 0 mistakes.

Proto3 made primitives all non-optional, default 0. But messages were all still optional, so you could always wrap primitives that really needed to be optional. Then they added optionals for primitives too recently, implemented internally as wrapper messages, which imo was long overdue.


The technical reason is that they break the end-to-end principle. In computer networking in general, it is a desirable property for an opaque message to remain opaque, and agnostic to every server and system that it passes through. This is part of what makes TCP/IP so powerful: you can send an IP packet through Ethernet or Token Ring or Wi-Fi or carrier pigeon, and it doesn't matter, it's just a bunch of bytes that get passed through by the network topology.

Protobufs generally respect this property, but required fields break it. If you have a field that is marked 'required' and then a message omitting it passes through a server with that schema, the whole message will fail to parse. Even if the schema definition on both the sender and the recipient omits the field entirely.

Consider what happens when you add a new required field to a protobuf message, which might be embedded deep in a hierarchy, and then send it through a network of heterogenous binaries. You make the change in your source repository and expect that everything in your distributed system will instantly reflect that reality. However, binaries get pushed at different times. The sender may not have picked up the new schema, and so doesn't know to populate it. The recipient may not have picked up the new schema, and so doesn't expect to read it. Some message-bus middleware did pick up the new schema, and so the containing message which embeds your new field fails to parse, and the middleware binary crashes with an assertion failure, bringing down a lot of unrelated services too.


If you want to precisely capture when a field must be present (or will always be set on the server), the field_behavior annotation captures more of the nuance than proto2's required fields: https://github.com/googleapis/googleapis/blob/master/google/...

You could (e.g.) annotate all key fields as IDENTIFIERs. Client code can assume those will always be set in server responses, but are optional when making an RPC request to create that resource.

(This may just work in theory, though – I’m not sure which code generators have good support for field_behavior.)


That decision seems practical (especially at Google scale).

I think the main problem with it, is that you cannot distinguish if the field has the default value or just wasn't set (which is just error prone).

However, there are solutions to this, that add very little overhead to the code and to message size (see e.g. [1]).

[1]: https://protobuf.dev/programming-guides/dos-donts/


But you can distinguish between default and unset: all optional fields have has_ method associated with them: https://protobuf.dev/reference/cpp/cpp-generated/#fields


The choice to make 'unset' indistinguishable from 'default value' is such an absurdly boneheaded decision, and it boggles my mind that real software engineers allowed proto3 to go out that way.

I don't get what part of your link I'm supposed to be looking at as a solution to that issue? I wasn't aware of a good solution except to have careful application logic looking for sentinel values? (which is garbage)


Yes, proto3 as released was garbage, but they later made it possible to get most proto2 behaviors via configuration.

Re: your question, for proto3 an field that's declared as "optional" will allow distinguishing between set to default vs. not set, while non-"optional" fields don't.


Ah, ok! Yeah I think we've been working with an older version of protobuf for a while where that wasn't an option.


Yet still in 2024, supporting optional is off by default for some languages in protoc...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: