The static/dynamic debate reoccurs every ~20 years and has more to do with the e...

kentonv · on Nov 13, 2020

> Because you don't know what information you want to encode in your schema.

I'd argue that Protobuf actually embraces this -- its killer feature vs. other binary protocols of the era was that it's always easy to add new fields.

IshKebab · on Nov 14, 2020

Yes, but it's not dynamically typed at all. Dynamic typing is basically always the wrong decision in my experience. You give up a huge amount of basic error checking (e.g. 15% of JavaScript bugs are prevented by Typescript) just to avoid a little bit of typing. And it's not even like it saves you any time since with dynamic typing you spend so long trying to figure out what types functions expect and looking up APIs in documentation since code completion doesn't work.

Guthur · on Nov 14, 2020

Conflating binary and schema.

Also there are many over schema encodings that have better type systems than protobuf and allow dynamic schema, Avro being one for example.

ayewo · on Nov 14, 2020

Your first sentence is peak HN :)

On a serious note, kentonv properly prefaced his opinion with a qualifier “I'd argue that ...” but you somehow interpreted it as a fact ...

Anyway, if you think the person you replied to does not understand the difference between binary and schema, after helping author protocol buffers at Google, then no one can really claim to know anything here on HN.

Healthy skepticism is fine as it helps keep the discussion grounded, but its blind use will more often than not stifle insightful contributions from being posted by people close to the action.

Edited for clarity.

nostrademons · on Nov 14, 2020

Ignorance of who he's replying to aside, Guthur has an important technical point. Avro's separation of schema & binary is better than protobuf's field tags. You need a schema to meaningfully parse protobufs anyway (usually baked into the codegen), so you might as well make that schema a first-class object and package it with the data. Then you don't need to include or maintain the field tags (shrinking the binary size), you can explicitly reconcile different versions by sending along the new schema, and you don't need to codegen for each language, letting you easily support reflection and dynamic languages.

kentonv · on Nov 14, 2020

That approach -- sending the schema with the message and having the receiving end use it to reconcile version skew -- seems extremely expensive.

For it to be a win in terms of message size, the message has to be pretty large, or the schemas have to be negotiated at the start of a session that lasts a while. Maybe it would be a win for some use cases, but there's certainly plenty of use cases where the schema would end up much larger than the actual payload.

The bigger problem, though, is complexity. Code at the receiving end has to parse the schema, diff it against its own expectations, build the proper tables to translate the message content to the desired version, and then apply that.

I admit I haven't actually tried building something like this nor have I looked much at Avro's implementation, but this all sounds really slow and error-prone to me.

nostrademons · on Nov 16, 2020

> the message has to be pretty large, or the schemas have to be negotiated at the start of a session that lasts a while

Those are the two main use cases for a binary serialization format, though. Either you tend to have a very large row-oriented file with lots of homogenous records, say a SSTable or BigTable or Hadoop SequenceFile or Postgres binary column. Or you have a long-lasting RPC connection that serves one particular service interface, where you'd want to negotiate the protocol used by the service at the beginning and keep it up for hours.

I can think of a couple exceptions, like end-user HTTP connections to a public API. But for those, you usually need a complicated capability-based protocol negotiation anyway, because you need to handle hostile (hacked) clients or old versions that are stuck on something from 5 years ago. Google's habit of sticking a custom message type in a MessageSet and sending it along for a feature that triggers on 0.1% of traffic isn't really a thing outside of Google (and possibly other FANGs), not least because most companies can't afford to staff a team to maintain a feature used by 0.1% of traffic.

The solution for complexity is pretty routine: hide it behind a library. I'm not terribly fond of the particular API that Avro provides, but the wire format is sound & simple and there's nothing preventing someone from writing alternative library implementations.

Guthur · on Nov 14, 2020

This is my major criticism of protobuf and by extension grpc. IMO it emphasises two salient points, size on the wire and centralised control. The first is laudable but in my mind is taken too far because of the second. A decentralised distributed system requires ease of discovery and a decoupling of server and client.

I'm well aware of the many reasons to build distributed systems, not least of all the ability to distribute engineering effort, and so can see that if team distribution is a primary motivator for creating a micro service system that there would be a desire to make it appear like it's actually all one process. Of course it isn't one process but I can see the desire.