Hacker News new | past | comments | ask | show | jobs | submit login

> the message has to be pretty large, or the schemas have to be negotiated at the start of a session that lasts a while

Those are the two main use cases for a binary serialization format, though. Either you tend to have a very large row-oriented file with lots of homogenous records, say a SSTable or BigTable or Hadoop SequenceFile or Postgres binary column. Or you have a long-lasting RPC connection that serves one particular service interface, where you'd want to negotiate the protocol used by the service at the beginning and keep it up for hours.

I can think of a couple exceptions, like end-user HTTP connections to a public API. But for those, you usually need a complicated capability-based protocol negotiation anyway, because you need to handle hostile (hacked) clients or old versions that are stuck on something from 5 years ago. Google's habit of sticking a custom message type in a MessageSet and sending it along for a feature that triggers on 0.1% of traffic isn't really a thing outside of Google (and possibly other FANGs), not least because most companies can't afford to staff a team to maintain a feature used by 0.1% of traffic.

The solution for complexity is pretty routine: hide it behind a library. I'm not terribly fond of the particular API that Avro provides, but the wire format is sound & simple and there's nothing preventing someone from writing alternative library implementations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: