Why We’re Switching to gRPC

time4tea · on May 27, 2019

Yeah. Easy things are easy with most technologies... It's only after a while that you start to see the 'problems'.

With grpc... It's designed by Google for Google's use case. How they do things and the design trade-offs they made are quite specific, and may not make sense for you.

There are no generated language interfaces, so you cannot mock the methods. (Except by mocking abstract classes, and nobody sane does that, right)

That's because grpc allows you to implement whatever methods you like of a service interface, and require any fields you like - all are optional, but not really, right.

Things that you might expect to be invalid, are valid. A zero byte array deserialised as a protobuf message is a perfectly valid message. All the strings are "" (not null), the bools false, and the ints 0.

Load balancing is done by maintaining multiple connections to all upstreams.

The messages dont work very well with ALB/ELB.

The tooling for web clients was terrible ( I understand this may have changed )

The grpc generated classes are a load of slowly compiling not very nice code.

Like I say, if your tech and business is like Google's ( it probably isn't) then it's a shoe-in, else it's definitely worth asking if there is a match for your needs.

kjeetgill · on May 27, 2019

> With grpc... It's designed by Google for Google's use case. How they do things and the design trade-offs they made are quite specific, and may not make sense for you.

Agreed. It's always important to try to pick technologies that 'align' with your use-cases as well as possible. This is easier said than done and gets easier the more often you fail to do it well! I do think people will read "for Google's use case" and hear "only for Google's scale". I actually think the gRPC Java stack is pretty efficient so it "scales down" pretty well.

I want to skip over some of what you're saying to address this:

> Things that you might expect to be invalid, are valid. A zero byte array deserialised as a protobuf message is a perfectly valid message. All the strings are "" (not null), the bools false, and the ints 0.

Using a protobuf schema layer is wayyyy nicer than JSON blobs but I agree that it is misconstrued as type safety and validation. It's fantastic for efficient data marshaling and decent for code generation but it dosn't solve the "semantic correctness" side of things. You should still be writing validation. Its a solid step up from JSON not a panacea.

alasdair_ · on May 28, 2019

JSON has a bunch of schema systems, including Open API which is a repackaging of Swagger with some extra stuff and is also endorsed by Google.

Do you consider protobuf superior to those alternatives for web-based (rather than server to server) projects?

kjeetgill · on May 28, 2019

I spend all my time server to server so I don't feel qualified to give real advice.

My impression is that if you're going to talk to a browser, that edge stands to gain much more from conforming to HTTP standards. If your edge is more "applicationy" and less "webpagy" then maybe a browser facing gRPC (or GraphQL?) might be more appealing again.

As to the other JSON schema systems, I kinda wish one of them won? It feels like a lot of competing standards still. Not really my area of expertise.

apitman · on May 29, 2019

There are a couple gRPC implementations for the browser[0](officially supported), but it seems to require quite a bit of adaptation, and looked pretty complicated to set up.

[0] https://grpc.io/blog/state-of-grpc-web/

luhn · on May 28, 2019

I think OpenAPI/Swagger has won. Haven’t heard of any others recently.

buckhx · on May 27, 2019

I am by no means arguing with your general point, but some of these and may be language specific.

For example in Go, the service definitions are generated as interfaces and come with an "Unimplemented" concrete client. We have a codegen package that builds mock concrete implementations of services for use in tests.

Zero values are also the standard in Go and fit most use cases. We have "optional" types defined as messages that wrap primitives such as floats for times when a true null is needed (has been mostly used for update type methods).

The web clients work, but generate ALOT of code. We're using the improbable package so we can provide different transports for broswer vs server JS clients btw.

The big win we've seen from grpc is being able to reason about the entire system end to end and have a central language for conversation and contracts across teams. Sure there are other ways to accomplish that, but grpc has served that purpose for us.

ptoomey3 · on May 27, 2019

Our main hinderance with gRPC was that several disparate teams had strange issues with the fairly opaque runtime. The “batteries included” approach made attempts to debug the root causes quite difficult.

As a result of the above, we have been exploring twirp. You get the benefits of using protobufs for defining the RPC interface, but without quite as much runtime baggage that complicates debugging issues that arise.

kjeetgill · on May 27, 2019

That's always the problem with "batteries included". If they don't work it's often not worth the effort to fix them; you gotta toss em.

I'm curious what languages you were using gRPC with. The batteries includedness across tons of languages is a big part of gRPC's appeal. I'd assume Java and C++ get enough use to be solid but maybe that's wishful thinking?

ptoomey3 · on May 27, 2019

We were mostly using ruby (which uses their C bindings) and golang (which are native to golang).

cachvico · on May 28, 2019

What kind of problems did you run into, if you don't mind sharing?

ptoomey3 · on May 28, 2019

One that we encountered in several services were gRPC ruby clients that semi-regularly blocked on responses for an indeterminate amount of time. We added lots of tracing data on the client and server to instrument where “slowness” was occurring. We would see every trace span look just like you would hope until the message went into the runtime and failed to get passed up to the caller for some random long period of time. Debugging what was happening between the network response (fast) and the actual parsed response being handed to the caller (slow) was quite frustrating, as it requires trying to dig into C bindings/runtime from the ruby client.

mgsouth · on May 28, 2019

It was a couple of years ago, but the Go gRPC library had pretty broken flow control. gRPC depends upon both ends having an accurate picture of in-flight data volumes, both per-stream and per-transport (muxed connection). It's a rather complex protocol, and isn't rigorously specified for error cases. The main problem we encountered was that errors, especially timed-out transactions, would cause the gRPC library to lose track of buffer ownership (in the sense of host-to-host), and result in a permanent decrease of a transport's available in-flight capacity. Eventually it would hit zero and the two hosts would stop talking. Our solution was to patch-out the flow control (we already had app-level mechanisms).

[edit: The flow control is actually done at the HTTP/2 level. However, the Go gRPC library has its own implementation of HTTP/2.]

ptoomey3 · on May 29, 2019

Yet another batteries included downside...a blackbox http implementation that is hard to debug.

ptoomey3 · on May 28, 2019

This isn’t to say it happened on every response..it was a relatively small fraction. But, it was enough to tell _something_ was going on. Who knows, it could be something quirky on the network and not even a gRPC issue. But, because the runtime is so opaque, it made debugging quite difficult.

alasdair_ · on May 28, 2019

Even Java codefen has issues - like a single classfile so big it crashes any IDE not explicitly set up for it, or a whole bunch of useless methods that lead to autocomplete being terrible.

dub · on May 27, 2019

You can add whatever custom validation you want on top of proto3 (using annotations if you like). Required fields aren't very useful at a serialization level: adding a new required field would always be a backwards incompatible change. You should never do it. They're only useful if you have total certainty that you can define all your APIs perfectly on the first try. But again, if you really want them you can always build something like https://github.com/envoyproxy/protoc-gen-validate on top. That's the benefit of a simple and extensible system vs one that tries to bake-in unnecessary or problematic default behaviors.

Also: why wouldn't grpc work well with load balancers? It's based on HTTP/2. It's well supported by envoy, which is fast-becoming the de facto standard proxy for service meshes.

alasdair_ · on May 28, 2019

>Also: why wouldn't grpc work well with load balancers? It's based on HTTP/2

You answered your own question.

There is always some bit of older infrastructure, like a caching proxy or “enterprise” load balancer that doesn’t quite understand http/2 yet - it is the same reason so much Internet traffic is still on ipv4 when ipv6 exists - the lowest common denominator end up winning for some subsection of traffic.

tedk-42 · on May 28, 2019

Not wrong on this.

We use gRPC in our tech stack but some pods handle far more connections than others due to the multiplexing/reuse of existing connections.

Sadly Istio/Envoy solutions are in our backlog for now.

We can't fault gRPC otherwise. It's way faster than if we were to encode/decode json after each microservice hop. It's integrates into golang nicely (another google coolaid solution!) so a win-win there.

dilyevsky · on May 27, 2019

> The messages dont work very well with ALB/ELB.

Doesn't work on ALB because its HTTP/2 support is trash, not gRPC's fault here. Works fine with NLB btw.

> Load balancing is done by maintaining multiple connections to all upstreams.

Again, this is a "feature" of HTTP/2. Use linkerd or envoy that support subsetting among other useful things.

Don't blame your misunderstanding of how technology is meant to be used on said technology.

adamson · on May 28, 2019

This feels like a circular argument. Not everyone needs HTTP/2 support (especially for internal services for enterprise applications).

ptoomey3 · on May 28, 2019

Need or not need...a bigger issue is simply the technical reality of what you have now. If your non-trivial infrastructure doesn’t have great http2 support, it might be a pretty big lift to make that change first.

dilyevsky · on May 28, 2019

Yes, it is such a foundational thing that it has to be woven into how you run your infrastructure. Can't just drop it in in most cases. If my memory serves me well, Google basically re-started their entire codebase/infra from scratch - google3 (version 2 was skipped, apparently) to accommodate this shift.

denormalfloat · on May 27, 2019

> There are no generated language interfaces, so you cannot mock the methods.

For Java that isn't true. Java ships with a lightweight InProcess server to stub out the responses to your client.

> Load balancing is done by maintaining multiple connections to all upstreams.

Load balancing is fully pluggable. The default balancer only picks the first connection.

> The tooling for web clients was terrible

Agreed. This is almost entirely the fault of Chrome and Firefox, for not implementing the HTTP/2 spec properly. (missing trailers).

time4tea · on May 27, 2019

There is a big difference in my book between starting up an in process server (requiring all sorts of grpc naming magic) running your extensions to an abstract class inside the grpc magic, and a language level interface.

On the one level you can say new X(new MyService()) or new X(mock(Service.class))) if you have to, and on the other its just loads of jibber-jabber.

denormalfloat · on May 27, 2019

There is a reason mocking is not supported: It's incredibly error prone. The previous version of gRPC (Stubby) did actually support mocking of the stub/service interfaces. The issue is that people mock out responses that don't map to reality, causing the tests to pass but the system-under-test to blow up. This happened often enough that ability to mock was ripped out, and the InProcess server-client added.

The extra "jibber-jabber" is what makes people confident that their Stub usage is correct.

Some sample bugs NOT caught by a mock:

* Calling the stub with a NULL message

* Not calling close()

* Sending invalid headers

* Ignoring deadlines

* Ignoring cancellation

There's more, but these are real bugs that are trivially caught by using a real (and cheap) server.

mleonhard · on May 28, 2019

One more thing that's a show-stopper for any public service: No flow control. This means anybody who can connect to your gRPC server can OOM it.

dikei · on May 28, 2019

I suppose you can use a proxy to perform rate-limiting.

damnyou · on May 28, 2019

You should never use protobuf types directly in your code. Always convert to native types at the edges — that will let you do validation.

kbwt · on May 28, 2019

At that point, what does Protobuf really buy you?

damnyou · on May 28, 2019

A rock solid serialization and RPC system.

adamson · on May 28, 2019

> Things that you might expect to be invalid, are valid. A zero byte array deserialised as a protobuf message is a perfectly valid message. All the strings are "" (not null), the bools false, and the ints 0.

How does this work? How do you make, say, all fields but the second null? Do you just send a messages that's (after encoding) as long as the first two fields, where the first field is 0x00 and the second contains whatever data you want?

violinist · on May 28, 2019

Two things: 1) proto buffers intentionally don't allow null values; values that aren't set will return a default value 2) gRPC uses proto3, which does not distinguish between a field unset and a field set to the default value

kjeetgill · on May 27, 2019

This article comes at a good time because I've been exploring the OpaenAPI vs. gRPC for a codebase that presently uses neither. Evaluating technology feels like a lot of navel gazing, so it's nice to hear other's experiences even if their uses don't line all the way up with ours.

Disclaimer: Java fanboy bias. For services internal to a company, I think gRPC is an all around win. If you need to talk to browsers integrations, I don't have as many opinions.

Personally, I really prefer working at the RPC layer rather than at the HTTP layer. It's OOP! It's SOA! Pick your favorite acronym! HTTP's use as a server protocol (as opposed to a browser) is mostly incidental. It works great but most of the HTTP spec entirely inapplicable to services. I like named exceptions to 200 vs 5xx 4xx error codes. Do I really care about GET, PATCH, PUT, HEAD, POST for most of my services when all of my KV/NewSQL/API-over-DB services have narrower semantics anyway.

Out of band headers are nice though.

Between protobufs, http2, and a fresh, active server implementation we see pretty solid latency and throughput improvements. It's hard to generalize but I suspect many users will. Performance isn't the only driving factor but it's nice to start from a solid base.

I'm sure missing all the tools like curl and friends is an annoyance, but I like debugging from within my language, and in JVM land at least it's been easy enough.

atombender · on May 27, 2019

Have you considered GraphQL? Lots of overlap with gRPC, but much more web-friendly. Much better support for optional vs. requires data, too. And comes with server push, replacing the need for WebSockets/SSE.

Only downside I can think of is that there's no analogous mechanism to gRPC streams; you have to implement your own pagination.

kjeetgill · on May 27, 2019

I havn't looked into GraphQL much at all, so correct me if I'm mistaken.

From what I understand of it, the big idea is that instead of passing parameters from the client to the server and fully implementing the query logic, stitching, and reformatting etc. on the server side, you now have a way to pass some of that flexibility out to the client. Instead of updating both the server and the client as uses change, more can be done from the client alone.

I spend most of my time on the infra side of things and rarely if ever make my way out to the browser so I can't speak to WebSockets/SSE or web friendliness. Being the "backend-for-backend" I just prefer being more tight-fisted about what my clients can and can't do. I mostly deal with internal customers with tighter SLAs so I like to capacity plan new uses.

Maybe I'm just old fashioned.

rubenbe · on May 27, 2019

I recently chose gRPC as a communication protocol between two devices (sort of IoT).

Until now it works perfectly as expected. The C++ code generator provides a clean abstraction plus it saved a lot of time (both in programming and debugging). The gRPC proto file syntax also nudge you in the right direction wrt protocol design.

When trying to "sell" gRPC it helps that there are generators for plenty of languages and it's backed by a major company.

gravypod · on May 27, 2019

I wish that tooling around generating protoc into stubs and client libraries was simpler. I wish there was a single command I could run and turn large collection of proto files into libraries for "all" languages (python, java, C++, Node package, etc). Unfortunately there's no universal approach to this.

q3k · on May 28, 2019

This seems like an odd requirement. Are you trying to generate stubs for your API users ahead of time? This will likely not work as generated stubs evolve in lockstep with protoc and runtime support libraries, and thus are not guaranteed to work across discrepant versions. Thus, stub code should be generated alongside the consumer/client. It also likely shouldn't be committed into a VCS.

gravypod · on May 28, 2019

It would be done in CI. Generate stubs -> package/compile -> push to internal package repo.

This way your protocol for your infrastructure is just another library.

q3k · on May 28, 2019

Having an explicit 'create client library by generating/compiling proto stubs' is generally also bad mojo from my experience, unless you're also abstracting API stability and service discovery. If not, it will be unnecessarily painful to make a change to either the service discovery method or a non-backwards-compatible proto change, as you will have to lockstep both the service rollout, the library build and the client bump.

jsty · on May 27, 2019

What makes that impossible now? I'm probably overlooking something in your use case, but couldn't you just have a simple build script / makefile that generated the libraries with a different protoc call for each library?

gravypod · on May 27, 2019

Nothing makes it impossible but comparing it to things like thrift the complexity becomes apparent:

    thrift --gen <language> <Thrift filename>

This handles the following languages: C (depends on GLib), Cocoa, C++, C#, D, delphi, Erlang, Go, Haskell, Java, Javascript, OCaml, Perl, PHP, Python, Ruby, Smalltalk. From this one command I can instantly integrate this into almost every build system I know of.

On the other hand gRPC has the same features but it's a slightly different workflow. For each language you go to the language's code generation page, find the command line option for generating your language's code, read up on some decisions that were made for you, etc. All of that is fine, the part that annoys me a bit is each language needs a language module for the compiler (if it's not one of the core few languages). For example in the documentation for generating Go [1] they have you download and install protoc-gen-go from http://github.com/golang/protobuf assuming that you already have golang installed.

gRPC seems much more focused on the idea that I want to define an API for my code, I want that specification to live inside the project that I am writing, and that you can figure out how to generate stubs on a language by language basis.

What I want is something I can write a set of system specification files, type one command and get modules built for all languages. From there I can import those modules using my native language's favorite package manager (npm, composer, Hunter for CMake, etc). Ideally the Protocol Specification, the Library Generation, and the Library Usage are three components that are separate.

[1] - https://developers.google.com/protocol-buffers/docs/referenc...

jsty · on May 27, 2019

With the significant caveat that I haven't used it much myself (I mainly work with Bazel which obviates the need), I think Uber's prototool [0] can manage most if not all of that. Might be worth giving it a look.

[0] https://github.com/uber/prototool

shereadsthenews · on May 27, 2019

bzl build :my_proto_library ?

gravypod · on May 27, 2019

In many conversations I've had with Google (or Google-Adjacent) engineers I've revealed a truth to them that was quite shocking: Bazel isn't the only build system in wide scale deployment currently. It's also far from the most used build system currently. While Bazel is a monolithic tool that solves this problem there is no external tool that solves this problem of configuring the different semantics and configuration for protos in different languages.

shereadsthenews · on May 27, 2019

You said you wished for a command and I gave you one.

star-trek-fleet · on May 27, 2019

The truth is gRPC, like kubernetes, was built with decades of lessons of RPC framework inside a container oriented distributed environment; and more importantly, gRPC is the blessed framework inside Google as well, meaning it's qualified to power the largest and most complex distributed systems in the world (I think it'd safe to omit 'one of' here), which in comparison is not the case for kubernetes.

Addition: Borg and kubernetes are designed with similar goals but different emphasis. They are like complementing twins had different personalities. For this I recommend Min Cai's Kubecon'18 presentation about peleton [1], the slide is titled "comparison of cluster manager architecture".

[1] https://kccna18.sched.com/event/GrTx/peloton-a-unified-sched...

shereadsthenews · on May 27, 2019

Wait, I don’t get it. Kubernetes:Borg::gRPC:Stubby. Google uses gRPC internally to the same extent that they use Kubernetes internally, i.e. hardly at all.

tgma · on May 27, 2019

This analogy is very misleading. Kubernetes is probably never going to run any real workload internally at Google, but gRPC powers all external APIs of Google Cloud, and increasing other Google properties (e.g. ads, assistant), used by mobile apps like Duo and Allo, and have some big internal service use cases. The reason Stubby still dominates internally is simply taking lots of time to migrate to gRPC that might be hard to justify, but I do see gRPC being used very widely internally at Google; it’s simply a matter of time. I don’t see that happening to Kubernetes; it’s a joke when compared to Borg.

Google aside, many other companies like Dropbox rely on gRPC extensively to successfully run infrastructure: https://static.sched.com/hosted_files/grpconf19/f7/Courier%2...

CydeWeys · on May 27, 2019

I work at Google and my team has real workloads running on Kubernetes.

There's plenty of internal teams that use GCP. Increasingly this might be the direction things are heading.

tgma · on May 27, 2019

GCP itself is a job on Borg. ;)

star-trek-fleet · on May 27, 2019

That's not true. GCE uses Borg very differently than normal Google internal systems, which you can imagine is quite natural as they are reserving different customers. GCS and other system, in turn, also differ wildly than GCE. when you talk about GCP as a whole, it becomes impossible to summarize in a few statement, and I doubt there is anyone on earth who is capable to describe it coherently even without time constraint.

tgma · on May 27, 2019

What I said (GCP runs on Borg) is absolutely and technically correct, affirmed by your own comment, which highlights the power and flexibility of Borg. The point being no one[1] at Google relies on Kubernetes for raw cluster management capabilities at scale. They might use it for other things that can make deployment more friendly in some scenarios. (This doesn’t make Kubernetes a bad system by any means, just quite different and not a substitute for Borg whereas gRPC is a direct substitute for Stubby). This debate is better argued in your own eng-misc@ and not on a public forum.

[1]: no-one that we care. At Google this is obviously always incorrect. There’s always that someone who uses weird things like mongoDB and AWS.

shereadsthenews · on May 27, 2019

And there's no reason why a small project should not. But nobody is going to move, say, indexing to GCP. And when it comes to power laws the big things are big and the small things are not.

CydeWeys · on May 27, 2019

This sounds like a No True Scotsman argument to me, that if something runs on GCP instead of Borg, it isn't "real". Also throw in shades of moving goalposts.

Indexing doesn't run on GCP primarily because it's legacy (as in, the first product Google ever did) and thus long predates GCP itself.

shereadsthenews · on May 27, 2019

It’s neither of those fallacies. The fallacy is to suppose that if you know several people using technology X then it must be quite popular. We see this all the time on HN where people suppose that, say, Erlang is quite popular because there are dozens of companies, each with five engineers, using it. But then we ignore that there are five companies with a hundred thousand engineers each that do everything in c++. It’s the same with these other things. It’s quite like that K8s satisfies the requirements of 80% of the projects at Google and it’s also quite likely that all of them put together consume 1% of the production resources, so it leads to the question of whether it’s even capable of solving a really large problem, as mehrdada argues elsewhere in this thread.

CydeWeys · on May 28, 2019

It's not a Google product obviously, but Snapchat runs on GCP. That's quite big. Is that not a "real" product? Admittedly they're on App Engine, a much older product than Kubernetes, but I suspect they'd be able to run on Kubernetes, and perhaps that's what they would choose if they were to build from scratch right now.

the-rc · on May 27, 2019

Indexing has been rewritten many times over. Even if you removed all the dependencies on Bigtable and co., I think indexing would be the last to move, for quite practical reasons, due to its sheer size and design. The parent poster picked probably the worst workload to migrate to public GCP. Gmail, YouTube and search serving are easier in comparison.

shereadsthenews · on May 27, 2019

I guess you’d find that on an rpc-weighted scale Stubby handles several orders of magnitude more traffic than those gRPC endpoints you mentioned.

tgma · on May 27, 2019

This may be true today (although even on this metric I’d estimate Kubernetes to be off by some orders of magnitude, bordering zero). My point is there’s a path and plan forward for gRPC adoption and it’s a matter of transition to a new system (which can, admittedly, be very long). For Kubernetes, I don’t think there is a credible path for replacing Borg.

enitihas · on May 27, 2019

Curiously, what do you find lacking in Kubernetes compared to Borg?

tgma · on May 27, 2019

Scale, for one thing.

Kubernetes is not a bad system but it’s not designed to run Google.

Thaxll · on May 27, 2019

What google system runs more than 5k tasks on a single cluster?

the-rc · on May 27, 2019

https://github.com/google/cluster-data/blob/master/README.md...

"ClusterData2011_2 provides data from an 12.5k-machine cell over about a month-long period in May 2011."

shereadsthenews · on May 28, 2019

10k replicas in a single cell is the default charge-free quota of every individual engineer at that company. It’s basically zero.

dilyevsky · on May 27, 2019

Heh, you have no idea. Lots and lots of systems. And 5k wouldn't even register there...

Thaxll · on May 30, 2019

Can't edit my comment anymore but I meant 5k nodes / 150k pods / tasks.

int0x80 · on May 27, 2019

Can you expand on that, please?

dilyevsky · on May 27, 2019

I used to work on one of Borg teams at Google and now run Kubernetes platform. Nearly every Kubernetes component (node, master, networking) will melt down at fraction of Borg scale. It's not even close.

int0x80 · on May 28, 2019

thanks for the answer.

q3k · on May 27, 2019

https://kubernetes.io/docs/setup/cluster-large/

More specifically, "No more than 5000 nodes" and "No more than 150000 total pods" is fairly limiting to large (Google-large) clusters.

int0x80 · on May 28, 2019

thanks for the answer.

techslave · on May 27, 2019

grpc was built way before containers were a thing. jails and zones were barely out of the gate at the time.

star-trek-fleet · on May 27, 2019

You are taking about stubby, I would guess.

Borg is cirra 2003, pb/stubby was before that. Gfs probably was similar in timing as stubby. And many other cluster level foundations. In the end, Borg is the true corner stone that ties everything together and completes the Google infrastructure puzzle (or the modern global scale cluster computing).

sytelus · on May 28, 2019

I'd needed RPC framework for few of my projects but every time I'd considered gRPC, I ended up walked away from it. The big issue is that gRPC has a huge amount of dependencies and it tries to a lot of things, many of them might be irrelevant for you but would cause extra headache anyway. When all need is just serializing your stuff and send over the wire, there are much better lightweight frameworks. For C++, I think RpcLib is one of the best. It doesn't even require maintaining .proto file, do "compile" of schems every time you change something, etc. The moral of the story is that always look around instead of just going for the most popular solution first.

shereadsthenews · on May 27, 2019

A couple of subtly wrong points in the article. Firstly gRPC payload can be anything, needs not be an encoded protocol buffer. Secondly there’s not a whole lot of “validation” going on in the protobuf codec. Basically any fundamentally correct buffer encoded as message A will decode successfully as message B for any B. If there are unknown fields they are silently consumed. If there are missing fields they are given the default values and there is no “required” in proto3. So there is significantly less safety, and significantly more flexibility, in gRPC than people generally realize.

ntenenz · on May 27, 2019

`required` was removed due to the challenges it introduces in designing backwards-compatible API changes[1].

[1] https://github.com/protocolbuffers/protobuf/issues/2497#issu...

docker_up · on May 28, 2019

"Required" fields that are no longer required, and "optional" fields that are no longer optional are basically 6 of one and half a dozen of another.

I'm personally strongly in the "required" camp because at least the interface makes an attempt at giving clues to a user as to what fields are important. If everything is optional, there's no information being passed as to what is important anymore.

duality · on May 27, 2019

"Basically any fundamentally correct buffer encoded as message A will decode successfully as message B for any B."

This is incorrect. I suspect you're overextending proto3's treatment of unknown fields to include discarding incorrectly typed fields too. If A has field 1 types as an int, and B has field 1 typed as a string, an A message with field 1 set will not parse as a B message. However, if the A message has no fields set, or sets a field number unknown to B, that could parse successfully with "leftover" unknown fields.

kentonv · on May 27, 2019

> If A has field 1 types as an int, and B has field 1 typed as a string, an A message with field 1 set will not parse as a B message.

In the C++ reference implementation, which I wrote, this is not true. The field 1 with the wrong wire type would be treated as an unknown field, not an error.

It's possible that implementations in other languages have different behavior, but that would be a bug. The C++ implementation is considered the reference implementation that all others should follow.

However, shereadsthenews' assertion is not quite right either. Specifically, a string field and a sub-message field both use the same wire type; essentially, the message is encoded into a byte string. So if message A has field 1 type string, containing some bytes that aren't a protobuf, and message B has field 1 type sub-message, then you'll get a parse error.

But it is indeed quite common that one message type parses successfully as another unrelated type.

therein · on May 27, 2019

> In the C++ reference implementation, which I wrote, this is not true. The field 1 with the wrong wire type would be treated as an unknown field, not an error.

Yeah I was about to say, protobuf C++ implementation will definitely treat it as an unknown field. I just had it do that a few days ago. :)

shereadsthenews · on May 27, 2019

Ok but these messages are isomorphic on the wire:

  message enc {
    int foo = 1;
    SomeMessage bar = 2;
  }

  message dec {
    bool should_explode = 1;
    string why = 2;
  }

You can successfully decode the latter from an encoding of the former.

dweis · on May 27, 2019

Minor nit, but not necessarily. For basically all values of SomeMessage, dec should fail to parse due to improperly encoded UTF8 data for field 2 (modulo some proto2 vs. proto3 and language binding implementation differences).

Change field 2 to a bytes field instead of a string field and then yes.

shereadsthenews · on May 27, 2019

I should mention that I consider this a feature not a bug. The isomorphism permits an endpoint to use ‘bytes submessage_i_dont_need_to_decode’ to cheaply handle nested message structures that need to be preserved but not inspected, such as in a proxy application.

shereadsthenews · on May 27, 2019

True but UTF8 enforcements was quite absent in all implementations until proto3, and the empty string would be a special case.

keymone · on May 27, 2019

Bool will decode from int’s encoding??

dweis · on May 27, 2019

Yes, they are both varint encoded on the wire. Refer to https://developers.google.com/protocol-buffers/docs/encoding...

keymone · on May 27, 2019

Sigh.

dweis · on May 27, 2019

I don't think this is the case, or at least, I'd expect it to be a bug.

Protocol Buffers should generally be non-destructive of the underlying data. That means even if it encounters the wrong wire type for a field, it should simply retain that value in the unknown field set rather than discard it.

dunk010 · on May 27, 2019

Rich Hickey, the creator of Clojure, gave a talk with some very relevant points in this space (The Language of the System): https://www.youtube.com/watch?v=ROor6_NGIWU

dustingetz · on May 28, 2019

"gRPC is great because most systems just glue one side effect to another side effect, so what's the point of packaging that into REST" – I think a HN comment from a googler

The great thing about Clojure is that you can make holistic systems that are end-to-end immutable and value-centric, which means the side effects can go away, which means gRPC stops making sense and we can start building abstractions instead of procedures!

cgdub · on May 28, 2019

His point about protocol buffers (i.e. schema-out-of-band protocols) is unfortunately brief in this talk.

Depending on your use case, you may have to do a lot to work around protocol buffers not being self-describing. I haven't seen a good description of the problem online, but if you find yourself embedding JSON data in protobufs to avoid patching middleware services constantly, you should look at something like Avro or MessagePack or Amazon Ion.

nevi-me · on May 27, 2019

I'm yet to see enough services expose a gRPC endpoint, at least other than Google. That'll keep the perception of adoption s/low.

I'm writing this as I take a break from working on a polyglot project made up of Kotlin, Rust, Node, where we use gRPC and gRPC-web. We're slowly stealing endpoints from the other services/languages to Rust.

Without focusing on the war of languages, the codegen benefits of protobufs have made what used to be a lot of JSON serde much easier.

mleonhard · on May 28, 2019

Can you point to a single public Google-run gRPC service? I was under the impression that all connections into Google are proxied by GFE (Google Front-End) servers to internal servers running the Stubby RPC server code. GFE is definitely not running gRPC server code. I don't believe a gRPC endpoint could pass Google's own Production Readiness Review process.

terinjokes · on May 28, 2019

I've seen Google endpoints available over gRPC over the last few years. Many, if not most, of the Cloud endpoints are directly documented as being available over gRPC[0]. For others, like Google Ads, a peak at the client libraries show it's using gRPC[1] as well.

[0]: https://cloud.google.com/pubsub/docs/reference/service_apis_...

[1]: https://github.com/googleads/google-ads-java/blob/master/goo...

nevi-me · on May 28, 2019

Yes, this. A lot (based on my last interaction 2 yrs ago) of Google's SDKs are convenience methods that hide gRPC behind. If you use a language that doesn't have an SDK, you can mostly connect directly to their rpc endpoints.

nevi-me · on May 28, 2019

The googleapis [1] repo has the publicly accessible gRPC definitions, which you can access directly. I've done this before, though it was a bit tedious as I had to learn how to manually pass Google credentials (documentation wasn't good enough).

[1] https://github.com/googleapis/googleapis

_ZeD_ · on May 27, 2019

hey, hey! have you known about this new "webservices" stuff? with SOAP you can call remote code as it is here! and with WSDL you can create automatically the client!

adrianmonk · on May 27, 2019

Short summary of web services:

Phase 1: Ad hoc, free-for-all chaos.

Phase 2: SOAP tries to bring order. It fails mainly because the "S" ("Simple") is a lie.

Phase 3: Pendulum swings hard toward simplicity with HTTP plus JSON plus nothing else, thanks.

Phase 4: Things shift possibly more toward the middle (a little structure), but none of the competing systems have become obvious winners.

crehn · on May 28, 2019

Most things in life seem to change like a pendulum learning from previous swings, slowly converging to a healthy middle. Extremes are useful since they give perspective and attractive since they're easy to grasp.

Sevii · on May 28, 2019

Except for the fun time when the WSDL doesn't match the actual implementation because 'they don't support WSDL'. (despite serving one from their SOAP service)

tracker1 · on May 28, 2019

Or worse, when the WSDL has a response type of "Object" ... OMG was this ever painful to generate clients for. Usually cheated and used Node as a bridge service.

nullwasamistake · on May 27, 2019

Man I miss WSDL. We're building it all over again with gRPC.

docker_up · on May 28, 2019

gRPC is exactly like XDR and ONC-RPCs from the Unix days of the 90s, ex. NFS.

stephenr · on May 27, 2019

Apart from “well google (created|uses) it” I don’t really get the benefit of gRPC compared to any other rpc, eg jsonrpc or even xmlrpc, both of which are fairly static, open specifications for a way to communicate, rather than actual releases of a library that apparently has a new release every 10 days or so.

wvenable · on May 27, 2019

Binary. Streaming. Strongly typed (with caveats). There's a whole article about the advantages/differences linked from the top of this page.

jwalton · on May 27, 2019

> My API just returned a single JSON array, so the server couldn’t send anything until it had collected all results.

Why can't you stream a JSON array?

Edit: Here's a (hastily created and untested) node.js example, even:

    class JSONArrayStream extends Transform {
        constructor() {
            super({readableObjectMode: false, writableObjectMode: true});
            this.dataWritten = false;        
        }

        _transform(data, encoding, callback) {
            if (!this.dataWritten) {
                this.dataWritten = true;
                this.push('[\n');
                this.push(JSON.stringify(data) + '\n');
            } else {
                this.push(',' + JSON.stringify(data) + '\n');
            }
            callback();
        }

        _flush(callback) {
            this.push('\n]');
        }
    }

spenczar5 · on May 27, 2019

There can be a bit more to it:

- how can a client send an error message if it runs into problems mid-stream? You can invent a system, but you’re walking into an ad-hoc protocol pretty fast; why not use something others wrote?

- what if the remote end wants to interrupt the stream sender to say “stop sending me this” for any reason? For example, an erroneous item in the stream, or a server closing down during a restart.

- grpc supports fully bidirectional streams, interleaving request and response in a chatty session; how do you do this?

Not that the original article mentioned these. I bristle though when I hear the engineer’s impulse to “why don’t you just”-away at something.

stephenr · on May 27, 2019

you’d probably need to do some tricks to get it to parse in a browser.

Editing, because I can’t reply: I was specifically going to mention SSE/EventSource but expected an immediate “not everything is a browser” response.

Editing the 2nd: yep, that’s kinda what I meant by “tricks” - essentially splitting out chunks to pass to the json parser.

jwalton · on May 27, 2019

See my edited example above; you can stream data to this, it'll stream nicely to the browser. This uses "\n"s at the end of every line, which means you can write a very simple streaming parser client side, because you can just split the input at "\n," to get nice JSON bits, but there's certainly JSON streaming libraries on NPM that will parse this for you more "properly". And, it parses with a normal JSON parser too.

pbedat · on May 27, 2019

Modern browsers understand text/event-stream so no need for dirty tricks ;)

pbedat · on May 27, 2019

Also: Server sent events

stephenr · on May 27, 2019

https://news.ycombinator.com/item?id=20023784 Specifically claims that validation isn’t performed as the article mentions

“binary” is not necessarily a benefit, particularly for developer tooling/debugging

Streaming is one area it may have a benefit but honestly the other issues outweigh that possible benefit (and it’s not like there aren’t other ways to stream data to a browser without resorting to polling)

ska · on May 27, 2019

It's not just streaming. Any heavy natively binary objects (think scientific computing) are a pain to marshal without a good binary interface, and it can easily become a real performance issue.

cblum · on May 27, 2019

Of course you're being downvoted. But I'm totally with you on this one.

As someone currently responsible for migrating services to gRPC at my company, I always say that the main reason people are switching is "because Google."

While there are merits to gRPC and protobuf, I don't think there are enough advantages to throw away REST and JSON and all the tooling around it. The moment you start switching, you start feeling the pain.

"Because Google" is also the main reason everyone wants to run their crap on Kubernetes. It's pure hype.

Always makes me think of "You Are Not Google": https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb

pilif · on May 27, 2019

The article says that one of the advantages of gRPC is streaming and that JSON wouldn’t support streaming.

That’s however just an implementation detail. JSON can easily be written and read as a stream.

Switching your whole architecture, dealing with a binary protocol and the accompanying tooling issues just because of your choice of JSON parser feels like total overkill.

JSON over HTTP is ubiquitous, has amazing tooling and his highly debuggable. Parsers have become so fast that I feel they might even have the opportunity to be faster than a protobuf based solution.

Finally I don’t buy the argument about validation. You have to validate input and output on the boundaries no matter what.

Even when your interface says “this is a double”, it says nothing about ranges (as seen in the article where valid ranges were specified in the comment) for example.

maltalex · on May 27, 2019

> Parsers have become so fast that I feel they might even have the opportunity to be faster than a protobuf based solution.

Not even close. Event new JSON serializers/deserializers aren't magic. Protobuf is a LOT easier to parse, so it's naturally a LOT faster.

First two duck results for "json vs protobuf benchmark":

https://auth0.com/blog/beating-json-performance-with-protobu...

https://codeburst.io/json-vs-protocol-buffers-vs-flatbuffers...

ricardobeat · on May 27, 2019

The first link shows a mere 4% margin when talking to a JavaScript VM.

Even at a 5x improvement, most projects will never reach a point where the transport encoding is a bottleneck. Protobuf has a lot going for it (currently using in a project) but can’t be sold on speed alone.

denormalfloat · on May 27, 2019

Is the JSON parser implemented natively, or in JS? It may not be apples-to-apples.

scottlamb · on May 27, 2019

> Is the JSON parser implemented natively, or in JS? It may not be apples-to-apples.

True, but if you're wanting an implementation you can use in Javascript running in the browser, it may accurately reflect reality. You have a high-quality browser-supplied (presumably native) implementation of JSON available. For a protobuf parser, you've just got Javascript. (You can call into webassembly, but given that afaik it can't produce Javascript objects on its own, it's not clear to me there's any advantage in doing so unless you're moving the calling code into webassembly also.)

I don't think browser-based parsing speed is important though. It's probably not a major contributor to display/interaction latency, energy use, or any other metric you care about. If it is, maybe you're wasting bandwidth by sending a bunch of data that's discarded immediately after parsing.

fanf2 · on May 27, 2019

My guess would be that most of the cost is creating the JS objects and the parsing is a relatively small part of the cost, so optimizing it would not help much.

connor4312 · on May 27, 2019

Yea, the V8 json parser is implemented naively and optimized alongside the engine in a way that other serialization methods in Javascript, and JSON in other languages, is generally not.

tgma · on May 27, 2019

As mentioned in other comments, gRPC transport is orthogonal to Protobuf serialization. The gRPC runtime library takes no dependency on that. You can use gRPC with JSON. It just happens the default code generators use protobuf IDL and serialization. You can use gRPC library with your own JSON based stub generator.

kjeetgill · on May 27, 2019

While that's true I think protobufs are (correctly) seen as the standard preferred way to use gRPC. The first point from the main page:

> Simple service definition

> Define your service using Protocol Buffers, a powerful binary serialization toolset and language

It's a little unfair to call it that orthogonal.

Thaxll · on May 27, 2019

You can't do good streaming using REST / JSON, it's either broken / slow / badly implemented. And that's for one direction, bidirectional streaming is not even possible.

EugeneOZ · on May 27, 2019

Not all of your API endpoints should response with JSON, that's all. Create endpoint for streaming - it's simple solution.

dewey · on May 28, 2019

There's also https://github.com/uw-labs/bloomrpc/blob/master/README.md which is kinda like Postman but for gRPC. I didn't see it mentioned in the Caveats section of the post so maybe useful to someone else too.

superfreek · on May 27, 2019

gRPC is great, but my issues with it are debugging and supporting the browser as a first class citizen.

We've been working hard on OpenRPC [0]. An Interface Description for JSON-RPC akin to swagger. It's a good middle ground between the two.

[0] https://open-rpc.org

apitman · on May 29, 2019

No streaming? I poked through the docs and spec but didn't see it mentioned. Assuming it's just JSON-RPC under the hood that answers my question, but maybe ya'll have added support on top.

denormalfloat · on May 27, 2019

Have you looked at gRPC-Web?

huehehue · on May 28, 2019

I have mixed feelings about gRPC-Web, and welcome alternatives. Setting up a proxy with any sort of non-standard config can be a pain, gRPC-Web doesn't translate outbound request data for you which can get ugly[0], and your service bindings may or may not try to cast strings to JS number types which silently fail if over MAX_SAFE_INTEGER.

[0] Instead of passing in a plain object, you build it as such:

  const userLookup = new UserLookupRequest();
  const idField = new UserID();
  idField.setValue(29);
  userLookup.setId(idField);
  UserService.findUser(userLookup);

The metadata field doesn't seem to mind though...

SamReidHughes · on May 28, 2019

I'd just like to say I appreciate the writing at the beginning of the article.

"While more speed is always welcome, there are two aspects that were more important for us: clear interface specifications and support for streaming."

This offers a quick exit for anybody who already knows about these advantages.

signa11 · on May 28, 2019

one fundamental issue with grpc seems to be that every request for a given service ends up either creating a thread or use an existing one from a pool-of-threads. ofcourse, you cannot limit the number of threads because that will lead to deadlocks.

i _suspect_ for google-scale it should all be fine, where available cpu's are essentially limitless, and consistency of data gets handled f.e. due to multiple updates etc. at a different layer.

writing safe, performant, multi-threaded code in presence of signals/exceptions etc. is non-trivial regardless of how your 'frontend' looks like. async-grpc is quite unwieldy imho.

i have heard folks trying grpc out on cpu-horsepower-starved devices f.e. wireless-base-stations etc. and running into aforementioned issues.

ahuang · on May 27, 2019

gRPC isn't a requirement for response streaming (as is quoted as one of the main reasons for doing the migration). That can all be achieved with http/json using chunked encoding. In fact, that's what the gRPC-gateway (http/json gateway to a gRPC service) does https://github.com/grpc-ecosystem/grpc-gateway.

gRPC adds bi-directional streaming which is not possible in http, but the use cases for that are more specialized.

apitman · on May 29, 2019

Sure, the actual transfer will be streamed, but most JSON clients wait for the entire response before firing your callback. As far as I know there isn't even a commonly used spec for partially reading a JSON document.

ishaanbahal · on May 27, 2019

Great to see people using GRPC, but this article doesn't state anything that the actual grpc.io website doesn't, except for the OpenAPI comparison.

j16sdiz · on May 28, 2019

I don't see the article answering the "why" question. It was just "We don't like what we where using, so we tried gRPC"

ww520 · on May 28, 2019

Another alternative is Thrift. It got lots of language bindings and the servers superb.

qwerty456127 · on May 27, 2019

IMHO the only reason to use HTTP for services today is caching.

ec109685 · on May 27, 2019

Parsing speed shouldn’t be a factor in deciding. Parsing protobufs on the browser is going to be way slower than using a native json parser and even on the server, there are java libraries that are much faster than protobufs, e.g. https://jsoniter.com/

That’s why formats like FlatBuffer where written; however parsing is likely not going to dominate your application, so other factors should influence your decision instead.

kentonv · on May 27, 2019

> on the server, there are java libraries that are much faster than protobufs

Be careful not to back broad arguments with outlier benchmarks.

In general, it is plainly true that JSON is much more computationally difficult to encode and decode than Protobuf. Sure, if you compare a carefully micro-optimized JSON implementation against a less-optimized Protobuf implementation, it might win in some cases. That doesn't mean that Protobuf and JSON perform equivalently in general.

ec109685 · on May 27, 2019

What is the reason not to use the micro optimized JSON implementation if parsing becomes your bottleneck?

kentonv · on May 27, 2019

I don't think I said that?

ec109685 · on May 28, 2019

My point is that json is always “fast enough”. Either you don’t care about parsing speed and can use what is most ergonomic or you do care and you’ll use an optimized library.

You’ll never need to move to protobufs due to parsing speed.

rurban · on May 28, 2019

> however parsing is likely not going to dominate your application, so other factors should influence your decision instead.

Exactly. If you need performance, you'll use Cap'n Proto or FlatBuffer, which use a native binary interface, so you don't need to create and copy objects, you just map them them in from IO.

apitman · on May 29, 2019

Sadly, capnproto's awesome RPC system doesn't appear to support streaming, though I do believe gRPC supports FlatBuffers.

kentonv · on May 29, 2019

> Sadly, capnproto's awesome RPC system doesn't appear to support streaming

Sure it does. You can implement "streaming" in Cap'n Proto by introducing a callback object, and making one RPC call for each item / chunk in the stream. In Cap'n Proto, "streaming" is just a design pattern, not something that needs to be explicitly built in, because Cap'n Proto is inherently far more expressive than gRPC.

That is, you can define a type like:

    interface Stream(T) {
      write @0 (item :T);
      end @1 ();  # signal successful end of stream
    }

Then you can define streaming methods like:

    streamUp @0 (...params...) -> (stream :Stream(T), ...results...)
    # Method with client->server stream.

    streamDown @0 (...params..., stream :Stream(T)) -> (...results...)
    # Method with server->client stream.

Admittedly, this technique has the problem that the application has to do its own flow control -- it has to keep multiple calls in-flight to saturate the connection, but needs to place a cap on the number of calls in order to avoid excess buffering. This is doable, but somewhat inconvenient.

So I am actually in the process of extending the implementation to make this logic built-in:

https://github.com/capnproto/capnproto/pull/825

Note that PR doesn't add anything new to the RPC protocol; it just provides helpers to tell the app how many concurrent calls to make.

apitman · on June 1, 2019

Interesting. I do tend to favor protocols that are a bit lower level and more flexible for composing higher-level functionality. Though this does sound pretty complicated to implement using only what capnproto offers right now. Would there be a way to jury rig "request(n)" backpressure as described in reactive streams[0] (also implemented by RSocket[1]) on top of capnproto? That's what I'm using for omnistreams and it's proven very simple to reason implement and reason about.

[0] https://github.com/reactive-streams/reactive-streams-jvm

[1] http://rsocket.io/

[2] https://github.com/omnistreams/omnistreams-spec

apitman · on June 1, 2019

Actually the more I think about it I don't think it would work, since request(n) assumes the producer can send multiple messages for each request.

kiliancs · on May 27, 2019

> When you use a microservice-style architecture, one pretty fundamental decision you need to make is: how do your services talk to each other?

Problems I don't have when using Erlang/Elixir umbrella apps + OTP.

techslave · on May 27, 2019

answer: for trivial reasons. too bad he didn’t dig deeper.

781 · on May 27, 2019

Does anybody remember reading an article along the line "we use X because it's cool and trending and we are cool people?" Not saying it's the case here, but did anybody honestly admitted in an article that they used a technology because it makes them look cool?

I do remember reading quite a lot of articles about the inverse of this: "we don't hire people using Windows/IDEs/ because it says a lot about them, a craftsman should chose his tools wisely, ...", but never the positive.

stephenr · on May 27, 2019

I got a very strong “kool aid” vibe from this.

I don’t remember any “we don’t hire people on Windows/using an IDE” (the last part would be particularly weird IMO), but I wouldn’t be surprised if somewhere said “if you want to use Windows you’re on your own (support wise) and if it becomes a time sink you switch or find work elsewhere”.

I’ve supported (in terms of dev environment/tooling) people on Macs, Windows and Linux. Windows by far had the weirdest issues to solve/avoid.

781 · on May 27, 2019

> I don’t remember any “we don’t hire people on Windows/using an IDE”

Two famous examples:

http://charlespetzold.com/etc/DoesVisualStudioRotTheMind.htm...

> If you are a startup looking to hire really excellent people, take notice of .NET on a resume, and ask why it’s there.

https://blog.expensify.com/2011/03/25/ceo-friday-why-we-dont...