Hacker News new | past | comments | ask | show | jobs | submit login

I think you are misreading that comment. He’s saying that stubby is fast, not grpc. In fact performance is the big unknown with grpc adoption within google. It definitely isn’t on par with stubby today and it has to get there before anyone significant will switch to it.



I can't comment on raw numbers (because I simply don't have them) but at least for the service I work on, replacing Stubby with gRPC wouldn't really move the needle even if it was 2-3x slower (it might be faster, this is just for illustration) -- we spend our time waiting on IO from other services or crunching numbers in the CPU. Being a Java service, gRPC/Java might well be just as fast or faster than Stubby Java, but I could understand that Stubby C++ has been hyperoptimized over the years vs. gRPC C core which might have a ways to go. By the latest performance dashboard [1, 2], gRPC/Java is leading the pack but gRPC C++ doesn't seem like it's slouching too much either. I seem to remember the C++ impl crushing Java at performance a while back, so I'm sure that'll change in the future.

Honestly though? It'd take a _very_ demanding workload such that your RPC system was the bottleneck (so long as they're within constant factors of each other). There are services like that, but they're the exception and not the norm. Most services don't need to do 100kQPS/task. Even then, at that point you're spending a lot of time on serialization/deserialization, auth, logging, etc.. Your service is more than its communication layer, even if that's important to optimize it's still just a minor constant factor.

The real problem is inertia. There's a lot of code/tools/patterns built up around Stubby and the semantics of Stubby (including all its features which likely haven't been ported to gRPC yet) and that's difficult to overcome.

Our #1 use of gRPC so far I would imagine is at the edge. gRPC is making its way into Android apps since it's pretty trivial for translating proxies to more or less 1:1 convert gRPC to Stubby calls.

[1] https://performance-dot-grpc-testing.appspot.com/explore?das...

[2] https://performance-dot-grpc-testing.appspot.com/explore?das...


You and I seem to be using a different denominator to quantify "most" services. I'm thinking of it as "most" in terms of who has all the resources / budget. You seem to be thinking of it in terms of sheer number of services or engineers working on them. The fact is that the highly demanding services have the huge majority of the resources, and are the most sensitive to performance issues. If your service uses 10% of Google's datacenter space, you won't accept a 5% or even 1% regression just so you can port to gRPC, because at that scale your team can just staff someone or even several people to maintain the pre-gRPC system forever and still come out ahead on the budget.

Totally agree that world-facing APIs will all be gRPC and that makes perfect sense to me.


> You seem to be thinking of it in terms of sheer number of services or engineers working on them.

I'm not sure where I said that, but yes, that's part of the switching cost.

> The fact is that the highly demanding services have the huge majority of the resources, and are the most sensitive to performance issues. If your service uses 10% of Google's datacenter space, you won't accept a 5% or even 1% regression just so you can port to gRPC,

The thrust of my statement was that for many services, RPC overhead is minimal. So even a 2x or 3x increase in RPC overhead is still minimal. I agree, a 5% increase in resource utilization for a large service is something that would be weighed. But lets explore that idea for a moment:

> because at that scale your team can just staff someone or even several people to maintain the pre-gRPC system forever and still come out ahead on the budget.

Not necessarily. Engineers are expensive and becoming ever more expensive while computing resources are becoming increasingly cheaper. Not only that, but engineers tend to be more specialized and so you can't just task anyone to maintain the previous system, it tends to be people with deep expertise already. And those people also have career aims to do more than long-term support of a deprecated system, so there's retention to be considered.

Pretending for a moment that all your services except a small handful moved on to somme system B from some system A, if the maintenance burden of maintaining system B starts to eclipse the resource cost of moving to system A (which decreases all the time due to improvements in system B and the increasing cost of maintaining system A, and the monotonic reduction in computing resource cost), then you might well just swallow the 5%-10% increase in resources either permanently or temporarily and come out ahead in the end.

Additionally, as system B moves on, staying on system A becomes increasingly risky: security improvements, features, layers which don't know about system A anymore all threaten the stability of your service. If you've checked out the SRE book, you'll know that our SLOs are more important than any one resource. If nobody trusts your service to operate, then they won't use it and then you won't have to worry about resources anymore since the users will have moved on.

> because at that scale your team can just staff someone or even several people to maintain the pre-gRPC system forever and still come out ahead on the budget.

To reiterate the point above, these roles tend to be fairly specialized and hard to staff. Arguably these same engineers are better tasked making system B good enough to switch to so you can thank system A for its service and show it the door.

Bringing this back to Stubby vs. gRPC, it's a pretty academic argument so far. They're both here to stay. And honestly, when we say "Stubby" there's already different versions of Stubby which interoperate with each other and gRPC will not be any different. Likewise, we still use proto1 in addition to proto2 and proto3 (the public versions) since that just takes time and energy to fix.

We do make these kinds of decisions every day, and it's not always in favor of reduced resources. If we cared for nothing other than resource utilization, we'd be completely C++, no Java, no Python. Realistically, the cost of maintaining systems with equivalent roles can often lead to one or the other winning out, usually in favor of maintainability so long as their feature sets are roughly equivalent. We're fortunate to be in a position that we can choose code health and uniformity of vision over absolute minimum resource utilization. And again, even if we choose system B (higher resources) over system A, perhaps due to the differences in architecture or design choices the absolute bar for performance of that system will be greater than system A, despite starting lower. Sometimes it takes a critical mass of adopters to really shake out all those issues.

I know that quotes from Knuth are often trotted out during these kinds of discussions, but it's true: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

That 3% is where we choose to spend our effort, and that critical 3% includes the ability of our engineering force to make forward progress and not be hindered by too much debt. It also includes real data, check our Google Wide Profiling [1].

> Totally agree that world-facing APIs will all be gRPC and that makes perfect sense to me.

Probably not all. We still fully support HTTP/JSON APIs, but at least in our little corner of the world we've chosen to take full advantage of gRPC.

Anyways, thanks for letting me stand on my soapbox for a bit.

[1] https://storage.googleapis.com/pub-tools-public-publication-...


Interesting that you allude this the coexistence of C++, Java, Python, and Go because I think this bolsters my point. The overwhelming majority of services at Google are in C++. There are individual C++ services that consume more resources than all Java products combined. I think this speaks to the appetite for performance and efficiency within the company, since it is demonstrably the most difficult of these languages.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: