Hacker News new | past | comments | ask | show | jobs | submit login

You can’t have it both ways. If you need to monitor it and take corrective action (which you do) then you shouldn’t rely on it.

This is an argument for making your liveness probe == readiness probe. It should just check pod availability in a minimal way, and if continuing to send the pod traffic based on this indicator turns out bad because of congestion, you want to see that causing errors and react, not let the scheduler take it out of service for new traffic.

You want liveness & readiness to check the same thing, and it should be a non-trivial check of service health that is also very low latency. And as long as that check is passing, keep sending traffic.

When the check fails, it should always be for a “hard down” reason that tells you the pod could not, regardless of traffic levels, accept traffic because it’s fundamentally internally down.




I don't want the pager to go off just because of some slight non-liveness. That's a likely outcome of high utilization (usually viewed as a good thing, isomorphic with low cost). If you're running really hot and a few tasks are shedding load by playing dead intermittently, that's OK up to a point; if a large portion of pods are doing that at a high rate, that might be bad. You might not even alert on it, just throw it up on a dashboard as informative indicator for operators.


> “ I don't want the pager to go off just because of some slight non-liveness.“

That’s just bad engineering. Really, one should want the pager to go off for that and be really pedantic to actually sniff out the root cause and actually fix it.

Hiding that type of issue by letting something like liveness/readiness policy tacitly conceal it is just going to result in a far worse or more systemic issue later with far worse pager disruptions to your life.

You’re skipping flossing every now and then only to need serious root canals later.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: