This is a good question, and I think the article doesn't cover this topic well.
From the article:
> The other one is to tell if during a pod's life the pod becomes too hot handling too much traffic (or an expensive computation) so that we don't send her more work to do and let her cool down, then the readiness probe succeeds and we start sending in more traffic again.
Well... maybe. Is it a routine occurrence that an individual Pod becomes "too hot"? If your load balancer can retry a request on, say, a 503 Service Unavailable, you may be better off relying on that retry combined with CPU-based autoscaling to add another Pod (it's simpler, tradeoff is the load balancer may spend too much time retrying).
If you can't or don't want to add additional Pods, then your client is going to see that 503 (or similar) anyway. I'd say, then, that the point of a Pod claiming it's "not ready" to get itself removed from the load balanced pool is to allow the load balancer to more quickly find an available Pod, but this adds complexity and may be irrelevant if you run enough Pods to have some overhead capacity.
A Rails app is a bit different from a node/go/java app in that (typically at least, if you're using Unicorn or other forking servers) each individual Pod can only handle a limited number of concurrent requests (8, 16, whatever it is). It's more likely then that any given Pod is at capacity.
But, liveness/readiness are not so simple. If these probes go through the main application stack, then they're tying up one of the precious few worker processes, even if only momentarily. I haven't worked with Ruby in a number of years, but I remember running a webrick server in the unicorn master process, separate from the main app stack, to respond to these checks. But I did not implement a readiness check that tracked the number of requests and reported "not ready" if all the workers are busy.
For the readiness probe a simple endpoint that returns 200 is enough. This tests your service’s ability to respond to requests without depending on any other dependencies (sessions which might use Redis or a user auth service which might use a database).
For liveness probe I guess you could check if your service is accepting TCP connections? I don’t think there should ever be a reason for your service to outright refuse connections unless the main service process has crashed (in which case it’s best to let Kubernetes restart the container instead of having a recovery mechanism inside the container itself like supervisord or daemon tools).
> For the readiness probe a simple endpoint that returns 200 is enough. This tests your service’s ability to respond to requests without depending on any other dependencies (sessions which might use Redis or a user auth service which might use a database).
If the underlying dependencies aren't working, can a pod actually be considered ready and able to serve traffic? For example, if database calls are essential to a pod being functional and the pod can't communicate with the database, should the pod actually be eligible for traffic?
> Do not fail either of the probes if any of your shared dependencies is down, it would cause cascading failure of all the pods.
The idea would be that the downstream dependencies have their own probes and if they fail they will get restarted in isolation without touching the services that depend on them (that are only temporarily degraded because of the dependency failure and will recover as soon as the dependency is fixed).