Arguably, this logic should live in another place that monitors the service. Esp...

otterley · on Jan 19, 2024

systemd is a service monitor. It wouldn't be nearly as useful if it wasn't!

gizmo686 · on Jan 19, 2024

From the servers perspective, external problems typically do get fixed on their own. It is nice when resolving the primary issue is sufficient to fix the entire system; instead of needing to resolve the primary issue; then fix all the secondary and tertiary issues.

At my work, we have a simple philosophy for this. The tester is allowed to (on the test system): toggle servers' power; move around network cables; input bad configuration; etc; in any permutation he wants. So long as at the end of the exersise everything is setup correctly the system should function nominally (potentially after a reasonable delay).

There should, of course, be a system level dashboard that notifies someone there is a problem; but that is unrelated to the server internal retry logic.