Hacker News new | past | comments | ask | show | jobs | submit login

A few other ones that come up regularly:

- Servers/applications never restart

- Startup order is fully deterministic

- All instances run the same software revision (maybe somewhat covered by 8)

- Hardware is reliable

- All clients are playing nicely along (and I'm not even talking about attacks)

- Logging and metrics are cheap

- There are no software bugs in any layer




> Startup order is fully deterministic

It still is for the minority of systems.

On single OSs Google runs an init that is deterministic; as do the BSDs and some distributions of Linux.

For distributed systems your ops team controls for this; usually on the service discovery layer. (Don’t publish until ready, don’t start service until you can establish a connection to required endpoints).


Many people think it is - but the realitiy is that it depends a lot on the system one is working on, can be super tricky, and often relying on it will cause some pain points later on.

e.g. let's take the example of an init system starting up all processes. Now what happens if a if one of the processes crashes and gets restarted by a processmanager? Now the order already changed, and e.g. a former process which relied on the restarted one might work based on outdated data. Similar things can also happen on other layers - e.g. one of the services in a dependency chain might disappear and reappear.

Another example is a developer/administrator manually changing the config of a certain service and restarting it to take effect - that could also trigger dependency problems.

Now those are absolutely solvable - either by making sure all services operate gracefully with any startup order or by other mitigations (e.g. "always reboot the full box"). But like everything else in the list, it still is a problem that is observed in very often in distributed systems.


These are fallacies of distributed systems, most of these are the reasons you want to have a distributed system. You build distributed systems because you know - Servers restart, so you replicate and load balance - Dependencies matter, so you build circuit breakers, and health checks - Infra is heterogeneous, this is why you use containers - Hardware is unreliable, again, replicate, load balance, and HA - Provide APIs for your clients to communicate with your service - Logging and metrics are expensive, collect what you need, prom helps with this, logging is not fully solved - Do people really think there are no bugs?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: