I emit logs. I mentioned that in my above comment as well.
> So your code assumes that the network/storage/DB are fault-free?
No, it does not assume that the network/storage/DB are fault-free, completely. It assumes they are fault free with the SLA limits provided by your Cloud provider. The software is written in a way that enables the platform to know about errors when then happen and remediate them. Like, when you build a website, you make the app serving layer stateless and choose a backend datastore that is replicated across regions and is highly available (like Cloud Datastore or Spanner). If your platform detects that a disk failed and your app is returning errors, then that instance is killed and an another instance is brought up. Very similar mechanisms exist for the above-mentioned data services as well for auto scaling, sharding, ...
If your Cloud provider can not guarantee the SLA's or shows a green sign even when the service is down, IMO they are not competent enough.
I emit logs. I mentioned that in my above comment as well.
> So your code assumes that the network/storage/DB are fault-free?
No, it does not assume that the network/storage/DB are fault-free, completely. It assumes they are fault free with the SLA limits provided by your Cloud provider. The software is written in a way that enables the platform to know about errors when then happen and remediate them. Like, when you build a website, you make the app serving layer stateless and choose a backend datastore that is replicated across regions and is highly available (like Cloud Datastore or Spanner). If your platform detects that a disk failed and your app is returning errors, then that instance is killed and an another instance is brought up. Very similar mechanisms exist for the above-mentioned data services as well for auto scaling, sharding, ...
If your Cloud provider can not guarantee the SLA's or shows a green sign even when the service is down, IMO they are not competent enough.