The Calculus of Service Availability

nickpsecurity · on May 25, 2017

"Thus far, this article has established what might be called the "Golden Rule of Component Reliability." This simply means that any critical component must be 10 times as reliable as the overall system's target, so that its contribution to system unreliability is noise. It follows that in an ideal world, the aim is to make as many components as possible noncritical. Doing so means that the components can adhere to a lower reliability standard, gaining freedom to innovate and take risks."

In information security, the pioneers discovered the concept of the Trusted Computing Base. They noted that systems had a lot of attack surface. Problems would show up everywhere. Verification cost went up and feasibility down as size and complexity of the system increased. The solution was to design systems where you could trust one or a few components to ensure the security of the system while all others got breached. That's the TCB. It was required to be NEAT: Non-bypassable, Evaluable, Always-Invoked, and Tamper-proof.

http://www.landwehr.org/1983-bats-ieee-computer-as.pdf

High-availability engineers came up with similar concepts like redundant systems w/ voters that made much of the hardware untrusted. Language designers' TCB is their type system and runtime [if any] far as language itself. For proof engineers, it's a tiny, proof checker that can spot problems in complex, proof assistants. For distributed databases, most of it is how storage is handled, and the protocols. This pattern should be remembered since it keeps popping up over and over.