Enterprise firmware, when it has a soft fail, just croaks as fast as possible. That lets the raid array hurry up and do its thing, or maybe even higher level replication do its thing.
Except that also only works "most of the time".
Anyone who works at scale with drives and (RAID-)Controllers
knows that even "enterprise" drives can and do take
entire controllers down.
It's not uncommon to lose a full set of
daisy chained JBODs to a single disk acting funny.
Consequently, and since storage clusters have to be
redundant at the node-level anyway, it makes a lot
of sense to skip the markup for enterprise firmwares
and instead design for quick node failure detection
and ejection (short timeouts).
Except that also only works "most of the time".
Anyone who works at scale with drives and (RAID-)Controllers knows that even "enterprise" drives can and do take entire controllers down.
It's not uncommon to lose a full set of daisy chained JBODs to a single disk acting funny.
Consequently, and since storage clusters have to be redundant at the node-level anyway, it makes a lot of sense to skip the markup for enterprise firmwares and instead design for quick node failure detection and ejection (short timeouts).