We run a very large installation 100% on spot and have done for a few years. We ...

inopinatus · on Oct 22, 2022

There is a tale - perhaps apocryphal - handed down between generations of AWS staff, of a customer that was all-in on spot instances, until one day the price and availability of their preferred instances took an unfortunate turn, which is to say, all their stuff went away, including most dramatically the customer data that was on the instance storages, and including the replicas that had been mistakenly presumed a backstop against instance loss, and sadly - but not surprisingly - this was pretty much terminal for their startup.

Caveat operator.

(I’m sure parent commenter is either not exposed to this scenario or has otherwise mitigated against it)

phamilton · on Oct 22, 2022

We've worked closely with our team at AWS to ensure we are following best practices. The consensus has been that 4+ AZs and 12 instance types is sufficient diversification.

We also have a second, on demand, ASG ready to fire up at a moments notice if something were to happen with capacity.

We also heavily leverage managed services for state.

cypress66 · on Oct 22, 2022

But wouldn't the rds snapshots or whatever still be there? I don't understand why this caused data loss.

inopinatus · on Oct 22, 2022

There is no RDS in this tale. All their data was on EC2 spot instance storage.

halfmatthalfcat · on Oct 22, 2022

Absolute yikes.

Moissanite · on Oct 21, 2022

Have you observed metal instances taking longer to boot? I did last time I checked, and the difference was big enough to affect pricing in a non-trivial way, given that performance is the same and that you start paying immediately.

phamilton · on Oct 22, 2022

This is a good point. They do take longer to boot, which might be part of the reason there's a discount, but it hasn't been so significant that we avoid them because diversification is important when running on spot.

khuey · on Oct 23, 2022

Yes they take significantly longer to boot.