There is no way that you could record metrics - even custom metrics that get populated via the CloudWatch logs agent to CloudWatch and over a certain threshold of errors, bring another instance up and kill the existing instance? If you could detect sporadic errors there must be some method to automated it.
I’m assuming this isn’t a web server, if so it’s even simpler.
A statistical rule moves you into the realm of deciding what rate of false positives and false negatives you'll tolerate. Based on data from exactly two incidents in this case, which is obviously a bit fraught.
I’m assuming this isn’t a web server, if so it’s even simpler.