"Expect bad deploys: Bad changes will get out, but that's okay. You just need to...

nbm · on April 12, 2016

As long as you're not in the spacefaring, automotive, banking/insurance, and medical industries, it's probably the case that it's acceptable to have some downtime and bugs - nobody will die or have their livelihood destroyed by it.

Given this, your confidence threshold for a release is not approaching 100%, it's hitting some "good enough" value, where the work you're doing to test for the next 1% is 2x of the testing you're doing now and is "not worth it". As you burn through some sort of error/downtime budget, you'll adjust that level of confidence - as you have more problems, and take more time with responding to problems.

Continuous deployment's upside is a confidence in the release process (since you do it so often), and some assurance that you'll be able to find the problem reasonably fast (since you only have to look through a smaller number of changes). You'll have fewer bigger problems, and more smaller problems. There definitely are cases where 10 smaller downtimes of 5 minutes is worse than 1 larger downtime of one hour, but usually it's better to have the former.

vhata · on April 12, 2016

The point here is that bad changes get out no matter often or rarely how you do your deployment. Everywhere has deployed buggy code. Doing rapid deploys simply decreases the amount of time it takes to recover from that.

kisstheblade · on April 13, 2016

" I wonder how well "sometimes we break things" scales with users of smaller services" ---

Writing "business software" I have noticed that this doesn't scale at all. I mean when you have a couple of thousand people depending on the software for work bugs are really not tolerated that well.

It's probably different if you have hundreds of servers and can detect bugs on a deployment on one of them, so it only affects a small percentage of users and then you can roll back and try again. But if you have a single installation and you break that all the time with your commits then it probably doesn't work so good. And for the majority of software you really do not need "webscale" installations with millions of "heroku boxen" or droplets etc... Sure have some for redundancy but it really doesn't help with this "deploy master on each commit" type of deal.

wolframarnold · on April 13, 2016

That depends on the service. If you can afford outages that may be fair game. But if you have a high traffic service that's running hundreds or thousands of hosts, you can't take them all offline at once. Deploys can take hours, so can rollbacks. In that situation with high SLA requirements you can't really "expect" bad deploys.