Hacker News new | past | comments | ask | show | jobs | submit login

"Expect bad deploys: Bad changes will get out, but that's okay. You just need to detect this quickly, and be able to roll back quickly."

That's an amazing statement to me. I've always worked in smaller environments where we roll up many changes and try to deploy them perfectly. The penalty for bad changes has been high. This is a really new way of thinking.

It's an exciting way of thinking, but I'm not sure I love it. I wonder how well "sometimes we break things" scales with users of smaller services. I guess the flip side is that "we often roll out cool new things" definitely is desirable to users of small services.




As long as you're not in the spacefaring, automotive, banking/insurance, and medical industries, it's probably the case that it's acceptable to have some downtime and bugs - nobody will die or have their livelihood destroyed by it.

Given this, your confidence threshold for a release is not approaching 100%, it's hitting some "good enough" value, where the work you're doing to test for the next 1% is 2x of the testing you're doing now and is "not worth it". As you burn through some sort of error/downtime budget, you'll adjust that level of confidence - as you have more problems, and take more time with responding to problems.

Continuous deployment's upside is a confidence in the release process (since you do it so often), and some assurance that you'll be able to find the problem reasonably fast (since you only have to look through a smaller number of changes). You'll have fewer bigger problems, and more smaller problems. There definitely are cases where 10 smaller downtimes of 5 minutes is worse than 1 larger downtime of one hour, but usually it's better to have the former.


The point here is that bad changes get out no matter often or rarely how you do your deployment. Everywhere has deployed buggy code. Doing rapid deploys simply decreases the amount of time it takes to recover from that.


" I wonder how well "sometimes we break things" scales with users of smaller services" ---

Writing "business software" I have noticed that this doesn't scale at all. I mean when you have a couple of thousand people depending on the software for work bugs are really not tolerated that well.

It's probably different if you have hundreds of servers and can detect bugs on a deployment on one of them, so it only affects a small percentage of users and then you can roll back and try again. But if you have a single installation and you break that all the time with your commits then it probably doesn't work so good. And for the majority of software you really do not need "webscale" installations with millions of "heroku boxen" or droplets etc... Sure have some for redundancy but it really doesn't help with this "deploy master on each commit" type of deal.


That depends on the service. If you can afford outages that may be fair game. But if you have a high traffic service that's running hundreds or thousands of hosts, you can't take them all offline at once. Deploys can take hours, so can rollbacks. In that situation with high SLA requirements you can't really "expect" bad deploys.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: