Hmm, I don't see how that changes with longer/stricter deploy processes - unless you have some of the tooling around that makes CD very possible in the first place (automated checks/etc.)
I've certainly worked in places with very long and strict deploy processes that managed to mangle production data frequently. Even worse, because the deploy process was so strict and long the bad code managed to stay on production for much longer than 10 minutes (the deploy time mentioned in the article).
There's some vague notion out there that long deploy process == safe, but there's very little evidence to suggest that's the case. If anything, it seems much more dangerous because larger changesets are going out all at once.
It goes back to my original comment above; if you have the proper tooling (tests executed that must pass prior to deploy given a green light, green/blue deploys, canaries, automated datastore snapshots/point in time recovery, granular control of the deployment process), I think continuous deployment provides a great deal of value above what you've invested into the process. But that investment is critical if you've bought into CD. Otherwise, it's "deploy and pray".
Sure, and I guess my point is: If you haven't invested in those things, waterfall-esque deploy processes are just as bad and perhaps even worse because there's more chance for confounding changes to cause a nasty error.
The only reason waterfall-esque deploy processes work without those things is because companies often waste tons of people-hours on testing things out in the staging environment (which requires time, obviously).
The thing you're missing is that you're amortizing the cost. Yeah, it's typically prohibitive to run the manual testing on every CL. However, if you have any manual testing you need to run, then at some point you have to batch the changes & test them out together anyway. Automated tests don't necessarily solve this problem either. 1) Some automated tests can be time-consuming & so require batching of CLs to run too 2) it's impossible to predict if you are going to catch all issues via automated testing 3) there's always things it's easier to test for manually.
When it comes to data integrity, I would think you need a structured mechanism (at least for larger teams that have a high cost for failure) for rolling back any given CL either by tracking writes, making sure to have a plan in place to recover from any given CL (e.g. nuking the data doesn't break things), being able to undo the bad writes, or just reverting to a snapshot. Without being careful here CD-style development feels like lighting up a cigarette beside an O2 tank. Now for web development this is fine since it's not touching any databases directly. More generally it feels like a trickier thing to attempt everywhere across the industry.
> However, if you have any manual testing you need to run, then at some point you have to batch the changes & test them out together anyway.
Wait, why is that? Manual testing should be reserved for workflows that can't be automatically tested (or at least, aren't yet).
I'm not sure I see why doing any amount of manual testing would necessitate manually testing everything.
> Some automated tests can be time-consuming & so require batching of CLs to run too
I'm not sure I see why this is a problem, and CD certainly doesn't require that only one changeset go live at a time.
> it's impossible to predict if you are going to catch all issues via automated testing
This is also true of manual testing.
> there's always things it's easier to test for manually.
I'd go further and say it's almost always easier to test manually, but the cost of an automated test is definitely amortized and you come out ahead at some point. That point is usually quicker than you think.
> I would think you need a structured mechanism...
This paragraph is entirely true of traditional deploys with long cadences as well. The need (or lack thereof) for very formal and structured mechanisms for rolling back deploys doesn't really have much to do with the frequency that you deploy.
> Now for web development this is fine since it's not touching any databases directly.
Maybe we're speaking about different things here, but the trope about web development is that it's basically a thin CRUD wrapper around a database, so I'm not sure this is true.
> Wait, why is that? Manual testing should be reserved for workflows that can't be automatically tested (or at least, aren't yet). I'm not sure I see why doing any amount of manual testing would necessitate manually testing everything.
I never said you need to manually test everything. This is about continuous deployment where typically a push to master is the last step anyone does before the system at some point deploys it live shortly thereafter. In the general case, however, how do you know in an automatic fashion if a given CL may or may not need manual testing? If you have any manual testing then you can't just continuously deploy.
> This is also true of manual testing.
I never opined that there should only be one or the other exclusively so I don't know why you're building this strawman & arguing it throughout. A mix of automatic & manual testing is typically going to be more cost effective for a given quality bar (or vice-versa - for a given cost you're going to have a higher quality) because (good) manual testing involves humans that can spot problems that weren't considered in automation (you then obviously improve automation if you can) or things automation can't give you feedback on (e.g. UX issues like colors not being friendly to color-blind users).
> The need (or lack thereof) for very formal and structured mechanisms for rolling back deploys doesn't really have much to do with the frequency that you deploy.
That just isn't true. If you're thorough with your automatic & manual testing you may establish a much greater degree of confidence that things won't go catastrophically wrong. You deploy a few times a year & you're done. Now of course you should always do continuous delivery so that to the best of your ability you ensure in an automated fashion an extremely high quality bar for tip of tree at all times so that you're able to kick off a release. Whether that translates into also deploying tip of tree frequently is a different question. Just to make clear what the thesis of my post is, I was saying continuous deployment is not something that's generally applicable to every domain (continuous delivery is). If you want an example, consider a FW release for some IoT device. If you deployed FW updates all the time you're putting yourself in a risky scenario where a bug bricks your units (e.g. OTA protocol breaks) & causes a giant monetary cost to your business (RMA, potential lawsuits, etc). By having a formal manual release process where you perform manual validation to catch any bugs/oversights in your automation you're paying some extra cost as insurance against a bad release.
> Maybe we're speaking about different things here, but the trope about web development is that it's basically a thin CRUD wrapper around a database, so I'm not sure this is true.
The frontend code itself doesn't talk do the DB directly (& if it does, you're just asking for huge security problems). The middle/backend code takes front-end requests, validates permissions, sanitizes DB queries, talks to other micro-services etc. Sometimes there are tightly coupled dependencies but I think that's rarer if you structure things correctly. Even FB which can be seen as the prototypical example of moving in this direction no longer pushes everything live. Things get pushed weekly likely to give their QA teams time to manually validate changes for the week, do staged rollouts across their population to catch issues, etc.
In general I think as you scale up continuous deployment degrades to continuous delivery because the risks in the system are higher; more users, more revenue, more employees & more SW complexity means the cost of a problem go up as does the probability of a catastrophic problem occurring. When I worked at a startup continuous deployment was fine. When I worked at big companies I always push for continuous delivery but continuous deployment would just be the wrong choice & irresponsible to our customers.
I've certainly worked in places with very long and strict deploy processes that managed to mangle production data frequently. Even worse, because the deploy process was so strict and long the bad code managed to stay on production for much longer than 10 minutes (the deploy time mentioned in the article).
There's some vague notion out there that long deploy process == safe, but there's very little evidence to suggest that's the case. If anything, it seems much more dangerous because larger changesets are going out all at once.