Adopting Feature Flag-Driven Releases

alaithea · on Aug 3, 2016

It's funny that this is just becoming a thing. I heard about flag-driven releases from somebody in DevOps at Facebook last year, and how integral they were to their release process there.

I ran my own SaaS as a self-taught, solo developer from about 2002-2008, and I used to insert feature flags all the time to test stuff out. Later, when I joined the corporate world and learned more industry standard best practices, I assumed feature flags were just another bad practice I needed to leave behind with my old PHP project. But come to think of it, it was very convenient to:

- test things on a small scale live,

- preview features with the biz folks,

- and it was almost always very smooth to remove the flag and make a feature live, because it was already part of the program's overall control flow.

Funny the things you come upon out of ignorance, versus the drive to do things some way because that's the way everybody does them, and the good/innovative ideas that may fall through the cracks in the rush from the former to the latter.

raarts · on Aug 3, 2016

Right. Feature flags are nothing new. Have them been using myself since forever.

The use of them in CD, in A/B, blue /green, or gradual rollout scenarios is very convenient though, that probably explains the popularity.

groovy2shoes · on Aug 4, 2016

I work at a big EDA corporation, and we make good use of feature flags in our product. Typically, any new feature that needs extra testing than usual will be included behind a gate for one or two major releases before being enabled by default. Then, certain customers who work closely with our support staff will be given "access" to the flag (typically, a beta license is needed before the flag can be enabled -- in some cases, the presence of a valid beta license will even enable new features automatically sans flag). This makes it very easy to roll out beta code: since it's included in the release, every customer already has the code, and beta testers just need to be told about a flag and/or given a beta license. We don't ever need to prepare a special beta release or anything like that.

There are other cases where feature flags come in handy, too. For example, sometimes our default behavior does not match our competitor's. For many customers, that's totally fine, it either doesn't matter or they just deal with it. Some customers, however, demand that they get the same behavior as the competitor's product for certain functionality. We can implement a feature that toggles a compatibility mode and stick it behind a flag. These are flags that will typically never be enabled by default, but they make some of our customers very happy. For certain strategic reasons, we obviously prefer that customers just get used to our behavior. (Not to mention that, from a purely technical standpoint, our behavior often makes much more sense anyway. But it's sorta like C: Rust might be technically superior, but C is so entrenched that it's not going to get replaced overnight.)

I think I came from an opposite boat than you did, because this is the first company I've worked for that made extensive use of feature flags. Rather than assuming it was some bad practice, I was actually delighted by just how convenient they make certain things. And the best part is that they don't seem to really have any drawbacks.

pjbster · on Aug 4, 2016

I dabbled with feature flags once. We had... issues.. with our DBAs who, it seemed, couldn't make any changes to a database without taking it offline for a cold backup first. We floated the idea of feature flags as one way to perform certain live deployments. The ops guys refused outright for reasons which stank of protectionism at the time.

Time passed. I was tasked with rewriting a fairly complex piece of code and I put the new code behind a feature flag. At the time, there was no facility in place to support feature flags so I had to introduce one as part of the deployment. It all went into a test environment but the deployment scripts didn't set the config to enable it. This was deliberate because I wanted to verify that we wouldn't change existing behaviour (i.e. break anything) by deploying with the flag disabled. The testers, however, were looking for the new functional behaviour and raised a defect.

- when the project manager found out, he claimed that this was responsible for a timescale slippage (the delivery into test was massively late, naturally)

- when my line manager found out, he told me I'd be fired if I tried a "stunt" like that again

- when the DBAs found out, they decreed that, henceforth, all database updates which resulted in a "functional change" would be subject to the full release process (i.e. cold backups).

So the experiment was a total failure. Or a total success. It depends on how you look at it; one amusing moment was when the PM came out of a meeting and, with some gravitas, suggested that we might have to rip out half the new code in an effort to salvage the timescales. When I demonstrated that we could achieve that in, oh, about 5 seconds by flipping the flag, the notion went away. The guy was a contractor and I suspected he was just trying to create more work to keep himself around.

To me, the biggest question with feature flags is managing the removal of dead code. In this case, it was never removed because the task of deleting it would trigger a full project release cycle and the company didn't see the point of spending the money.

Ntrails · on Aug 4, 2016

We used feature flags to update "live" code but using if conditions based on hard coded test user/client ids. I wrote everything necessary to respect a database flag instead but that never made it out of a test db whilst features kept being added behind "id" based flags.

So now to remove said flags you just hunt through code for the special id string and remove the "no" branches. Oh, and for Gods sake don't accidentally make the wrong thing live.

Flawless.

dawnerd · on Aug 4, 2016

We used feature flags a lot a few years ago when I worked for ehow/demand media. I don;t really remember what exactly they controlled but there was quite a number of them.

ilikerashers · on Aug 3, 2016

Feature flags should be used judiciously. They are better suited to frontend-type changes, enabling a button or something visible.

Flags to enable a web service call or the like is clutter most developers can live without.

I find architects perceive them as the perfect fix for release management. To me, they result in too much config bloat with minimal value.

pkaeding · on Aug 3, 2016

I agree that feature flags, like any tool, should only be used where it makes sense. I disagree that they should only be used for visual changes though. Often, backend changes can have performance impacts that are only visible at scale, so being able to roll something out gradually can help there.

Also, a flag can be used as a permanent control to disable a part of the system (like an external call) if that part of the system is having problems. Imagine if you had a github activity feed integration, and you wanted to be able disable that feature when a github outage caused that call to take a very long time. Being able to easily disable that feature without needing to change your code can be immensely useful.

nb- I work for LaunchDarkly.

yclept · on Aug 3, 2016

How exactly does the api work? Is every call to check a flag making a HTTP request to LD or does the ld client sync periodically?

pkaeding · on Aug 3, 2016

No, every call to check a flag is not making any sort of network connection. The flag ruleset (includes details like specific users that should get each variation, or particular user attributes that should indicate a specific variation, or a percentage rollout) is streamed to the SDK in a long-lived background connection.

The SDK maintains these rulesets in memory, and when you need to evaluate a flag for a given user, the ruleset is evaluated. This way, there is no I/O at the time that a rule is evaluated, so there is no real performance impact. At the same time, because of the way we stream the rulesets to the SDK, any rule changes take effect immediately.

https://launchdarkly.com/performance.html has some diagrams, etc, detailing this (sorry for the marketing fluff :)

yclept · on Aug 3, 2016

Great, we will be looking at applying this in a new project of ours, and replace our existing feature flag implementations in other systems if things work out.

lantastic · on Aug 3, 2016

What are the situations where you did not (or would not) opt to use feature flags?

pkaeding · on Aug 3, 2016

I usually skip feature flags for things like bug fixes where the risk that my 'fix' is actually worse than the initial bug is very small, or changes where it cannot make sense to have both modes operating at once. An example of this might be a new billing scheme, where the old system billed by the item, and the new system billed by the pound. There might not be a technical reason why both cannot co-exist, but it would cause marketing/support headaches to have some customers billed one way, and some billed another.

One surprising use case that I have had great success with, though, is in database migrations. I detailed my approach here, if you are interested: http://blog.launchdarkly.com/feature-flagging-to-mitigate-ri...

lantastic · on Aug 3, 2016

Interesting, thanks for the link!

pkaeding · on Aug 3, 2016

No problem! If you happen to be in the SF Bay area (and if you will forgive the self-promotion) I'm going to be presenting this at a meetup in a couple weeks: http://www.meetup.com/AWS-EASTBAY/events/232870012/

sciurus · on Aug 3, 2016

Strongly disagree. Working in operations, most days I don't care what the frontend looks like. But, for example, what if a developer wants to switch to a new recommendation algorithm? That's going to go behind a feature flag so we can ramp it up incrementally and switch back to the old algorithm if there are any problems.

pkaeding · on Aug 3, 2016

Absolutely. In my experience, most unforeseen issues are related to performance at scale with unexpected data (the kind that seems to only ever crop up in production, and is really hard to design a load test for).

Performance issues in the browser are rare, and have nothing to do with the number of users using the application.

hashkb · on Aug 3, 2016

Can you elaborate? It sounds like you've had some bad experiences, would love to hear them. (I've had mostly good experiences.)

ilikerashers · on Aug 3, 2016

Sure. They create friction in a release process.

Flags set via config files require constant updates to said files, not a problem in a small team.

Work in a secure environment where config is controlled? You now need a process for controlling all these new flags.

Have hard dependencies on config (this flag MUST exist)? You tie your application to your config, both now need to be released at the same time.

Code clutter? Yes and it is avoidable. Also flags are not suitable to every type of feature so you're in a half way house anyway. Something which requires a DB change COULD be done via a flag but not worth the effort.

My experiences are mostly around CD pipelines that start breaking because of complexity. Feature flags might suit some application deployments and that's great. In everything I've worked on, they can be replaced by incremental (non-breaking) updates. In that situation anyone can do the release, no deep-feature knowledge required.

sciurus · on Aug 3, 2016

It sounds like you are using a bad tool if feature flag creation or changes require editing a config file. We use Gargoyle at Eventbrite and there is a nice API and UI for making changes.

erichurkman · on Aug 3, 2016

I've seen a few feature flags turn into ways that a company/developer can gloss over fixing a really hard problem.

A personal example: we were developing a new insurance billing workflow. Since it was a huge change, and very risky, and not all health insurance companies make it easy to test, we rolled it out practitioner by practitioner or customer by customer.

We got to about 95% 'enable_improved_billing_flow=True'. The remaining 5% were turning into a huge project to migrate due to the specifics of their practice / their primary insurance companies / multitude of business reasons (those ~5% were an outsized percentage of our revenue). In the end, we ended up having to maintain two billing flows.

sciurus · on Aug 3, 2016

I'm curious: How do you think this rollout would have worked without the use of feature flags, and how would that have been better?

e98cuenc · on Aug 3, 2016

When are those conditionals deleted? I assume after a few conditional features it can be hard to make sure all the possible combinations work well.

What strategy do you use to decide when to permanently enable a feature and simplify the codebase?

swalsh · on Aug 3, 2016

We have an automated system that emails you after 30 days of a toggle being on 100%. After that, it emails you once a week. Soon you get bothered by it enough you take it out. System works.

bengale · on Aug 3, 2016

That's a great idea.

civilian · on Aug 3, 2016

The team I joined uses feature flags like this a lot.

> When are those conditionals deleted?

For us, almost never. A few weeks after the feature flag becomes True everywhere, we just never even look at what would happen if the flag is False. This helps the confounding of multiple flags, but also thankfully multiple flags usually don't collide-- they're all fairly specific changes.

> What strategy do you use to decide when to permanently enable a feature and simplify the codebase?

When I'm refactoring/cleaning-up other stuff.

The really big benefits I see for feature-flags is that:

- You can merge into master faster. Often my feature has added some utility function or refactored something or added some css that the other engineers could (and should) use. So it's great to have the code rejoined sooner.

- Our tester can test on prod, for the few situations where our QA server is not an exact enough replicate of prod.

- It makes it really easy to show marketing/business/TheBoard new features that are _almost_ ready but all our bugs weren't quite finished before boardmeeting/sprintend/etc.

scottlamb · on Aug 3, 2016

>> When are those conditionals deleted?

> For us, almost never.

That's dangerous. I remember an incident in which a team accidentally emptied their production experiment config (which controls which features are on: during ramp-up as a whitelist of enabled users and/or a percentage in (0%, 100%); most experiments in the file had long since reached 100%). Suddenly all their experiments reverted to 0% and they were serving live traffic in a totally untested state. Some of these experiments had been on for literally years and now suddenly were off. The combination of experiments had never been tested. There had also been many changes to the code since most of these had last been disabled. As you might expect, this didn't go well.

(That incident could also be part of a long and sad cautionary tale about the dangers of global configuration files, btw.)

The best practice I've seen is for the configuration to specify an end date for each experiment. If it's still listed in that date, a ticket is automatically filed against the defined "owner" of the experiment. The ticket reminds the owner to finish the experiment, remove the conditionals from the code, do a release, wait until they're confident there will be no roll-back to prior to that release, and then remove the experiment from the config.

jkodumal · on Aug 3, 2016

For temporary feature flags, we try to clean up as soon as possible. As part of the rollout plan for a new feature, we'll have defined what metrics we're trying to achieve, and once we're confident that we've met those marks, we clean up the flag.

We eagerly create "cleanup branches" and corresponding pull requests to remove a flag-- that front-loads the cleanup work and means we aren't context switching back to try and remember how to clean up a flag.

Also, we've found that in practice flags are usually independent-- testing permutations of flags shouldn't typically be necessary. People usually take an analogous shortcut with feature branches-- we test branches in isolation but don't usually re-test after merging.

pkaeding · on Aug 3, 2016

If anyone is interested, I wrote a blog post detailing the cleanup branch approach: http://blog.launchdarkly.com/how-to-use-feature-flags-withou...

segmondy · on Aug 3, 2016

Checkout this article the author wrote in may, http://blog.launchdarkly.com/how-to-use-feature-flags-withou...

The idea is you create a branch to delete those conditionals before the feature flag is activated. Once it's activated, you merge in that branch.

adamhepner · on Aug 4, 2016

Funny thing is, that I've worked with multiple (unrelated) teams who simply refuse to use branches. This keeps boggling my mind (I usually don't work with code directly, since I am a tester, and focus on the end to end verification and validation, not with the details and unit tests), especially the argument that is used: "we want to move fast". On one occasion I could understand it, since they used SVN and externals, and each branching process somehow required 3 developers and 4-8 hours (I never bothered to drill down to the "why" behind those numbers, but I believe they mismanaged the sources, and used externals, where they should have been using just binary dependencies). Yet my current team uses git, and straight on decided that branching is too much hassle.

So yeah. Why?

jovanni · on Aug 3, 2016

I work at a company that uses feature flags heavily for releases. Even most refactors are done behind a flag.

It works surprisingly well, especially when you have 20 developers all trying to release different features at the same time, you then have a controlled way to isolate enabling and monitoring each new feature one at a time.

It's generally expected that you clean up your flag conditions after things have been tested. And that you write code so removing the flag flow is trivial.

avita1 · on Aug 3, 2016

> Even most refactors are done behind a flag

How does this work without cloning vast swaths of code and saying:

    if (feature)
       loadModuleA();
    else:
       loadModuleB();

jovanni · on Aug 3, 2016

On occasion you do that. But usually it's just a couple if statements in a couple places with very little code duplicated.

Also, again, it's expected that after it's tested you delete the old path. It's a short lived transitional situation.

Sometimes it's annoying. Overall it's safer when you're on a massive scale.

flukus · on Aug 3, 2016

That would tie in cleanly with an IOC container.

paulddraper · on Aug 4, 2016

I really like feature-flag releases, but they are not without disadvantages.

The one that comes to mind is complexity. Ten features is a thousand possible configurations of your application. Thirty features is one billion possible feature sets for your app.

Ideally, features are independent of each other and work regardless of the enabled/disabled state of the other features. But ideally code doesn't have bugs.

All in all, I think feature flags are great. I just prefer to limit them to the larger, more risky changes.

agentgt · on Aug 4, 2016

We do this and we measure the use of the feature and health with a metrics library such as codahale.

I highly recommend using an easy to use metrics library to test feature flags.

lifeisstillgood · on Aug 3, 2016

I think feature flags are a smell if used for anything other than controlling which users see which output.

If Inhave a feature that is, say a way to calculate a customer quote, turning on that feature without managing the deprecation is poor. Most "features" can be seen as services provided to other areas of the code base, micro service style or not.

It's a nice idea but I worry

heathkit · on Aug 4, 2016

It's weird to see Google best practices continue to leak and become the standard.