We've tried various schemes like this with the D compiler. The sweet spot seems to be the current design, where new and possibly buggy/breaking features are introduced with the -preview=feature compiler switch. If it proves out, and users have time to adapt, eventually it becomes the default, and a -revert=feature is there for people who need more time to adapt to it.
This has so far been working great.
As John Carmack wrote, it enables us to be a little more fearless in introducing new capabilities and putting them on trial.
For example, my recent trial implementation of disallowing more than one mutable pointer to the same variable being passed to a function:
Additionally, but on a slightly different note, I can think of two other common situations where these equivalence validation techniques are useful.
When refactoring or doing performance work, its very helpful to create two variant implementations, for instance the simple, obviously correct version and the new, optimized version. One can then send the output from a property based tester or a fuzzer into both implementations and test: assert forall x . referenceVersion(x) == optimizedVersion(x). A good property based tester or fuzzer will give one high confidence that behavior has been both understood and reproduced. Since changes to legacy code and bug fixes are among the primary causes of defect introduction, these testing techniques usually quickly bring to ones attention how fallible one is.
For non code artifacts like configurations and build systems where output is assembled or generated. The Unix diff utility can be useful. One assembles or generates the artifacts in two directories. One directory is representative of the project before the change and the second directory after the change. Anything that shows up in the directory diff should correspond to exactly to what one expected to see as a change. Since the full actions of package assembly and generation are often opaque, this technique provides assurance in parts of the system development process where there are often few other safety nets.
> One can then send the output from a property based tester or a fuzzer into both implementations and test: assert forall x . referenceVersion(x) == optimizedVersion(x).
You can even do that in a production environment, assuming the "optimised version" is properly protected so it can't take down the entire system if there's an error in it.
Additionally, for performance work you can put your performance expectations in a test: run both the old and new version, assert the perf. difference you want in addition to the result being the same.
I've seen a couple of comments here talking about feature flags and I don't think that's quite what Carmack is getting at here.
It seems he's talking more about refactoring/optimizing/debugging working code rather than adding new features. Then the old version of the code becomes a point of reference against which you test your new implementation.
Once you are very confident that your new implementation is no worse than the old implementation you have two choices.
1. Leave your old implementation around as a reference implementation, but mainly use your new implementation. This is useful if your old implementation is functional and readable but slow and your new implementation is less readable but faster. This helps other developers understand your new implementation.
2. Remove your old implementation altogether and just keep your new implementation in your codebase.
Either way the thing Carmack is advocating for is having an extended period of time where the two implementations coexist, even though one of the implementations is "redundant." He is explicitly advocating this despite the fact that you could in theory just retrieve the redundant implementation from version control and not have to "pollute" your codebase with a redundant implementation.
Yes. And the paragraph that most stood out to me was this:
> If the task you are working on can be expressed as a pure function that simply processes input parameters into a return structure, it is easy to switch it out for different implementations. If it is a system that maintains internal state or has multiple entry points, you have to be a bit more careful about switching it in and out. If it is a gnarly mess with lots of internal callouts to other systems to maintain parallel state changes, then you have some cleanup to do before trying a parallel implementation.
If you have clean interface boundaries, it's easy to have two versions of something coexisting. If you don't, you can't just change the component in question, you have to change the whole program, which means you can't go this route. It's not just about deciding to do two implementations of something, it's about writing your whole program in ways that facilitate this style of development (no side effects, etc.)
I think Carmack is basically using feature flags - but that's just the platonic ideal polished version of this, not the fundamental goal.
The less AAA/polished version is to do this in code instead of in data. Maybe you comment out old.h in favor of new.h, or have a #if 0 ... #else ... #endif chunk in your .cpp files. This can be a lot quicker than adding conditional logic to hundreds of call sites, or implementing hundreds of forwarding stubs, if you don't have a nice dynamic module boundary already. You can make a define, two build variants, and side-by-side compare things that way too. Not quite as nice as runtime switching, but potentially a lot easier, if you've kept things from spilling into common headers too much and have reasonable link times.
Another alternative is to do this on your second computer, carefully synced to the same code/config/debug settings/feature flags/???, and hopefully with the same specs. This manages to work even if things spilled out into headers, but can require some painful coordination to keep in sync.
I don't think so. I think he's suggesting something fundamentally different. Note that he specifically calls out keeping around a reference implementation indefinitely that he implies will never be called by a code path (but is continuously maintained) as a subcase. That's kind of hard to square with feature flags.
There's nothing fundamentally temporary about feature flags, even if that's the popular use case.
You can opt into using the reference implementation of a renderer with a flag.
You can opt into using a debug implementation of an allocator with a flag.
You can opt into your experimental hopefully faster renderer with a flag.
There's very little that's fundamentally different in these cases - maybe some get removed from some builds at compile time - and maybe your plans for the future are a little different.
But if plans change - perhaps maintenance of the reference implementation ends up being more costly and less beneficial than expected - you may very well end up deprecating, and then deleting, your reference implementation. To an outside observer not privy to your plans, this is identical to classic "I'm making my new work opt-in, then opt-out, then the only choice as development progresses" temporary feature flags.
EDIT: Heck, https://martinfowler.com/articles/feature-toggles.html even lists plenty of long-term and even static build-time configuration stuff under the category of feature toggles, so I really can't see what's hard to square, unless you're using a much narrower definition of "feature toggles" than is being used there.
I think first off yes I do have a narrower definition of feature toggles; if you are using a configuration file to perform arbitrary swapping of code modules in and out I would argue you've turned feature toggles into a synonym for a dependency injection framework, especially if you're allowing for two implementations to run at the same time (e.g. and then compare their results for correctness otherwise fail with an assert).
But arguing over definitions is not particularly enlightening or fun, so let's take feature toggles in their full generality as DI frameworks.
I still don't think that's what Carmack is advocating for.
Indeed Carmack specifically calls out:
Code fearlessly on the copy, while the original remains fully functional and unmolested. It is often tempting to shortcut this by passing in some kind of option flag to existing code, rather than enabling a full parallel implementation. It is a grey area, but I have been tending to find the extra path complexity with the flag approach often leads to messing up both versions as you work, and you usually compromise both implementations to some degree.
Now Carmack isn't talking about feature flags quite in the same way you are (I think the short cut he's talking about here is not doing a full re-implementation and interleaving parts of the old code with new code at runtime).
Nonetheless in one important way he is talking about feature flags in the more general case, which is that his approach should lead to ideally no change to your original codebase (the only exception he lists is if your original code relies heavily on global state). This is not true for feature flags (or DI). You need to write new code just to support it. Indeed your link calls this out as the "carrying cost" of feature flags because of the additional "abstractions or conditional logic" they require.
Hence feature flags are not the AAA form of what Carmack is talking about here (although they are one choice of implementation for the particular functionality Carmack is getting at if not necessarily his philosophy of do no violence to the original code). There are several other candidates, but the one I personally gravitate towards is hot code reloading. You write your separate code, then hot swap it in. And hot swap back your old code as necessary (either for educational purposes or for testing purposes). You don't need to decide up-front which parts of your code need to be designed around feature flags; it's all hot-swappable. You don't need to write extra logic to enable feature flags, the hot-swapping makes it all come for free. You don't need even need to shut down your process and restart it to read a new configuration!
> Rolling back code and rebuilding to run a test is a pain, and you aren’t going to do it very often, even if you have a suspicion that things aren’t working quite as well in a particular case you hadn’t considered during the rewrite.
> What I try to do nowadays is to implement new ideas in parallel with the old ones, rather than mutating the existing code. This allows easy and honest comparison between them, and makes it trivial to go back to the old reliable path when the spiffy new one starts showing flaws. The difference between changing a console variable to get a different behavior versus running an old exe, let alone reverting code changes and rebuilding, is significant.
This is great advice, always being able to compare and easily switch between parallel implementations is key to maintaining systems long into the future and not doing the v2 rewrites every 6 months.
Creating a solid system that doesn't have breaking changes, but new paths/flows to use at will is key to debugging between the two and maintaining a live app/game/system/infrastructure.
Parallel implementations sometime takes more work and more care to signatures and surface/facades in the app, but the benefits in debugging, comparing multiple implementations and easing into new systems when they are fully ready, not rushed, is iterative knowledge from the trenches of the shippers that was learned through pain.
I’m working daily on a pretty large, gnarly, legacy monolith, and consequently been thinking a lot about approaches to refactoring when I came across a (ruby) library called Suture [1].
It allows you to perform black-box testing, recording and replaying test data (gathered from production).
From there you can refactor your code and even have the library run both new/old codepaths in production, raising errors if there is a mismatch.
To your point about constant rewrites, I think using a library like this while continuously refactoring existing code is a pretty exciting idea.
Too bad I need a Java version (maybe a good idea for a side-project).
Very appealing, but one problem I've found with this general idea is equivalent but non-identical results. A simply-solved example is a serialized set: different orderings differ, but are equivalent. You can get more complex ones, such as ASTs ax+bx and (a+b)x.
Such cases should be pretty rare, but they come up for me all the time.
Not exactly what he's discussing, but similar in concept: I've noticed is that if you have a piece of code that needs to be split into two separate pieces and doesn't have well defined interfaces, it's often good to duplicate the code into two places, then slowly start cutting away at the redundant parts until you've diverged the pieces enough that they fulfill different roles.
It's kinda like surgery. You can't just rip open your patient, take out all their organs, then put them back after. You need to keep the patient alive. If you refactor but end up stuck halfway with a bunch of broken code, you either have to heroically finish the refactor and have it introduce minimal regressions and minimal issues or, more probably, you'll run out of time/people will get impatient and the refactor will be abandoned.
I personally find it more akin to sculpting or drawing. I have this kind of “holistic” approach to development where I have a rough yet clear idea of what I set to do, so I start with the rough lines then brush it up with increasing detail, chiselling away the unneeded parts, redrawing details over my initial sketch while keeping its overall spirit, yet correcting the details and relative positions I obviously could not get right the first time.
I call it “holistic” because instead of focusing on details and interfaces first, the possible problems kind of serendipitously find obvious solutions, progressively emerging from a bigger whole as I move forward. It feels like dropping a Rubik’s Cube from chest height and have it solve itself by the time it hits the ground.
See how the picture “emerges” in this timelapse[0]. The overall thing is always present and things get increasingly precise as the drawer progresses, adjusting details from the overall design. What you don’t see is the timeline is the artist zooming in and out every other second to make sure the emerging details fit in the overall piece, like you’d zoom in and out of a fractal where everything influences the rest at every level (but note that this does not mean coupling! See how the boat gets scrapped at some fairly advanced point and mostly redrawn to be a better fit in the overall composition). The good thing is you can basically stop at anytime and still have a working system, one that will be very easy to improve over time.
This is also how I approach an existing codebase, looking at the overall composition, and will often resort to a parallel implementation of my own, however rough yet capturing its essence, to understand it better. Refactoring the original piece is made much easier then, and depending on its quality, possibly incrementally rewritten in a Ship of Theseus way to gradually eat the original away and make it converge towards a future-proof system.
Is the date the repost was published 2018, or the date Carmack originally published? My first thought to
> Rolling back code and rebuilding to run a test is a pain
was “bisecting with distributed version control” isn’t that hard. The rest of the article doesn’t definitively place it in any era. If I’m reading this correctly, he’s essentially advocating for a/b(/c/d ...) testing and feature flags?
So interested to know when this idea occurred to him, and read this HN discussion.
Regardless, the idea (“competing” or alternate implementations in the same build) is compelling.
>If I’m reading this correctly, he’s essentially advocating for a/b(/c/d ...) testing and feature flags?
Feature flags and A/B testing are used more for testing new features or functionality. This is directed at evaluating completely different implementations while trying to keep the same functionality. Particularly regarding performance, he touches on the fact that if not all features are implemented in the new version, you can’t trust the metrics.
This somewhat like how bridges are built. You keep the old one open while you build the new one alongside. I'm sure there are better analogies but the idea seems as old as time.
> It is often tempting to shortcut this by passing in some kind of option flag to existing code, rather than enabling a full parallel implementation.
I've fallen into this trap, with a horrible mess of conditions.
But you still need some kind of flag, right? To switch which implementation to run. He's talking about modules, not entirely separate standalone programs, so I guess he means the flag is outside the implementations chosen between (and never passed into any of them).
EDIT He's using console/environment flags, so it just needs a little code to read flags, and switch implementations.
> The difference between changing a console variable to get a different behavior versus running an old exe, let alone reverting code changes and rebuilding, is significant.
Write templates or interfaces. Integrate existing code against those, but link the correct module implementation to a dedicated binary for each approach.
C++ templates and shared objects or Dagger multibinding in Java are good ways to do this.
He’s arguing for changing the behaviour via an environment variable, and not recompiling/linking at all:
> The difference between changing a console variable to get a different behavior versus running an old exe, let alone reverting code changes and rebuilding, is significant.
I’ve not tried his way, but he’s pretty explicit about what The Way is.
The context here is a game loop. A console variable, in Carmack's games, wouldn't require a restart, never mind rebuilding a module. Switching the variable for the renderer (or some subset thereof) would replace the renderer for the next frame, so the switching time between codebases is measured in milliseconds. That makes for easy visual diffs.
At the expense of adding a hacky conditional to your execution path. Fine for a one off, but not for any larger workload.
Instead, bind different modules to different binary build targets. If you're smart about your module size, compilation time is a non issue. It will be way more maintainable. And by forcing yourself to do this, you force yourself to modularize your code in an extensible way instead of placing flag hacks everywhere.
> I disagree then. No one does this anymore. The big tech companies certainly don't.
In a way they do. Game development/engine development is more possible for this but it happens in all software, especially widely successful software that must iterate while existing implementations are available along with the new one.
A big tech version might be old verses new reddit using same data store to verify how things work, and allow people to slowly change over and on demand with a url switch. Digg did not do this and alienated everyone famously.
Or when Facebook or others swaps out a new version of the api, the old ones run for a time based on the version switch, as well as apps built that integrate it or libraries used for it.
Or some A/B testing in terms of software flow or usability/presentation.
Or in Unity for instance they had their old GUI available while their new Unity UI was available, allowing people to switch. Same with their animation system to Mecanim animation system, same with their particle systems where they had two. They have to roll all new features in like this over time now that so much is built on the engine.
When you use Unity an example of parallel implementation might be which particle system you use at runtime, maybe both are integrated and you flip between the two to see what looks or works best. Or flipping between legacy animation or Mecanim. We have the ability to switch between ui libraries, particle systems, animation libraries on the fly because of all the different games/implementations we have to support and it is needed and they need to be a baseline support across all apps/games. Same with utilities, they can flip between these systems at runtime to check differences.
Doing parallel implementations can lead to cleaner internals to more easily plug in, and it can prevent the 'version 2' disease of hard cutting over legacy that ends up missing a bunch of features.
This is why I’m curious about when Carmack published this. TFA is dated 2018, but says it’s republishing Carmack’s before it’s lost. The date could provide some interesting context.
This is of course good advice. It's strange that anyone who is more than the greenest professional developer wouldn't do this. Experimental feature flags is basic software development.
>Experimental feature flags is basic software development.
No, it is not basic or even recommended. Experimental feature flags should be used sparingly, especially those that take effect at runtime.
The worst codebases I have worked in were littered with runtime feature flags all over the place, which rendered many tests worthless by creating a combinatorial explosion of runtime complexity.
It is the greenest of developers who use experimental flags liberally.
When I think of some people I worked with who wouldn't grok this, other than the usual types who tend to overestimate maintenance costs carte blanche, it's usually because they confuse API with implementation. This seems to be an increasing problem - people increasingly don't seem to appreciate that you can do massive changes underneath and keep a stable API.
For example let's say a component needs a rewrite. They would scrap the function signatures for the old one, even if the problem being solved and expectations from a callee perspective haven't changed and they would apply equally well to the new thing.
And they would pretty much never use an "interface" in languages that have them, or a wrapper, everything is a direct call into their explicit dependency.
Can be extended to things like web services. Every rewrite renames identical parameters for no particular reason.
Yes, it's a very pragmatic way of improving a running system that you shouldn't break. Copy the old code to a new function/module/class. Keep the old code path in place but add a little feature flag or simply comment out the line that calls the old path and replace it with the new code path. Anything that allows you to easily switch between the old and new. Start fixing it. When everything is done, clean up by removing the old code.
This is a lighter weight version of creating a feature branch and attempting to keep that up to date or doing AB testing. That kind of thing typically involves a lot of project bureaucracy which can make a change more controversial. This way, until you are ready to enable the new path, there is no risk and you can keep on committing/merging changes. The only downside is temporary code clutter.
I've been developing like this for the past few years now but at my previous jobs people still used feature branches, so there are definitely other styles out there.
Seems like would be useful for determining if a bug is due to the new code or not. Because you can instantly switch back and forth on the fly if needed.
“Of course, to be honest, the consequences usually fell on a more junior programmer who had to deal with an irate developer that had something unexpectedly stop working when I tore up the code to make it “better”.”
Ye people like you kinda suck.
Unfortunately, you’re just now learning that maybe the new implementation isn’t “better” but that it’s just...new.
And now it’s slowed down the team too. I know it’s John Carmack, but still it’s not very collaborative nor empathetic towards your coworkers to plow through code like you’re the only one working on it.
That's literally his job, though. His job is to innovate radically. The job of the junior programmer is to do more routine tidy-up. Innovation creates temporary inconvenience, that's part of its cost.
This has so far been working great.
As John Carmack wrote, it enables us to be a little more fearless in introducing new capabilities and putting them on trial.
For example, my recent trial implementation of disallowing more than one mutable pointer to the same variable being passed to a function:
https://github.com/dlang/dmd/pull/10249
It's hard to tell in advance what effect this will have on the existing D user code base.