Cute litte story, but the moral the author takes away from it is actually wrong....

hugh3 · on Oct 21, 2011

Another classic story which illustrates your point: the "Magic/More Magic" switch.

http://catb.org/jargon/html/magic-story.html

Closer examination revealed that the switch had only one wire running to it! The other end of the wire did disappear into the maze of wires inside the computer, but it's a basic fact of electricity that a switch can't do anything unless there are two wires connected to it. This switch had a wire connected on one side and no wire on its other side.

It was clear that this switch was someone's idea of a silly joke. Convinced by our reasoning that the switch was inoperative, we flipped it. The computer instantly crashed.

mechanical_fish · on Oct 21, 2011

Before you remove the onions, make sure you understand why they were added in the first place.

Unfortunately, while this is correct, it is nowhere near strong enough.

Just because you know the initial hypothesis that drove someone to throw the onions in, doesn't mean you know what the onions are actually doing. The onions are interacting with all the other ingredients. Onions are more complicated than you think.

Before you remove the onions you must understand what they are actually doing in your recipe, not merely what you think they are doing... or you must be prepared to find out. Because even very tiny systems are generally too complicated for you to fully understand, you usually have to settle for science: You must have good enough test coverage to detect any breakage as soon as possible after the onions go out, so that you may frantically throw the onions back in and then go back to the drawing board.

And the longer the onions have been in the recipe, the longer the recipe has evolved in the presence of onions. Take the onions out and all the other changes that have happened since the onions went in might have to be tweaked. Oh, the subtle things you could find.

Here is what can happen: You'll take out the onions, and then five years later your customers will complain that your company's red paint has started peeling, ten years ahead of the fifteen-year warranty period. Oh, no. You call an all-hands meeting of your QA team. They spend months doing expensive research, while your product engineers fly from city to city trying to pacify the increasingly irate customers. Eventually the scientists find that the paint can be fixed by mixing in some kind of protein or other. But why did the paint used to work? Oops, the onions had protein in them! Sure enough, if you throw in some special Essence Of Onion Protein compound the problem goes away. Great. Now all you have to do is issue a field recall of about six years worth of your product.

You will bring this story to your CEO and the shareholders, and they will ask why you didn't just pay for some onions? Or, if removing the onions was critically important, why you didn't run a small set of onion-free test batches on a parallel process line and then put those batches through lots of tests, both in the lab and in the field, before making the change?

I've worked as a semiconductor engineer, specifically in charge of diagnosing problems that arose in the field, so believe me when I tell you that I've seen "simple", "well understood" little tweaks in a semiconductor processing recipe cost companies millions of dollars and one hell of a lot of stress. I've also lost six months of my grad student career to one specific bad step in a recipe. Once you get a working recipe you do not change a thing without a specific plan to thoroughly measure and document the effect of that change. Find someone from Intel and ask them how this works: I've heard that you can't so much as touch a knob in Intel's fabs without a signed change order.

5hoom · on Oct 21, 2011

"the longer the onions have been in the recipe, the longer the recipe has evolved in the presence of onions."

That is just an excellent explanation of why removing stuff from a working system can be so dangerous. Thank you.

dotBen · on Oct 21, 2011

And why we should write our recipes (code) carefully to imply intent such that dependencies like this are not created unnecessarily. Consider:

"Add one onion", vs "Dip piece of onion in mixture until it begins to fry"

Latter clearly indicates no aspect of the recipe should be a dependency on the onion or inherit properties from the onion as the onion will be removed before runtime (consumption)

p0ckets · on Oct 22, 2011

This still wouldn't be sufficient in mechanical_fish's example. Someone might still remove the onions (and the needed proteins) after the introduction of thermometers with the reasoning that this would be a "simple" and "well understood" tweak.

wladimir · on Oct 22, 2011

With very old code bases, you need to start removing things, otherwise it becomes unmaintainable. Sure, you need good testing and code coverage to make sure you did it properly, but that applies to any change.

It doesn't mean that the people implementing it 15 years ago were idiots, no, it all made sense at the time. But after years of maintenance some parts grew into an abomination (losing any touch with the high-level design), and other parts are not needed because the specific gadget it was supposed to support no longer exists.

If you don't refactor, you'll eventually have a landmine-ridden place where every change has impact on 10 different unrelated places. A company I worked at had this problem. They had to keep adding developers to handle bugfixes and feature requests, and it only became worse instead of better because they never made available time for refactoring.

"Removing code that you don't understand" is indeed wrong and shouldn't happen, but at least if you use SCM can always look back to understand why it was added. And then remove it anyway.

Believe me, that process makes sense for a factory but for source code you eventually end up with voodoo programming. The source is too complex for any human to understand, and it is impossible to teach new developers which didn't "grow into it".

mechanical_fish · on Oct 22, 2011

Oh, I agree that you'll eventually be unable to maintain or improve a codebase built on coincidence, just as you will eventually reach the limits of a chemical process that uses onions. For example, your onion process will be inconsistent at some level, precisely because onions are a hack with a lot of potential side effects. What breed of onions, again? How big should the individual onions be? Don't onions vary according to the soil you grow them in?

So you'll probably have to take the onions out, sooner or later. But: You've got to be prepared for the real costs of that project. It is very, very tempting to tell yourself that your simple change is "obviously" going to be cheap. Particularly in software, which is not chemistry, let alone biology [1], and which is so close to being deterministic and consistent and provable -- after all, the individual components often are that simple (at least in the absence of cosmic rays and lightning strikes), and every complex program starts out simple and easy to change, and the complexity can grow so slowly that you barely perceive the day-to-day increase. So you can convince yourself that you know what's going on in a software system. And then, whoops, you make a change, and it has a side effect, which has another side effect, and then the bug reports come in, from users you may not have even realized you had.

---

[1] This is why it's great to spend a little time moonlighting as a biologist: You can practice living in a world without certainty. Biologists really don't know more than a small fraction of what's going on. Even individual bacteria are mysterious. If you've been raised on physics you will be shocked by biology. You really have to learn what controls and statistics and experimental design are about when you're trying to measure a biological system.

masklinn · on Oct 21, 2011

> I'm sure many of us have been bitten by this as developers. You see a few lines of codes or a feature and you have no idea why it's there, so you remove it. What happens? Random, seemingly unrelated application starts failing or angry customer calls wondering why some feature is no longer available. Whoops...

Yep. Tells me OP is not a developer, and has not read the article he linked. Because the title makes no sense at all.

CapitalistCartr · on Oct 21, 2011

The story sounds to me like a call to document thoroughly. To avoid the onions write down the reasoning with every step.

loup-vaillant · on Oct 21, 2011

If the priority is to not break anything, sure, don't fix what's not broken.

Now every piece of code which is there for no know reason remains a problem. If you want to keep the code base simple, better remove either the code or your confusion (add a comment, refactor, whatever).

hvs · on Oct 21, 2011

You should never remove code that you don't understand. And when isn't it a priority to "not break anything?"

loup-vaillant · on Oct 21, 2011

Whenever your product is in for the long run. For those cases, better temporarily break it than eventually being unable to manage it at all. That line of code no one understand is a technical debt. Sometimes, it is better to pay the debt right now.

I'm currently working on a 2 millions LOC program which never paid its dammed debts, and I weep every day before this holly Big Ball of Mud.

mcantor · on Oct 21, 2011

If our highest priority was always "Don't break anything", we would never ship to production after the first move. "Make it better" is often a higher priority.

yxhuvud · on Oct 21, 2011

Yes you should. It is often the simplest way of discovering why it is there - you see what part will break.

polymatter · on Oct 21, 2011

I think you mean this as part of the investigation in what this code does.

That is, after setting up a test/dummy system, you remove the code, run your tests and try to break it. Perhaps sticking in breakpoints or print lines or whatever..

I doubt you really mean to experiment on the business-critical live system.

Thats unlikely to be very effective anyway. The most obscure code is often a bug fix for some really obscure bugs. Maybe it fixes some weird bug in Swedish XP SP1, or something which you don't find just by removing it and testing. Situation depending, you will need to actually read and understand the code and investigate properly.

yxhuvud · on Oct 21, 2011

Obviously, it is not a tactic to use on a live production system. On the other hand, no programming at all at a live system is the prudent way of doing it.

Good test coverage help a lot here though - remove something and you'll get a set of failing tests to investigate.

grannyg00se · on Oct 21, 2011

Or you notice nothing breaks so you leave it out, joyfully congratulating yourself that you reduced total LOC count by one. Then six month later things start breaking and you have no idea why.

gbog · on Oct 21, 2011

Refactoring and removing code needs courage and tact, but sometime it needs to be done. If you have a database that is not properly normalized, the more client code you add to it, the harder it is to normalize later.

The biggest problem with refactoring and removing code is to explain it to non-techy project managers. If you say that the code or the design is bad, you are saying the guys behind it are bad (you or the previous team), which is not very tactful. I usually try to explain that a software system is a living thing, that need some regular washing up. Also, it is possible to explain that the next features or optimizations will take less time to implement after refactoring.

yxhuvud · on Oct 21, 2011

I tend to blame myself. "I didn't fully understand the problem space when I first wrote it, and now I have to rewrite it because it is a horrible mess and too complex to maintain".

Works surprisingly often.

jeffdavis · on Oct 21, 2011

"You should never remove code that you don't understand."

Never is a long time.

It's fairly easy for low-skill developers to write code that's time-consuming for high-skill developers to understand. In fact, making simple things hard to understand is pretty much the definition of bad code.

So, you use your judgement. If it's a particularly subtle and important part of the code, spend the extra time to make sure you're not missing anything. If it's not, then just rip it out and don't waste time in a maze of strange control flow, redundant code, useless invariants, and confusing assumptions.

masklinn · on Oct 21, 2011

> If the priority is to not break anything, sure, don't fix what's not broken.

The issue generally arises because things are broken, investigation leads to a piece of code which creates the breakage (everything is perfect before, things are broken after), nothing seems to use it (the project is naturally pretty much void of automated tests), you remove the code, it fixes the issue, and then you learn that prod has started failing hard.

vilhelm_s · on Oct 21, 2011

"Don't ever take a fence down until you know why it was put up." --Robert Frost