Hacker News new | past | comments | ask | show | jobs | submit login
Every app has a scary basement—dark old mysterious code vital to its operation (miksovsky.blogs.com)
74 points by Terretta on Sept 7, 2010 | hide | past | favorite | 23 comments



You _can_ turn this kind of scary "basement" code into something that's maintainable, but it's often a multiyear slog. I've salvaged C++ horrors that were virtually all basement code, intricate and fragile and dating back to Win16 code written in 1993.

Here's a recipe that may help you get started on your adventure:

1. Work incrementally. Never try to fix the whole program at once. Overhauling the whole program inevitably introduces lots of disruption for users and delays the addition of new features. This will not be appreciated.

2. Make sure you have tests. Lots and lots of tests. Any time you touch a module, write some tests for that module.

3. Build better module boundaries. This can be a mechanical process. For example, to separate disk I/O code out of a checkout register, declare a RegisterIO class. Every time the GUI code touches the disk, add a new method to the RegisterIO class and move the code there. You'll wind up with a _really_ ugly RegisterIO class containing a lot of ad hoc IO functions. But the RegisterIO class will improve with time.

4. Reimplement small modules one at a time. You may want to run the old and new versions of a module in parallel for a while (I once had a program with two different text rendering subsystems), or run the same unit tests against both the old the new module.

Eight years of work later, you'll have a reasonably shiny new program with some odd architectural decisions and a really ancient logging subsystem.:-) But at least you will have been shipping new versions and earning revenue every step of the way.


Michael Feathers's _Working Effectively with Legacy Code_ is entirely about dealing with this sort of code, especially on carefully retrofitting tests so you can safely do deeper restructuring.


Everything is fixable. It's just a question of time, process, testing, re-testing, etc. Is it always worth it? No.


Nobody's arguing otherwise, just talking about techniques for when it's necessary.


"When I was first exposed to this reality, some friends on the nearby Microsoft Word team shared stories about their app's own scary basement: a routine called FormatLine. Given a point in a document and a column width, FormatLine would lay out the next line of text at that point. As I heard it, this routine had evolved into a handful of functions that were each thousands of lines long. Developers assigned to descend into the depths of FormatLine were treated with the same respect and concern as spelunkers attempting to reach trapped miners."

This paragraph is hilarious and horrifying all rolled into one. That is not my idea of a fun project.


I had a friend who was on the Excel team, he told me similar horror stories about some of the very old code that is still in Excel to this day and about as unmaintainable as can be.


There was a great "scary basement" story posted here about three years ago that seems to have gone virtually unnoticed:

http://news.ycombinator.com/item?id=85587

If the name seems familiar, well, it's either because you bleed yellow or you've heard of CouchDB.


I think this puts into words the potential benefit I see in approaches like Google's GWT, which uses Java's ability to describe strict separations to separate "hairy cross-browser engineering fiddliness" from "view intent".

I did say "potential". :-) And the same lesson definitely applies beyond the need to separate UI concerns, but that issue seems to get the tangliest the fastest.


Yeah, I think we all design systems that try to prevent shooting ourselves in the foot, but the need for performance usually breaks this. From the article:

To get the most performance out of the PCs of the day, I believe he had the register itself more or less directly read and write transaction data from disk.

He's implying that the normal view/business logic separation wasn't fast enough.


Things like this make it seem worthwhile to create a code translator. In the same way the compiler converts the code from horrific basement language to computer language, it would convert the code from basement language to a properly structured maintainable form. Not just re-formatting, but perhaps changing dated methodologies etc.

Kind of like perltidy, but much more aggressive, pruning and re-arranging and re-writing bits as needed.

On the one hand, in theory a computer could never do it as well as a human, and we should learn to code properly. On the other hand, it's a very common problem, and computers are supposed to be time-saving devices above anything else.


Perltidy just handles formatting. Scary basement code is often poorly formatted and reformatting is a necessary first step in fixing it, but it's a trivial first step. It doesn't address the hard part.

Perlcritic is closer to what you're thinking of; it examines the structure of the code and points out design and implementation problems. It can't fix them for you though, and it only handles common designs that it knows about. It can't judge the quality of a design that it's not already programmed to recognize. Again, a useful tool for fixing all of the common problems, but it can't address the overall design.

The refactoring tools in IDEs like Visual Studio are useful too, but they don't understand your code in a way that would help redesign it. They just automate much of the process of extracting code from one method to another method/class. Useful, but not redesigning your code for you.

Ultimately, it takes an intelligent software designer to do this kind of work. Maybe artificial intelligence will handle it some day, but that's not coming for a while.


Writing such code would, unfortunately, be non-trivial (in the geek meaning) at best. Most likely undecidable.


"To get the most performance out of the PCs of the day, I believe he had the register itself more or less directly read and write transaction data from disk."

I don't buy this excuse. It just sounds like bad code to me.


Times were different back then. The article says the original control dates from 1990, and wikipedia confirms that Windows 3.0 came out in 1990 so it is plausible. The 486 had just been released in the previous year, so it was the New Hotness and not very many people had it, so you'd be looking at an install base of mostly 80386s with a substantial contingent of 80286s that your program still needed to run on with acceptable performance. Wikipedia says for the 80286 you're looking at a top-end of 12MHz, with 6 and 8 MHz in the field. Since this is a Windows 3.0 program, you had a minimum of 640KB on the system (though of course you can't use it all).

So this is a program that has to run with reasonable performance on a 12MHz 80286 with 2MB of RAM. Are you sure you wouldn't consider having your UI read and write the disk directly? Separation of concerns is a modern luxury. And remember, you're using compilers from 1990, too. Don't expect great inlining. (Besides, only profligate memory spendthrifts would inline a function.)

(Edit, later followup: In fact, "separation of concerns" is one of the major sources of the "bloat" people complain about when they examine modern OSes. Much of the "non-bloated-ness" of earlier generations of OS doesn't come from programmers caring more about their craft, it comes from programmers having to stuff programs into things that can barely hold them and building "clever things" that we would today consider monstrosities like UI widgets that read and write directly to the disk. I am unsympathetic to accusations that, say, a modern KDE desktop is "bloated".)


"So this is a program that has to run with reasonable performance on a 12MHz 80286 with 2MB of RAM. Are you sure you wouldn't consider having your UI read and write the disk directly?"

I wrote code for those machines, and no, I wouldn't write my UI code to read and write the disk directly.

"Separation of concerns is a modern luxury."

This is a myth. Merely an excuse for poor API design. Proper abstraction doesn't mean inefficiency.

Your last paragraph (the "Edit") is almost offensive. The cause of bloat is not separation of concerns. It's that people just don't care about writing efficient code (or they just don't know how). Did you ever write code in the era you speak of? Because your statement comes across as ill-informed conjecture.


To be fair, memory bandwidth was relatively better and pipelines were nonexistent, so there was little reason to inline a function unless the call setup was literally longer than the body.

My favorite monstrosity was the http://en.wikipedia.org/wiki/Overlay_(programming). When your OS couldn't page functions from disk for you, you damn well did it yourself. They even started building it into linkers....


The madness still exists today. I'm developing an app that runs on a 8-bit 8051 CPU with a 16 bit address bus and 1MB of code Flash... Let's just say that segmentation and function pointers isn't a perfect match. Luckily the next revision of the hardware runs an AVR32 instead, I'm just hoping that the hardware will be ready for the world before the app is!


It was a rare 286 indeed that came with 2MB of RAM!


Yes, I was being a bit generous and assuming that perhaps Microsoft Money didn't have to run on the absolute minimum spec for Windows 3.0. I couldn't find the exact specs Microsoft gave for the original version. Considering that I can chew through 2MB in my programs nowadays if I accidentally sneeze wrong it still makes the point, I think. :)


In fact, "separation of concerns" is one of the major sources of the "bloat" people complain about when they examine modern OSes.

Um, what is your source for that assertion? From where I'm standing, separation of concerns is a technique which favorises the re-use of code, and hence the reduction of bloat.


Separation of concerns at the OS level involves the creation of shared DLLs, and people complaining about bloat inevitably complain about the many megabytes of DLL files that the simplest program needs nowadays, even if it only uses a vanishing fraction of them. People slinging bloat accusations around seem to want a program in which every byte loaded from the disk is executed with high frequency or something.

In my opinion, you're totally correct. This is part of the reason I'm not very sympathetic to the bloat argument. The megabytes of libraries are there for good reasons.


We used to call this "Hemoglobin". It's something scary complicated, created by elves (as far as you're concerned); no one knows how it works, but it does (barely), and only if occasional offerings of live virgins are made at its altar.

Another appropriate term is "Happy Fun Ball" (google it)


This is so true...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: