Size is the best predictor of code quality

jacques_chester · on Sept 26, 2011

Size is the best predictor of many things about software projects.

Note that 'size' is a dimensionless quality; we can only approximate it with certain proxy metrics (KSLOC, Function Points, Budget allocation).

Edit: and gzipped size, and token counts, and logical lines, and Halstead metrics, and cyclomatic complexity, and object points, and ... and ... and ...

For example, project size is the best predictor of whether a project will meet its initial budget/time/feature/quality goals (Boehm, Standish). It totally swamps staff quality, programming language, programming process, tools, libraries, everything in this respect (Boehm).

Per (Standish), a project with a budget > US$10 million at launch has a 98% probability of not meeting its goals and from memory < 50% probability of avoiding cancellation.

In fact I have a totally untested hypothesis that agile "works" because it's mostly applied by small teams to small projects.

(Boehm): Barry Boehm, Software Cost Estimation with COCOMO II

(Standish): The Standish Group CHAOS Report.

wslh · on Sept 26, 2011

"In fact I have a totally untested hypothesis that agile \"works\" because it's mostly applied by small teams to small projects."

I think that agile works more as an interface and contract helper between customer and supplier. A few years ago it was very difficult to convince a customer to follow the iterative way. The customer just wanted all the features in time.

jacques_chester · on Sept 26, 2011

On the other hand, small project sizes make this possible at all. For a very complex project you have so many potential users and customers that a single agile team cannot deal with all of them.

Size has its own unique problems.

spenrose · on Sept 26, 2011

Agile "works" because neither "agile" nor "works" is well defined. (Some agile methodologies are well-defined, of course.)

_3u10 · on Sept 26, 2011

The issue is that budget size requires more code, (eg. if you have a budget you need to spend it), and spending it requires hiring programmers. You can't take on a $10 million dollar project and tell people you're hiring 4 programmers and that the project will be done in a few months. You need to hire 100 programmers and tell people it will take a year.

Basically, it's a symptom of the idea that work expands to fill time, agile works IMHO because it avoids spending time that doesn't need to be spent.

jacques_chester · on Sept 26, 2011

This is one of those chicken-and-egg, correlation-is-not-causation problems.

Does big 'size' cause a big budget, or do big budgets cause blooming 'size'? A bit of both I'd wager.

aaronfreeman · on Sept 26, 2011

That reminds me of a little fable I've always found humorous: The Parable of the Two Programmers http://www.csd.uwo.ca/staff/magi/personal/humour/Computer_Au...

huherto · on Sept 26, 2011

One of the big advantages of agile is to reduce scope. Traditionally the incentive is think of everything at the beginning because no changes will be allowed during the project. The result is overproduction of features. Agile methods try to focus on the features that are more important reducing the size of the project.

lurker19 · on Sept 26, 2011

The composable programming evangelists (off-shoot from the movement to minimize mutable state) ague that the hifhest-quality software is always a small program that composed other small programs, recursively until all the desired functionality is accumulated.

cpeterso · on Sept 26, 2011

I just pulled Steve McConnell's (must read!) CODE COMPELTE: A Practical Handbook of Software Construction off my bookshelf. The section How Long Can a Rountine Be? references some surprising (but perhaps dated) studies that suggest the evidence in favor of short routines is "very thin" and the evidence in favor of longer routines is "compelling". These studies are probably biased to desktop and corporate software written in C during the 1980s.

The consensus is that routines should have fewer than 200 LOC, but that routines shorter than ~30 LOC are not correlated with lower cost, fault rate, or programmer comprehension. btw, the longest function I've seen in commercial software I've worked on was 12,000 LOC! I will not name names. :)

* A study by Basili and Perricone found that routine size was inversely correlated with errors; as the size of routines increased (up to 200 LOC), the number of errors per LOC decreased (1984).

* Another study found that routine size was not correlated with errors, even though structural complexity and amount of data were correlated with errors (Shen et al. 1985)

* A 1986 study found that small routines (32 LOC or fewer) were not correlated with lower cost or fault rate (Card, Church, and Agresti 1986; Card and Glass 1990). The evidence suggested that larger routines (65 LOC or more) were cheaper to develop per LOC.

* An empirical study of 450 routines found that small routines (those with fewer than 143 source statements, including comments) had 23% more errors per LOC than larger routines (Selby and Basili 1991).

* A study of upper-level computer-science students found that students' comprehension of a program that was super-modularized into routines about 10 lines long was no better than their comprehension of a program that had no routines at all (Conte, Dunsmore, and Shen 1986). When the program was broken into routines of moderate length (about 25 lines), however, students scored 65% better on a test of comprehension.

* A recent [sic!] study found that code needed to be changed least when routines averaged 100 to 150 LOC (Lind and Vairavan 1989).

* In a study of the code for IBM's OS/360 operating system and other systems, the most error-prone routines were those that were larger than 500 LOC. Beyond 500 lines, the error rate tended to be proportional to the size of the routine (Jones 1986a).

* An empirical study of a 148 KLOC program found that routines with fewer than 143 source statements were 2.4 times less expensive to fix than larger routines (Selby and Basili 1991).

fhars · on Sept 26, 2011

Judging from my experiences, several of these studies may be missing a major confounding factor: the complexity of the problem the code solves. All my longest methods are rather stupid output formatting stuff or if/else cascades that handle tedious by mostly trivial distinctions. But for code that solves hard problems I often write many small functions for independent steps of the solution.

So the studies that measure method length/bug count correlation within a single code base or in code written within a single organization might only measure the fact that code that requires no thinking contains fewer bugs than code that does. Paging Captain Obvious. Some of the other studies address that (e.g. Shen 1985 and the code comprehension studies), but as it is so often the case in quantitative studies of things related to programmer productivity, we lack repeated measurments where the only variable is the independent factor whose influence is studied.

gruseom · on Sept 26, 2011

That's an interesting point. There's certainly a cap on the complexity of code that can be put into a single long function. (Unless, I suppose, it has inner functions that call one another, like how people do OO in Javascript; such things can be just as complex as whole programs.) Usually it's implementing some conceptually unified thing. Even if that thing is a complex algorithm, it's still cohesive enough to be able to say what it is. And implementing even a very complex algorithm is not particularly complex at a system level.

gruseom · on Sept 26, 2011

I've mentioned those findings here before. It's great to have them listed. It's fascinating that, even with all their limitations and age, they fall so completely on one side of the question - the opposite side to what a lot of sophisticated programmers believe.

The last time we debated this on HN, there was a disagreement about how much complexity the interactions between functions add to a program. To me, complex call graphs are even worse than complex code inside a function. I was surprised to learn that an opposing view even existed.

berntb · on Sept 26, 2011

>>complex call graphs are even worse than complex code inside a function

Hear hear.

I have a problem specific to that, here. Some programmers follow the "no documentation should be needed, the code is obvious" dogma in a non-typed scripting language -- and you have to look many levels up in the call graph before you even find out about the damn function parameter's types... :-(

dman · on Sept 26, 2011

Ideally you have tests which let you know how every function can be (ab)used.

berntb · on Sept 26, 2011

If I wrote 2-20 line functions, then I wouldn't document/test every one, either.

ZeroGravitas · on Sept 26, 2011

I think the idea is that the short functions call other (short) functions at a lower conceptual level to create a large amount of functionality. By testing that function, you're testing how they work together.

This is different from short functions being short because they don't do much.

0x12 · on Sept 26, 2011

One of the very hard to quantify effects here are how many of the bugs found in 'short' routines would have gone unnoticed if the routines were longer. In other words, were all the bugs that were there (both in short and in long functions) actually found?

In trivial flow (short routine) bugs tend to stand out whereas in complex flow (longer routine with multiple levels of nesting) bugs can be much harder to spot.

It may actually be good that more errors were found in shorter routines after all that is exactly what it is about, finding errors, not making errors.

ghostganz · on Sept 26, 2011

I wonder how these 1980's results relate to object oriented code and other things that have happened since then. Even 25 LOC would be a huge method in Smalltalk-style OO for example.

Other than OO, we also have much better tools for navigating code now. That may have changed how we approach and understand unknown code.

Confusion · on Sept 26, 2011

Another reason (in addition to those already mentioned) why these numbers are not easy to interpret:

  small routines [..] had 23% more errors per LOC

If the smaller routines had, on average, 100/1.23 = 81% or fewer of the lines of the larger routines, then they still had fewer errors per routine.

jacques_chester · on Sept 26, 2011

Good writeup. McConnell is right in that there's a vast abyss between the researchers and practitioners in our trade.

Another good book by McConnell, which also discusses size with summaries of studies, is Software Estimation: Demystifying the Black Art.

lurker19 · on Sept 26, 2011

Smaller functions tend to be more reusable leading to fewer LoCc per codebase, which could overcome the bugs/LoC penalty.

Also, considering the date, this research could be influenced by the bug-prone-ness of parameter passing in C as a non-memory-managed language.

icefox · on Sept 26, 2011

For anyone interested in more discussion of this I suggest grabbing a copy of "Making Software", in particular chapter eleven on Conways Corollary which the chapter centered around this paper from 2008 http://research.microsoft.com/pubs/70535/tr-2008-11.pdf

The meat:

  Table 4: Overall model accuracy using different software measures
  Precision  Recall Model
  86.2% 84.0% Organizational Structure 
  78.6% 79.9% Code Churn
  79.3% 66.0% Code Complexity
  74.4% 69.9% Dependencies
  83.8% 54.4% Code Coverage
  73.8% 62.9% Pre-Release Bugs

Or in plain terms if people mess with code they don't normally mess with, you can bet real money (with a higher probability than other metrics) it introduces bugs.

Edit: I have been meaning to make a git tool that would analyze the history of a project to create predictions on what bit of code is the most buggy using this model, but just haven't done it yet. It would be cool to integrate it with GitHub's bugs api to see how correct it might be. If someone does make it let me know!

dustingetz · on Sept 26, 2011

> However, I still haven’t found any studies which show what this relationship is like. Does the number of bugs grow linearly with code size? Sub-linearly? Super-linearly? My gut feeling still says “sub-linear”.

interesting, my gut says exponential, which is why cost and likelihood of project cancellation shoot up in the largest projects.

edit: i have been corrected below, i concur with quadratic.

tshaddox · on Sept 26, 2011

Are you sure you mean exponential? I think quadratic is a more reasonable guess.

blahedo · on Sept 26, 2011

He might be using "exponential" to mean "superlinear", which seems to be the sense a lot of my students try to use it in (as well as non-technical people).

tshaddox · on Sept 26, 2011

Now that you mention it, I guess non-technical people do tend to use that definition. I had never thought about it before. It's unfortunate, considering there's a big difference between, say, quadratic and exponential growth. In fact, it tends to be a more impactful difference than between linear and quadratic, especially regarding algorithms.

voodoomagicman · on Sept 26, 2011

I am not aware of the difference. Is it that exponential means something like c^n, superlinear is n^c, and linear cn?

tshaddox · on Sept 26, 2011

What Locke1689 said. If you're not familiar with big O notation, linear is like f(x) = 2x, quadratic is like f(x) = x^2, and exponential is like f(x) = 2^x. There's a huge difference between quadratic and exponential: exponential grows significantly faster as x increases. For computing, the difference is significant. Cobham's thesis says that polynomial algorithms (which includes quadratic) are reasonable to perform, while exponential algorithms aren't.

Locke1689 · on Sept 26, 2011

Superlinear is anything above linear. This includes pseudolinear (e.g., O(nlog n)). Exponential is O(m^n). Linear is O(n).

Edit: And polynomial is O(n^m).

dustingetz · on Sept 26, 2011

without straining my brain, to me its reasonable to measure complexity by number of interacting components, which is a combination, which is geometric, which is a discrete exponential, right?

sage_joch · on Sept 26, 2011

The number of edges in a complete graph is n*(n - 1)/2, so by that metric it would be quadratic.

lurker19 · on Sept 26, 2011

Yes but the number of nodes is n, and an "interaction" among m nodes might not be decomposable to a chain of pairwise interactions, which raises the ceiling back to exponetial (actually factorial, which is worse).

scarmig · on Sept 26, 2011

What corresponds to interactions that aren't decomposable as pairwise interactions? Race conditions? Resource utilization? Real bugs (and the nastier ones), but probably a minority. So factorial with a relatively small constant in front of it.

But obviously the real answer is to write a program that will correctly verify all other programs.

joshhart · on Sept 26, 2011

It can be quadratic. Think about drawing all possible lines between points (possible components in software):

Make a table with columns being # points and the # lines you can draw between them.

1 0 2 1 3 3 (triangle) 4 6 (box with a criss cross) 5 10 (5 point star, with everything connected) ...

This relation is n * (n - 1) / 2

lurker19 · on Sept 26, 2011

Yes, combination, which is factorial, which is yet larger than exponential.

Not sure what you mean by "geometric" in this context.

MattJ100 · on Sept 26, 2011

The Art of Unix Programming by esr covers just this question[1]. Nowadays module/file size is one of the most important factors in how I design and write my code.

[1]: http://catb.org/~esr/writings/taoup/html/ch04s01.html

codex · on Sept 26, 2011

Isn't it the case that small code bases contain fewer features? So, really, isn't this result merely that the bug rate per feature is a constant?

chc · on Sept 26, 2011

That is not necessarily the case, no. For example, implementing the functionality of printf yourself is rather involved, but calling printf from the standard library is literally a one-liner. In both cases you get the functionality of printf.

Similarly, there are brief ways to write things and verbose ways. Sometimes what is commonly written as large factory classes and interfaces in one language works out to a simple higher-order function or macro in another language. See the old "evolution of a programmer" joke for some extreme examples.

jpitz · on Sept 26, 2011

( I HATE that you are downmodded. )

I would argue that it isn't just bugs per feature, but bugs per interaction-point. the more features there are, the more interaction points between those features.

InclinedPlane · on Sept 26, 2011

Not at all.

Consider a very high quality code base with a lot of features. If you leverage every best practice by using advanced techniques such as advanced functional programming (using closures, macros, monads, etc.), aspect-oriented programming, and domain-specific-languages as necessary then you could still have a very small code base.

Most code bloat is due to things like unnecessary duplicate code (such as common error handling, logging, or thread management idioms being inlined and "unrolled" everywhere rather than tucked away behind nice abstractions), different code using different sub-components that are very similar (e.g. every team using their own hand-rolled string class in C++), working around limitations of whatever language is used (stretching limited languages too far), working around design defects, and such like. Anyone who has had even a tiny exposure to functional programming techniques can appreciate the enormous power it has to reduce code quantity.

gruseom · on Sept 26, 2011

I have two questions for y'all on this.

I've heard it said for years that studies show the number of bugs grows roughly linearly with code size and that this holds true across any programming language. (It's repeated, for example, at http://c2.com/cgi/wiki?LinesOfCode) I think I've even seen references to such studies, but I don't remember where. So, HN: what are these studies? Anybody know? (I mean besides the one referenced by the OP on class size. I believe this meme goes further back than that.)

My second question is about how to measure code size. PG said a few years ago: why not just count tokens? I've thought about this ever since and I don't see what's wrong with it. Raw character count is obviously a lame metric, and LOC isn't much better. But token count seems like an apples-to-apples comparison that is easy to measure objectively and leaves out noise such as name length and whitespace. So: what's wrong with token count as a measurement of code size?

tansey · on Sept 26, 2011

Here's one study showing that complexity grows exponentially with the size of the program:

http://www.cc.gatech.edu/sparc/Resources/readings/berry.pdf

Here's a study showing addressing changed requirements or ﬁxing program defects requires a program maintenance effort that is directly proportional to the size of a program:

http://dl.acm.org/citation.cfm?id=358228

gruseom · on Sept 26, 2011

Thanks, I'll take a look. This point from the abstract of the latter is highly reminiscent of the paper cited by the OP:

"Repair maintenance is more highly correlated with the number of lines of source code in the program than it is to software science metrics."

Edit: I read it. This must certainly be one of the studies people are referring to; it covers exactly the question of interest. The major limitation is that all the programs were in PL/I, so it says nothing about language-independence. The important finding is that line count was the most highly correlated variable with bug count of those studied, quite a bit more so than more complex metrics were (Halstead's E). It's also interesting that although the authors write, "There are some very large programs in this set," the largest program was in fact only a very modest (by our standards) 6572 lines of PL/I.

jcromartie · on Sept 26, 2011

In the Computer Language Benchmarks Game, they compare GZip'ed code size. I disagree with removing comments, since comments tend to be important, and comments generally document weirdness and bugs inherent in the platform/library/language.

http://shootout.alioth.debian.org/help.php#gzbytes

GregBuchholz · on Sept 26, 2011

Les Hatton has some of what you are looking for. See "Re-examining the fault density - component size connection":

http://www.leshatton.org/IEEE_Soft_97b.html

...and some of his other work:

http://www.leshatton.org/index_CS.html

gruseom · on Sept 26, 2011

Thanks. I'll take a look at those too.

jacques_chester · on Sept 26, 2011

> So, HN: what are these studies?

If you have access to the publication libraries of the ACM and IEEE, you'll find that they publish most of this literature.

(It costs money for both, unfortunately).

> My second question is about how to measure code size. PG said a few years ago: why not just count tokens?

Some schemes do so, it depends on how you define "SLOC". The classic problem is if-thens.

How many lines is this?

    if ( foo ) then bar else baz

Or this?

    if ( foo )
    then bar
    else baz

Or this?

    if ( foo ) then
      bar
    else
      baz

In the literature you'll see a distinction between "physical" lines of code and "logical" lines of code. The latter is close to token-based.

> So: what's wrong with token count as a measurement of code size?

I vaguely recall that some properties correlate with physical lines and others with logical. I can't recall what and which, sorry.

gruseom · on Sept 26, 2011

If you have access to the publication libraries of the ACM and IEEE, you'll find that they publish most of this literature.

I know where to find research literature. I'm asking for specific citations. Is the claim an urban legend? If "studies show" X, one ought to be able to point to the studies.

How many lines is this?

All three of your examples have the same number of tokens, so to judge by them alone, token count is not just a good measurement of code size, it's a perfect one. My question is what's wrong with it.

I don't see how "logical lines", whatever that is, can possibly be simpler than counting tokens. In fact I don't see how anything can be simpler than counting tokens, since it's easy to know what it means, a tokenizer is always available, and everything irrelevant to the program is by definition dropped.

slowpoke · on Sept 26, 2011

  I don't see how "logical lines", whatever that is

At least by the definition I'm accustomed with, a logical line is a statement or series of statements which directly and logically belong together, for example a function call or an arithmetic operation.

Logical lines are independent from physical lines as each logical line can be split over multiple physical lines (eg splitting up a long string IO operation), or one physical line can contain multiple logical lines (this is a bad idea in most cases though).

Since the definition hinges a bit on what somebody considers as "logically belonging together", the whole concept is a bit fuzzy. Consider this string formatting operation (Python):

somestring.split(somechar)[-1]).replace("foo", "bar")

Do you consider this one logical line? Or would your logical lines look more like this:

somestring = somestring.split(somechar) somestring = somestring[-1] somestring = somestring.replace("foo", "bar")

Both are valid interpretations of logical lines, but they are visually and conceptually quite different, which makes it - imho - a bit problematic to try and use them as a measure of code quality.

gruseom · on Sept 26, 2011

You make sense, but the concept itself seems so fuzzy and hard to nail down that I marvel at how it ever arose, given that there already exists an unambiguous and ubiquitous way to distill the logical structure of code free of textual artifacts.

akkartik · on Sept 26, 2011

I think token count hasn't gained traction because of languages with syntax and types. "Surely all the boilerplate in defining a class doesn't make it more complex," goes the argument.

I prefer to turn the debate around on its head. Rather than argue complexity metrics (boring) I say I prefer languages without boilerplate because they make complexity harder to camouflage.

gruseom · on Sept 26, 2011

One can argue that boilerplate doesn't add to complexity (though I don't agree). But no one can argue that it doesn't add to code size. The studies cited in this thread show either linear or superlinear growth of bug count with code size. If those studies are correct, doesn't that rather settle the issue?

akkartik · on Sept 26, 2011

mnutt · on Sept 26, 2011

One way that I could see where actual lines is a more useful measurement is if there is any correlation between what percentage of the code you can see on-screen at any time and bugs.

Not saying that there is such a correlation, just that there may be cases where it is useful to measure by line count.

jacques_chester · on Sept 26, 2011

> I'm asking for specific citations.

None to hand right now.

> I don't see how "logical lines", whatever that is, can possibly be simpler than counting tokens.

I wasn't rejecting token counts per se. I think that it's a useful metric too.

What I was trying to convey is that "logical lines" is the term used in the literature. Logical lines can cover token counts if you define 1 token = 1 logical line. Or it might not. Either way, you have to settle on a definition.

cpeterso · on Sept 26, 2011

A common LOC metric for languages in the C family is to count semicolons ("real" lines of code).

gruseom · on Sept 26, 2011

I used to use that all the time. It's great because it's quick, and a pretty damn good sloppy metric. I remember joking that the extra semicolons in for loops and the lack of semicolons in if/while statements balance each other out. People tend not to use semicolons in comments, either, except in commented-out code (which some of us have a pet peeve against anyway).

alok-g · on Sept 26, 2011

It's very easy to correct for both for loops and if/while.

gruseom · on Sept 26, 2011

Yes, but then the metric is no longer so charmingly trivial.

anon_d · on Sept 26, 2011

I actually think raw byte-count is a pretty good metric. Documentation size and variable name length are also symptoms of complexity. Specifically I think wc -c $(find -type f) (how many total bytes) and find -type f | wc -c (how many files and directories) make good metrics. Obviously you need to filter out data files and such.

This has some problems (e.g. spaces vs. tabs, utf8, etc.), but all of these size metrics will be pretty loose.

gruseom · on Sept 26, 2011

But this penalizes programmers who like to use long readable names. I'm not one of them (though I used to be), but they have a strong case here.

Take any program. Replace all the names with the smallest possible character sequences. Have you made the program simpler? Or smaller in any meaningful way? Surely not. I'd say what you've done is left its logical structure precisely intact (another way of saying that token count is a good metric) while reducing its readability.

anon_d · on Sept 26, 2011

This metric relies on the assumption that people are trying to produce readable code. IMHO long variable names are much more helpful in complex codes than simple ones.

gruseom · on Sept 26, 2011

Ok, but now I'm wondering if we have opposite views of code size. In my view, code size is bad bad bad. More code means more complexity. Any time you add code, you're subtracting value; it's just that (if it's good code) you're adding more value than you're subtracting. So a higher score in a code size metric is a bad thing to aspire to, and we should greatly favor approaches to writing software that -- all other things being equal -- lead to smaller programs. I don't think that programmers who use long names for readability should have their programs discounted as longer (and thus more complex). Just because their names are longer doesn't mean their programs are.

anon_d · on Sept 26, 2011

No no no. My logic is this: Take tight, readable code with short names a replace them with long names, and you'll have worse code. The converse isn't true because complex (bad) codes are more readable with long variable names.

Complexity -> Code Size Code Size -> Long Variable names (win for big codes) Complexity is bad

Therefore long variable names are a symptom of a problem, but not the problem themselves. Long variable names aren't bad, but they are still a good predictor of badness. Since size metrics are meant to predict badness, long identifiers should increase size metrics.

gruseom · on Sept 26, 2011

Oh, I see. You sound like an APLer. We have similar tastes, but many good programmers disagree, so I doubt that long variable names are a predictor of program badness. Not every long name is FactoryManagerFactoryManagerFactory.

Consider a language like K, in which variables usually have one-letter names. The real code-size win for K is not that. It's that the language is so powerful that complex things can be expressed in remarkably compact strings of operators and operands. (Short variable names, I'd argue, are an epiphenomenon. It's because the programs are so small that you don't need anything longer, and longer names would drown out the logical structure of the program and make it harder to read.) Token count is a good metric here. Both line count and byte count come out artificially low, but token count can't.

gruseom · on Oct 2, 2011

I came back to say I've thought about your argument a couple more times and I think you're on to something there. The idea that long variable names, even when they add to readability, are a secondary indicator of code badness (because the code is too complex not to be able to get away with short names) is a subtle and interesting way to frame the problem. I'm surprised it didn't get more pushback from the 95+% of programmers who take the opposing view. I suppose this little corner of the thread is a quiet enough backwater that nobody noticed.

But I still don't see how you get around the objection that, according to your preferred metric, if you replace all the names with arbitrarily small character sequences, you get significantly smaller code - yet clearly not better code.

icefox · on Sept 26, 2011

Also reduced its maintainability.

jacques_chester · on Sept 26, 2011

One metric I've seen is gzip-compressed size, which has the nice property that it identifies the size of the incompressible elements -- ie it discounts repetitive boilerplate.

Another interesting set of metrics is Halstead's "software science" metrics[1]. They fell out of favour because initially they were hard to count and didn't seem to correlate with anything else.

[1] http://en.wikipedia.org/wiki/Halstead_complexity_measures

anon_d · on Sept 26, 2011

I never understood the gzip one. Repetitive boilerplate is bad; why hide it?

jacques_chester · on Sept 26, 2011

You're trying to understand the "true" size of the software in spite of the idiosyncrasies of a given language.

As I noted somewhere above, "size" is an abstract, dimensionless quality. It can only be approached through proxies. The more the merrier, I reckon, especially if they turn out to correlate with different things.

jcromartie · on Sept 26, 2011

In the case of most projects, copy/paste code is not just because of the language. It's because of lousy programmers. I've seen large codebases which are made up of a full 40% duplicate code. There's no way to blame that on the language.

pyre · on Sept 26, 2011

You're missing the point of the parent: There is no 'one true' metric. If you use different metrics (actual lines, logical lines, gzip'd size, etc) you may well find different correlations.

gruseom · on Sept 26, 2011

But repetitive boilerplate is exactly the last thing that should get away scot-free in a measurement of code size.

jacques_chester · on Sept 26, 2011

It depends on what you want to know. Pure physical lines is one thing, "size" is a another.

gruseom · on Sept 26, 2011

I want a way to measure how complicated a program is that's independent of language and obviously extraneous things like line length.

jacques_chester · on Sept 26, 2011

You may find that the Halstead metrics I mentioned are closer to what you're after.

gruseom · on Sept 26, 2011

I've changed my mind. I'm interested in what I originally said: what's the best way to measure code size, and what are those studies (if they exist). Otherwise we get into debates about size vs. complexity, which is actually less interesting IMO. Size as a proxy for complexity is good enough for me.

watmough · on Sept 26, 2011

"I've heard it said for years that studies show the number of bugs grows roughly linearly with code size and that this holds true across any programming language."

I have absolutely no reason to doubt this, but I suspect this does not look deeply enough at the process.

Bugs, normally, get fixed, especially towards the end of a project, and it is much easier to eradicate bugs, and verify eradication, in a small project, than it is in a large one.

Poor bug-fixing, in the late stages of a large, buggy project, may well introduce further bugs, as well as discovering latent bugs, masked by the original ones.

ljlolel · on Sept 26, 2011

The Mythical Man-Month

gruseom · on Sept 26, 2011

That's a marvelous book-length essay, but not a formal study. Or does Brooks cite research on this?

GotToStartup · on Sept 26, 2011

Writing as little code as possible to fully accomplish a goal has recently become a fundamental principle that I live by.

If I can write a piece of code in fewer lines, I'll do it. That pretty obvious, we all would. But I try to take it a step further and consciously seek solutions that lead to fewer lines of code. Chopping down a large block of code is an incredibly gratifying feeling for me.

I find that writing less code while maintaining expressiveness usually leads to simpler solutions and, IMO, it is simplicity that reduces the bug count.

iandanforth · on Sept 26, 2011

I find that in a team environment that I would prefer my co-workers write more lines of code, and longer lines at that.

While, given a moment or two, I can unpack a dense list comprehension (a one liner) I would rather read several statements that add up the same thing.

Of course there are many times when you could do the same thing with fewer lines, in a more elegant, straight-forward way. However I have a hard time believing that just having fewer lines is a sufficient goal for flexible, maintainable code.

Then again I'm pretty new at this :)

revscat · on Sept 26, 2011

You will probably enjoy this, then:

http://binstock.blogspot.com/2008/04/perfecting-oos-small-cl...

codeslush · on Sept 26, 2011

In my experience, it's the size of individual methods/functions that determine the number of bugs. >50 or 75 lines of code per routine greatly reduces its maintainability and increases the number of bugs (often difficult bugs to track down).

mrb · on Sept 26, 2011

At my school (Epita, France) our C coding style standard mandated that all functions be <= 25 lines.

Even though some lines didn't count (like those containing a single curly brace), it was very tough, but always possible. This applied to all projects, small and large. They made us write mostly Unix apps, like an FTP server, a command-line NNTP client, a POSIX shell (I still remember how meticulous you had to be when reading all the man pages to implement process control and terminal control correctly!). Plus the code had to be portable across all 3 Unix OS running at the school: NetBSD, Solaris, Digital Unix. This was in 2000-2001.

For example I just checked the FTP server I wrote for one of the assignments (I still have a copy): 3123 lines and all the functions are <= 25 lines of code. Such rigorousness definitely shaped the quality of the code I now write professionally, 10 years later...

codeslush · on Sept 26, 2011

That's awesome! I'm guessing it fits within what I defined - you look at the starting line number and the ending line number for code points in a function and the number should be <50 to 75 LOC. That includes inline comments (noticeably not function definition comments). Code clarity should also be prevalent - meaning, nothing fancy! ;-) Don't cheat the system with single line if statements (for example). It's a really, really simple rule that works! People have argued with me, saying they needed more LOC for a routine, but not once has that proven to be true - at least not in the code I reviewed. And I'm, by far, not the sharpest tool in the shed. If I can do it, anyone can!

mrb · on Sept 26, 2011

Our coding standard was very strict. It was not possible to cheat and save lines by writing, eg:

  if (func()) a = 1;

You had to write:

  if (func())
    a = 1;

Writing very complex C programs with functions <= 25 lines is definitely possible. All Epita students were routinely doing it!

codeslush · on Sept 26, 2011

I was thinking more along the lines of:

if (func()) { a=1; }

/* curly braces didn't count in your allocation of LOC, but they would in mine. */

boyakasha · on Sept 26, 2011

If you read the abstract of the paper that the post refers to, it actually says that the size of a _class_ affects the number of bugs _in that class_. That's something very different to the size/bugs correlation for a whole application.

lurker19 · on Sept 26, 2011

This is a hugely important overlooked point. When the measurements have a systematic bias, a product optimized to those measurements will have systemic problems. In this case: large, easy-to-understand classes that are excessive in number and completely fail to interoperate correctly.

mynegation · on Sept 26, 2011

Size by itself is very elusive metric and varies hugely depending on the language used. At least for a single method/procedure I found Cyclomatic Complexity (http://en.wikipedia.org/wiki/Cyclomatic_complexity) to be best predictor of maintainability if not quality

anthonyb · on Sept 26, 2011

The metrics involved here are specifically OO metrics - inheritance depth, number of children and so on, but there's at least one, Weighted Method Count, which uses cyclomatic complexity as a weighting factor. It's described on page 7 of the paper.

It seems counter-intuitive to me too (that the complexity of a method doesn't matter), but perhaps if you have to have that complexity in your application, it's best to have it in one method rather than trying to spread it around with inheritance or some sort of pattern?

skew · on Sept 26, 2011

It is a result of the paper that once you control for lines of code, Cylomatic Complexity no longer predicts defect rate.

sleight42 · on Sept 26, 2011

This theory emphasizes the imortances of service oriented architectures. Note: I specifically am not endorsing WS-*.

Decoupling a larger application into smaller applications with well-defined interfaces should reduce cognitive load and, per this theory, perhaps defects as well.

blackhole · on Sept 26, 2011

The fact that this is apparently not painfully obvious is almost as pathetic as the fact that people actually try to use it to justify programming languages that reduce the lines of code instead of learning how to write code that isn't terrible. A giant, bloated codebase written in erlang is still a giant, bloated codebase.

That isn't to say erlang or any other language isn't worth learning (they are), its just that no language can save you from bad programming.

tikhonj · on Sept 26, 2011

I don't think the argument is that some language will completely "save you from bad programming" but rather that it will encourage bad programming less some other language.

Some languages like Java require more lines of code and provide less mechanisms for dealing with complexity than other language (say Erlang), which means that you're more likely to write bad code in Java than Erlang.

All languages are equal, but some are more equal than others.

normalhuman · on Sept 26, 2011

I always had a bit of a problem with this sort of study because you can never know for sure that you uncovered all the bugs in a program (short of doing a formal proof of correctness). This in turn introduces biases. Case in point, it could be that a program with a higher number of LOC is actually more likely to have a real user base which then leads to a larger number of bugs being discovered.

ff0066mote · on Sept 26, 2011

"Bigger is just something you have to live with in Java. Growth is a fact of life. Java is like a variant of the game of Tetris in which none of the pieces can fill gaps created by the other pieces, so all you can do is pile them up endlessly."

I am reminded of http://qntm.org/files/hatetris/hatetris.html

majmun · on Sept 26, 2011

I had theory that code mainainability is ultimately dependend on nuber of things that programmer can keep in head simultainously. Number ussually beeing beetween 4 and 12 in later text N. so that means if function has more than N parts it will be divided to 2 functions. and if class has more than N methods it will be divided. and so on.. Anything that has more than parts than N will be divided. and we must consider that N is not same for everybody but varies. what is maintainable for one programmer could not be for other.

How to measure what is your N number. It could be done like in that movie Rainman you throw toothpics and must quicly count them , start with large number and in next round remove some until you consistenly correctly count the number of toothpics.

in extension to this there is limit to number of parts that one man can control . this number is N^N

jonmc12 · on Sept 26, 2011

The unfortunately named CRAP method of measuring code quality uses a metric based on Cyclomatic complexity (http://en.wikipedia.org/wiki/Cyclomatic_complexity) and code coverage to estimate change and maintenance risk of code (this page has an equation http://www.artima.com/weblogs/viewpost.jsp?thread=215899). The paper cited in Vivek's article emphasises that code length decreases cognitive complexity. I would bet that Cyclomatic complexity also correlates to bugs and maintainability on the same basis.

tholschuh · on Sept 26, 2011

More insights about defect prediction in Thomas Zimmermann's publications at:

http://thomas-zimmermann.com/publications/list/Short

A lot of his papers are freely available as pdf.

pnathan · on Sept 26, 2011

After rummaging around code metrics, I've come to the conclusion that kLOC is the best estimator for 'hardness' of a codebase. There are a few ways to slice it, of course, (no-comment source only? statements only? semi-colons only?), but, fundamentally, I do not see any pragmatic use for code quality metrics besides "How many pages is this". Everything boils down to the amount of moving components in the system.

technomancy · on Sept 26, 2011

If you want to simplify it even further there is this: https://github.com/technomancy/bludgeon

    Bludgeon is a tool which will tell you if a given
    library is so large that you could bludgeon
    someone to death with a printout of it.

sunir · on Sept 26, 2011

No code has no bugs.

(A koan.)

daemin · on Sept 26, 2011

Conversly: No code has no features.

techiferous · on Sept 26, 2011

Inversely: This feature has no code: http://instantzendo.com (view source to see the code)

tikhonj · on Sept 26, 2011

I remember that that exact program (written in C!) won a prize at one of the IOCCCs.

Admittedly, I'm pretty sure it won the prize for the "Most Egregious Abuse of the Rules", but it won nonetheless.

daemin · on Sept 26, 2011

Yeah, but there's code there to return the no code page.

techiferous · on Sept 26, 2011

NOP!

( http://homepage.ntlworld.com/richard.leedham-green/ )

cpeterso · on Sept 26, 2011

No features has no bugs.

colomon · on Sept 26, 2011

So, does this mean that adding tests actually adds bugs to your overall program (main code + tests)? One might hope that well-designed tests at least push the bugs from the main code to the test... wonder if any research has been done on this particular question.

DannoHung · on Sept 26, 2011

Does this just mean character count or statements/expressions? I've worked with APL-like languages which produce enormous ratios of character to statement/expression but they never seemed particularly easier to debug than say, Python.

bluekeybox · on Sept 26, 2011

I wonder if real-life organizations/bureaucracies obey the same law.

oacgnol · on Sept 26, 2011

It'd be interesting to look at relative code quality vs. size on a per-language basis. Some languages take a whole lot less boilerplate to accomplish the same thing.

regularfry · on Sept 26, 2011

I'd love to know how this is affected when you include whitespace, and if code quality is measurably affected by how much can actually fit on the screen.

johnrob · on Sept 26, 2011

There is probably a proof out there for why fewer lines of code is better; but everyone who believes it is too busy pumping out features to bother articulating it.

Birejji · on Sept 26, 2011

I would say fewer AND human readable lines are better. The more both can be achieved, the better.

shithead · on Sept 26, 2011

[...] my hypothesis that the number of bugs can primarily be predicted only by the total lines of code [...] I still haven’t found any studies which show what this relationship is like. Does the number of bugs grow linearly with code size? Sub-linearly? Super-linearly? My gut feeling still says “sub-linear”.

That's an interesting question. I'd bet the other way - for example, a 500 kLOC program having more total bugs than the sum of ten 50 kLOC programs.

But even if the sub-linear hypothesis were true, there's the yield problem that semiconductor manufacturers know well. Suppose you have, statistically, one fatal defect per 500 kLOC. That means it's hard to get a functional 500 kLOC program done. But you could get right eight or nine out of ten 50 kLOC programs ...

fragsworth · on Sept 26, 2011

The 500k loc program will have more bugs, because suppose at best it can be divided into ten 50K components each with the same average number of bugs as the 50k program. The fact that the components must interact correctly will introduce more bugs. How much more is pretty unclear but I am certain it is more.

dramaticus3 · on Sept 26, 2011

I wonder if they correlated it with Programming Language. Because this would suggest that the more terse a language is, the less prone to errors it is.

xpda · on Sept 26, 2011

I just cut my Visual Studio font size down to 6, but my code runs the same.

majmun · on Sept 26, 2011

effects are not instant. you must keep your code on font 6 for ever.