Hacker News new | past | comments | ask | show | jobs | submit login
Bug finding is slow in spite of many eyeballs (haxx.se)
166 points by jonaslejon on Feb 23, 2015 | hide | past | favorite | 67 comments



Despite being big and different and having developed some grumpy old forum symptoms, HN is still smart. As such, I think we have a great example of memetic churn and progression here. We start with manifesto-ey essays trying to cut through some habitual thinking and paradigms. The new fundraising model. Bootstrapping. Web based SAAS is an ocean with a lot of paradigm shifts, the tech options, the business models, the development cycle. Those get people excited and defensive and there's fun to be had arguing it out. There's all sorts of pedantic arguing in the fringes pointing out that this things not strictly true. Then there are manifesto busting pieces. Case studies and anecdotes about manifesto-ed approaches going terribly wrong.

I'm not down on this process, it's progress.

Anyway, I think this is a part of it. "Enough eyeballs" is a big hairy hefty concept, part of the big hairy open sourced concept. No one could have predicted open source. The cultural and economic (BTW, economy is a big part part of culture) machinations of open source add up to something big and impressive and unpredictable. "Enough eyeballs" is a (somewhat squishy) big concept trying to make sense of it. This is like trying to boil down 'platform wars' or 'platform openness' into slogans and roes of thumb. You may come up with an interesting, informative and useful way of looking at things. But, underneath it is a big harry tangle of machinations, circumstances and ungodly complexity. Hence the need for rules of thumb and manifestos in the first place.

This post is part of the more mature stage. The audience is expected to be on Linus' side already and the novel part is discovering that the rule of thumb is isn't working in all cases. Diging into the machinations. Who are all these eyeballs? What are these bugs? Are some types of eyeballs more or less effective against some bugs?

I'm not sure where I'm going with this… be excellent to each other?


Anyone who has a couple of year into software engineering knows that there is diminishing return over the number of "eye balls" and that some problems are just god damn hard or need to be looked at with the problem in mind.

Software quality is orthogonal to the openness or closeness of the source code.

I think it is more about the will to do something great, the ability to listen to the user base and of course the technical skills of the authors.


The article shows it's not about that

> Perhaps you think these 30 bugs are really tricky, deeply hidden and complicated logic monsters that would explain the time they took to get found? Nope, I would say that every single one of them are pretty obvious once you spot them and none of them take a very long time for a reviewer to understand.

For me the takeaway is that even something as basic and mature as curl can't get it right in C. Two buffer overflows in as many years, along with TLS and HTTP failures. It's past time to move onto better tools. I hope the "bitcoin piñata" calls attention to the fact that there's now a full SSL stack available in a substantially safer language.


It's the devil that we know. New environments: new exploit classes.


There are some great examples of that (return-oriented programming as a way of getting around non-executable data segments), but surely there are some tools where adopting them was a pure win for safety and correctness over the status quo.


> or need to be looked at with the problem in mind.

The article addresses how hard some of the bugs were (they were not hard), but additionally, sometimes (often?) code needs to be looked at without the problem in mind -- who hasn't written code (or a paper, or comment on HN) and proofed it immediately, committed it, and realized it has an error that your mind elided over because you were so sure "what you meant" was there that you glossed over a dumb error. Having a different mind review these things avoids that problem.

Edit: I glossed over this -- I was also referring to proofing your work much later, so it's not still fresh in your mind "as you think it is". So, bring a "different mind" to the table.

s/completely different mind/different mind/


I don't think orthogonal is quite right either.

I think it's just a matter of being complicated when you get dig into it. Open Source has certain strategies for quality that work, certain advantages (and disadvantages), different economies which means different abundances and scarcities of different resources. This "Linus’ Law" expresses something novel about the way open source projects can benefit from the advantages of being what they are. It's not an empty statement just not an airtight, always-right rule.


Yeah, number of eyeballs helps but it's not a catch-all. In some ways it could actually be a hindrance, since less-experienced eyeballs may not be able to mind all the necessary details, or may introduce further bugs/vulnerabilities. For our purposes, these are "bad programmers", whether or not they're good programmers on other stuff.

A simple model for code quality is something like Q =(good programmers) / (good + bad programmers). I would actually argue that it's not even related to the average, and that bad programmers can degrade quality disproportionately to their actual number. I think it might be something closer to Q = (good programmers)^2 / (good + bad programmers)^2. This is what it seems like people are getting at with the whole "negative productivity" in the good/bad/10x programmer framework.

Openness/closeness of the source code isn't the whole picture but the open-source model can more easily run into the too-many-cooks problem if the source is not properly gate-kept and reviewed in the aggregate. Thus I don't think it's entirely orthogonal. Of course closed-source software is typically commercial, which runs into its own set of pressures that degrade code quality.

I think much more sophisticated testing systems are what will really boost code quality. More advanced detection suites that throw more warnings, mandated full unit test coverage, plus CI frameworks that make sure that your code isn't merged if it doesn't clear the warnings and pass the full-coverage unit test. Randomized address systems and mandated array-bounds checking that catch undefined behavior and off-by-one errors. Basically, making things fail noisily instead of silently and forcing contributors to pay attention.

The C compiler in particular is really bad on the "throwing warnings" thing, even before you get yourself into other kinds of trouble. As in, according to the C standard, "rm -rf /" (or literally any other behavior the compiler wants) is a valid output behavior if you miss a closing quote([1]), rather than a compile-time error. That's an absurd definition in a security-minded world, and that's just the most egregious example of undefined behavior allowed by the C standard.

JVM-style managed code and declared exceptions are annoying but do seem to be a step in the right direction from a security perspective.

[1] http://blog.regehr.org/archives/213


It is a long tail distribution and it happens because of different computational complexity classes. I don't really know if there is anything to be done about it.

I'll give you a high grumpy old forum five though.


Eventually, we need to move towards formally verified software. Some bugs slip past any number of human eyeballs. It won't be feasible to do formal verification of all software anytime soon, but it should be done for operating system kernels, networking and crypto libraries, virtual machines, compilers, and similar security-critical software.

Switching to programming languages with type systems powerful enough to protect against common classes of bugs is a good first step (but not enough).


I feel like if we did this we'd never get to an actual working system. How do you formally verify that a remote file system interacting with a faulty hard disk is doing the right thing? This reminds me of Hurd, which is arguably a superior design that will never be finished.


That's certainly an issue, and there's no easy way around the fact that even formally specifying the behavior of a program that has to interact with the outside world is problematic.

Nonetheless, some of the academic research that has been done on formal verification is quite impressive (including the development of actual nontrivial working software). And it can be done piecemeal -- you could formally certify the functional parts of a crypto library (and demand this standard of future replacements) independently of what goes on in the world of file system drivers.

Realistically speaking, as worse is better, I think a large push for formally verified software probably won't happen for decades to come, but it will happen eventually. It already happened in parts of the hardware industry (Intel certainly don't want the division bug to happen ever again). The main issue is that we need tools that make development easier, to allow keeping up with the rapidly changing requirements in the software world.


>we need tools that make development easier, to allow keeping up with the rapidly changing requirements in the software world.

The practice of Test driven development is a great solution to this. The problem is very few people use it, many people actually are against it, and many of those that do do it, do it incorrectly (not on purpose). I would think that small chunks of singly-purposed functionality would be much easier to verify.


There's kind of a considerable gap between formally verified software and test-driven development, though. For example, a parser or the routines that go into it might pass dozens or hundreds of unit tests, yet still fail to parse certain inputs correctly, fail to parse them in exactly the same way as another parser (as Langsec researchers have emphasized, often producing possible time-of-check/time-of-use vulnerabilities), or even have a memory corruption bug.

I don't mention this in order to criticize test-driven development or the improvements it can bring to software reliability or safety, just to point out that there's still a big gap from there to a formal proof of correctness.


What are some examples of non-trivial working formally verified software?


There is Quark [1], a web browser with formally verified kernel. Here, kernel is a process which manages other slave (helper) processes which render the page actually.

[1]: http://goto.ucsd.edu/quark/


http://compcert.inria.fr/ is the first that comes to mind.


Sure, but there will still be bugs. We might have to figure out what a bug is semantically on the next layer of abstraction as we are developing it (and have the capacity to formally describe it), but there will be opinions, and thus there will be bugs.


Of course. In the end, we don't even have a perfect understanding of the physics on which digital technology is built. Nonetheless, we can evidently do much better than we do today, and delegate more of the hard work to our robotic slaves.


I think a wise person would hesitate at judging a task hard or easy.


"Because in reality, many many bugs are never really found by all those given “eyeballs” in the first place. They are found when someone trips over a problem and is annoyed enough to go searching for the culprit, the reason for the malfunction."

That /someone/ is able to go searching for the culprit instead of having to rely on someone ELSE to look at the source and figure out what is going on is the whole point of the quote, no?


Hazy memory, but three decades ago I read some research from the Open University in the UK that said, roughly, that the number of bugs found is a factor of the number of users.

Some apparently "bug free" programs are actually riddled with bugs but they are not found because almost nobody uses them. They are probably not fixed for the same reason ;-)


Yes, open is source indeed awesome that way. Anyone can, in theory, go chase down that bug they tripped over. And many people do! But not nearly as many people as are capable of it (say, developers with some familiarity with the language, et al).

I've made just a few bug fixes to open source software that I didn't have some ownership of. From talking around with other devs over the years that makes me quite unusual, in that almost none of them have made any fixes to other peoples' code. And here I was feeling bad that I hadn't done more.


I don't think it should be surprising that few users actually look at the code or are willing to dig into a foreign code base. It's still true that open source makes it possible, which is a huge step up from any other model. There's a lot of backlash right now due to some very high profile bugs that have been around for a very long time. But would those bugs have been found if the programs hadn't been open? Also, look at what happened after heartbleed: Another group of people decided to dig into the openssh code base and try to clean it up, finding and fixing lots of other issues, without any authority from the original authors. That's the benefit of openness, in my opinion.


Yes, the "many eyeballs" is a part of it. But I think the author's point is that they are only half. The other half is someone has to trip over the bug. That is, people aren't finding these bugs purely through reading the source code.


There is a huge difference between a code review and the specific search for the cause of a known, defined bug. Should we rephrase Linus' law as "given enough eyeballs, all bugs' causes are shallow"? But maybe that's what the sentence actually meant for the first person that wrote it.


So how many bugs remain?

Mostly rhetorical question, but can any extrapolation be done? If you go back five years, can any of those numbers correlate to the findings since? Do any metrics such as cyclomatic complexity, #defects/kLoC[1][2], unit tests or code coverage help?

In most cases the definition of "defect" is not well-defined, nor in many cases easily comparable (e.g., a typo in a debug message compared to handling SSL flags wrong). Is is a requirements or documentation bug: the specification to the the implementer was not sufficiently clear or was ambiguous. Also, when do we start counting defects? If I misspelled a keyword and the compiler flagged it, does that count? Only after the code is commited? Caught by QA? Or after it is deployed or released in a product?

Is it related to the programming language? Programmer skill level and fluency with language/libraries/tools? Did they not get enough sleep the night before when they coded that section? Or were they deep in thought thinking about 4 edges cases for this method when someone popped their head in to ask about lunch plans and knocked one of them out? Does faster coding == more "productive" programmer == more defects long term?

I'm not sure if we're still programming cavemen or have created paleolithic programming tools yet[3][4].

p.s.: satisified user of cURL since at least 1998!

    [1] http://www.infoq.com/news/2012/03/Defects-Open-Source-Commercial
    [2] http://programmers.stackexchange.com/questions/185660/is-the-average-number-of-bugs-per-loc-the-same-for-different-programming-languag
    [3] https://vimeo.com/9270320 - Greg Wilson - What We Actually Know About Software Development, and Why We Believe It's True
    (probably shorter, more recent talks exists (links appreciated))
    [4] https://www.youtube.com/watch?v=ubaX1Smg6pY - Alan Kay - Is it really "Complex"? Or did we just make it "Complicated"?
    (tangentially about software engineering, but eye-opening for how much more they were doing, and with fewer lines of code) (also, any of his talks)


While technically, it is possible to use statistics as "proof" with N=30, but it is stretching it a bit, IMHO.

For example, stating that the amount of reports per year corresponds to the amount code added, by stating that both are "somewhat linear" is not very solid. I could just as well state that the amount of reports per year is "somewhat exponential" and conclude that it does not correspond to the amount of lines of code added.

This does not make the point overall any less true, it is just that the foundation: the numbers, are too few to make any grand conclusions from.


Might I humbly suggest that anybody serious about this issue read (sadly, the late) Manny Lehman's "FEAST" publications? He attempts to quantitatively model software evolution, which includes complexity, errors of omission (limitations of domain model), errors of commission ("bugs"), etc. It is fascinating reading. I remember many "Aha!" moments when seeing the graphs. It also contains many quantitatively-derived principles one can operate by, some of which underlie pg's "beating the averages" argument. His wikipedia page is here https://en.wikipedia.org/wiki/Manny_Lehman_%28computer_scien..., and the FEAST pubs are here http://www.eis.mdx.ac.uk/staffpages/mml/feast2/papers.html


I've been using open source for ages, and have rarely taken the time to look at the source.

But, at least its there.

I sometimes wish there wasn't such motivation to hack software. If we were all working towards a common good (in my case I'd like to see us doing a bit of space exploration, renewable energy tech etc), and we wouldn't need to exploit stuff.

But, we're humans. Greedy little scumbags hehe, always looking for the short term gain. All varying forms of politicians.


Whereas closed source proprietary software has no eyeballs on it. Open source is at least an opportunity to identify problems by third parties without reverse-engineering. Open source also allows code analysis tools to do automated tests across wide numbers of codebases.


And that is the key IMHO. Closed-source requires enough commercial incentive by one firm to look at it. Open-source requires enough incentive in the aggregate across many problem, each of whom may have their own reasons... and who cares what they are.

Most bugs I have seen figured out have been a collaborative effort. One person finds one part, which leads to the next person figuring something else out, etc. Much harder to do when it is just your job, and you may not even be paid for this work.


Closed-source requires enough commercial incentive by one firm to look at it.

Or a motivated attacker to reverse it for exploits.


> Whereas closed source proprietary software has no eyeballs on it. Open source is at least an opportunity to identify problems by third parties without reverse-engineering. Open source also allows code analysis tools to do automated tests across wide numbers of codebases.

Shared source (http://en.wikipedia.org/wiki/Shared_source) also has these properties (but not the freedom properties of open source).


Closed source does in fact have eyeballs on it. Vulnerabilities are found in source such as Windows by those who are only looking at the binary.

Automated testing tools are also available at the binary level.


Closed source often has eyeballs specifically paid to... eyeball it.


All four combinations exist in nature: Open source code with plenty of eyeballs, open source software with no eyeballs, closed source software with tons of reviewers, and closed source software with no no reviewers. It'd be interesting to see what the percentage break down is amongst these, but it probably wouldn't be surprising.

It's also another tick in the box as to why pre-checkin code review is so important - bad code is often immortal, and it can be really hard to patch out bugs if people have grown to rely on broken behavior, so it's best not to get yourself in that state to begin with.


Also interesting would be a breakdown by language. For example C++ versus Haskell.


"bad code is often immortal." I have to remember to quote that!


The question is what do the brain behind the eyeballs do?

Do they in fact register the bug, ignore it, file a report, solve it? Without source code we are left to trust the vendor that they actually fins and solve bugs. The vendor and/or its staff might be under pressure to leave bugs in place. We just don't know.


You have to trust the vendor's interest in making a profit. Nothing wrong with that :-)

Closed source (usually) gives you Ts&Cs, support model and some level of guarantee of fitness for purpose which is why we pay to use it. This is not to say that there aren't bugs that seem to take forever to fix, but usually that is not the case at least not when there is a risk of the product becoming unsalable.

Of course there are closed source software or appliances that just are not worth their cost because they fail to mitigate the risks of having us trust what we cannot see, but these usually fail to gain any significant market share.


In these days wit SuperFishes and Lenovos selling out their customers for a few bucks more, trusting that the vendor have an interest in making a profit might not by itself be ok.


>You have to trust the vendor's interest in making a profit. Nothing wrong with that :-)

Sometimes there are conflicts of interest. We have documented cases of 3-letter agencies paying companies to leave bugs or unsafe options in code. Sometimes the backdoors may be more valuable than the product itself.

I suspect many corporate clients found out these days that their SSL MITM software they used made their infrastructure vulnerable.


This is very true. However trusting that open source does not include such vulnerabilities is a leap of faith we cannot make which is why auditing and quality control process are needed.


But just because they're paid to do it doesn't mean they care, or that they're any good at it.


You can apply that attitude in general to open source software as well.

Most people using open source software trust it as implicitly as they would have to trust closed source software, they have to, because reviewing and comprehending all of the code running on any typical machine is impossible. Most people simply use open source and don't care about (or aren't capable of) dealing with the code, and of the ones that do, there is no guarantee that they're going to be competent.


Totally true - i think if you want to use open source you should be prepared to read the code in order to understand it and for debugging


Headcount.

Given enough headcount all problems are solved in line with the time plan is the enterprise version of the law. With enough engineers, you can just treat them as replaceable units, just numbers in an Excel sheet. Move them around as needed to meet staffing requirements, or resource allocation requests as they are often called.

(No, I'm not cynic at all.)


maybe? How can we be sure?


Linus' law may scale, but even assuming these eyeballs produce bug fixes, applying those fixes to the source tree does not scale as well.

I have taken time to put my eyeballs on bugs in spidev's ioctl() and TI's spi driver but my bug fixes are not in the tree.

Signing off, adhering to the source standard and attaining enough respect from the established devs to get your fix accepted are the limiting factors.

I already invested significant amounts of time finding these bugs and fixing these issues; I don't have any more to spend to make Mark Brown or other kernel devs happy.

I don't even care about getting the credit for my fixes, but it seems the kernel devs don't want to take my code to next step and get it integrated.


I've got some different take-a-ways. First, time-to-discovery is different from the shallowness of the bug. Increasing the number of eyeballs looking at the code increases the likelihood that:

  a) someone will encounter the problem.
  b) that someone will be interested enough to dig into it.
  c) they will then find the problem before giving up.
That's the power of many eyeballs. It's an expression of interest. With fewer eyeballs, you might get bug reports (a), but not have enough people looking to get (b+c) out of the community. With closed source, (b+c) can _only_ be provided by the team. Open Source means that this can be provided by a sufficiently large community.

That open source projects are having security bugs reported can probably be explained by economics. It's becoming harder to find security problems in closed source projects (windows), so researchers shift to open source ones. With an open source project, there's a lot of low hanging fruit to be had with static analysis, copied code and fuzzing. Closed source is a lot harder, so people go for where the ROI is good - either towards the rewards or towards open source for practice.

Finally, shifting to Java just shifts the attack to the JVM, which is just as hard to secure. I still remember the year of Java exploits, complete with a remote DoS attack based around sending a floating point number to the server[1]. There will always be bugs. If your goal is to write secure software, open source is good. If your goal is to avoid bad press, making it expensive to test is probably the way to go instead.

[1] http://www.oracle.com/technetwork/topics/security/alert-cve-...


And: c) the bug report and fix will not be ignored.

Considering how OpenSSL handled bug reports and fixes (by letting them sit in the tracker for years) or Ulrich Dreppers lets say less than welcoming attitude the works done by the eyes can be a waste. And the many eyes soon withers to nil.


Well the conclusion is not surprising. Bugs aren't found because many people "read" the source code, but many people of many different skills use it and therefore every problem that is hard to you will some day find a person that has the specific domain knowledge to fix it much more easily.

Also I'm a little surprised at the size of the dataset and the choices, given that open source probably fixes an easier bug faster than a higher priority one.


Linus' Law is not some kind of catch all that applies to auditing code for security weaknesses. It specifically refers to the rapid quality control that happens when you release early and release often - the bazaar method of software development, as outlined here:

http://www.catb.org/esr/writings/homesteading/cathedral-baza...


I always understood this saying to be referring to the difficulty of fixing bugs once known. Some bugs are really hard to understand, reason about, and figure out how to fix. Given a large number of contributors, someone is likely to have just the right mindset and familiarity that the bug is easy for them to understand and fix (it is "shallow" for them).


This article and your observation remind me that there's a huge gap between ways in which bugs might be "shallow".

If Linus's Law and so on really did rely on people encountering the bugs by chance in everyday use of the software, it's no wonder (in hindsight) this doesn't help with a lot of the security bugs we face, many of which would never be triggered randomly in normal use, but rather require constructing elaborate attack scenarios. Even those that might occur by chance are not likely to be repeatable, and so not likely to get reported or analyzed.

Maybe this points to a change in our prototypical concept of a "bug". When ESR first wrote "The Cathedral and the Bazaar", I would have associated "bug" with something like "the TIFFs produced by program A can't be read by program B", or "program C seems to crash if you have a non-ASCII filename", or "program D drops its network connections if more than 65536 packets are received". Today, I would associate it with something like "an attacker who sends a certificate containing an extension with invalid ASN.1 encoding that follows one or more syntactically valid extensions that are marked Critical and that are unknown to the user-agent can get remote code execution" or "an attacker who sends an XML payload that is parsed correctly by library A and incorrectly by library B due to discrepant handling of Unicode escaping can request operations that should be forbidden".

Well, a literal hot-off-the-presses example would be a bug in handling multibyte characters in the regular expression library in Flash Player, which was exploitable:

http://googleprojectzero.blogspot.com/2015/02/exploitingscve...

In other words, these bugs are often complicated artifacts that require research to find and malice to use -- not annoyances and breakages that are frustrating end users every day.


Given enough eyeballs, all bugs are shallow.

Given the universal quantifier, one would assume that ESR meant security bugs here as well.

Also, it is stated as a law, but ESR did not back it up with much statistical evidence to support it. I would be surprised if it holds in reality for all bugs.

Extra questions:

- Does releasing early and often lead to more (non-shallow) eyeballs? Not really: see Heartbleed or the OpenSSL RNG bug in Debian.

- Do you need more eyeballs, or just the right eyeballs? (Given the right eyeballs, all bugs are shallow)

- Isn't the effect of moving to a safer language larger?


- Does releasing early and often lead to more (non-shallow) eyeballs? Not really: see Heartbleed or the OpenSSL RNG bug in Debian.

The existence of those bugs doesn't disprove the claim. You'd need to show that software that is not released early and often doesn't have (even) fewer eyeballs.


This law mostly applies to usability bugs. If a driver isn't working, somebody is going to notice. Security bugs on the other hand are often invisible during all forms of standard use. Heartbleed for example will never disrupt the user experience. The only way to find that problem is to be actively looking for it.

Generally speaking, 'eyeballs' aren't actively looking for problems. They're passively using software and noticing when something seems out of place. Many security bugs will never seem out of place, and are only discoverable when someone is intentionally probing for security bugs.


Yes, that's the party line. If it was true, the number of open bugs would decrease over time. For most open source programs, it increases.

Mozilla has passed the 1 million bug mark.


To be fair, mozilla uses bugzilla to track everything, not just "bug" bugs. Each new feature in development has a bug, or multiple bugs. When code is refactored there is a bug for that, when somebody wants commit access there is a bug for that, when an employee needs a new laptop there is a bug for that, when a community organizer wants some money or gear for an event there is a bug for that, and so on... Other organizations use their bug/issue trackers in a similar manner.


Mozilla has passed the 1 million bugs _ever filed_ mark. Across all Mozilla products, for that matter.

Some possibly more useful statistics:

For the "Core" product (read: Gecko), there are as of right now 52k open bugs and 230k closed bugs.

For the "Firefox" product (desktop, not Firefox android), there are 21k open bugs and 128k closed bugs.

Note that these "bugs" include feature requests, so unless you think the number of requested features will decline over time....


> For most open source programs, it increases.

I know of at least one study that contradicts this claim:

"We found that with shorter release cycles, users do not experience significantly more post-release bugs and bugs are fixed faster, yet users experience these bugs earlier during software execution (the program crashes earlier)."

http://swat.polymtl.ca/~foutsekh/docs/Khomh-MSR-2012.pdf


I've always read the expression backwards: With enough eyeballs, the shallow bugs will be found.

Things like spelling errors in print statements, comments etc gets corrected. By off by one errors, use after free and all kinds of subtle logical problems that only manifests once in a while and after long execution will require a focused effort to find (and more and more, good tools.) Daniels description matches this sentiment.


At least you've never read it as "With enough bugs, the shallow eyeballs will be found".


True. Nor that enough eyeballs with bugs will be found at the shallow end.


To quote Theo De Raadt:

My favorite part of the "many eyes" argument is how few bugs were found by the two eyes of Eric (the originator of the statement). All the many eyes are apparently attached to a lot of hands that type lots of words about many eyes, and never actually audit code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: