I don't think it is surprise that Linux is about adding more features in an organic way, instead of being well thought like *BSDs.
However saying that no testing happens in Linux kernel is dishonest, to say at least: there is automated tests maintained by big corporations like LTP [1] or autotest [2], thousands of people run different versions of unstable/mainline kernels with different configurations, security researches does multiple tests like running fuzzers and reporting issues, multiple opensource projects run tests in current versions of Linux kernel (that in the end also serves as a test of kernel itself), etc etc.
Linux is basically the kind of project that is big enough and impactful enough that naturally gets testing for free from the community.
This has always been my problem with the "many eyes make all bugs shallow" theory. It's a random walk. The most common features and configurations will get tested over and over and over well beyond the point of usefulness. Less common features might not get tested at all. In fact it's worse than a random walk, because usage is even more concentrated in a few areas. Yes, the fuzzers etc. do find some bugs, but relative to the size and importance of the project itself they're smaller than they would be on most other kinds of projects.
There's just no substitute for rigorous unit and/or functional tests, constructed with an eye toward what can happen instead of just what commonly does happen in the most vanilla usage. Unfortunately, UNIX kernel development - not just Linux - has never been strong on that kind of testing. Part of the reason is the inherent difficulty of doing those sorts of tests on such a foundational component. Part of it is ignorance, which I mean as simply "not knowing" and not as an insult. Part of it is macho kernel-hacker culture. Whatever the reasons, it needs to change.
I have been in a few situations where I could have found bugs, because I was exploring deep behavior, but the lack of legible documentation on the expected behavior made me back off. I have no clue on what’s expected or not, the man pages are just a joke and lkml is mostly unreadable.
> The most common features and configurations will get tested over and over and over well beyond the point of usefulness. Less common features might not get tested at all.
Not that I wouldn't want a featured Unix-like kernel with a comprehensive[1] test suite (and I don't know any major OS that have this, not Windows, macOS or Linux), however I think this is fine.
Common setups works fine, uncommon configurations may have some problems, with possible workarounds. Most people will not run mainline kernels anyway, so a workaround is acceptable.
[1]: What I mean by comprehensive is basically having tests for every possible configuration, that is basically impossible anyway. Probably the closest thing we can get is formally proven OS, however I don't think we will even have a general purpose OS formally proven.
> tests for every possible configuration, that is basically impossible anyway
Agreed. Expecting 100% across the entire kernel would be totally unreasonable. OTOH, coverage could be better on a lot of components considered individually.
Any component as complex as XFS is going to have tons of bugs. I don't mean that as an insult. I was a filesystem developer myself for many years, until quite recently. It's the nature of the beast. The problem is how many of those bugs remain latent. All the common-case bugs are likely to be found and fixed pretty quickly, but that only provides a false sense of security. Without good test coverage, those latent less-common-case bugs start to reappear every time anything changes - as seems to have been the case here. That actually slows the development of new features, so even the MOAR FEECHURS crowd end up getting burned. Good testing is worth it, and users alone don't do good testing.
I think filesystems in kernel have automated tests, I know xfstests [1] exist, at least. And they exist exactly because filesystems bugs generally are critical: a filesystems bug generally means that someone will lose data.
[1]: different from what the name may suggest, xfstests is run in other filesystems too. Here is an example of xfstests ported to ZoL (ZFS on Linux): https://github.com/zfsonlinux/xfstests
Yes, xfstests exists. I've used it myself and it's actually pretty good for a suite that doesn't include error injection. But part of today's story is that xfstests wasn't being updated to cover the last several features that were added to XFS. The result is exactly the kind of brittleness that is characteristic of poorly tested software. Something else changed, and previously latent bugs started popping out of the rotten woodwork.
What’s crazy for me, after reading all this, is how wonderfully stable my Linux (kernel-level)[0] experience has been. I’ve never used any non-ext2/3/4 filesystems, granted, so I haven’t used this code, but I find it hard to believe that these findings are indicative of the code I have used on a relatively run-of-the-mill amd64 machine. So maybe if you’re like me, using a fairly standard distro with the official kernel on somewhat normal hardware, you would have the benefit of millions others testing the same code.
[0]: I have had more than my fair share of user land problems, but I have come to expect that on any platform.
Yeah I've also had good experiences with Linux reliability.
But that's because I intentionally stay on the "happy path" that's been tested by millions of others. I avoid changing any kernel settings and purposely choose bog-standard hardware (Dell).
When you're on the other side, you're not just maintaining the happy path. You're maintaining every path! And I'm sure it is unbelievably complex and frustrating to work with.
-----
Personally I would like software to move beyond "the happy path works" but that seems beyond the state of the art.
Over time you get trained not to do anything "weird" on your computer, because you know that say opening too many programs at once can cause a lockup. Or you don't want to aggressively move your mouse too much when doing other expensive operations. (This may be in user space or the kernel, but either way you're trained not to do it.)
There is another post that I can't find that is about "changing defaults". I used to be one of those people who tried to configure my system, but I've given up on that. The minute you have a custom configuration, you run into bugs, with both open source and commercial software.
The kernel has thousands of runtime and compile-time options, so I have no doubt that there are thousands upon thousands of bugs available for you to experience if you change them in a way that nobody else does. :)
> Or you don't want to aggressively move your mouse too much when doing other expensive operations. (This may be in user space or the kernel, but either way you're trained not to do it.)
Operant conditioning by software bugs is totally a thing, but for this particular example I was trained into exactly opposite behaviour. I do move my mouse a lot during very resource-intensive computations, because that lets me gauge the load on my system (is there UI animation lag? is there cursor movement lag?), and in extreme cases, it can tell me when there's time to do a hard reboot. I've also learned through experience that screensavers, auto-locking, and even auto-poweroff of the screen can all turn what was a long computation into forced reboot, so avoiding long inactivity periods is important.
This conditioning comes from me growing up with Windows, but I hear people brought up on Linux have their own reason - apparently it used to be the case (maybe it still is?) that some computations relying on PRNG would constantly deplete OS's entropy pool, and so just moving your mouse around would make those computations go faster.
Your lens is probably too small, Paul does a great job quantifying bug probability to more human terms in the above presentation.
I ran all OS dev and support in an environment with 10000 FreeBSD systems and 3000 Linux systems in high scale production. The ratio of kernel panics was similar (although Linux tended to exhibit additional less rigorous failure modes due to more usage of work queues and other reasons). You could expect at least a couple faults per day depending on the quality of hardware and how far off the beaten path your system usage is at this scale. The big benefit I found with FreeBSD is that I could understand and fix the bugs, and I was generally intimidated by doing so on Linux.
How do you define "throughly" anyway? Just like a comment in the original post,
>>But anything less than 100% coverage guarantees that some part of the code is not tested...
> And anything less than 100% race coverage similarly guarantees a hole in your testing. As does anything less than 100% configuration-combination coverage. As does anything less than 100% input coverage. As does anything less than 100% hardware-configuration testing. As does ...
It is combinatorially impossible to "thoroughly" test something as large and complex as Linux kernel. Other than e.g. sqlite I struggle to think of a single substantial piece of software that is truly thoroughly tested.
However saying that no testing happens in Linux kernel is dishonest, to say at least: there is automated tests maintained by big corporations like LTP [1] or autotest [2], thousands of people run different versions of unstable/mainline kernels with different configurations, security researches does multiple tests like running fuzzers and reporting issues, multiple opensource projects run tests in current versions of Linux kernel (that in the end also serves as a test of kernel itself), etc etc.
Linux is basically the kind of project that is big enough and impactful enough that naturally gets testing for free from the community.
[1]: https://github.com/linux-test-project/ltp
[2]: https://github.com/autotest/autotest