You are piecing a bunch of empirical facts together using reasoning, but there are lots of implicit assumptions in your reasoning.
One assumption is that reducing error cases, reduces errors. Maybe the other errors increase in prevalence when you reduce error cases?
You say that even in impure code you can factor out pure parts. You assume this refactoring doesn't introduce bugs.
You talk about the fraction of code that is pure in haskell repos. But we don't know how the amount of impure code compares to amount of impure code in other languages. Maybe it is the same? Just haskellers have some pure code in addition?
Finally, we are talking about the bug proneness of haskell, not just an idealized fp-language. Maybe something with haskell causes bugs. There are many candidates. Poor support for debugging has mentioned in another comment. Maybe the libraries are less mature. Maybe less IDE support. Maybe something about the syntax causes bugs. Etc etc.
For this reason I think it would be easier to just experimentally verify the hypothesis directly: Does Haskell cause less bugs?
> but there are lots of implicit assumptions in your reasoning.
Yes, there are assumptions, but now we're getting somewhere.
> One assumption is that reducing error cases, reduces errors. Maybe the other errors increase in prevalence when you reduce error cases?
I think this is pretty unlikely. If we switch from C to a programming language that makes buffer overflows impossible, we don't expect to see a corresponding increase in, for example, off-by-one errors. After working with a language like Haskell for awhile you start to see that when the compiler tells you about missing a case for Nothing, you don't just sweep it under the rug. You stop and think about that case. I think it's reasonable to assume that when a programmer stops and thinks about a case, he's more likely to get it right. Errors often happen because the programmer neglected to think about a case entirely.
> You say that even in impure code you can factor out pure parts. You assume this refactoring doesn't introduce bugs.
I'm not talking about refactoring here. I'm talking about writing your code that way from the start, so regressions aren't an issue. As for the question of how many bugs you might write to begin with, see the previous point.
> You talk about the fraction of code that is pure in haskell repos. But we don't know how the amount of impure code compares to amount of impure code in other languages. Maybe it is the same? Just haskellers have some pure code in addition?
In other languages ALL code is impure. There are no other pure languages in mainstream use today. So then the question becomes whether the pure code written in Haskell performs any useful function as opposed to being just meaningless fluff. It's obvious that this code performs tons of useful operations and every time it is used, we are avoiding the need to have to solve the same problem again. It's very clear what problems are solved by that code, and it's very clear that people are able to reuse it to avoid the work. Therefore, it's absolutely not simply additional cruft.
> Finally, we are talking about the bug proneness of haskell, not just an idealized fp-language. Maybe something with haskell causes bugs.
None of the points you mention cause bugs. Lack of debugging has nothing to do with causing a bug. It just means that you'll have to go about fixing your bug differently. You can look for counter-arguments all day long, but at some point you're just grasping at straws.
> For this reason I think it would be easier to just experimentally verify the hypothesis directly: Does Haskell cause less bugs?
It's not easier. Building software is hard and takes a long time. Furthermore, constructing a scientifically valid experiment is even more costly. Most people want to be productive, so they're not going to go to that effort. There is huge variation between people, so these kinds of experiments have to be undertaken like medical studies...i.e. with large n. If you're interested in building complex, real-world systems, you're simply not going to get a large n. Also, Haskell has only become commercially viable in the last 5 or so years, so there hasn't yet been enough time/effort spent to perform these experiments.
But even with all these caveats, we do have some experimental evidence. There was a CUFP report from a company that rewrote a substantial groovy app in Haskell [1]. And the result is that yes, Haskell did indeed result in fewer bugs.
Groovy was originally designed as a scripting language intended for manipulating and testing classes written in a statically-typed language, Java, in which substantial apps should be written. Only later did new managers of Groovy start promoting it for building systems even though it wasn't suitable for that. A better test would be rewriting a substantial Java app in Haskell.
One assumption is that reducing error cases, reduces errors. Maybe the other errors increase in prevalence when you reduce error cases?
You say that even in impure code you can factor out pure parts. You assume this refactoring doesn't introduce bugs.
You talk about the fraction of code that is pure in haskell repos. But we don't know how the amount of impure code compares to amount of impure code in other languages. Maybe it is the same? Just haskellers have some pure code in addition?
Finally, we are talking about the bug proneness of haskell, not just an idealized fp-language. Maybe something with haskell causes bugs. There are many candidates. Poor support for debugging has mentioned in another comment. Maybe the libraries are less mature. Maybe less IDE support. Maybe something about the syntax causes bugs. Etc etc.
For this reason I think it would be easier to just experimentally verify the hypothesis directly: Does Haskell cause less bugs?