> I'd estimate that 100% of sample implementations in published research papers ...

josephg · on Oct 12, 2019

Decades ago I had a friend who was taking a double major in CS and Physics at a school that shall remain nameless. He decided to do his masters thesis in environmental science. The department in question published a lot of papers based on simulations, showing how different factors could effect weather patterns and things like that.

The simulation program they used was written in fortran by a previous grad student, and there was no version control. It was instead copied from researcher to researcher, and expanded on in turn, then passed to the next researcher in the lab to be modified for their own work.

My friend asked for the version of the code which was used for particular research papers the group published. They couldn't find it. He asked where the test suite was. There were none. He asked if anyone had audited the code to make sure it worked, and - well you see where this is going.

Feeling frustrated, my friend spent the next month or two going through the code and writing tests to check that the code was actually correct. Amongst other things he found a + that should have been a - in a core part of the simulation code, and that change dramatically changed a lot of the results the program produced. Because none of the researchers actually kept the programs they used to generate their results, it was impossible to even tell when that bug was introduced, or which of the papers published by their department had correct results.

Anyway, this caused a big argument between my friend and his supervisor. My friend held that their department wasn't doing real science, and he ended up being asked to leave the team. He did his thesis in an entirely different field.

I don't know how common this sort of thing is, but its probably way more common than we'd like to admit. We should demand across the board that any paper which relies on source code co-publishes the source code. This is becoming increasingly common in CS papers, but it should happen across the board. If it were up to me it would be demanded by top tier journals as a condition of publishing.

dataflow · on Oct 12, 2019

Yes, I realize. I never disagreed with this. What you said is entirely consistent with the point I was making, which was that "100% of published research has bugs" and "researchers, even in computer science, are usually not skilled programmers" are the quite misleading claims to make about research quality.

leereeves · on Oct 13, 2019

It seems to me that this discussion has been about bugs that affect the research.

You're right that no one should care very much if there are typos in research code, or if it crashes on some input that it never sees, but the criticism here has been more substantial than that.