Hacker News new | past | comments | ask | show | jobs | submit login

I've implemented algorithms I dug out of original research papers, which often included sample code. (That's why I first learned to read Fortran!) I've almost never gotten results that match the authors' exactly. Sometimes the bugs are obvious, and sometimes they're subtle. I'd estimate that 100% of sample implementations in published research papers have bugs. Researchers, even in computer science, are usually not skilled programmers. The product for them is the paper, not a program.

It's the same category of problem as "enterprise software". Whenever the customer is not the user, the user gets screwed. With research, the customer is the journal.




Academicia is suffering from a disease of uselessness where papers are written from a standpoint of getting credit. It’s not about other people understanding, reusing or learning how to use what they’ve found. Not to mention lacking in reproducability. Sadly this limits the usefulness of research for most. Yet for some reason we all still hold academia up on a pedestal


People in academia always have a pressing career goal. Especially the best of them do not get bogged down by minor issues. I'm not talking about actual fraud here, just a bit of sloppiness that doesn't invalidate the results, but religiously adhering to meticulous standards is not profitable for researchers.

The primary goal is to publish the paper, go to conferences, meet others and network.

While it's not a dichotomy, who will be seen as "better"? Someone who spent tons of time to write reproducible clean code with tests, tracked down all minor details and pondered about things that may change the results by 0.1 percent, or someone who was a bit sloppy but got 3 publications in the meantime and got to know famous professors at conferences?

Attitudes like this determine your success. If nobody values or indeed even sees or knows about your efforts, those efforts are practically wasted. "I'm a really detail oriented person" doesn't have the same ring to it as another few papers on your CV.

I may sound cynical but the other extreme, idealism is not useful either.


> While it's not a dichotomy, who will be seen as "better"? Someone who spent tons of time to write reproducible clean code with tests, tracked down all minor details and pondered about things that may change the results by 0.1 percent, or someone who was a bit sloppy but got 3 publications in the meantime and got to know famous professors at conferences?

This is a common narrative, but I'm not certain if the example you give is typical. I've written about this before: https://news.ycombinator.com/item?id=18743531

Since 2018 I have tried to publish at least one paper debunking something every year. So far, if anything, being more careful has led me to publish more, not less. Admittedly, I don't think everyone should be as careful as I am as people would quickly run into diminishing returns, but the idea that being careful necessarily means publishing less has not been true in my experience.

Also, I don't think the typical case is something changing the results by 0.1%. In my experience when I catch an error it's usually larger than that. The first debunking paper I published was about something that was one or two orders of magnitude off the correct value in the typical case, but still received 300 citations...

Ultimately I think it would be best if an academic field adopts uniform standards for publication. That way, a sloppy researcher can't pump out 3 bad papers. Many academic communities have nominal standards that are not enforced. Some examples are discussed here: https://www.osti.gov/biblio/1141709


It often feels like a quixotic fight. Strong claims sell papers, but are hard to stand behind for a researcher with high standards, who is very critical of overall practices in a field. Their "best" recourse in the idealistic sense would be to publish an overarching meta analysis of the field, and pointing out problematic practices, gaining enemies and being labeled as a crusader.

In my field, computer vision, a major problem is uncomparable setups, far reaching claims etc. A major pattern is introducing some fancy model, showing that it gets a bit better results that could well be due to noise or some tuning on the test set (not even secretly, just trying many different things and putting the bold formatting on your best number and claiming it as SOTA). Whether it was really due to your fancy model or not is hard to prove. But my experience is that mundane, pedestrian changes in a model introduce way bigger performance changes than whether or not you use the fancy module or architectural tweak that is proposed in mediocre published works. Which also means you can absolutely not compare models created by different people, especially when the performance gap is small. Tiny details can screw up the whole ordering, so trying to interpret the order and draw conclusions from it is equivalent to reading tea leaves.

Now surely this applies mostly to mediocre research. But that is the majority. Top of the line research is more solid, but if you're a small unknown researcher, your best strategy is to start getting out there, publish and network.


This is not about code, but look at figure 4 here

[1] https://www.pnas.org/content/115/50/E11790

and tell me why i should ever trust something from someone who swaps the location of towns on a map, while the point of said paper is about the spreading of the plague along trade routes.

Instantly invalidated...(forever!)


Because people will make mistakes. If you expect papers to be 100% accurate, you'll always be disappointed. You have to account for possible mistakes. Whether they're accidental or misleading on purpose is to a degree irrelevant at first - we should always watch out for them.

Even in situations where mistakes cost lives, we put multiple failsafes and reviews and clear instructions in place - and still say they reduce the likelihood, rather than prevent accidents.


You know...

I mailed them, with CC to all involved. No reaction at all. Hamburg and Lübeck still swapped. What should i make of that?


Given the paper already appears to have a correction for something minor, one more would seem appropriate, yes - it’s a pretty jarring error.

But it’s just disingenuous to say it affects (let alone “invalidates”) the actual study. And shouldn’t a proper pedant spot that the top pin isn’t even where Lübeck is - that’s Rostock.


Good spot.


Does the swap in the position of the two cities compromise the findings in any way? If not, it's an irrelevant mistake like misspelling a word. What do you expect them to do? retract the paper because they misspelled a word?


It could, because in the timeframe the paper is about, they were geographically near, but had no real direct connection to speak of. Traveling over land along branches of the 'old salt routes' took at least two days, and that was fast. They were transport hubs of different branches of the early Hanseatic League.

[1] https://en.wikipedia.org/wiki/Hanseatic_League

This is not the equivalent of a 'spelling error'.


The brokenness of academia starts with the exploitation of graduate students by educational institutions, which makes it a very lucrative enterprise to over-hire professors, each of whom is like an incubator for the school, which acts as an investor because they get a cut (upper five figures per year per student) no matter what. It's similar to college sports in a way. It's all driven by money.

Most research is useless. Most professors are unneeded given the size of the problem space. Students see this and in the end, besides the very few true geniuses, it's not the most purely motivated that become professors, but the ones who most aggressively play the game in trumping up their results.


> Most research is useless.

I agree.

> Most professors are unneeded given the size of the problem space.

I disagree.

First, most subjects are too deep for the professor to be knowledgable about more than just their specialty (Do you expect a single Computer Science teacher to have complete and up-to-the-date knowledge of formal methods, programming languages, and operating system virtualization?) Even then, they will either be out of date or have spent a lot of time keeping up-to-date.

The problem is in many cases, the size of the problem space is much too big for a single group of researchers to have any effect on it. Many 'simple' studies have thousands of factors that can affect the outcome, and almost all of them have too few people, with not enough time and energy to devote to isolating all of those factors. For that reason (and many others), most studies are not replicatable, and dubious at best.

This is the main reason why psychological, sociological, medical, and most other fields of research that aren't mathematical, are considered dubious. Not because their methods are inherently bad, or their fields inherently invalid, but because most studies do not have the manpower available to do a completely formally-correct ideal study, so they have to make-do with what they have, and trust that eventually we will have enough mediocre-evidence studies that together account for enough varying factors on the subject that we can eventually account and iron out the individual flaws through statistical methods.

If research were _truly_ a priority for humanity, and we really dumped all of our effort into scientific research as a society (i.e. governments and companies both prioritized R&D and gave the scientists enough resources to actually do the jobs properly), then we might see these fields as "hard science" rather than "soft science".

But that's like saying, if Jeff Bezos got up and actually objectively used his money properly, he would have billions left over and there would not be starvation or poverty in the modern world. It's an idealistic scenario that is extremely unlikely to happen.


>>> Most professors are unneeded given the size of the problem space.

>most subjects are too deep for the professor to be knowledgable about more than just their specialty

>in many cases, the size of the problem space is much too big for a single group of researchers to have any effect on it.

>most studies do not have the manpower available to do a completely formally-correct ideal study

I agree that the size of the problem space requires a lot of researchers. This is probably why there's so much attention focused on machine learning and big data, since they seem to have the potential to address the problems you've listed here. Of course, there are many technical and ethical issues to be confronted in developing and deploying them.

>If research were _truly_ a priority for humanity, and we really dumped all of our effort into scientific research as a society (i.e. governments and companies both prioritized R&D and gave the scientists enough resources to actually do the jobs properly)

It's not, because the greatest problem for humanity is still subsistence, which requires solving a massive resource distribution problem. There are some governments and some companies that do prioritise R&D and provide enough resources, but they are too few and far between.

>psychological, sociological, medical, and most other fields of research that aren't mathematical, are considered dubious.

>then we might see these fields as "hard science" rather than "soft science".

It seems to me that these fields are "soft" in part because the ethical issues surrounding the surveillance that's needed to collect the data to do a formally correct study are quite formidable.


You can view an undergraduate education (and especially a graduate education) as a significantly negatively paid (or at best unpaid) internship for the job of academia.

This has all the advantages of internships that employer's normally enjoy. The internship enables you to hire significantly better employees than you'd otherwise be able to get in the open market, because you can engage in more vetting and because of the power of defaults.

Many smart people go into academia who would have been much happier and more productive outside of it simply because going to school itself made pursuing a job in academia something much more of a default than it would have been for many people.


You should be able to contribute to these software projects, and receive credit akin to publishing a new result. Authors and contributors (and bug-fixers) should receive more credit for the subsequent results derived from their software. This aligns incentives more properly: The researchers struggling to gain insight using tools can trust the tools, and they are incentivized to make their new tools useful to the community, rather than publish-and-forget.


If you are a scientist and a programmer, you could make a software useful for others, publish an article about it, and provide it cost-free under the condition that when people use it for research they have to cite the article in their paper. This is how you could get lots of citations.


Many of the best-used robotics packages are handled in this way. The ROS paper itself has thousands of citations.

But it might be tough to get tenure doing that, and few jobs will support those efforts outside robotics startups or research labs, and often proprietary is the law of the land in those domains.

And outside tech R and D, where software still rules but isn't a core competency, you'll likely never have success with this model, sadly


Pop science coverage is always way overrated as a valuable exercise as well. So easy to cherry pick studies or fundamentally misunderstand the concepts or context.


Something I don't get is why universities don't to a better job of having professional statisticians and programmers on staff for the explicit purpose of providing support to researchers.

I guess grad students are cheaper and "good enough."


From experience:

- Research is an iterative creative process. The creation of the code is an organic process of discovery and not a separate stage from writing it's specification.

- The jobs are too small to split up. This is the same in industry. Try to split up a small task (1 person, 2 weeks) over a small team of specialists (Analyst, programmer, DB specialist, dev-ops, tester and now you need a project manager, ...) and suddenly it becomes way, way larger than it should be.

- Most production programmers realy dislike working in research environments. The objectives and nature of research code is very different from what turns-on and drives professional programmers, so you wouldn't have 'the best professional programmers' working there anyway.

Being able to code well enough to do your own experiments is part of the researcher's skill-set. Grad students should become 'good enough' at it. >

Let's not kidd ourselves. Independent academic research is a constant struggle for funds. Nobody is going to pay for a magnitude increase in the costs. If you are looking to up the quality, best look into how to get academic research out of the ridiculous quantified output metrics conundrum.


The University of Texas at Austin has statisticians faculty, staff, and students can consult with for free: https://stat.utexas.edu/consulting/free-consulting

I've used the service before and can recommend it.

Don't know anything comparable for programming at UT, though I can say that UT's computational science (different from computer science) program has some good classes on software engineering practice aimed at practicing research scientists and engineers.


As I understand it, most large universities have something like this. At smaller schools it might be less formal. Where I went to college, the students were expected to have their methodology approved by the "stats guy" in the department.

As for programmers, it's not just a matter of cost. Most scientists don't understand software engineering, but most software engineers don't understand science or math. Also, programmers who belong to their own organizational structure can't keep up with requirements that could change from one hour to the next. Despite "agile," realistically most software development is comfortable with timelines of months or years.

Don't get me wrong, I'm an R&D scientist at an industrial business that makes commercial software. The programmers are brilliant, and the software they make is amazing. But I have no choice but to be self sufficient for my own software needs in R&D. It's two different worlds.


If given a good spec, a clearly written formula, I don't think it's too unreasonable to expect a seasoned programmer to implement it despite not understanding the science that prompted the math. Maybe I'm wrong for some kinds of exotic math, but I've certainly implemented a few mathematical formulas that looked like Greek to me, at least when I started.

Moreover what I propose is that every researcher have a programmer available to consult with, who would perhaps write small amounts of code (perhaps make sure the code builds before it's published?) I don't suggest that researchers not write code, I think they should write code. But I think an expert in writing code should be available to help keep the standard of code high.


In my view, it would help a lot to provide scientists with resources for learning how to write quality software. Most of us don't even know how to write a specification, except by writing the code, and don't have an idea of what we want until we see something begin to work. Today, a lot of the math we use is written directly in code.

This stuff evolves over multiple iterations. Many of my programs never run more than once. Testing them requires being connected to the equipment if any kind of closed-loop control is involved.

I work in an R&D setting in industry. If something I make threatens to become a product, the thing I hand over to the software team is a proof of concept, that could include a hardware design and working code.


I've had these same ideas when working on physics-related code. They should have a few actual software experts around to consult and / or do the development.


Where's the money going to come from, and who's going to do it?

I wasn't making much above minimum wage working as a programmer at a university. People with computer science degrees don't tend to take such jobs, when they've got half the west coast waving 6-figure job offers at them.

Again, it comes down to the user and the customer not being the same person. The person reading the research paper would sure like it to have been written (or at least debugged) by an expert programmer, but it would be some department who would have to have paid for such a programmer. Departments don't have budget for that, and their currency is publication, not correctness proofs.

(I assume departments would have to pay, because we paid for all other IT services. We even had to pay the university for network access. It's not like the library, which is always free to everyone on campus.)


> Something I don't get is why universities don't to a better job of having professional statisticians and programmers on staff for the explicit purpose of providing support to researchers.

Complete lack of incentives to do so.

Academics are, by design, inbred. They care only about what other academics think. And few of them care about code quality. The attitude is slowly, slowly changing, but I think even now in most disciplines it is not the norm to share your code when you publish papers, and it is not surprising for an author to refuse to provide you the code when you email them.

Until journals require academics to submit any code used for simulations/analysis, things won't change.

In any case: To answer your question - there are always some programmers on staff for large scale research projects. I doubt they're judged by code quality, though.


I think ideally, universities should have enough professional statisticians and programmers on staff to provide a mandatory but nonbinding review for every paper any researcher at the university decides to publish. They shouldn't act as gatekeepers to publishing, but would provide feedback to the researchers for the benefit of the researchers.

Compelling researchers to release their source code would be the next step, but they'll fight you tooth and nail on that one. Many researchers are in very competitive fields and think that releasing their source code might give some other team a boost. I think this attitude is contrary to the interests of scientific progress, but because it's a matter people have such strong personal stakes in, it could prove a hard fight.

Chemistry departments often have glassblowing technicians. Perhaps this would be comparable.


> I think ideally, universities should have enough professional statisticians and programmers on staff to provide a mandatory but nonbinding review for every paper any researcher at the university decides to publish.

I guess I wasn't clear enough to connect all the dots. These kinds of policies are made by academics. The people with the authority to mandate this are themselves academics who got promoted. The reason the universities will not do it is because of the nature of the people in charge.

Academics don't work for these people. They are these people.

Pretty much the same story with grant approvers and journal editors. Mostly from academia.

The argument to do it is to improve the quality of the research output. But guess who evaluated the quality of the research?

It doesn't matter that you and I know this is an incredibly unreliable way to do research. As long as academics can continue to publish their papers with crappy software practices, they will do so.

As I said, there is a lack of incentive to do this. Who will benefit? Probably over half of researchers do not want someone to find flaws in their methodology. In my discipline, it was common to leave out inconvenient details from a paper. Common enough to the point of psychosis: The researchers convinc themselves that the flaw is not a problem, so it's not even a question of integrity any more. It's poor incentives leading to a few generations of researchers who are trained to be blind.


Because it would cost money, and I'm honestly not sure most academics (at least in CS) truly realize the dumpster fire that is their coding.


> programmers on staff

That is generally referred to as "research software engineers/groups" and are unfortunately currently fairly rare in academia, but are becoming more common. RSECon is dedicated to this topic, and hosts a list of research software groups [1]

[1]: https://rse.ac.uk/community/research-software-groups-rsgs/


I’ve tried to make this work for a while now, and it’s definitely not just an issue of getting statisticians/programmers involved. It’s an issue of power and incentives. My immediate boss is great, he understands that things need to take time and there are a multitude of challenges with development and helps me solve them. Some of the other researchers, however, see me only as a typing monkey.


Anecdote: One day a friend asked me to look at their family member's doctorate because she was having trouble with the analysis section.

All I could do to save face was point her to some resources, because if I said what I really thought, it would be in a trash can set on fire.

Or to put it another way: maybe they don't do it, because if they did, it would stop the requisite amount of papers being published.


100% of programs have bugs. We can't expect to hold the bleeding edge to the same rigor as, say, a flight computer. It's sometimes literally the first implementation of an algorithm or procedure.

The comments here using words like "disease" and such are not wrong, just tilting at the wrong windmill. The problem, I think, is not the initial bugs, but the lack of maintenance and improvement over time. You should be able to contribute to these projects, and receive credit akin to publishing a new result. Authors and contributors (and bug-fixers) should receive more credit for the subsequent results derived from their software. This aligns incentives more properly: The researchers struggling to gain insight using tools can trust the tools, and they are incentivized to make their new tools useful to the community, rather than publish-and-forget.


Does that make hello word the most perfect program? :)


Same here, not having done a PhD I’ve spent a lot of time just reproducing papers I liked and realized they were often very hand-wavy in the way they achieved their results, often exaggerating the impacts of the research.


I agree with everything you pretty much said, but to me it strikes at something deeper.

To me, you don't have to be a software engineer to write some scientific code.

What you need is a scientific mind: logic, repeat, test, falsify, question. Sure structure and design helps everyone, but if you think about code in this way you'll still find the bugs in a small script.

And to me that's the great problem laid bare with the shoddy coding in science, it reveals the emperor's new clothes: scientists and academics, supposedly the professionals in their field, can't think scientifically or apply such thinking to their own work.

They're chasing false positives, and that's deeply troubling...


I think without in-the-trenches software engineering experience, it is hard to determine what class of issues would be a problem.

In this case, it was a dependency on the ordering of filenames. Whenever your code depends on something being in a certain order, you should ask yourself what puts the items in that order. But it's perfectly reasonable to believe that filenames are returned in order; perhaps for your small sample size or when you run "ls" they're in order (because you have ls set up to sort, but don't know it).

The difficult part that comes with experience is knowing what you can and can't trust. I would trust that "select foo from bar order by timestamp desc" to return items in order. I wouldn't trust "readdir" to return items in any particular order (maybe ordered by inode number, but I can't think of any real program where I care about order that would want that order). But you can really go down a rabbit hole when you don't have the right level of trust or experience. (I have seen a lot of unit tests in my time that do something like test a core language primitive or library; scores of tests like assert(append([1], 2) == [1,2]). This comes from a total lack of trust for things that probably can be trusted. Or from people that don't have their homedir full of files like "foo.py" to figure out how some API works and instead put them in their unit test files ;)

The other thing is figuring out what tests would be valuable and knowing that you have to write them, and the researcher on the other end needs to run them. (I remember so many bug reports like, "I force installed that library after 'make test' failed and I'm getting weird results from my program that uses the library." Well yeah.) I see a lot of people run their programs on simple input, check "yeah, that looks fine", and then trust that code forever. But somehow, it breaks, and they don't notice it on their complex inputs, and it leads to problems like the one in the article.

On the other hand, the thing about nondeterminism is that you can't test for it. Sometimes it's very obvious, you do something like (keys(foo) == bar) and because dictionary keys are explicitly randomized by your language, your test fails the first time and you immediately find your error. But if you're talking about something like OS-dependent APIs, you'll write your tests and never see your bug, because the order is consistent on your particular computer. (At least someone on another OS will eventually find it, and there are plenty of services that will CI your code on various OSes, so you do have the potential to uncover that issue yourself.) An even more severe case deals with concurrency; your unit tests run one thread, so you never see the subtle 1 in a million timing issues until you push to production and see the 1 in a million bug every second because you're processing a million records a second.

(And things get even more obscure than that. I once got paged for one replica in a large job not working correctly. Look at everything that could possibly cause the issue, nothing. "I guess I'll restart the task." Problem persists. Someone suggests "Tell the scheduler to schedule the task on a different machine." Problem goes away and never comes back. Would suck if that one-in-a-million obscure machine-specific problem happens to the machine that's producing the results for your scientific paper.)


I'm under no illusion that it's inherently easy (I consider such to be my job and that's partly why I get money), but that's why one should be 'professional' and have to work for it.

If you make a calculation, you should have processes to verify if that calculation or result is correct, that you can repeat that calculation, think about why it might be wrong even if you think it's right etc.

If your technique relies on sorting things in a particular order, then you both explicitly sort them in that order, and then you figure out an independent method to verify they are sorted in that order.

Indeed in one of my jobs that was a bug: old bank code where sql statements were inherently assumed to be in a consistent/stable-sorted order for downstream processing because they used an 'order by' clause. When the vendor upgraded the sql engine to be distributed/multithreaded, the relative order inside equal keys changed (and indeed was now non-deterministic when it was technically deterministic at the time the code was written even though that was an assumption not a requirement).

But I don't think I'm unreasonable: yes, there is no black and white rule for how far down the rabbit-hole you go: is it too far to test each individual instance of integer arithmetic, or floating point calcs, or individual os's or processors... in practice probably yes. But you do need higher-level tests to determine if calcs are going wrong (that might then later start you down that route when your results suggest they are going wrong). And yes, there are bugs that are inherently difficult to track down, especially as you go up and up in complexity and abstraction. And there are parts of science where the feasibility of the calculation itself or its outcome is the great unknown. But these are edge cases and not 99% of science papers.

What's alarming is the reuse of code they don't inspect, the use of libraries and techniques they don't understand, and practically no critical thinking or inspection towards their data, results or methods.

They throw stats or a library or a technique at a wall, get a positive result and: bang! into (at least one) paper.

In my eyes that's not how science should work.

I'll also go one step further and say that if you have unexplained bugs, warnings, errors, and if your calculations are not reproducible, then for 99% of science you should not be publishing just because most of the time when you run it you get a positive result. Would it suck if that one in a million obscure machine specific problem happens when producing your paper? Sure. Would you be negligent in publishing anyway without a way to prove or assert the correctness your calculation? Yes.

Is that what everyone's doing? Hell, I think it's orders of magnitude worse. Some languages (like R) are almost built around the philosophy of 'the show must go on' rather than 'be correct or no result for you'. I think that's part of the reason why they're popular.

Again, I don't think that's software engineering specific, I think it's a mindset that is required to make a good scientist.


Could you please tell us more about this R 'the show must go on' philosophy?


In short, R contains silent coercions, vector recycling, lazy evaluation, partial matching on strings in various calls, exceptions to the rules, surprises, unexpected values, inconsistent and weird operations on edge cases and often chooses to continue on with these incorrect and bad values rather than throw errors and stop computation.

I'm not a fan of its writing style (there's a point where being funny hinders readability), but I believe chapter 8 of the R Inferno is replete with examples.

The combination of these factors results in a language where actually trying to reason about the veracity of a computation and coding defensively is exceptionally difficult, but people can load packages and make the wrong computations that look right exceptionally quickly, so they think of it as convenient.


Thanks, bookmarking this answer...


> It's the same category of problem as "enterprise software". Whenever the customer is not the user, the user gets screwed. With research, the customer is the journal.

This is a very good summary of the issues. I work in this space, and I can confirm that For a lot of PIs, the software is at most tangential to the work of writing about the software.

Features that literally have never been run or tested can be written about in papers as if they are 100% foolproof.

The sad part is that these discoveries of program bugs can probably only occur in the top 5% of all research software. The rest are either left unreleased, or are so poorly coded that they will never be vetted in any meaningful way.


I totally agree on this. Having read papers and tried to implement, or even re-implement the same algorithm, even when their source code is available, it's hard to get results that match.

I'm not a hard core TDD guy, but sometimes a few tests go a long way to making your code reproducible...or even just working correctly.


What’s worse, sometimes the bugs are deliberate. A slight tuning of a parameter that can be explained away but having a significant impact on the output data. This alone with statistical bullshiting ( there’s always a statistical manipulation that will create the desired result), renders 80% of all research practically worthless.

edit: code change -> parameter tuning.


Or in machine learning if that manipulation is not enough to get state of the art results, just don't cite anyone who is better! Now you're SOTA, congrats. Happens way more than people realize. Not as much in the top 2-3 conferences.


I can imagine this might happen in theory, but have you come across evidence of this in practice?


Parameter tuning is prevasive, but I guess I went too far to suggest that it's a code change.


Another common trick is to tune your program to a dataset then run all of the other programs with default parameters.

Bam, you've got a benchmark where you're fastest, just like everyone else.


Fortunately it's becoming more common to provide permalinks to gitlab etc so people can access the actual version of code used in a paper. That being said having some incentives to reward this practice would help speed up adoption.

Part of the blame for incorrect code also falls on reviewers. If someone agrees to peer review a paper, that should include running the code as part of the general scrutiny. If someone finds a bug after publication, they should let the appropriate people know so an errata can be published.


> I'd estimate that 100% of sample implementations in published research papers have bugs.

> Researchers, even in computer science, are usually not skilled programmers. The product for them is the paper, not a program.

This is so silly and misleading. Any nontrivial software is likely to have bugs, whoever the producer or customer or user may be. That's not the point. Merely having a bug doesn't invalidate an algorithm any more than lacking one validates it. Moreover, even when there isn't a bug, there's still no guarantee you can reproduce results exactly -- stuff like your compiler's particular handling of floating-point may well change the outputs.

Finding a bug that doesn't change the substance of some research is nothing to be smug about, whether it's obvious or subtle. All you'd be doing is wasting your time trying to prove that you're a good programmer and the author is a bad programmer, when that very 'bad programmer' is being productive and advancing science without getting hung up on irrelevant details. The metric you have to care about is finding bug that materially affects the conclusion being drawn from the experiment; that's the rate you need to report here.


Decades ago I had a friend who was taking a double major in CS and Physics at a school that shall remain nameless. He decided to do his masters thesis in environmental science. The department in question published a lot of papers based on simulations, showing how different factors could effect weather patterns and things like that.

The simulation program they used was written in fortran by a previous grad student, and there was no version control. It was instead copied from researcher to researcher, and expanded on in turn, then passed to the next researcher in the lab to be modified for their own work.

My friend asked for the version of the code which was used for particular research papers the group published. They couldn't find it. He asked where the test suite was. There were none. He asked if anyone had audited the code to make sure it worked, and - well you see where this is going.

Feeling frustrated, my friend spent the next month or two going through the code and writing tests to check that the code was actually correct. Amongst other things he found a + that should have been a - in a core part of the simulation code, and that change dramatically changed a lot of the results the program produced. Because none of the researchers actually kept the programs they used to generate their results, it was impossible to even tell when that bug was introduced, or which of the papers published by their department had correct results.

Anyway, this caused a big argument between my friend and his supervisor. My friend held that their department wasn't doing real science, and he ended up being asked to leave the team. He did his thesis in an entirely different field.

I don't know how common this sort of thing is, but its probably way more common than we'd like to admit. We should demand across the board that any paper which relies on source code co-publishes the source code. This is becoming increasingly common in CS papers, but it should happen across the board. If it were up to me it would be demanded by top tier journals as a condition of publishing.


Yes, I realize. I never disagreed with this. What you said is entirely consistent with the point I was making, which was that "100% of published research has bugs" and "researchers, even in computer science, are usually not skilled programmers" are the quite misleading claims to make about research quality.


It seems to me that this discussion has been about bugs that affect the research.

You're right that no one should care very much if there are typos in research code, or if it crashes on some input that it never sees, but the criticism here has been more substantial than that.


100% of programs have bugs.


Most programs don't have significant bugs that affect the entire purpose of the program.

You wouldn't accept a chess program that didn't know how bishops moved, or which only worked when you played a specific opening, but these classes of bug are not uncommon here.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: