Hacker News new | past | comments | ask | show | jobs | submit login

> The main reason why research code becomes a tangled mess is due to the intrinsic nature of research. It is highly iterative work where assumptions keep being broken and reformed depending on what you are testing and working on at any given time. Moreover, you have no idea on advance where your experiments are going to take you, thus giving no opportunity to structure the code in advance so it is easy to change.

I'd say you're confirming the author's theory that writing code is a low-status activity. Papers and citations are high-status, so papers are well refined after the research is "done". Code, however, is not. If the code was considered on the same level as the paper, I think people would refine their code more after they finish the iteration process.




Yes... and no. It is true that after a result is obtained, one could clean up the code for publication. And it is true that coding is not seen add first class at the moment.

At the same time, you need to consider that such a clean up is only realistically helpful for other people to check whether there are bugs in the original results, and not much else. Reproducing results can be done with ugly code, and future research efforts will not benefit from the clean up for the same reasons I outlined in my previous post.

While easing code review for other people is definitely helpful (it can still be done if one really wants to, and clean code does not guarantee that people will look at it anyway), overall the gains are smaller than what "standard" software engineers might assume. And I'm saying this as a researcher that always cleans up and publishes his own code (just because I want to mostly).


> At the same time, you need to consider that such a clean up is only realistically helpful for other people to check whether there are bugs in the original results, and not much else.

I assumed that most code published could be directly useful as an application or a library. Considering what you're saying, this might be only a minority of the code. In that case, I agree with your conclusion about smaller gains.


Most academic code runs once, on one collection of data, on a particular file system.

Academic code can be really bad. But most of the time it doesn't matter, unless they're building libraries, packages, or applications intended for others. That's when it hurts and shows.

I'm a research programmer. I have a master's in CS. I take programming seriously. I think academic programmers could benefit from better practice. But I think software developers make the mistake of thinking that just because academics use code the objective is the same or that best practices should be the same too. Yes, research code should perform tests, though that should mostly look like running code on dummy data and making sure the results look like you expect.


I know a lot of "research programmers" (meaning people who write code in research labs but are not themselves the researchers or investigators on a study), and they often have MS degrees in CS - though actually, highly quantitative masters degrees where very elaborate code is used to generate answers is a bit more common than CS per se (math, operations research, branches of engineering, bioinformatics, etc).

Here's the thing - in industry, this background (quant undergrad + MS, high programming ability, industry experience) is kind of the gold standard for data science jobs. In academic job ladders it's... hmm. Here's the thing - by the latest data, MS grads in these fields from top programs are starting at between 120k-160k in industry, and there are very good opportunities for growth.

I actually think that universities and research centers can compete with highly in demand workers in spite of lower salaries, but highly talented people in demand will not turn away an industry job with salary and advancement potential to remain in a dead end job.


Yeah my standard quote about research code is that it is not the product, so it is ok thta it is bad. The results are the product and those need to be good. Someday someone will take those results (in the form of some data or a paper) and make a software product, and that should be good.


I am under the impression that most authors do not even publish functioning code when publishing ML/DL papers which I find to be absurd. The paper is describing software. Imo the code is more important than the written word.


Shouldn't checking for bugs be of primary importance. How many times have impressive research results turned out to be a mirage built upon a pile of buggy code? I get the sense that is far too common already.


> How many times have impressive research results turned out to be a mirage built upon a pile of buggy code?

You're actually making bugs sound like a feature here. I'm pretty sure that if you've gotten impressive results with ugly code, the last thing you want to do is touch the code. If you find a bug, you have no paper.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: