Hacker News new | past | comments | ask | show | jobs | submit login
How should we critique research? (gwern.net)
90 points by bookofjoe on May 23, 2019 | hide | past | favorite | 20 comments



Should we maybe encourage research that tries to disprove existing research? Or at least, ensure that there is some funding allocated to reproducing scientific studies to ensure that as we progress we aren't progressing based on falsehoods?

This is a side-bar more directed at the unfortunate consequences that we suffer from caused by the pressure on researchers to produce positive results.


Some ACM conferences are now handing out "badges" that mark papers where the associated "artifacts" are publicly available or have been independently reproduced or replicated: https://www.acm.org/publications/policies/artifact-review-ba...

The main issue with that right now seems to be the lack of real incentives: those badges are nice to have, but in the end you're evaluated by publications in highly regarded journals and conferences, citations, and funding brought in. Those things of course tend to let people focus on novel, fancy, exciting work, instead of the time-consuming process of making things easy to replicate by others, or even replication of other people's experiments. Incentive-wise, the publication business follows the principle of fire-and-forget.

What's needed, but I believe very difficult to introduce, is the strict expectation of independent replication. That would mean replication as part of the core review process, which would A. create a lot of work for reviewers B. initially reduce the attractiveness of whichever conference introduces it first.

One thing that would be easy to introduce in computer science would be to make it mandatory to share all code used in experiments - if you cannot/won't do that, you should not able to take part in this game.


In my CS area, many results aren’t interesting enough to be replicated, even in high quality journals. Often the results aren’t even very relevant to what is otherwise a design paper, and are just there to check some boxes. The fact that research results aren’t replicated is just symptomatic of a much larger disfunction.

It has been a long time since I’ve read a paper where I actually wanted to use whatever they were selling, let alone reproduce whatever results they were claiming. We don’t even have “novel, exciting, fancy” work!

When something comes out of a paper that really changes the game (like say MapReduce), it gets replicated a lot.


Good point, a lot of work really falls below the threshold where replication would still make sense.

However, I do think there is a lot of incremental, piecemeal type of work that over the years could amount to a decent step forward. It's just that currently, from looking at the code behind quite a few published papers, the statements often just cannot be trusted enough to actually build on these works. It is my view that some subsubfields in CS sustain themselves by avoiding the most pertinent questions, because their answers would reveal that the entire subsubfield has been superseded, or was never that promising to begin with. Unfortunately, that type of noise generation is actually more profitable than a single paper saying "nope", although the value of the latter in terms of knowledge generation is enormously high.


what's your CS area?


In biology and medicine, this funding is provided by industry. Pharma companies and startups often try to reproduce academic findings before investing into further research

Pharma and biotech startups only make money if a drug works (for the most part), so they tend to have a much higher bar for the robustness of findings than academic researchers and journal editors do.

Of course, this only covers a tiny subset of research


SURE journal — Series of Unsurprising Results in Economics https://blogs.canterbury.ac.nz/surejournal/


What a beautiful site.

Look at the initial. Look at the notes in the actual margins.


Concur. It's why I pay $1/month to his Patreon account.


Interesting read and worth thinking about the question the way it's framed, although I don't know there's actually an answer. I agree that how much something matters is how much things would change if it were changed, but I think that depends heavily on the scenario, in terms of the questions asked, the design, and so forth. I also think problems in contemporary academics go far beyond statistics, and won't be addressed with statistics, but rather funding and sociopolitical changes.

Take the conclusion that "issues like measurement error or distributions, which are equally common, are often not important." I strongly disagree with this. In a classic randomized controlled trial, maybe yes. But even then there are serious problems potentially. Just to offer a few examples:

Measurement is key in this age of unreproducability. It's not uncommon for claims to be made based on results involving one particular measure, when the hypothesis would apply to many measures of the same thing in the sample. When an author claims X causes Y, and there are multiple measures of Y, but results are only reported for one, problems are there. Modeling the joint effect on Y, rather than the measures of Y, is key.

Overfitting goes hand in hand with distributional misassumptions, because to the extent that an observed distribution deviates from the assumed one, overfitting models can capitalize on excess information missed by the base distribution. A classic example of this is assuming a normal distribution when fitting a linear regression but fitting to a very nonnormal distribution: in many cases an interaction term will add significantly even in the absence of a real interaction, because it captures more of the interestingness of the data, information-theoretically speaking.

Measurement error also becomes key in modeling things like mediation effects, or in trying to control for covariates. Residual confounds are increasingly being recognized, which is all about measurement error. This is maybe related to overfitting, but many claimed effects can be attributed to measurement error, especially differential measurement error between variables. This is often more of a problem in observational studies, but it can easily apply to experimental designs as well, when there's some ambiguity about how an effect is acting, or if it's actually acting through the mechanisms being hypothesized.


Researchers could publish their findings along with sources for all involved data in a machine-readable/analyzable form (formal logic or simple natural language) so that conflicting results in other publications could be quickly found and reports needing additional verification would be scrutinized more.


IMO, the most prominent/important critique in thesis in this space was Karl Popper. He, for example, championed falsifiability for scientific theories.

Popper's two big targets for his criticisms were marxism and freudianism, who he accused of pseudo-science. Despite making a lot of enemies, his criticism made an impact and the fields evolved to address them, to some extent. They often dropped the "science" claim, or adopted more scientific methodologies. Today, Popper's pseudosciences are often characterized as (or evolved into) "soft sciences."^

Still, I think it's telling that 80 years later these general areas of study (economics, psychology, sociology, policy research) contain the majority of the problems this article is talking about.

A big part of the problem is statistics. That is, they are studyings tatistical phenomenon. The relationship between the "big hairy theories" which (eg supply & demand) and the more scientific/falsifiable hypothesis that they can actually test is.. problematic.

So... part of the problem is fixable... The scientific framework needs to be designed for statistics. Expiremental design plans can be published prior to expirementation, for example. Negative results need to be published too. Part of the problem is harder. Without a huge increase in independant replication studies, many of these fields are not going to be genuinely producing scientific knowledge, as fields. Individual results are more akin to anecdote.

The bigger problem is that "big" theories (eg keynsian macro, Maslow's hierarchy, etc.) are not generally scientific theories. That split between "small" testable theories and interesting, fundamental big theories is just very hard to bridge in a fundamental way.

^He went easy on liberal economists, possibly because of personal friendships... mostly because they didn't claim to be scientific to the same extent. I think in retrospect, this may have been a mistake.


Popper’s approach still works here. When some one writes a statistics based paper, they basically have said “I have a hypothesis that X correlates with Y. If this is not true (falsifiable), X will not correlate with Y in this data”. The hypothesis survives the test and this is science. Of course, the authors probably did the opposite, but this doesn’t matter for the philosophical underpinnings.

So how do we detect bad studies? With this same framework. Take the same hypothesis, but design a different test. If the hypothesis fails the treat, publish that. That is science.

We shouldn’t be so hung up on reproducing old papers or scrutinizing every study. You’ll get lost in the details. If a hypothesis is true, it will stand up to every test of it. So if you doubt a result, test it!


Perhaps publishing bad research should be punishable in some way. Also the system of citations does not work because negative citations also count as citations.


> negative citations also count as citations.

If by "negative citations" you mean works that are widely criticised, I don't see the problem. Most scientific papers cite prior works in order to point out their limitations. This is not a bad thing. We need to understand how previous attempts fell short in order to understand how and why we might want to do better in the future.

Maybe however by "negative citations" you mean citations of works that are plainly wrong. I think this occurs very infrequently, to the point where it probably isn't an issue. I certainly haven't come across works in my area which are cited for being wrong. I don't see the point of citing them either; I'd rather cite a paper that points out the problems and analyses their impact (i.e. something constructive).


The problem is that raw citation counts are used for promotion and hiring. And they look the same for 'did groundbreaking work' as for 'actually so flawed that everyone cites it just to make fun of it and as a cautionary lesson to everyone else to not be so incompetent'. Some authors explicitly take a mercenary attitude and don't care about sloppiness - after all, if someone criticizes them, that just means their citation count goes up...


With respect, I think you don't know what you're talking about. I've never seen cases where a researcher will cite a paper to poke fun at it. One might cite an erroneous paper in order to point out errors if one were interested in surveying the types of errors which occur in Science. However; (i) that paper is unlikely to be famous and; (ii) even then it's considered poor form to shit on your colleagues.


Yes, but I rarely even come across negative citations.. If they are slightly negative it is only to differentiate their work from the cited one to "get their own paper out".

It is more common that the cited papers have not even been read.


I don't think negative citation for papers is a good idea. Negative citations for journals might be.

When paper gets trough with errors that reviewers should catch, maybe that should show in the impact factor.


Determine whether the practical application of research contributes to extending human lifespan for the purpose of inhabiting other planets.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: