Hacker News new | past | comments | ask | show | jobs | submit login

I think your deluding yourself. If you can't get good data accept that and do something else. Pretending the best you can do must inherently be acceptable is an insidious idea that causes a lot of wasted effort.



I think you misread me completely. I am not saying that it's hard, but this is the best we can do, so we have to do it. I'm saying it's hard, but it turns out to be worth doing despite that. It just turns out that using the scientific method and using sound statistical reasoning from data is a really effective tool for learning about things, even when you cannot make bulletproof true/false statements. Think of it in a bayesian way. A small-medium experiment still moves your belief distribution, despite the fact that you remain relatively uncertain.


A huge range of things are worth doing without being science. Further, just looking at vast amounts of data is not the scientific method.

It's like using vast amount of information to create a self driving car vs. actually letting the car run on real roads. The first can only tell if it approximates reasonable driving, the second can tell if it avoids getting into dangerous situations. You can collect a lot of information on the US economy, but in the real world the FED is actively trying to manage things and you can remove that factor from the data.


Where is anyone proposing just looking at vast amounts of data? With certain kinds of data, you can still learn causal effects observationally, like in econometrics. That's the closest I can find in this discussion. I mean, you're totally right naive data analysis is bad and more data doesn't help that, but nobody is advocating for doing that.


You only have two choices, look at data you don't control or data you do. The entire point of experiments is to narrow the range of uncontrolled data as much as possible. Looking at raw data does not help. Looking at huge data-sets of minimally controlled experimental data does not help.

Physicists's for example can't change the age of the universe they are operating in. It's a rather large unknown, but not exactly an unknown unkown.

At the other end, people trust surveys of eating habits. I don't care if you send out a billion of those things it's still bad data in systematic and changing ways.

In between, most animal studies in mice are looking at disease analog X, in a population of fat, minimally stimulated, etc.


Define "good data". It really does depend on your field. One of our biology professors once told us that any r^2 value above 0.5 could be considered "not bad" on a linear regression in biology - I daresay physicists are used to rather nicer fits than these. But where you draw the line between "good" and "bad" data really is relative to your field of study, and science needs to be evaluated in that context.


While fields may consider X good enough, that does not mean it is good enough. One measurement might be what percentage of published papers are junk. And in that context many fields fail any reasonable metric.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: