May I ask how you reached this insight? What field do you work in?

platz · on July 17, 2017

classic statistics is much more interested in the explanatory power of models to describe phenomena.

ML is mainly interested in prediction (correlation instead of causation), typically over some data that just fell in your lap.

gaius · on July 17, 2017

In a sense Data Science is like the Cult of the MBA. MBAs believe a trained manager can manage anything because management skills are generic. A data scientist believes they can analyse anything because analysis is generic. Both fail in the real world because they discount domain knowledge.

banned1 · on July 17, 2017

Is there a field that does not discount domain knowledge? Or is that just "judgment" and custom analysis? I am trying to understand how all fields map together. Thank you.

sgt101 · on July 17, 2017

The divisions are very confused. I think that sensible people all wish to use domain knowledge if possible. There are two tiers of this, firstly the use of domain knowledge in the manual or procedural construction of the insight system. Secondly the use of formalised knowledge in the creation of models that can then be fused with data.

The first case is where data science has got a bad name; people swing into domains and companies full of cocksure ideas, produce insights that are risible or obvious and get ejected. Sometimes it takes years for sufficient knowledge to be acquired by analysts to deal with difficult domains.

Lots of people use Bayesian inference to do the second. Tools like Stan and PyMC3 are really popular and effective.

sgt101 · on July 17, 2017

I simply don't recognise that characterisation of ML. I think that "data driven AI" fits far better. ML emerged in a number of ways over the years, but a strong driver of the last iteration was the knowledge engineering bottleneck encountered in fifth generation computing and surrounding the demise of the last turing center.

I invite you to read Chris Bishop's or Stephen Muggleton's books.

Anyone who works with data will find it hard to imagine data that "fell into your lap", all the data I've ever used successfully required slogging and grinding.

platz · on July 17, 2017

data that "fell into your lap" is a euphemism for a process in which the data was not generated by a controlled experiment in which a hypothesis is created first, and then the data is gathered according to a procedure using the proper controls and testing.

"fell into your lap" has nothing to do with how hard the work is, it's about the difference between a controlled experiment and an observational study. the bulk of ML is observational in nature (focusing on prediction) and therefore has nothing to say about causation or understanding the causal variables of the underlying reality.

sgt101 · on July 18, 2017

Yes, it's observational data. But this is quite common in many sciences. Observations of stella events, measurements of ecosystems and weather events for example. These have lead to theories with explanatory power and machine learning tools can and do as well.

One big deal is applications to dynamic domains.