Hacker News new | past | comments | ask | show | jobs | submit login

I have been a data scientist for the last 4 years.

I think (one of) the problems with the data science career field is that there are a lot of juniors who want to run sklearn and call it a day, following the tutorials that seem to 'just work' that real-world data doesn't without a fight.

To get value out of the work, you have to be methodical, careful, and really dig into the data. The observation that 85% of the time is cleaning doesn't eliminate the need to know what you're doing, what approaches to use, how to judge success, how to communicate results, etc.

Another thing to consider: I've found big, boring companies are usually better to do DS at than small ones. Big, boring companies have better discipline in collecting and managing data. Also, a 1% improvement to an existing process matters a lot at BigCo, and very little at a startup - and a lot of DS models are that sort of incremental progress over rules engines or heuristics.




In my world working on data at a BigCo (Industrial plant in my case) I'd say there are 3 schools of people

1) 'The Old Guard' who are extremely skeptical. They tend to be extremely dismissive of models and predictions, distrust anything but most basic analysis. If they can't do the analysis on an excel spreadsheet it's too complicated and "will never work". These people tend to be Engineers (mechanical and chem type) and Plant Operations roles. A lot of the time there is value in listening to there skepticism but they tend to be extremely conservative by nature (Fortran ort to be enough for anyone...).

2) 'The Optimists' people who think "big data" and "machine learning" is the panacea to every problem in our org. To these people a prediction is a good as a real measurement - they trust forecasting implicitly. They have probably read an article somewhere about machine learning but don't really grasp any of the intricacies. These people tend to be in logistics/accounting/finance type roles and a large part of my job tends to be spent in phone calls with these people explaining why their forecasts did not match the actual results.

3) 'The KPI guy' - usually a manager who is somewhat out of their depth who wants to distill everything he can into a single number which can be displayed on a dashboard. The end result is a dilbert-esque situation where the 'KPI guy' decides that to make his mark in the org he needs to come up with a new metric. You end up with the bizarre situation where people are discussing a 'super metric' made by combing other metrics into a single number. I also spend a lot of time on phone with these guys because they forget what undpin their super metrics and don't understand all the subtleties they've distilled out of the data by focusing so much on higher level metrics. They get angry when you question the value of their dashboard. Whenever someone starts talking about "Yield" "OEE" "DIFOT" good chance they are a 'KPI guy'

Most of my job is balancing out interactions between the three 'customers'. Tempering the optimists enthusiasm, reigning in the KPI guys and nudging the Old Guard.


This is so spot on about Data Science in the "enterprise" or "legacy" organisations (i.e. basically pre-dating the data hype).

Personally getting stuff done with data in this environment is more satisfying than using the latest neural network, I presume you're the same?


I have a thing I do which I like to call "Artificial Stupidity". That is, I like to take a naive implementation and see how far it can get me. Chances are I'll do it in perl. If I need to do some more serious statistics or visualisation with it I'll haul out R. I have not yet had the opportunity to bring some python into the mix, but that's 50% lack of opportunity and 50% that I'v not yet found a normal general purpose computing activity I can't do with perl.


Testify! QFFT.

There are other pathologies, too, but it's amazing how much worldview and the basic behavioral psychology elements of high/low trust and autonomy manifest themselves in what should be "objective" analytics projects.


one thing I think would freshen things up, and its something I am going to try and push, is data scientist 'study abroad' or 'exchange' program. data scientists i know in one org will mostly do time series, and I mostly do anomaly detection. Others NLP, etc, and we would all like to work on all of that stuff, so why not exchange us around from time to time?

I think it would make things much more interesting




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: