Hacker News new | past | comments | ask | show | jobs | submit login

More like 'analyst' in how easily it is thrown around. Calling a built in function in python or R is just about equivalent to calling one in Excel. Sure, you can claim that folks need to know more about what is going on, but honestly, how many have actually gone through the work of deriving the functions they're calling to begin with?



I'm wondering how useful deriving functions yourself is in the age of computers. I feel like knowing axioms about the mathematical structure you're dealing with and how to do proofs is very important, but it always struck me as odd that were still stepping through complex applied maths functions manually in pen and paper. Programmers don't bother say, writing our own hashtable implementation more than a handful of times in our lives, do we? Does forgetting how to derive hashtables mean we won't know how to use them effectively?

Genuine question - more than happy to be proven wrong.


>stepping through complex applied maths functions manually in pen and paper.

We do that because:

A it helps us understand them better

B it teaches us how to think, the way Feynman said "Know how to solve every problem that has been solved". Granted, it seems pointless to work through what is easily accessible through machine BUT it teaches how to solve new problems. I wouldn't consider using NumPy or Matlab as the first step towards solving a new math problem.

It's like using Assembly vs using a higher level programming language.


Completely agree. There's a lot of nuance in these algorithms, they're not as cut and dry as simply calling a package method and oftentimes they aren't optimized to your use case. I work in Machine Learning, specifically on NLP, and it is really obvious when interviewing potential employees who knows what SVD means and who just know the NumPy function. Most "data scientists" I've interviewed fall in the latter category.

edit-This is of course completely anecdotal experience.


I suppose my real question is - how many times do we need to do it? Once we have stepped through it by pen and paper once, or derived the result, how many times do we need to keep doing it? My experience in that mathematicians will do this again and again and again.


I agree. A smart data scientist doesn't waste their time reinventing the wheel: they build off the hard work of others. When necessary they can create what is needed, but they don't do so typically.

They are both more and less, in my experience, than statisticians (more flexible and solution-oriented, less rigorous and classical), than analysts (they can do more, in general, but a great analyst will be better at analysing and visualizing), than developers (they know more stats, less software engineering, and have great patience for wrestling data into submission). I like to think of data scientists as people who combine the skills of all the above to solve hard problems which exceed the domain of any of specialty (analyst, statistician, developer). It doesn't mean we're amazing at everything, just that we are effective, flexible problem solvers.

And for the record, machine learning, statistical modeling, and data mining are just a small portion of the pie. Being good at modeling and machine learning will not remotely guarantee success as a data scientist.


I respectfully disagree. While I understand where you're coming from, I don't agree with your distinction between an analyst and a scientist. Given the data scientist's typical compensation and expected experience, there should be a higher bar set for them that does include developing solutions from base. I understand the use of utilities, but far too frequently I find people who rely on packages to do their work don't really understand what they're working on (they often don't realize the underlying assumptions that the package writers made for them either). With your description of the tasks for a data scientist, I would label this as a Data Analyst's work if I was hiring one.

I could of course be wrong and have a bit too narrow of a view from my particular subfield.


>how many have actually gone through the work of deriving the functions they're calling to begin with?

Why would you waste your time re-inventing a wheel.

A good data scientist isn't good because he/she can ace shitty trivia, he/she is good because they know the right question to ask.


That's only part of it. A good data scientist is also good because they know how to answer hard questions.

In those situations math isn't "shitty trivia," but instead a tool to be leveraged against those hard questions.

You can consider the derivation of SVD to be shitty trivia while throwing np.linalg.svd around while engineering features. That's fine! Good luck visualizing that data in a meaningful way, or dealing with non-linear data, if you're ignoring that "shitty trivia."


> dealing with non-linear data

What is non-linear data?


Data derived from non linear inputs.

That is to say problems that can't be expressed by linear functions.

I.e. Y= mx + B is a linear function.

Y= ax^2 + bx + C is a polynomial (non linear) function.

Linear Programming (LP) involves solving a series of linear equations (something like Excel's Solver can do this).

When you are dealing with non linear functions you need to use a method such as Sequential Quadratic Programming (SQP).


Using a term like nonlinear science is like referring to the bulk of zoology as the study of non-elephant animals.

— Stanislaw Ulam

https://en.wikipedia.org/wiki/Nonlinear_system




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: