Hacker News new | past | comments | ask | show | jobs | submit login

I've seen this a LOT in my professional group. Many people (who often have PhDs!!) I interview for data science positions seem to know absolutely nothing about the algorithms they use professionally, or how to optimize them, or why they are a good fit for their use case, etc etc etc. I usually see through LinkedIn that these same people are now in impressive-sounding positions at other companies.

I had one candidate who was in charge of a multi-armed-bandit project at their current company. I asked them how it worked, and how they settled on that. Their response was "you know, I'm not really sure, the code was set up when I got there". He had been there for over a year, and could tell me nothing!

> A common one I've seen quite many times is people using a flawed validation strategy (e.g. one which rewards the model for using data "leaked" from the future), or to rely on in-sample results too much in other ways.

It's funny you mention this, we have a direct competitor who does this and advertises flawed metrics to clients. Often times our clients will come back to us saying "XYZ says they can get better performance", the performance in this case being something which is simply impossible without data leakage or some flawed validation strategy.




Where are these jobs where you can interview this badly and still get hired because in my experience DS interviews are extremely hard and often expect people to have very high Stats skills as well as Data Structures/Algo skills at FAANG level.


I think the issue here is that "data science" encompasses two very distinct branches of work. One answers to business needs and the other produces data based solutions for the product itself i.e you might have a data scientist who A/B tests your website design so you minimize your churn rate and the other is the team at uber eats who maintains the recommendation engine. While the distinction might not always be as sharp, the former makes up the bulk of data scientists in the market (and I suspect the OP is in that boat) with comparably simple interviews while the rest is the 5 step interview process with hackerrank test you are more familiar with.


Yes we definitely fall into more traditional "predictive modeling" data science than deep learning / recommendation algo roles.


I think the distinction is not so much on the domain/application. Rather it’s just that many Organisations decided to jump on the data-science wagon and don’t quite know yet for what qualities to look out for during hiring. And in second order as long as the predictive model is not included in a business process the over fitting is not as easily visible to the layperson stakeholders (and junior data scientists).


These days if you have a company selling cat food or rivets for aerospace or providing taxi swrvice to a random city, or whatever, they might have a few data scientists helping them make "optimized" business choices. Obviously they won't have a very adcanced recruiting process for that.



It's different at a lot of non-tech companies. I'm in the nonprofit world and my interview barely had any technical component at all.


To be able to tell whether a candidate is good, the hiring team has to be expert! No chicken, no egg.


The ML interviews at FAANG are absurdly simple. Design YouTube recommendations for which canned answers are readily available.

A simple stats question. If I double the number of samples, how much will the confidence interval change? Most FAANG ML engineers can't answer this question.


And then reverse a binary tree?


What's your point? The question about sample sizes is arcane trivia?


For someone on an ML team? Yes. You could spend years building computer vision models and not once think about sample size.


The Dunning-Kruger effect is strong here. "What I know is what makes me the expert. What I don't know is irrelevant".

The definition of Standard deviation is in chapter 1 of Stats 101. https://www.google.com/search?q=standard+deviation&tbm=isch Apparently, asking Stats 101 chapter 1 question of a so called "Data Scientist" is too much of an irrelevant question!

> expect people to have very high Stats skills

Or as you have made apparent, expect people to have ZERO stats skills!

Some of the innumerate activities I have observed in "expert" data scientists and ML engineers who have years of experience without once thinking about sample sizes

1. Using A/B tests to accept the Null hypothesis instead of rejecting it

2. Squandering away 30M $ in annual revenue because they wanted to avoid a situation/meeting in which they might look like they don't understand statistics. This is hilarious because they simply nodded their head as if they understand all the calculations and then simply dropped any other meetings or followups and left 30M $ on the table

3. Not refreshing a key revenue generating model for 18 months because the were "trying to figure out" why the AUC was improving when the performance on "golden set data" was dropping

4. Using thresholding and aggregation to produce poor quality distorted training data of rich perfectly sampled data

5. Trying to use A/B tests to estimate impact even when the control and variant are not independent

All of the above at FAANGS! My coworkers in a non FAANG company were much more sophisticated. These are the kind of candidates a "build recommendations for youtube" interview selects. Template appliers.

The list of stupidities goes on and on! But yeah, none of them think that a basic understanding of statistics is necessary for work. The good thing about Javascript engineers is that they don't have an understanding of Statistics and are aware of it. However the DS/MLEs are unskilled and unaware of it.


> clients will come back to us saying "XYZ says they can get better performance"

Oh yes, good old marketing.

Along with buying off "Industry Awards" – hey, we're objectively the "Best cybersecurity company of 2022!" With a matching "platinum/gold badge" to go on our website! Or buying a place in the "10 Best Products for X" and "Independent X-vs-Y Comparison", another classic.

Because it works. Are your customers not sophisticated? Are they unable (or unwilling) to follow up on defects and outright lies? Or reality simply doesn't matter all that much to them? Humans LOVE a good story more than reality, after all.

Then your contribution as an engineer to your company's success, and hence its longevity and your job security, is strictly inferior to that of marketing. Not everything is the work of evil marketers – a lot of the supplied BS is in response to an existing demand for BS.


> Are your customers not sophisticated? Are they unable (or unwilling) to follow up on defects and outright lies?

You would probably be depressed if you knew who our customers were, and how technologically unsophisticated they are.


I manage at a client an application which is the actual leader (most top right and by far) in Gartner magic quadrant for its category, and for years, I have never seen a product this bad, where the implementors and supports are clueless of their own product. And obviously it's buggy as hell.

Lies and deceptions.


Gartner is it’s own confidence trick. They don’t rate you unless you pay them to rate you. It’s manufactured reputation extortion by another name.


The people who make the decisions don't use the product. That's almost always the root cause of this stuff. I worked on a system for my state - another vendor came in and 'took over' all the functionality my system handled. Supposedly. 7 years later, my system powers the exception to the mandate to 'use system X', because... they refuse to provide the functionality that they sold the state. Contractually, "we provide feature ABC", but the reality is.. they don't. I even provided them our code to use - it was paid for with public money, they should just integrate it and then sell it to other people to make their product better. They can't even be bothered to take the code and integrate it... they prefer to continually lie and say "we provide feature ABC" when... they don't. It's beyond insane. A large majority of the people on the ground know it's bad/lacking/broken, but ... they have 0 voice in the matter.


With a handle like Foobar8568, why not be more specific? It sounds like you might have a good story.


> XYZ says they can get better performance

Can you do your analysis both ways? Give your customers both, then tell them you method is more modern, but if they want outdated methods you have those too.


Is this the US? I'm concern about the extremely low level or the bar to get a PhD in Europe... and I'm wondering if that is a global problem, or only Europe.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: