I teach engineers for a living. I struggle to see how this is not just a straw m...

hn_throwaway_99 · on Jan 14, 2021

It's not a straw man, I've seen it clear as day in several companies. When it comes to data science, it's "garbage in, garbage out". I've seen companies do lots of "data science" with a bunch of data scientists skilled in python and jupyter notebooks, only to discover a ton of work was useless because the incoming event data was tagged incorrectly due to a bug.

The actual process of collecting, aggregating, cleaning and verifying data is a hugely important skill, and not one I've really seen typical data scientists possess.

finnthehuman · on Jan 14, 2021

>The actual process of collecting, aggregating, cleaning and verifying data is a hugely important skill, and not one I've really seen typical data scientists possess.

Then they are not scientists. They have a label "scientist" but lack of rigor of actual science.

I don't see why changing the label to "engineer" would suddenly make them have rigor.

avs733 · on Jan 14, 2021

Right?!

This is sort of the meta failure of the argument. They are arguing that people's data skillsets are wrong. To make that argument they are analyzing based on the wrong variable in a data set.

avs733 · on Jan 14, 2021

I have experienced the same thing...but I just don't think it has anything to do with whether the positions are labeled data scientist or data engineer.

And I would warn you from my experience teaching statistics to undergraduate engineers...they are not going to be much better. Regularly get 'hey we have this data what test can we run?' 'what are you trying to show?' 'we don't care we just need to run a statistical test' conversations.

hn_throwaway_99 · on Jan 15, 2021

To be clear, I totally agree with you. I wasn't just arguing for changing labels, I was arguing that there is one set of "engineering" focused skills (e.g. building data pipelines, data warehouses, tagging events, etc.) and a different set of analysis skills (e.g. machine learning, statistical tests, etc.) and you shouldn't over-index on the latter without having enough of the former.

bonniemuffin · on Jan 14, 2021

I suspect this may actually be an issue of school vs real world rather than scientist vs engineer.

Data in the classroom setting is pristine and beautiful; data in the real world is messy and buggy. You have to get burned by buggy data a few times (or maybe a bunch of times) in the real world to learn to look for bad data smells -- I don't think schools effectively teach this kind of intuition, regardless of whether the students are training as data engineers or data scientists.

If data scientists are spending more time in school getting advanced degrees, they're not getting as much exposure to buggy data, whereas data engineers with a BS and a few years of industry experience would already have built up this skill.

avs733 · on Jan 14, 2021

>Data in the classroom setting is pristine and beautiful; data in the real world is messy and buggy.

I got to take over our department's undergraduate statistics course a few years back.

The first change I made was all homework, tests, and projects used real data set. I intentionally have them collect bad data (they don't know its bad before hand). First day of class we collect data using the board game operation...I give basic instructions and then halfway through ask everyone to stop and agree on how they are entering data for the variable of 'success or failure' of the surgery. Oops...

In my experience teaching the course, the reason the students (engineers) find statistical reasoning hard is:

* They have never been given anything 'broken', everything is curated to avoid things not working. The result is they think data has inherent meaning. A right answer.

* Their entire learning experience has been stripped of context and the need to make decisions with information. They can give me a p value but are terrified (not unable, just unwilling) to interpret it or give it meaning.

* They have never encountered the concept of variability...everything is presented as systems with exact inputs and outputs.

When I work with postdocs, I sometimes (less frequently) encounter many of the same challenges. Data is treated as sacred and external and inherent. It's wild to me.

jariel · on Jan 14, 2021

So I think that the delineation between the scientist working with the content, and the Engineers who actually provide the mechanics for it is very fair.

If there is a question mark here - it's really how much value are we deriving from all of these data people?

Where is all the ML that's changing our lives? Search, Alexa and TikTok, I can see it.

In the future obviously vision systems for autonomous cars etc..

But I'm really wary about the heavily decreasing marginal returns after that.

It will surely change the world, but I think in specific areas. Most of the entire field seems like an optimization on something rather than anything new.

Washing Machines feed up immense amount of labour and toil. Alexa telling me the weather is not.

serjester · on Jan 14, 2021

I used to work at a legacy automaker and you’d be shocked at how much ML has changed certain areas of the business. It used to take an entire department to sort warranty claims and it’s now mostly automated. Aluminum part defects are now spotted automatically on the plant floor. Don’t even get me started with telematics data.

Most software isn’t consumer facing but just because you don’t see it doesn’t mean it’s not changing things around you. ML tends to be overhyped but your assessment is too pessimistic.

mateo411 · on Jan 16, 2021

I wouldn't think of a system that would automate the processing of warranty claims as ML. That's mostly applying the policy/rules to each claim.

However, finding defects in aluminum parts that involves using computer vision, would absolutely be a ML solution.

serjester · on Jan 19, 2021

There's millions of claims and thousands of car parts with all sorts of underlying issues. Unfortunately a rule based approach isn't feasible.

avs733 · on Jan 14, 2021

Most engineering and science jobs aren't a binary as much as they are a spectrum.

If the article is trying to make a point about skill development and diversification, I'm totally on board. Bifurcating the roles instead is going to be less effective.

To the value point...my sense has been we are seeing the Webcommerce 1.0 bubble Machine Learning edition. Lots of uses of it, not all of them have value. I am excited for where we will be in 10 or 15 years, but I suspect the difference will be huge. If you put me to a guess, I would say better data handling practices and ethics will likely be the linchpins of value creation vs. using tools for the sake of tools.

NeutralCrane · on Jan 14, 2021

The vast majority of applications of machine learning that is changing the world isn't happening on a consumer level. Its happening in factories, warehouses, farms, logistics chains, etc.

antipaul · on Jan 14, 2021

The article is so true, my latest mantra at work is “engineering is more important than data science”.

Everyone is buzzing about the latter, and few even realize what is the former.

avs733 · on Jan 14, 2021

eh...I think this can be analogized to what we already see in code...

You need architecture, you need backends, you need a front end, you need product design...all with data.

Why are computer scientists computer scientists not engineers? Why is computer science about the code side? Why did computer engineering end up being more on the hardware end of the spectrum?

Words, especially newly coined terms are pointers to meaning. That meaning is socially mediated, it is not inherent.

You're saying this (adn I think the author is too) because there is a need for this group of people to look beyond titles to skillsets, and the existing titles carry linguistic baggage of the difference between science and engineering that has existed for decades.