We don't need data scientists, we need data engineers

lordnacho · on Jan 14, 2021

My experience is in quant hedge funds, where sometimes you get some guys who develop the strategy and some guys who put it into production.

Yes, I do admit there can be some specialization in terms of time spent on science vs engineering.

But you really need people who understand both. Particularly if you have a strategist who thinks his job is just to dream up profitable models, he ends up carving that role out in a way that's detrimental to the rest of the team. You get people who just don't appreciate that there's other work to do than finding models, and that models depend on that other work to function.

You also get a huge prestige gap, because inevitably management will think that there's a magician and a blacksmith. One guy needs to be paid a lot, and the other guy needs to be paid enough.

These effects feed each other. Magician will say "where's my data" and expect blacksmith to make it, promptly. He won't do it himself, because spending time on mundane stuff makes the magic disappear. And not doing it yourself, or taking the time to understand it, will eventually lead to problems with the magic.

wpietri · on Jan 14, 2021

> Particularly if you have a strategist who thinks his job is just to dream up profitable models, he ends up carving that role out in a way that's detrimental to the rest of the team.

My god, this. These people make me bonkers. Especially because I feel like I have a bit of this tendency myself, the desire just to think big thoughts and do no actual work. Happily, I long ago learned that ideas were approximately worthless without labor, and that I anyway had much better ideas when laboring because it forced me to engage with the details.

And yes, those people can poison a team. My best working experiences have all been with people who a) all valued actual work and b) believed that everybody could have good ideas.

jnwatson · on Jan 14, 2021

"I'm the idea guy" out of someone's mouth is the stark red-flag warning that their net contribution is 0.

the-pigeon · on Jan 14, 2021

Ideas are so cheap and easy.

Implementation is a long hard road. And where you learn your idea was vague enough that it had almost no value. And only through painstaking iteration can you turn it into something with value.

mbeex · on Jan 15, 2021

> Ideas are so cheap and easy.

I doubt this.

eftychis · on Jan 15, 2021

It is true however.

Consider it was easy to bring an idea in this world, and the hard part was the initial first thought; writing a paper/article painstakingly rigorously would be unnecessary. Writing a book would be a breeze and no author would ever go through more than a single draft. The idea was born beforehand, was complete correct and perfect, so putting everything down with words is just a matter of transcribing. An organization would not usually hire engineers with multiple degrees, but simply writers or an automated system that would listen and transcribe the idea.

An idea is truly born and exists through a lot of effort and iteration and redefinement and refinement.

P.S. There is a hard and subjective issue of where the line is drawn between ideation and minor uninteresting and menial maintainance/get the money in the bank work.

We have to recall however that a parent brings their child to life and tags along through all the effort and work. A child is the result of years of high to low level of unpleasant work. Drawing an arbitrary line of when you shall stop giving as a parent is naive and egotistical.

jxramos · on Jan 15, 2021

Very true. I think I recently deliberated on this point about ideas being cheap because they can so deceptively sit in the realm of the ideal where everything is unconstrained and unchallenged. Grounding ideas and putting them to the test is where you begin to discover all the boundaries and tradeoffs and messy details that must be sorted through. Real work has a way of illuminating all those sticky messy points that must line up first for an idea to have a legging in the real world. Our imaginations are free to come up with all sorts of inconsistent and conflicting ideas that just never come to the foreground because of the centration on the beautiful perfect idea.

Threeve303 · on Jan 15, 2021

The best way I've heard this described is... Imagine the best painting you can come up with or have ever seen. Now go paint it.

mbeex · on Jan 15, 2021

What, if your ability for imagination is lacking? What, if this would be true in general? What do these implementers actually implement then?

I also think, I have another definition of 'idea' than most people here. It is common especially in software development, to see the way as the destination and incremental change as development caused by some magical good-ending evolutionary process. This includes a quite unsubstantiated belief in getting the right ideas automagically along the way.

You need both things. A good idea does not descent from heaven. Also, it has history in the person, creating it. This history is hard work in its own, inner fighting against the common and the environment. Jumping out of the box, all these pure implementers are unable to do.

I am working for 25 years in the industry. I have written real code from the beginning. But I am also a mathematician and can say I had a few good ideas along the way. I am proud of it.

wpietri · on Jan 15, 2021

What's the market price for an idea?

If it's greater than zero, let me know. I have notebooks full of them. Business ideas. Project ideas. Political ideas. Social ideas. I generally can't give 'em away, much less sell them. Why? Everybody has their own ideas, and they like 'em better. And the ones in my notebooks don't have what really matters: validation.

woah · on Jan 16, 2021

I’m interested, do you have a blog?

wpietri · on Jan 17, 2021

Not really! I have an old personal website, but I'm most active on Twitter, where you can find me as williampietri. Thanks for asking.

xkcd-sucks · on Jan 15, 2021

"Pure" ideas are cheap and easy; good ideas require a very thorough knowledge of implementation which is usually achieved through experience

Rury · on Jan 15, 2021

Right, and to add upon this - that's because all ideas originate from the senses.

To demonstrate: try to imagine a new color you've never seen before that's not in any way associated to any of the colors that you've seen. (it's impossible)

Or think of it this way: You could explain a car to someone who has never seen a car before, but only so long as the ideas used to explain the idea of a car (e.g. wheels, doors, windows), have already been familiarized to the other person. Otherwise if those ideas weren't familiar, such as a wheel, you'd need to also explain what a wheel is. And if the concepts used to explain the wheel wasn't familiar (and so on), you'd eventually hit a point where you must expose the idea(s) directly to their senses (eg show them), otherwise they will never understand what you're talking about.

So all ideas come from the senses, and your minds ability to combine these "pure" ideas that you've sensed. "Pure" ideas are cheap and easy because they're the simplest ideas - they're what you directly sensed. To have good ideas, you need to combine many "pure" ideas together, hence why those who have experience working closely and thoroughly with something, will often have the best ideas associated to that something...

skocznymroczny · on Jan 15, 2021

http://www.lizengland.com/blog/2014/04/the-door-problem/

Tehchops · on Jan 17, 2021

They're a dime a dozen(and that's a generous appraisal).

https://www.reddit.com/r/RoastMyIdea/

mumblemumble · on Jan 14, 2021

Or even negative. I've seen situations where the idea person is so busy being Mr. Toad that everyone around them is regularly scrambling to clean up messes and it ends up being a constant distraction from actually pushing projects through to completion.

johnsonjo · on Jan 15, 2021

Yes I agree, idea guys are bad, but I’d like to make a contrast between two words that are often thought of as synonymous. There’s the ‘idea’ guy and a man/woman with ‘vision’. The difference between these two is that someone with a vision enables others to see and feel what they see and feel about the future of a project in a manner that produces action on everybody’s part. The visionary man/woman would act in accordance with their vision, because there end goal is to create and/or share something truly valuable to others. I could think of a couple of people with visions many of them startup founders: Elon Musk, Steve Jobs, Peter Thiel, Paul Graham, and Sam Altman. Visionary man/woman take risks on their vision and put their plans into action. Sometimes they’re the business manager, or the scientist, and at other times the engineer, but in some way shape or form they lead others toward a goal through their tremendous clarity in their own and other’s collective vision.

hawktheslayer · on Jan 14, 2021

It's similar to the expression I hear frequently at my company--"I don't have all the answers, but I have all the questions".

WheelsAtLarge · on Jan 15, 2021

True, Ideas are easy but that does not mean they are next to worthless.

Idea people that just blurb out thoughts that go no where are all over the place but there are a few that convince others that their idea is useful. They don't necessarily need to do the work to make their ideas a reality but they need to convince others that their ideas have value.

We need both dreamers and workers.

If you are an idea person figure out how to get others to believe in what you are dreaming and the idea will become a reality.

Think Steve Jobs, he was the idea guy that made his dreams a reality. People want to believe that he was some kind of super engineer or programmer but he was the one that was able to get all the super engineers to do their best to develop his ideas.

skocznymroczny · on Jan 15, 2021

It's very prevalent in amateur gamedev communities. Every day there's someone who played some game, and has some ideas how to make it better. All he needs is a few programmers and artists to make his vision in reality. Usually the kind of projects he wants to make are big AAA games in whatever is the trending genre at the moment (used to be MMORPG, now it's about battle royale). When confronted they often get defensive and don't want to accept the reality that such big projects are made by hundreds of people with multimillion budgets, not by few guys working in a basement, no matter how dedicated they are).

wpietri · on Jan 16, 2021

Perfectly believable. I used to coach at startup events, and the dreamers who wanted to make the next Facebook were so exhausting. I eventually wrote up a a stock answer to give them: https://www.quora.com/Is-it-foolish-to-go-to-Startup-Weekend...

What really bothers me, though, is not the randos with this attitude. It's that some of them will grab enough money or power that they'll be able to live out their fantasy. And woe be unto any who hop on board. Quibi being the latest big example.

Aeolun · on Jan 14, 2021

That’s much better than the alternative. Where their ideas are crap, and their contribution is a direct detriment.

noodlenotes · on Jan 14, 2021

Also, a lot of data scientists find the science fun and the engineering boring. But they have overlapping skill sets - if you aren't good at one, you're probably not good at the other either. Somebody who shows up to a team with the goal of only modeling and pushing all the dirty engineering work to their teammates is basically a worst case scenario because

1) They probably aren't going to produce good models since they're not sensitive to data nuances, but now they've taken over ALL the modeling work.

2) They bring down the job satisfaction of everyone else on the team who would like to be doing at least some modeling.

3) They're sucking up the prestige that should be distributed over the entire team and management thinks they should be paid more for work that it turns out everybody thinks is more fun anyway.

My number one advice to entry level data scientists is to not be this guy. Don't give your interviewers the impression that you won't do your own engineering work because they won't want someone who brings negative value to the team.

NikolaNovak · on Jan 14, 2021

Here's the tricky thing:

I love your post; I agree with your post; but it takes a 90 degree turn at the end:

"My number one advice to entry level data scientists is to not be this guy. "

Everything most people are saying here indicates it's GREAT to be that guy. You're paid, you're respected, you get the fun parts, you love your job and it's pretty safe. It just happens to suck for everybody else including team and business... but it feels that in a practical sense, gist of everybody's actual unwitting message is "BE that guy, if you can" :-<<<

ramraj07 · on Jan 14, 2021

If you're that guy and you have a secure job it means you write models no one ever sees in a company which doesn't know or respect data, or you work in some data science factory as a small cog of a fairly well oiled team. The latter does happen from time to time, but it's often the former.

In every other place, your job is on the line to be erased because people will soon realize no one wants a wise-ass who doesn't actually contribute much to the bottom-line in the end.

proverbialbunny · on Jan 14, 2021

It sucks being that guy because everyone else ends up hating you.

Depending on the work environment it's not a stretch to see software engineers complaining to management, sometimes going as far to create rumors to get the jr data scientist fired.

So, no the grass is not greener. It's best to not be that person. This is why I go out of my way to prevent that scenario when I lead a team.

urthor · on Jan 14, 2021

Not really.

You just get seen as the product owner/project manager.

proverbialbunny · on Jan 14, 2021

That's a really good point.

I tend to be seen as a product lead / owner / stakeholder, so I feel like I'm being called out. lol

I think one difference is the software engineers see me as someone who is helping them by making their life easier. I'm not just throwing work at them blindly. I'm working with them. Also, they like it when I include them in the data science brainstorming sessions to solve difficult problems. I guess it's seen as exotic or something, but whatever the reason, they really love to be apart of it.

rhizome · on Jan 14, 2021

I think it's probably seen more as just being a decent boss.

RobRivera · on Jan 14, 2021

easy to ignore hate when youre pulling a 300k bonus at comp season and can jet to st. barts to go deep sea fishing and drink claws.

proverbialbunny · on Jan 14, 2021

Data scientists do not pull that kind of bonus. Today many of them get paid less than the data engineers do.

RobRivera · on Jan 14, 2021

news to me, and welcome news to hear at that since I'm more in the data plumbing and packaging business, not algo publications.

my personal data points are from folks on buyside. trading margins have been downward trending for years

proverbialbunny · on Jan 14, 2021

Quant research work isn't data science work which is probably where the mix up is.

On the quant side bonuses are distributed to the team.

noodlenotes · on Jan 14, 2021

Specifically, this is my advice to ENTRY level data scientists who are trying to find a job and compete against a flood of candidates hot off the bootcamps. I guess once you get your foot in the door, you can be that guy if you want. It seems to be a successful strategy at companies without technical leadership.

urthor · on Jan 14, 2021

The flipside is that there are 4x the job posts for data engineering as there are for "that guy".

Companies understand that you can't hire five of that guy and get things done. If you have 5-8 years of experience as a technical product manager/data science combo then you are very happy as the magician. But very few magicians are being hired out of college, and a lot of "software engineers in data"

mywittyname · on Jan 14, 2021

Pretty soon companies are going to start realizing that the 4x DEs can largely replace that 1 DS, and they will be more than happy to do so.

I went into DE because I was kind of forced into the space, but I'd strongly prefer doing full-stack DE. Anymore, I still have the opportunity to build models, they just aren't client-facing stuff, but instead are kind of Data Plumber Bots that help me do my job better so I can waste more time building other fun bots that I can't otherwise be paid for.

Seems like a waste of resources, but my manager could have another DS tomorrow, but my role would take months to fill.

proverbialbunny · on Jan 14, 2021

Back in the day (3 years ago and earlier) at every company I was at we used the term 'productionization' to describe someone making a model aka a proof of concept, and then someone else, a machine learning engineer or some kind of engineer rewriting it to work on a server.

This process is horrible, and not just because it doubles the work, but because it introduces bugs. When the version up in the cloud does not work as intended, is it a bug in productionizing or is it in the original model? Fixing bugs in this space can take longer than the initial model development and the initial productionization. Many companies have failed over this.

So what's the solution? In recent years the industry has turned to deployment over productionization. The idea is you deploy the model to the cloud directly. Both engineers and scientists work together on the process. The scientist defines what cells in the notebook get called for the final algorithm (as there are EDA / plotting cells and documentation cells too). The engineer sets up the amazon IO stuff, database login stuff, and monitoring services. The scientist works with them to create tests and what to monitor so they get notified if there is a problem with the service.

No more mystery bugs. The model gets directly deployed, the work load is minimal, and it brings people together. The downside is often the engineers and scientists are on different teams, and sometimes companies will not let them merge for a while, so it becomes a telephone game instead of everyone feeling like they're on the same team working together. imo moving the scientist to the engineering team during this time can be helpful, or moving the engineer to the data team.

Some companies have services where entire notebooks get put up into the cloud and all of it gets called, so the scientist has to write the notebook in a way that works for the cloud. It's rarer, but how I prefer it is a wrapper py file is created that calls just the relevant parts of the notebook, kind of like a header file. This process works well for me, but it as far as I know it is not standardized in the industry yet.

In short, if you end up in this situation, there is a better way. Import the notebook into a .py file or into the cloud, don't rewrite it. This (hopefully) will remove this scenario you're describing (comment this is replying to) so those issues will become a historical footnote.

Jugurtha · on Jan 15, 2021

Oh my! We were exactly like this many, many years ago. See reply to this thread[0].

The way I view it is frictions and impedance mismatch. People lived in several universes and there were many "taps on shoulders". Data scientist tries to work on a project but the system upgrade messed up their compute environment and their GPU isn't working anymore. Data scientsits ssh'ing into a "powerful workstation" to have their notebooks run on more RAM or more powerful GPUs, having a certain convention to start their notebook servers with specific ports.

Building models and then wanting to show results to the client and asking a colleague. Set up a VM on GCP, write a small application, scp the model to the machine, create an environment with the same dependencies to load the model, set up authentication on the machine. Email the client. Client doesn't reply in time. You have a bunch of VMs.

Meanwhile the data scientist has produced another model with a notebook and they want the engineer to deploy it. Others want to reproduce it but have the same trouble with running the notebook (libraries, etc.).

A complete mess. We ended up building our platform[0]. We wanted our PhDs to do what they were good at, and we wanted to handle a lot for them. In the same time, we wanted our more engineering inclined colleagues not to do that work themselves, and we let the platform do many of these things (building images, deploying, scheduling notebooks, etc).

- [0]: https://iko.ai

carabiner · on Jan 14, 2021

How do you maintain notebooks in production? You use papermill? What about versioning?

proverbialbunny · on Jan 14, 2021

Most libraries load entire notebooks from top to bottom when executing, and I believe papermill does too. (Please correct me if I'm wrong, as I've not used papermill.)

This is great for making a dashboard, a report, or some other kind of analytics, but when it comes to a service the customer uses, you typically never want to load the whole notebook. This is where the industry standard way of loading the whole notebook tends to fall on its face.

What we do is the cells that will end up in prod are written as functions inside of the notebook. This helps reduce globals when writing the notebook, so it is good form when prototyping, but also it allows just those functions to be called from the notebook, instead of running the entire notebook.

You will probably want to write your own library to do this, but in the mean time there is one that works for this purpose https://github.com/grst/nbimporter (Ironically the author doesn't recognize this use case.)

Using nbimporter you can import a notebook without loading it. You can then call functions within that notebook and only those functions get loaded and called.

In my notebooks I have a process function which is like main(), but for for feature engineering. On the prod side the process function is called from the notebook. Process calls all of the necessary cells/functions for me in the correct order. This way the py wrapper only has to call one function, then the ML predict function gets called, so it's pretty small on the .py wrapper side. There are tests written on the .py side, IO functions and what not too.

Data engineers love their classes, so it's easy to write a class that calls the notebook, and best of all calling a single function this way does not load globals, so the data engineers are happy. It's a nice library, because otherwisw you'd have to write your own (which you may end up wanting to do).

This way if the model doesn't work as intended in production it's my fault. We log everything, so I can run the instance prod caught on my local machine, figure out what is going on, update the model, and then it can be deployed instantly.

Version numbers on the engineering side I can't comment on as they have their own method, but on my end the second the model writes to a database then I strongly push for having a version number column or a version number metadata table in the database, so it's easy for me to access for future analysis.

zwaps · on Jan 15, 2021

Is rewriting your functions from notebook to a py file really something a research scientist can not do?

Or is it infeasible for some other reason?

I'd imagine many data scientists want to publish their work as python packages or libraries during their PhD, so they should be familiar with writing classes or functions that work at a bare minimum.

proverbialbunny · on Jan 15, 2021

>Or is it infeasible for some other reason?

I've had projects where the model doesn't perform as intended. Because one person was making the model and another productionizing it, it was hard to identify where the performance difference was coming from. Was the bug in the model itself or in the productionization process itself? It took longer to figure it out than writing the model or productionizing it the first time.

It takes so long to deal with these bugs because the model gets changed, so then prod gets changed to match it. Changing prod (rewriting functions) has the potential to create a new bug, so you solved one but added another, and still can't identify if it is in the initial model or from prod. This continues over and over again, problem after problem.

It's noteworthy to mention if one person is doing both the model building and converting to production this problem is significantly reduced, but is still a problem. The problem is exasperated from the lack of domain knowledge, being that both people are in the dark from the other person's process.

Furthermore, what if you need to update the model? Do you rewrite prod doubling or tripling your work? Do you take that risk to introduce another potential hard to diagnose bug, even if you're the one doing both roles?

Or do you automate the process, so the same code being developed on is the same code running on the server at the end of the day? No more bugs, half to 1/3rd the amount of the work. Why not do it this way? It's soo much easier to debug a problem in prod this way. You can take the log data and spit it into the local machine and know what you're seeing is what the user saw. No more guessing where the problem is.

One way to think of it is software engineers would think it is absurd to write their code, then hand it off to someone who doesn't completely understand it, to rewrite it in another language and put it up on a server. "Why would you ever want to do that?" they would think, and I agree with this sentiment. It is absurd to have someone (even you) rewrite your work unless you have no other option, and you do have other options. Transpilers are a thing if prod needs to be in another language. I've written models that have to go onto embedded environments. I know these challenges all too well.

>I'd imagine many data scientists want to publish their work as python packages or libraries during their PhD, so they should be familiar with writing classes or functions that work at a bare minimum.

It depends if you're writing a library, like doing ML / machine learning engineer type work, or you're solving a domain challenge and writing an end to end solution for that problem, and are using standard cookie cutter ML for your phd, aka data science type work.

One leads to an engineering role, and not surprisingly writing a library for it is ideal, so other people can use it. Another leads to a data science type role and not surprisingly showing code snippets in your paper with plots / EDA and all, the same way you'd write a notebook at work, is ideal.

I'm a data scientist, not an ML specialist (though I have invented a new form of ML for work once, but that was just once and not my primary thing). I specialize in end-to-end domain problems I'm solving. I'll write a notebook to solve it, not that I have to. I've been in the industry longer than notebooks were a thing, so I'm fine doing it the old fashioned way. What I am not is an MLE. I don't need to write libraries for other users to use. I don't need to write custom ML. I don't need to do that engineering bit. To be fair, I have, and I know when it's the right tool for the job. On stackoverflow all of my points come from helping people with the glue parts between C++ and R, so they too can write libraries for R. I'm proficient in modern C++ too. I can do the library ML type work, and I have enjoyed it, but I really do enjoy solving domain problems more, so it's what I'm doing, and it's what the previous comments in this chain you're responding to are all about.

test-account · on Jan 15, 2021

Exactely why I left Dolby. Didn't want to be a part of that process.

ramraj07 · on Jan 14, 2021

It's hard for most people entering this field because the incentives are perverted - there's this perception that DS is sexy and you actually don't need to know coding that much (just enough to scikit learn). Thus people with pipe dreams of tweaking model hyperparameters to spin gold come in and get a rude awakening. Not a lot unlike people flocking to become actors to LA.

chadash · on Jan 14, 2021

I worked in investment banking (as an analyst, not an engineer), so very different part of finance, but this was my take as well. Companies might love to talk about how important engineers are, but at the end of the day, if you can't directly link someone to revenue, they get viewed as a cost center and take on second tier status in the organization. Then the same companies complain that they can't find enough (or retain) engineering talent. Not many places get the balance right. Silicon valley treats engineers well because for the most part, the value they bring is more obvious (and also, they don't threaten the existing hierarchy in the company). Curious to hear if anyone has had the opposite experience.

isolli · on Jan 14, 2021

Yes, quite a few developers left our investment bank and went to work for our suppliers (of trading software), stating they'd rather work somewhere where they're seen as value creators rather than a cost center.

AtomicOrbital · on Jan 14, 2021

I worked for 15 years as a software engineer at Morgan Stanley where they valued the process of taking a 3 martini lunch idea into a production platform so value of engineers was recognized and rewarded as such ... its somewhat easier to whip up a new financial wrinkle its a whole other level of magic to design and implement that idea when it takes 60 software developers 3 years to get that idea to market before the rest of the street ... of course the IT department was/is the largest budgeted portion at the entire bank and for a good reason

mushbino · on Jan 14, 2021

Engineers get paid well in SV because they are in demand, have lots of employment opportunities, and therefore are more difficult to retain.

mywittyname · on Jan 14, 2021

And because their contributions can be tied back to revenue. You need both, demand for talent, as well as the ability & justification to pay for it.

Engineers are in high demand all over the world. But most companies do not profit enough from technology to justify similar paying SV salaries.

pmiller2 · on Jan 14, 2021

Not always. Frequently, the connection between code that’s written today and revenue tomorrow is tenuous and difficult to package in a way that says “look at me! I’m valuable!”

And, then there are those somewhat rare occasions where a project is not intended to increase revenue, and may even decrease it. At my last employer, we guesstimated that a project I worked on for months could possibly have ended up costing us $2M per year in revenue. That was both accepted and expected, because we were doing it to gain goodwill with users, but in such a way that it might end up pissing off a small minority of our customers.

I really wish, just once, I could work on a project and put underneath it on my resume “Increased revenue by X%,” because I’ve never worked on anything that was so easy to directly trace back to the top line.

Cost savings are another story, because engineers can fairly easily quantify how much less money is being spent by doing $THING a bit more efficiently....

mushbino · on Jan 15, 2021

My partner leads engineering talent programs for a large SV company and I can assure you they do not track value added by engineers. In the overwhelming majority of cases the value added cannot be tracked. Now, if you're talking Product Managers, that's a bit of a different story. It's simply a supply and demand issue.

dzonga · on Jan 15, 2021

tell that to Mr Nugget from The Wire. How much you get paid is a reflection on the CEO | culture not on the value you provide.

LeifCarrotson · on Jan 14, 2021

I really like your magician/blacksmith analogy.

I'm in industrial automation, but it's much the same. Projects where someone developed a strategy but has never been involved in the details of a machine are doomed to failure (or at best to be unreliable and producing low quality parts). Projects built by machine fabricators are over-engineered, frequently late, and sometimes unprofitable, but damn if they don't work well.

The main trouble, I think, is that when a shiny new contraption is brought to the king, it's too often the magicians doing the talking - whether they're speaking words of power or Common, their job is to talk. Meanwhile, the blacksmith is probably busy at in his workshop some ornate scroll work for the next thing, or repairing the previous gizmo, because he'd rather be hammering away at his anvil than talking.

The higher you go in an org chart, the fewer the number of people who understand the work their company actually does, and the more voices you have between the workers and the decision-makers to take some of the credit for work as it passes up the chain.

hef19898 · on Jan 14, 2021

That seems to be true in every field I can think of. The smaller the gap, or rather the more practical experience the strategy people have, the better a given org seems to be.

One common issue I run into is that when the blacksmiths start talking, nobody listens.

Rury · on Jan 15, 2021

The funny thing is, even when it's unintentional, people seem to attribute credit to the magician rather than the blacksmith. At my workplace, I even have situations where I explicitly tell people: "I am familiar with X and how (some) of it works, but not all of it. I did not create it nor was it even my idea; all credit goes to Jim. If you need anything to do with X, you're best off asking Jim. But if Jim's too busy to help, I may be able to provide some minor help.

Yet even when I do this, I somehow become the arbiter and authority for all problems and questions on X. 5 years go by and everyone thinks X was all my genius. And I hate it, because personally I do not like X created by Jim - even if everyone else does...

musingsole · on Jan 15, 2021

Then make sure Jim knows and feels your appreciation. Maybe Jim doesn't want the attention or questions (or at least doesn't value the prestige/benefits at the cost of the attention it requires). Many times the person who will churn out work in a dark, unappreciated corner will continue to do so if 1 person who they respect shows them enough respect in kind. They know the world they're in. No one expects plumbers to get a free lunch even while many know they should.

hntrader · on Jan 14, 2021

To add, quants that can't do the data engineering work are always crappy quants. I haven't seen a counter-example to that. Profitable models aren't going to be delivered on a silver platter. They need to be able to process pretty low level data effectively and build ad-hoc custom tools and data pipelines around that to test out their ideas. Otherwise they're constrained to the tools others have built and that massively narrows the search space that they're capable of traversing.

The best quants are 1/3 statistician, 1/3 developer and 1/3 trader, in my view.

twic · on Jan 14, 2021

I'm not sure about crappy quants. Some people of the "quantitatively inclined trader who has learned Python" variety are never going to be good at the engineering side - it takes years to learn to be a good software engineer, and that's not a good use of time, for them, or for their employer. But they can still do useful work.

The trick is to figure out how to work effectively with those people. Build infrastructure that keeps them on the rails, refactor their code, push them in the right direction, tell them when they've fucked up, teach them little things with high leverage. As long as that doesn't turn into being their slave, that's fine.

proverbialbunny · on Jan 14, 2021

If they're using a dynamically typed language to do monetary calculations, it's not going to be ideal.

Researchers do not need to have deep programming experience, but they have to be comfortable enough to use an environment that can lend itself itself to the problem at hand. On the quant side, unlike on the data science side, the barrier of entry on the programming side is a bit higher. To solve this problem many firms have their own internal programming language.

oivey · on Jan 14, 2021

This is dogmatism swung too far in the other direction, IMO. There are many, many successful production code bases written in dynamic languages. In my own experience as a vision scientist/engineer, there is tremendous value in being able to quickly whip up a concept in Python and then being able to easily visualize the results. Doing this exploration in C++ is wasteful. Implementation takes much longer, the correctness brought by static typing is dubious since the code isn’t in prod, and the canned CV/visualization libraries are fewer and frequently suck in at least some way. That said, there is also tremendous value in understanding how to map your Python prototype into production code, too. Someone strong in this field can do both.

proverbialbunny · on Jan 14, 2021

This was addressed in the previous comment

>On the quant side, *unlike on the data science side*,

Vision scientist is on the data science side. You're not dealing with monetary values where floating point error compounds on itself to the point your models become garbage. Quant work is it's own unique field with its own unique prerequisites.

oivey · on Jan 14, 2021

Nothing precludes you from doing integer arithmetic in a dynamic language.

I’m not a quant and this isn’t my area of expertise, but, for example, I’m pretty sure various differential equation solving methods depend on variables taking on continuous values, so floating point basically must be used. Understanding the impact of that is definitely very important. Analogously, I frequently run into numerical precision issues in image processing. Understanding how numbers are represented on a computer isn’t unique to being a quant. Understanding how the choice of representation can impact prod is also not unique to being a quant. The dynamicness of the language isn’t particularly relevant, either.

proverbialbunny · on Jan 14, 2021

>Nothing precludes you from doing integer arithmetic in a dynamic language.

You would be surprised. The second you use pandas with a custom data type (let alone any other library you'd want to use) it can randomly auto convert it to a float. Furthermore identifying when it randomly converts the type on you is a pain.

>so floating point basically must be used.

Quants tend to use fixed precision types. It is like a float in every way, except base 10 instead of base 2 so there is no floating point error.

dragonwriter · on Jan 18, 2021

> The second you use pandas with a custom data type

That's a pandas (and maybe numpy) issue, not a dynamic language issue. (If you want to generalize from the specific libraries more accurately than “dynamic language”, it's “using a low-level library whose type system doesn't match the host language type system” issue.

> Quants tend to use fixed precision types. It is like a float in every way, except base 10 instead of base 2 so there is no floating point error.

No, a type that is like binary floating point in every way except base 10 instead of base 2 would be decimal floating point, not fixed point. Decimal fixed point is different from binary floating point in more ways than base.

hntrader · on Jan 14, 2021

Quants don't care about floating point precision in research. It's just applied stats

proverbialbunny · on Jan 15, 2021

I do, because the results from my research varies when I'm validating the model.

greenshackle2 · on Jan 14, 2021

> If they're using a dynamically typed language to do monetary calculations, it's not going to be ideal.

And yet, Q.

singhrac · on Jan 14, 2021

> If they're using a dynamically typed language to do monetary calculations, it's not going to be ideal.

I think this is an inaccurate take. No one in finance is doing accounting or model estimation using Python's floats; they are using numpy's float32 (or float64) type instead. I think a more accurate version of what you're saying is that static type checking is useful when modeling complicated contracts; this might be true, but I think it's not that important, as those things aren't that liquid anyway.

Jane Street's decision to use OCaml is almost as much about hiring and history as it is about language features.

twic · on Jan 14, 2021

> No one in finance is doing accounting or model estimation using Python's floats

We are. When your input data only has five significant figures, and probably less than that of real information, numerical accuracy is the least of your worries.

aldanor · on Jan 14, 2021

Or, they're using ints instead, at least for market data.

proverbialbunny · on Jan 14, 2021

Fixed precision types technically. Internally they are an int under the hood, so yah basically that.

chadash · on Jan 14, 2021

> "To solve this problem many firms have their own internal programming language."

Any examples other than Jane Street?

anonymousDan · on Jan 14, 2021

Goldman Sachs (Slang)

Karrot_Kream · on Jan 14, 2021

> 1/3 statistician, 1/3 developer and 1/3 trader

How is being a trader different from being a statistician? Curious as I've never worked in finance before.

hntrader · on Jan 14, 2021

By trader, I mean domain knowledge about the markets. Statistics is the toolbox that this domain expert uses to test their hypotheses and turn them into a profitable model. But if the person isn't a domain expert and only knows statistics, their ideas about what to test won't be good.

wpietri · on Jan 14, 2021

And to knowledge, I'd add disposition. It's been years since I've been in finance, but the best traders I worked with were all very driven to succeed, to dominate, to win. Markets were really interesting to me, but I never cared much about that part.

hntrader · on Jan 14, 2021

Yeah, it's a performance discipline like any other (competitive gaming, athletics, etc) where only the top few % can succeed. If someone isn't very driven then they won't make it.

red_hare · on Jan 15, 2021

This is almost precisely the thought process behind how my company hires data scientists who build user-facing analysis.

1/3 statistician 1/3 engineer 1/3 product person who can learn the user's domain-specific needs

1vuio0pswjnm7 · on Jan 14, 2021

What must be communicated to management: It is easy to find other magicians. It is not easy to find another blacksmith. Without the right blacksmith, there can be no magic.

Magicians will be magicians, always hustling (bullshitting), but they will never have the value and job security of the blacksmith. The blacksmith can see the fruits of her own labour, whilst the magician must lie to herself and others in order to claim the blacksmith's value as her own.

If the blacksmith is good enough, she will earn the trust of management and management may consult the blacksmith in the selection of magicians. Management may ask the blacksmith to interview magicians and seek her advice on the final hiring decision.

The blacksmith may not carry the "prestige" of the hustling, bullshitting magician but she can command a high salary and dictate her own working conditions. This is only if management understands her value. What the magician thinks of the blacksmith is irrelevant.

Reliable blacksmiths are hard to find. Magicians are a dime-a-dozen.

legerdemain · on Jan 14, 2021

  > It is easy to find other magicians. It is not easy to find another blacksmith. Without the right blacksmith, there can be no magic.

What? That runs counter to my experience at every company where I've either seen data engineers or worked as one. My observations of how management treats the two groups is this:

Data engineers ("blacksmiths"): Blacksmiths are paid less. People think of them as less highly educated. Their work is less creative. When they are successful, their work is mostly invisible. They are interchangeable. People think of what blacksmiths do as more like scripting than writing code. Blacksmiths mostly work on configuring systems they didn't build. Blacksmiths do more troubleshooting than building. Their roles are focused on support.

Data scientists ("magicians"): Magicians are paid more. Much more. People think of them as more highly educated. By definition, what they do is magic. They work on prominent projects. Their successes are highly visible. They build large systems that only they can comprehend. They use support staff to clear away mundane obstacles so they can focus on unique, highly creative aspects of work.

Saying that we need more data engineers than data scientists is like saying that we need more janitors than CEOs. That's true, but it's true because we made it true by structuring projects around one prominent, well-paid person supported by a staff of invisible drudges.

This smacks of the positive self-talk that QA and software testers used to give each other: "We are indispensable! We take pride in our craft! Nothing can ship without our signoff!" And then lots of companies reduced their QA or eliminated it wholesale by focusing on continuous delivery and changing consumer expectations of what "broken" or "acceptable" means. The same fate awaits data engineers.

sgt101 · on Jan 14, 2021

Good Data Engineering is what enables good Data Science. With a good infrastructure you can go 100 times faster. Getting rid of Data Engineers means killing Data Science.

legerdemain · on Jan 14, 2021

Sure. So in a few years, an all-in-one 80% solution like Palantir Foundry will come along and there will suddenly be a lot less demand for data engineers.

Anecdotally, the former head of QA for Palantir UK is now the head of data engineering for Palantir UK, and Palantir does have an out-of-the-box, end-to-end, it-just-works product that handles 80% of ML workflows. You're betting your career that they won't put it in a box and sell it at commodity software prices?

oivey · on Jan 15, 2021

If something like that becomes mainstream, data scientists who just glue together canned ML algorithms from libraries should also be concerned. The kind of automation that enables eliding finding and cleaning the right data probably can also manage running PCA and basic ML.

sgt101 · on Jan 15, 2021

I remember the same claims for ISL Clementine before IBM bought it and turned it into SPSS modeler.

lumost · on Jan 14, 2021

The way the evolution in software went, platforms became more capable and allowed individuals to automate more common tasks. QA/DevOps/SRE teams were consolidated and replaced with smaller platform teams which empowered internal engineers to quickly write scalable and well tested services. If data management, instrumentation, and ML tooling become sufficient then perhaps the data engineers will be replaced by a science platform team.

Caveat is that many scientists are expected to publish novel research to advance their career. "Infrastructure" and "data management" do not tend to produce the kinds of sexy projects which are attractive to publish.

sgt101 · on Jan 15, 2021

I think this model (an integrated team) is what I see, there is a huge benefit in terms of short decision loops from having one team - but data engineering skills are really important in enabling it. Also if the people doing the data engineering are close to the data science then there's much more likelihood that they will produce effective solutions for the backend of the project.

lumost · on Jan 15, 2021

aye - this pattern often trends towards the roles merging over time. The counter example to the platform team approach from the software side is Software Engineers owning the full infra and ops that their services need.

There's a big incentive for companies that need to hire more people to give the "better" title out for the same kind of work. If managers do maintain a gold/silver role on their team all of the folks in the silver role will look at the gold role as their next move. Worse there is a net-negative productivity drag where the gold/silver role constantly debate what's in-scope vs. out of scope for their work.

I once saw a team where the scientists were meant to be equivalent to SDEs in coding skill, but the scientists could only in practice do some light python/bash scripting. They tried to make the SDEs responsible for "productionalizing" the projects which meant adding tests/etc. The engineers who could all left the team in 6 months, the ones who remained were also unable to perform more than light bash scripting/python work.

bumby · on Jan 14, 2021

>reduced their QA or eliminated it wholesale

This is usually the case only for companies that work on low risk applications (I.e. not safety related or critical industries) or have been lulled into complacency (sometimes, ironically, “we haven’t had a major issue so obviously QA isn’t needed” when strong QA is precisely why they didn’t see issues)

legerdemain · on Jan 14, 2021

  > low risk applications (I.e. not safety related or critical industries)

My anecdotal experience is with Palantir (software used in war zones). Between 2016-2018, they eliminated most of the 150-200 QA people they had. Testing is now done by devs and users with the help of sound CD principles like blue/green deployments.

bumby · on Jan 14, 2021

That’s fair, because there’s process elements in place to mitigate risk. I read your comment to say “get rid of QA” not “get rid of the separate QA team”. My comment was more pointed to those who feel testing is inherently wasteful. In your example, the testing and configuration is tightly controlled to mitigate the same risk as a QA than (although there may be something to be said about the best practice of having the QA team be independent). Where I get nervous is when testing gets cut in hazardous systems because of cost or schedule. I personally wouldn’t want to get on a plane or autonomous car built like that. My own personal anecdotal experience is that organizations that were cavalier about QA on safety critical hardware/software inevitably had their comeuppance

Rury · on Jan 14, 2021

>Without the right blacksmith, there can be no magic. Magicians will be magicians, always hustling (bullshitting)

>management may consult the blacksmith in the selection of magicians

I mean when you put it like that, why hire a magician (bullshitter), if the magic relies on the blacksmith?

And, if management needs to consult someone (a blacksmith) on hiring (another blacksmith or for whatever reason a magician), then arguably management is made of magicians.

Don't get me wrong, I agree with the point your making. It's just the problem is with BS and BS is rampant, or like you say: a dime-a-dozen.

red_hare · on Jan 15, 2021

> I mean when you put it like that, why hire a magician (bullshitter), if the magic relies on the blacksmith?

Because magicians are better at the smoke and mirrors that drives funding rounds and closing big sales deals.

schanq · on Jan 15, 2021

I see this same attitude about TDD adoption - teams in my company say things like “testing is for lackeys / that work is beneath us”, I.e. they see that as the responsibility of QA testers who are less important in their view. This is short sighted, arrogant and encourages similar problems with superiority complexes. TDD is still controversial in some circles, but engineers who have a deep understanding of both tests and implementation are far more valuable than those who only understand one side. Anyway, sorry for the somewhat off topic rant, but a lot of what you said resonated with me

collyw · on Jan 15, 2021

TDD is fine when you have a specification to work to. A lot of software development in the real world is quite "experimental". Requirements are poor so devs need to provide what is essentially a prototype and receive feedback until it is good enough.

schanq · on Jan 17, 2021

Agreed and that’s what spiking is for! Once you have an idea of the solution then you can implement properly with TDD, which generally results in a much better design than just going with the original spike. However, I find it bizarre that many engineers think unit testing is beneath them. I can understand that people find it difficult to stick to the test first discipline. But it’s crazy to me that you wouldn’t want to write tests at all.. I feel so much happier with the additional protection.

oivey · on Jan 14, 2021

I think this insight exists across a lot of fields. Basically, if you want to be a really excellent magician you also better be a decent blacksmith. More concretely in this case, if you’re unable to do the data “engineering” yourself then it will close a lot of doors for interesting and novel work on the “science” side. Beyond that, if the scientist’s job just involves gluing sklearn models together I think that job is more on the engineering side of things than the supposed scientist usually wants to admit.

lumost · on Jan 14, 2021

This problem only grows as the company scales and the science and engineering pieces are formally split along some role guideline.

Inevitably if you treat a job role as a support role, you'll attract weaker individuals into that role then you would get if it wasn't considered a support role. The problem with Science oriented teams is that all roles other than the science role morph into science support roles over time. The same pattern used to occur with Engineers and QA, or Engineers and ops.

leapis · on Jan 15, 2021

How do you achieve people like this? From my limited experience (college senior joining a hft firm shortly, so I've recently been in several quant finance SWE interview loops), firms seem to vastly downplay the financial aspects of the job for software engineers. Compounded on top of that, firms don't expect or encourage financial backgrounds for engineers (at least new grads)- the expectation is that whatever limited financial background we'll need to work will be given to us when it becomes necessary.

Is this because it's easier (obviously) to teach a quant engineering than it is to teach an engineer quant finance? Or rather because it's expected now that traders will become the bridge between researcher models and implementation, and engineers will simply provide the underlying infrastructure to power these implementations?

marcinzm · on Jan 14, 2021

As I see it you need people who have shallow knowledge of many areas and deep knowledge of one area. That lets you have a group of experts but ones that know enough about other areas of expertise to work with those other experts.

p5a0u9l · on Jan 15, 2021

this perception of classes within engineering is the greatest frustration of my career. People with a PhD or “scientist” in their title are not more valuable than engineers who end up being the ones to get things to work.

amznthrowaway5 · on Jan 15, 2021

The "scientists" are often far less valuable or have outright negative contribution.

Borlands · on Jan 16, 2021

From experience, the magician will take every chance to make this divide greater, and sell their expertise, rather than grow with (and help grow in domain) the blacksmith skills. You end up with magic: closed siloed knowledge. “How would the blacksmith ever understand magic?” was often something thrown around by magicians at meetings.

The (repetitive) blacksmith role is not an interesting one, digital revolution needs to come into place. Architects that build tools, self service systems are much more interesting.

inthewoods · on Jan 14, 2021

That's interesting - I just completed book on Jim Simon/Renaissance (The Man Solved The Market). One of their early advantages was having a person who was just focused on acquiring and cleaning data. I expect that advantage has largely gone away at this point due to wide availability of market data but I thought it was interesting in the context of this article.

curiousgal · on Jan 14, 2021

Same for CFM too, they have an entire team working on alternative data and they feed it to a modeling team.

zzzeek · on Jan 14, 2021

maybe hedge funds would be able to find more people if they didn't only hire "guys".

collyw · on Jan 15, 2021

Can you show me a job ad where it is specific to guys?

kyawzazaw · on Jan 15, 2021

How does compensation tends to differ? And the education levels?

AndrewKemendo · on Jan 14, 2021

Preach!

The data lifecycle is waaay overpopulated with Data Scientists who are not empowered or knowledgeable enough to work with product designers and engineers to do everything that empowers Data Science and ML.

We need more Data Engineers involved at time zero in projects to help:

1. Plan out what data should be produced/captured by the product

2. Instrument systems to actually generate data consistently and effectively

3. Build ETL pipelines and data management systems

4. Manage enterprise data sharing and resiliency

etc...

What ends up happening is you have a bunch of Data Scientists just handed a pg_dump or flat file from some ops team. That is typically missing data or poorly formatted and they spend 90% of their time cleaning it up then running some basic regression with numpy or whatever.

Need better understanding of the data lifecycle by organizations and investment in instrumentation and data management.

mynameisash · on Jan 14, 2021

> What ends up happening is you have a bunch of Data Scientists just handed a pg_dump or flat file from some ops team

Not to disparage the amazing data scientists I've worked with, but I've been on teams where this is very much the approach to operationalizing models. It's basically, "Here's the sklearn model and some fragile featurization scripts we built. Can you take this to prod ASAP?"

The problem I've seen is that DS & DE teams were in different parts of the org and had their own sprints that were in no way connected. So they kept chucking models over the wall and we kept trying to faithfully operationalize. Once we convinced leadership that we had to collaborate from the get-go, things went a whole lot better. It also improved the working relationship of engineers and scientists.

I learned a hell of a lot from the scientists; they learned how to write better code. They also learned what code they didn't need to write because I could do it faster or better than them, leaving them to focus on more important things. It was pretty amazing to find what manual processes they would setup in lieu of proper (or even any) engineering support. Again, these are amazingly smart people, but they were being square-pegged into a lot of round-hole engineering tasks.

Now, the much more frustrating issue I had was being in a very data-heavy organization and being told by a distinguished engineer (my skip-level) plus my direct manager that, "data engineering isn't a real discipline." I left that org very shortly thereafter.

teraku · on Jan 15, 2021

Not only that. If you have DS & DE in different orgs, often DE is in an IT org that also has to support legacy systems that sometimes become very time intensive. So then the DS org says they can't get stuff done because DE does not deliver, and DE can't get things done because the "elder engineers" in their org are not allowing reformation. So they are stuck between a rock and a hard place.

civilized · on Jan 14, 2021

This is 100% my experience as a data scientist. The engineering support we get is restricted to submitting a ticket for database access or moving data from one system to another. Wouldn't dream of involving an engineer in a data science project team, because I have no evidence that they have any experience or expertise in anything other than tickets to move data around.

Mauricebranagh · on Jan 14, 2021

That's first line support not engineering

killtimeatwork · on Jan 15, 2021

>> moving data from one system to another > That's first line support not engineering

Assuming the OP meant "setting up a pipeline for moving data from one system to another" and not a one-time copy, it is definitely engineering.

civilized · on Jan 15, 2021

Yeah, it's usually a pipeline

mssundaram · on Jan 14, 2021

Yeah, because moving data around (which is hardly the entire responsibility of data engineer) is not useful at all

fatnoah · on Jan 14, 2021

>The data lifecycle is waaay overpopulated with Data Scientists who are not empowered or knowledgeable enough to work with product designers and engineers to do everything that empowers Data Science and ML.

Reading this thread has made me realize just how lucky I am to work very closely with strong a very strong Data Scientist, who is complemented by a very strong Data Engineer. Conversations with the Data Scientist are always about strategy, product alignment, and ensuring we're optimizing what we build for learning. The Data Engineer works very closely to ensure we're actually capturing the data we think we are, getting it to analysis systems, and making sure those data pipelines stay healthy.

EdwardDiego · on Jan 15, 2021

We have a sister company with many data scientists, and very few (actually I don't think that they ever hired any with the specific title) data engineers.

And, their production alleged "machine learning" (it's pretty much standard linear regression, but calling it ML is sexy) systems are slow motion train-wrecks. If the string and duct tape holds, then it works, but it's unfortunately continually breaking.

Hell, in Slack, I watch their data scientists continuously wrestle with how to actually make their Jupyter notebooks work in production.

Whereas my company has far more data engineers than data scientists. The plan from higher up the corporate food chain was always that we'd give them our data to do their data science voodoo on, but we ended up getting a few data scientists of our own for specific projects.

So, we focused on ensuring our data stream was reliable, consistent and sufficiently timely, for them to work on. But as soon as it hits their systems, it's a forest fire of hacks upon hacks, which inevitably break.

In the end, we had to send in our data engineers to stabilise their flagship "real time reporting" product that corporate was so amped about.

So yeah, I think that there's probably a happy ratio of data scientists to data engineers, of about, say, 1:5 or 1:10, because the maths generally scales O(1), it's the beauty of maths, but the actual engineering to get clean data delivered timely without breaking anything scales very differently indeed.

Jugurtha · on Jan 15, 2021

>Hell, in Slack, I watch their data scientists continuously wrestle with how to actually make their Jupyter notebooks work in production.

Could you go into more details on what their struggles are? We had many problems as a company doing machine learning projects, and we built our internal platform (https://iko.ai) to keep our sanity. I'm always interested in problems others may be having.

EdwardDiego · on Jan 16, 2021

Python hell, basically. Getting the right Python version, right dependency versions etc.

Jugurtha · on Jan 16, 2021

We didn't have it any better wit Scala. We'd run sbt and it would download the internet.

Our belief was that there are some odd behaviors in every tool and we had to figure out a way around.

bitcharmer · on Jan 14, 2021

This same sentiment (which I personally agree with) applies to software engineering. As in: engineers deliver more practical value than comp scientists. Now you can down-vote me to oblivion.

jimbokun · on Jan 14, 2021

I think generally, Computer Science is a degree and Software Engineer is a job description. So many people get Computer Science degrees, then have a career as a Software Engineer.

Yes, there are Software Engineering degrees. But I think a minority of Software Engineers have a Software Engineering degree.

What this means in practice, is that Computer Science majors need to learn the engineering skills on the job or on their own after they graduate. Although some programs help students pick up some of those skills as part of the degree program.

analog31 · on Jan 14, 2021

Indeed, I think it's possible that the majority of people with degrees, have jobs that are not identical to their college major.

Certainly, most of us with unemployable majors. ;-)

Another phenomenon is employers applying the "engineer" title to any technical worker, such as designers, programmers, technicians, and so forth.

Part of it is that businesses evolve towards having caste systems. When this happens, then folks in the lower castes will try to rearrange their job titles to resemble the upper castes, or change jobs.

Fellshard · on Jan 14, 2021

Anecdotal: University of Washington considers (considered?) them two separate degrees, holding CS as more theory and research-driven, and CE as more practice and career-driven.

TheCoelacanth · on Jan 14, 2021

I think it would apply if companies were hiring large number of computer scientists and using them to try to build usable software. I don't see many making that mistake. Most recognize that computer scientists belong in a research or academic setting.

spacemanmatt · on Jan 14, 2021

Is this sentiment perhaps due to someone "practicing CS" on your engineering schedule? What's the real harm you're describing?

vbtemp · on Jan 14, 2021

You and me both, friend. Except I normally get down-voted to oblivion for saying the opposite.

adamisom · on Jan 14, 2021

I guess depends on what valuable means. I imagine most comp scientists are less replaceable than most software engineers, so point for compsci.

johnqpub · on Jan 14, 2021

The two are complimentary. Engineers can't do anything without the fundamental insights scientists provide. But scientists don't have the practical experience of writing end products that real users use.

Obviously this is a huge generalization but I think it's a useful way to think about it. And when I say scientist, I mean "Professor of CS" not "24 year old with a BS in CS".

collyw · on Jan 14, 2021

Depends if you have a computer scientist doing a software engineers job

alexfromapex · on Jan 14, 2021

I agree, if anything the data engineers (folks with engineering backgrounds) should be doing the applied work while a department of data scientists works on the theoretical or novel data analysis methods.

Right now our product has accumulated a lot of technical debt on the data validation side because data scientists designed the test code in a way that dramatically slows the development process.

listenallyall · on Jan 14, 2021

> novel data analysis methods

Many "data scientists" (not all, but many) have little to no ability to do anything other than apply "recipes" of algorithms or classification methods or logistic regressions, etc. Asking them to develop a "novel" method would be fruitless. Asking them to clean and scrub the source data set is like telling an amateur pie-baker the store was out of pie crusts, you'll have to make your own from scratch -- it's not going to happen, they just don't have that skill, the instructions on the box don't account for that possibility. As soon as the task diverges from the simple step 1, step 2, step 3 that they were originally taught, you realize they have very little ability to adapt. YMMV of course.

tchalla · on Jan 14, 2021

> Many "data scientists" (not all, but many) have little to no ability to do anything other than apply "recipes" of algorithms or classification methods or logistic regressions, etc.

This is because they rarely hire people with scientific thinking ability. They just hire people who can code and program from set recipes. Once you hire such people you can not expect them to do non-recipe work. If you don't want recipe work, don't hire people will recipe skills. Do not have job interviews that select for recipe people. But, that is exactly what most companies do.

simo7 · on Jan 14, 2021

Precisely.

oivey · on Jan 14, 2021

Yep. The key is really software skills. If you’re unable to even filter the data yourself, you’re also probably unlikely to be able do implement novel analysis techniques, especially if the analysis algorithm has many complicated steps or is computationally expensive.

In all fairness, it’s basically impossible for a new grad to have those skills. 4 years of a bachelors in any field isn’t enough to cover such a wide area. Even for people with graduate degrees it’s a stretch.

jimbokun · on Jan 14, 2021

The hope is that the 4 year degree gave you the ability to quickly pick up those skills on your own.

If your four year degree didn't give you the ability to learn and expand your knowledge on your own, its a colossal waste of your time and money.

oivey · on Jan 14, 2021

Sure, but depending on what you’re doing, “quick” might be years. You can get a PhD in understanding the theory, a PhD in designing fast numerical algorithms, or spend many years becoming a strong software engineer. I think the willingness to learn a diverse set of things is much more important than learning narrow areas fast. The short length of a bachelors usually isn’t enough to get this diversity.

superhuzza · on Jan 14, 2021

>Asking them to clean and scrub the source data set...it's not going to happen, they just don't have that skill

I think you've been working with conmen/conwomen. I've never seen a data science project that doesn't involve data cleaning or wrangling of some sort.

listenallyall · on Jan 14, 2021

Have you read through the comment thread? Did you read the article? Most everyone is in agreement that projects require a lot of cleaning & wrangling and a lot more -- the point is that data scientists are generally not doing that stuff, they expect academic-quality, pre-processed, pristine data, so it's data engineers who are stuck preparing the data, and who are in high demand.

superhuzza · on Jan 14, 2021

Yes I read both the article and the comments.

I meant a data science project in terms of a project completed by data scientists. In my experience, all data scientists are accustomed to doing extensive cleaning etc.

watermanio · on Jan 14, 2021

This feels especially true when you have access to things like BigQuery ML.

It's very easy for an average engineer (like me) to start using ML using these tools, but a lot harder to explain how it works, or exactly which type of models to use.

In my mind a DS would be really useful to just point us in the right direction and check work. Like a super specialist QA...

bigger_cheese · on Jan 15, 2021

>The data lifecycle is waaay overpopulated with Data Scientists who are not empowered or knowledgeable enough to work with product designers and engineers to do everything that empowers Data Science and ML.

This matches my observations as well. I'm an engineer (The non software kind) at an Industrial plant, I have noticed similar in my involvement with data scientists.

I think in a lot of cases it needs to be acknowledged that data scientists are not domain subject matter experts. Very often the data scientists we have worked with lack knowledge I take for granted as an engineer such as knowledge of basic chemistry, physics etc. I can sanity check plant data almost instantly. For example I will know if a material reacts in an endothermic or exothermic manner and can verify that its effect on a temperature prediction model make sense.

As a result I often feel like Data Scientists are not empowered to bring their full expertise to bear, they don't understand our process fully and lack a lot "engineering" knowledge to make value added inferences about what their models are demonstrating. Often they can deliver a model and show that a particular term is significant but they have a very shallow understanding of what the term actually represents and can't provide concrete recommendations as to how we could modify our plant to benefit from what their model is demonstrating.

Sometimes I feel like we need an additional translator sitting between who can speak both "Data science" and "Engineer" I don't think this is quite what "Data Engineer" as suggested by parent article is but possibly the role could be expanded to incorporate this.

aqme28 · on Jan 14, 2021

> you have a bunch of Data Scientists just handed a pg_dump or flat file from some ops team.

I feel seen. At a previous job, our output after some cleaning and transforming was a pg_dump for the data scientists to load. We had little visibility of what they did to that database once they got it.

nitrogen · on Jan 14, 2021

I suspect in rare cases this is by design, because engineers would object to the behavior of the Business Intelligence department on ethical grounds.

aqme28 · on Jan 14, 2021

If anything, we would have objected to the quality of the code they were writing.

hinkley · on Jan 14, 2021

This is a systemic problem. We ask non software engineers to write code, and then we expect them to apply a level of robustness and long term planning that even we have difficulty achieving. Not because we're being picky, but because we know the failure modes that are likely, and we know that people convince themselves that they aren't.

We've been through this with installer writers, database admins, test automation, operations people, and now 'devops' people who were supposed to be the answer to these problems. It never stops.

AndrewKemendo · on Jan 14, 2021

It's a two way street. SWE need to learn some data practices and data folks need to learn some SWE practices.

hinkley · on Jan 14, 2021

Oh absolutely. We'll build completely the wrong thing, but build it well (which just makes it all the harder to throw it away).

ipiz0618 · on Jan 15, 2021

As a Data Scientist, I do everything you mentioned because I came from a SWE background. I think Data Scientists, even if they are only interesting in the "fun" part of the job, should know how and why data are captured their way, so that they understand which models are better suited, which saves a lot of time.

mgh2 · on Jan 14, 2021

Who will do the proper cleaning then?

alex_anglin · on Jan 14, 2021

The aspiration that GP was getting to was that less cleaning is required as a result of better data engineering, I believe.

AndrewKemendo · on Jan 14, 2021

Correct. If you build your instrumentation correctly, then you don't really need to do any "cleaning."

Doesn't mean you might not need to do transformation for different uses but ideally wouldn't need to, for example change data types like turning a bool into an int.

darksaints · on Jan 14, 2021

The problem is that data engineers that are geared towards analytics very very rarely control the systems that create the data. If you're lucky, you have the task of hounding a team within your company to get their data management practices in order. And the conversation there is whether they should make their job harder in order to make your job easier.

Unfortunately, data engineers rarely deal with purely in-house data. You're gonna be pulling data from a variety of data sources. I can assure you that if you're pulling from government data sources, you're gonna have a hell of a time. Speaking from direct experience, my team is probably going to spend $10M/year just trying to keep a government dataset in order, because they won't do it themselves. I'm talking lawyers, legal analysts, data engineers, data scientists, data entry personnel, etc.. just to fix data that should have never been broken in the first place.

It shouldn't be a shock that cleaning the data is the path of least resistance for many.

AndrewKemendo · on Jan 14, 2021

Hence why I said DE need to be involved as early as possible. Aspirational sure, but that's what I've seen work the best and repeatably. It's the only scalable solution IMO otherwise you're perpetually playing catch-up.

On the point about the govt I literally built a completely new contract type and civilian hiring practices for the DoD to bring in Data Engineers so they could do exactly what I describe to make your life easier.

mgh2 · on Jan 14, 2021

Do data engineers have good analysis skills? Do business analysts have good engineering skills? I don't think either of them can fill the data scientist role.

The scientific training and mindset (scientific method, hypothesis, experiment setup, etc.) to even create an accurate model is an undervalued skill here no? Even if data cleaning is automated, these skills cannot be easily learned.

There is a reason why so many PhDs get into the field, because they were trained in the exploratory/research mindset that no engineering or analytics skills can fill. Correct me if I am wrong.

dijksterhuis · on Jan 14, 2021

> Do data engineers have good analysis skills?

Yes.

> Do business analysts have good engineering skills?

Depends on the analyst.

> I don't think either of them can fill the data scientist role.

> The scientific training and mindset (scientific method, hypothesis, experiment setup, etc.) to even create an accurate model is an undervalued skill here no? Even if data cleaning is automated, these skills cannot be easily learned.

It's not about replacing data scientists with data engineers, it's about both roles working together to make everything more efficient.

The hiring rate for data scientists has plateaued. The industry doesn't need any more of them. Why? Because data scientists often can't solve problems fast enough. It's a commonly quoted statistic that 70% of any data science task is data cleansing and/or etl. A data engineer's job is to take that 70% and turn it into 10%. The data engineer saves the data scientist time, meaning they can focus on what they're supposed to do -- build models.

dpayonk · on Jan 14, 2021

If we only had to use 1st party data, that might be easier. But then again, if you’re building your product incrementally, you’re still going to have instrumentation holes that you may or may not be able to partially backfill.

nerdponx · on Jan 14, 2021

It doesn't matter, as long as you don't make the person with the PhD in biostatistics spend their time writing ETL pipelines, which is a wildly inefficient use of a very expensive resource.

names_are_hard · on Jan 14, 2021

Do people with PhDs in biostatistics earn significantly more than programmers? I honestly know nothing about the market for biostatisticians, but my impression was that advanced degrees in the natural sciences don't really pay that well compared to software engineers, especially given that they're much more educated.

aldanor · on Jan 14, 2021

If they work e.g. in a hedge fund / trading firm, then - yea. And you see lots of PhDs from unrelated fields working as quants there.

Enginerrrd · on Jan 14, 2021

Not to worry, corporate will just outsource to firm which hires Data Janitors

dumb1224 · on Jan 14, 2021

In specific research areas such as biomedical science it is certainly tricky to get involved because of the data governance / confidentiality issue... so we have to do both roles to some extent

antipaul · on Jan 14, 2021

I have a dream - and it looks like this!

C4stor · on Jan 14, 2021

I can't recommend the Data Engineer career enough for junior developers. It's how I started and what I pursued for 6 years (and I would love doing it again), and I feel like it gave me such an incredible foundation for future roles :

- Actually big data (so, not something you could grep...) will trigger your code in every possible way. You quickly learn that with trillions of input, the probabily to reach a bug is either 0% or 100%. In turn, you quickly learn to write good tests.

- You will learn distributed processing at a macro level, which in turn enlighten your thinking at a micro level. For example, even though the order of magnitudes are different, hitting data over network versus on disk is very much like hitting data on disk versus in cache. Except that when the difference ends up being in hours or days, you become much more sensible to that, so it's good training for your thoughts.

- Data engineering is full of product decisions. What's often called data "cleaning" is in fact one of the import product decisions made in a company, and a data engineer will be consistently exposed to his company product, which I think makes for great personal development

- Data engineering is fascinating. In adtech for example, logs of where ads are displayed are an unfiltered window on the rest of humanity, for the better or the worse. But it definitely expands your views on what the "average" person actually does on its computer (spoiler : it's mainly watching porn...), and challenges quite a bit what you might think is "normal"

- You'll be plumbing technologies from all over the web, which might or might not be good news for you.

So yeah, data engineering is great ! It's not harder than other specialties for developers, but imo, it's one of the fun ones !

alexpetralia · on Jan 14, 2021

The other thing I'd emphasize here is dealing with "state". Data is effectively state.

As application engineers build increasingly "stateless" code (e.g. pure functions, serverless deployments, etc), that state gets pushed elsewhere. Someone has to manage the queues, file versions/locations, logs, databases, configurations and so on. That is all "data".

State management is a tricky problem even in a single-threaded application. It's doubly so in distributed systems, where state can be inconsistent between all the moving pieces. This is the source of endless data integrity issues. I think data engineering is a great way to get some exposure to all of this.

darksaints · on Jan 14, 2021

> As application engineers build increasingly "stateless" code (e.g. pure functions, serverless deployments, etc), that state gets pushed elsewhere.

Exactly. You can't magically make a stateful problem stateless, you can merely move that state around. Sometimes moving state around means moving it somewhere that is appropriate and capable of expertly handling that data. But if you make those choices wrong, it makes every aspect of your application more complex.

UI programming tried going down this idea of stateless programming, and for a while it was trendy to do so stuff like redux. The problem is that UIs are state machines. That's not an analogy, that is a literal statement. And it is true of all UI's...it's just as true of the transmission lever in your car as it is for your saas dashboard. You can't program stateless UIs...they would cease to be a UI. So at best, you can move that state around. And with most of these solutions (eg. redux), you end up pushing that state into a massive global singleton, where even simple things like the state of a single radio button needs to be fed through dozens of tightly coupled components in order to "statelessly" render. And even worse, you lose the extremely helpful distinction between UI state and domain state, mixing them both together into a gigantic shit stew.

lhomdee · on Jan 14, 2021

>The other thing I'd emphasize here is dealing with "state". Data is effectively state.

It gets even more complicated. It’s not just the current state that matters, but also the history (sometimes the entire history) up to that state.

pricci · on Jan 14, 2021

And where would you recommend someone to start a data engineering path. Any book, learning source?

Avalaxy · on Jan 14, 2021

The book "designing data-intensive applications" is really really good, and covers all the concepts (although not per sé the tools) you need to understand.

edmundsauto · on Jan 14, 2021

Long time DE here. I recommend trying to build your own data warehouse around something you're interested in. Don't worry about teh scaling - focus on the core engineering, taking data from different places, combining it into a sensible data model, update it automatically every day. Add in more sources.

It's shockingly difficult, and something that only experience can teach.

thecolorgreen · on Jan 14, 2021

I have the same question and I believe the answer is in the same vein as someone who asks about software engineering. Books/courses are great for the concepts, but your goal should be to build something ASAP since that's where actual learning will come from.

khaledh · on Jan 14, 2021

Work at a company that has a good data engineering discipline. Shopify is hiring: https://www.shopify.ca/careers/2021

Karrot_Kream · on Jan 14, 2021

A lot of these are just "garden variety" (distributed) systems problems. Dealing with systems with differing latency distributions, recovering from failure, acceptable tradeoffs between speed and accuracy, etc

theflyinghorse · on Jan 14, 2021

I wonder how: 1. one finds organizations that have data engineering 2. gets hired to said organization with software engineering background.

walleeee · on Jan 14, 2021

Nearly any field of computational science likely needs skilled data engineers. You could search for topics that interest you online and contact people accordingly.

I cold-emailed my current lab's P.I. and just asked for work. Search for "research software engineer" or "scientific computing professional" positions. Plenty of data engineering goes on in many fields (environmental science, climate modeling, high energy physics, physical chemistry, etc), and plenty of fields desperately need to develop an engineering culture (e.g., plant biology, my field), whatever interests you. Availability and compensation will vary by discipline.

khaledh · on Jan 14, 2021

Shopify is hiring 2,021 engineers (not just data engineers) in 2021: https://www.shopify.ca/careers/2021

secondcoming · on Jan 14, 2021

Indeed, adtech is a great place to work for anyone interesting in working with data. And yes, people working in adtech hate, and block, ads too.

kjerzyk · on Jan 20, 2021

Any recommendations on how to get started? Books, courses?

coding123 · on Jan 14, 2021

A couple of us inherited a machine learning project a while back. The code was horrible. Riddled with copy pasta (nearly half of the entire thing was copy paste and no code reuse). We basically refactored everything, standardized input and output file names. We put up a small Flask service to allow outside services hit it easily and wrapped it up in a Docker container so it was ultimately easy to deploy. Yes it was all the plumbing. However we also looked at the code, and the ML strategies, and while there was "some" level of competence, it was nothing more than word2vec add and divide. Totally horrible for actually finding key phrases that matter to the subject we're matching. So we started tackling that too with LSTM but our time got cut short and shifted off to another area. So not only was the "scientist" they hired completely crappy at the engineering, they weren't really helpful in the ML either.

This is obviously of lesser value to the topic at hand, and more about making sure you hire good people I think.

Avalaxy · on Jan 14, 2021

This is 100% my experience. I got hired as a ML engineer to bring a data scientists models into production. I did the same as you by tearing the whole thing apart and engineering it properly. I also look at the models, and oh boy... that data scientist had no idea what he was doing. Couldn't explain why he chose the model, didn't have any performance metrics (or even knew what metric to use to measure the performance) and just generally did not understand the basic concepts of his fields. I had to try really hard to drag answers out of him, but in the end I came out dissatisfied.

mlthoughts2018 · on Jan 14, 2021

I am curious what your take is on things like this article:

https://managingml.substack.com/p/the-myth-that-machine-lear...

It has been my experience too. Basically, ML / DS engineers are thrown under the bus for being poor general software engineers, but in practice it’s totally the opposite.

nerdponx · on Jan 14, 2021

The problem is that ML engineers are not the people who wrote GP's garbage code. Data scientists wrote it, and I know at least a few of my very intelligent, high-functioning data scientist colleagues who are alarmingly, astoundingly bad programmers.

coding123 · on Jan 15, 2021

For me it's just the one experience since I haven't had any other interactions with an ML / DS person since.

wpietri · on Jan 14, 2021

The phrase "If you can't dazzle them with brilliance, baffle them with bullshit" comes to mind.

mywittyname · on Jan 14, 2021

I always felt that tech-focused data scientists should also be required to know how process data end-to-end; at minimum, from a SQL database to deployed model, but knowing how to collect & clean data is important too. It seems like the industry is trying fill the gap that was created by a glut of people without math/cs backgrounds going into 5-week data science courses who then need hand-holding when they get real jobs.

Data science & engineering should be treated as a single collection of skill-sets. Lacking ETL experience is a major deficit, considering how prevalent that kind of work is.

This might just be my personal biases coming through. I consider myself a "full-stack" data scientist & engineer. But because data scientists who can work on the backends are rare, I always end up doing the plumbing while other people do the fun analysis work.

I think companies that are data "science" heavy are going to be at huge disadvantage soon. Tools like Rekognition and Google AI APIs are making the model training & deployment aspect almost trivial. At some point, the only real work involved in this space will be the data "engineering."

superbcarrot · on Jan 14, 2021

> Data science & engineering should be treated as a single collection of skill-sets.

This can be tough because there could be a lot in that skill set. You can't realistically expect someone to have solid knowledge of statistics including specialising in the sub-field and type of algorithms that your product needs, and also be able to write good code and act as a developer, and also have solid knowledge of all the tools for data streaming/processing/ETL. There is a point at which you're just stretching yourself too thin if you try to do all of these at once.

Of course, stuff like knowing how to interact with a database or employing good software development practices should be a very basic prerequisite and some scientists certainly shift things too far in the other direction and use their academic knowledge as an excuse to write poor code and not learn new tools.

I guess what I'm trying to say is that they are distinct skills but you still need all of them to some extent and striking the correct balance in one's skillset is really difficult.

mywittyname · on Jan 14, 2021

These are all skills taught in standard computer science programs. Granted, some are electives, like high-level stats. But even back in 2010, data science electives were available to fill the gaps. I took three DS&E classes in college with projects that were end-to-end platforms, where you'd have to collect, clean, and analyze the data, then build, test, and deploy models from it.

I would certainly hope that college courses are even more comprehensive after 10 years and an explosion in interest for the field.

Also, much like being a full stack developer, a full stack data engineer doesn't need to know everything at a master level. But that you can at least handle tasks at most points in the chain.

avs733 · on Jan 14, 2021

I teach engineers for a living. I struggle to see how this is not just a straw man argument based on colloquial usage of terms. It is just inferences drawn based on job ads that are rarely written by people doing the job and instead are effectively human-as-seo-optimized so the best candidates can find the job they hopefully fit for and not be too confused to apply for it.

hn_throwaway_99 · on Jan 14, 2021

It's not a straw man, I've seen it clear as day in several companies. When it comes to data science, it's "garbage in, garbage out". I've seen companies do lots of "data science" with a bunch of data scientists skilled in python and jupyter notebooks, only to discover a ton of work was useless because the incoming event data was tagged incorrectly due to a bug.

The actual process of collecting, aggregating, cleaning and verifying data is a hugely important skill, and not one I've really seen typical data scientists possess.

finnthehuman · on Jan 14, 2021

>The actual process of collecting, aggregating, cleaning and verifying data is a hugely important skill, and not one I've really seen typical data scientists possess.

Then they are not scientists. They have a label "scientist" but lack of rigor of actual science.

I don't see why changing the label to "engineer" would suddenly make them have rigor.

avs733 · on Jan 14, 2021

Right?!

This is sort of the meta failure of the argument. They are arguing that people's data skillsets are wrong. To make that argument they are analyzing based on the wrong variable in a data set.

avs733 · on Jan 14, 2021

I have experienced the same thing...but I just don't think it has anything to do with whether the positions are labeled data scientist or data engineer.

And I would warn you from my experience teaching statistics to undergraduate engineers...they are not going to be much better. Regularly get 'hey we have this data what test can we run?' 'what are you trying to show?' 'we don't care we just need to run a statistical test' conversations.

hn_throwaway_99 · on Jan 15, 2021

To be clear, I totally agree with you. I wasn't just arguing for changing labels, I was arguing that there is one set of "engineering" focused skills (e.g. building data pipelines, data warehouses, tagging events, etc.) and a different set of analysis skills (e.g. machine learning, statistical tests, etc.) and you shouldn't over-index on the latter without having enough of the former.

bonniemuffin · on Jan 14, 2021

I suspect this may actually be an issue of school vs real world rather than scientist vs engineer.

Data in the classroom setting is pristine and beautiful; data in the real world is messy and buggy. You have to get burned by buggy data a few times (or maybe a bunch of times) in the real world to learn to look for bad data smells -- I don't think schools effectively teach this kind of intuition, regardless of whether the students are training as data engineers or data scientists.

If data scientists are spending more time in school getting advanced degrees, they're not getting as much exposure to buggy data, whereas data engineers with a BS and a few years of industry experience would already have built up this skill.

avs733 · on Jan 14, 2021

>Data in the classroom setting is pristine and beautiful; data in the real world is messy and buggy.

I got to take over our department's undergraduate statistics course a few years back.

The first change I made was all homework, tests, and projects used real data set. I intentionally have them collect bad data (they don't know its bad before hand). First day of class we collect data using the board game operation...I give basic instructions and then halfway through ask everyone to stop and agree on how they are entering data for the variable of 'success or failure' of the surgery. Oops...

In my experience teaching the course, the reason the students (engineers) find statistical reasoning hard is:

* They have never been given anything 'broken', everything is curated to avoid things not working. The result is they think data has inherent meaning. A right answer.

* Their entire learning experience has been stripped of context and the need to make decisions with information. They can give me a p value but are terrified (not unable, just unwilling) to interpret it or give it meaning.

* They have never encountered the concept of variability...everything is presented as systems with exact inputs and outputs.

When I work with postdocs, I sometimes (less frequently) encounter many of the same challenges. Data is treated as sacred and external and inherent. It's wild to me.

jariel · on Jan 14, 2021

So I think that the delineation between the scientist working with the content, and the Engineers who actually provide the mechanics for it is very fair.

If there is a question mark here - it's really how much value are we deriving from all of these data people?

Where is all the ML that's changing our lives? Search, Alexa and TikTok, I can see it.

In the future obviously vision systems for autonomous cars etc..

But I'm really wary about the heavily decreasing marginal returns after that.

It will surely change the world, but I think in specific areas. Most of the entire field seems like an optimization on something rather than anything new.

Washing Machines feed up immense amount of labour and toil. Alexa telling me the weather is not.

serjester · on Jan 14, 2021

I used to work at a legacy automaker and you’d be shocked at how much ML has changed certain areas of the business. It used to take an entire department to sort warranty claims and it’s now mostly automated. Aluminum part defects are now spotted automatically on the plant floor. Don’t even get me started with telematics data.

Most software isn’t consumer facing but just because you don’t see it doesn’t mean it’s not changing things around you. ML tends to be overhyped but your assessment is too pessimistic.

mateo411 · on Jan 16, 2021

I wouldn't think of a system that would automate the processing of warranty claims as ML. That's mostly applying the policy/rules to each claim.

However, finding defects in aluminum parts that involves using computer vision, would absolutely be a ML solution.

serjester · on Jan 19, 2021

There's millions of claims and thousands of car parts with all sorts of underlying issues. Unfortunately a rule based approach isn't feasible.