My $DAYJOB has a very high match to this. In fact, I used to do data and analytics full-time for a long while (a lot of Spark, basically, mostly cutting through the usual lambda and K-architecture hype, bulding data lakes and <omg> "lakehouses") in several F500 companies.
I could not agree more with the overall sentiment that data science is overblown in terms of reproducible results because the people doing it just don't have an actual process or good leadership focus (which is not just a startup problem...).
So much so that I stayed stauncihily on the "data engineering" track because it was much more concrete in terms of technology, performance drivers, and business outcomes than the folk who sold pipedreams of magical AI models that would provide amazing analytics overnight.
Turns out that if you can't get at, scrub and actually _use_ the data, figuring out trends or training models doesn't happen, so I focused on making at least that 50% of the project happen and leave nice, tidy infrastructure, workflows and schemas for the data science folk to go through.
I also had the good fortune to work with some very organized, knowledgeable ML folk who actually understood how things worked, but some partners and customers had... incredibly disorganized "data scientists" that would leave stuff scattered all over the place (including private copies of datasets on their laptops when we had nice, secure remote sandboxes for them that even did data masking to avoid leaking sensitive data).
Personally, I blame a lot of this on lack of certifications or professional training that emphasises _process_. Otherwise it's exactly the same problem we've had for the past 20 years in BI departments: People doing their own Excel sheets because "SQL is hard" and nobody can do ETL properly.
(Full disclosure: I am an MS FTE, spent something like 10 years doing analytics almost full time, and have presented on how to do Data Science at scale a few times: https://carmo.io/talks)
I could not agree more with the overall sentiment that data science is overblown in terms of reproducible results because the people doing it just don't have an actual process or good leadership focus (which is not just a startup problem...).
So much so that I stayed stauncihily on the "data engineering" track because it was much more concrete in terms of technology, performance drivers, and business outcomes than the folk who sold pipedreams of magical AI models that would provide amazing analytics overnight.
Turns out that if you can't get at, scrub and actually _use_ the data, figuring out trends or training models doesn't happen, so I focused on making at least that 50% of the project happen and leave nice, tidy infrastructure, workflows and schemas for the data science folk to go through.
I also had the good fortune to work with some very organized, knowledgeable ML folk who actually understood how things worked, but some partners and customers had... incredibly disorganized "data scientists" that would leave stuff scattered all over the place (including private copies of datasets on their laptops when we had nice, secure remote sandboxes for them that even did data masking to avoid leaking sensitive data).
Personally, I blame a lot of this on lack of certifications or professional training that emphasises _process_. Otherwise it's exactly the same problem we've had for the past 20 years in BI departments: People doing their own Excel sheets because "SQL is hard" and nobody can do ETL properly.
(Full disclosure: I am an MS FTE, spent something like 10 years doing analytics almost full time, and have presented on how to do Data Science at scale a few times: https://carmo.io/talks)