Hacker News new | past | comments | ask | show | jobs | submit login

It seems to me you were given three jobs: database admin, data engineer, and data scientist.

When I am talking about automated data cleaning, I am talking more about preprocessing text, dealing with missing variables, discarding duplicates, noisy/uninformative variable and outlier removal, spelling correction, feature interactions and transformations. All of these can be (and are being) largely automated. [1] [2]

A data lake with 150+ undocumented tables is garbage in-garbage out, both for machines and humans. I'd almost label that as the barrier: "Data not available", not: "Dirty data". While a reality for some companies, such a company really needs a DB admin or data engineer, not try to shoehorn an (expensive) data scientist in these roles.

[1] https://people.csail.mit.edu/kalyan/dsm/

[2] https://www.ijcai.org/proceedings/2017/352




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: