Hacker News new | past | comments | ask | show | jobs | submit login

Yes, completely agree that each dataset requires decisions to be made that can't be automated, but there are huge opportunities for tools to assist users in understanding what cleaning decisions they might want to make and how those decisions affect the data. Most data cleaning tools do a very poor job of helping the user visualize and understand the impact cleaning has on data - they're usually very low level (such as pandas).

As an example of a tool: Trifacta (disclaimer I work here) https://www.trifacta.com/products/wrangler/. We're trying to improve data cleaning with features such as suggesting transforms the user might want, integrating data profiling through all stages to discover and understand, and transform previews so the user can understand the impact.

I think there's a huge opportunity for better tools in the problem space.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: