Hacker News new | past | comments | ask | show | jobs | submit login

Scikit-learn is a very nicely written library and I can use plenty of superlatives to describe the wonderous API of scikit-learn.

One thing I can't recommend enough is to extend their Transfomers base class in such a way that you implement their fit and transform methods. A simple example can be viewed here: https://gitlab.com/timelord/sklearn_transformers

which allows you to put your transformers into the scikit-learn Pipelines and GridSearchCV (and more). The way scikit-learn leverages multiple cores is by using joblib and Dask extends this implementation to effortlessly scale the scikit-learn pipelines onto a cluster of servers. https://distributed.readthedocs.io/en/latest/joblib.html

By writing your own data transformations in the transformer format you can, by extension, leverage this g great ecosystem.

I think it's a great time to be a data scientist / engineer now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: