Hacker News new | past | comments | ask | show | jobs | submit login

It's true that scikit-learn was started and originally written mostly by PhD students (most were in fact CS PhDs), and the API they designed is amazing! A lot of the python ML ecosystem has adopted it and uses it - fit, predict, transform. I don't think any language has something comparable.

4 years ago they removed a misleading class - and even at the time the documentation was clear about what it was doing. I'm not sure how this reveals some huge flaw about scikit-learn. At best it shows that the contributors can realize their mistakes and solve them, without even needing people to point it out? That's great!

Also pointing to a bad implementation 4 years ago, for a project which has since then had way more funding for engineering time, and who's use has exploded, seems a bit misleading.




See the second link I posted. Even the most basic 3 functionalities are bad designed. If X is your input space and Y your output space then fit should (after each call) return a function X->Y and not modify some internal state.

Have you ever tried looked at the pipeline cross validation, where you have to pass a dict of parameters to the function with underscore prefixes for each stage in the pipeline? Do this and you never call the API design amazing again.

There are examples for other bad design choices as well.

You are right, there is no alternative at the moment. Maybe julia lang will do better job, we will see.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: