Hacker News new | past | comments | ask | show | jobs | submit login

train a set of sklearn models one each per a random partition of the data (computed distributed). then combine all those models using averaging and evaluate them all against an even larger dataset. how do you do that in SQL



Sharding the table can help scale the problem across many machines and as I mentioned earlier you can use PL/R or PL/Python language extension to lift all sorts of ML functions to SQL functions.


I'm unfamiliar with PL/Python. Can you have a Python object be the returned value of a sql query? Because that's a requirement of my example.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: