train a set of sklearn models one each per a random partition of the data (computed distributed). then combine all those models using averaging and evaluate them all against an even larger dataset. how do you do that in SQL
Sharding the table can help scale the problem across many machines and as I mentioned earlier you can use PL/R or PL/Python language extension to lift all sorts of ML functions to SQL functions.