Next it'd be interesting to see Python 2k vs. Python 3+. My own experience tells me that the majority of top Kagglers still use Python 2k, despite Kaggle Kernels being Python 3+ exclusively.
I also am quite amazed with the predominant use of Logistic Regression. I wonder if that is less about interpretability / ease of engineering, and more about the barriers that data scientists face when using more complex methods: lack of data science talent, lack of management support, results not used by decision makers, limitations of tools.
If Kaggle results are anything to go by, all businesses that care about best performance on structured data, should be using a form of gradient boosting.
I also am quite amazed with the predominant use of Logistic Regression. I wonder if that is less about interpretability / ease of engineering, and more about the barriers that data scientists face when using more complex methods: lack of data science talent, lack of management support, results not used by decision makers, limitations of tools.
If Kaggle results are anything to go by, all businesses that care about best performance on structured data, should be using a form of gradient boosting.