Great comments. I heartily agree and support the statement about probabilistic graphical models. Just to add a couple more facets to this perspective:
'State of the art' does not always mean 'best for your task', and in fact lately depending on your field SOTA sometimes simply means 'unaffordable' for anyone whose budget is under 1 million dollars.
Try linear methods first.
Ensembles of decent models are usually good models. The point above about probability calibration can be at least somewhat mitigated by using ensemble averages.
Don't just assume "the $MODEL will figure it out" if you give it shitloads of degrees of freedom. Machine learning efficiency all comes down to efficiency of representation, and feature engineering can achieve huge payoffs if/when you incorporate domain knowledge and expertise.
Once you gain a perspective into the "universality" of statistical methods, optimization, and Bayesian probability theory, your work will become a lot easier to reason about. As an example, try to see if you can explain why least-squares fit results from the assumption that model residuals are normally distributed (and what connections this may have to statistical physics!).
'State of the art' does not always mean 'best for your task', and in fact lately depending on your field SOTA sometimes simply means 'unaffordable' for anyone whose budget is under 1 million dollars.
Try linear methods first.
Ensembles of decent models are usually good models. The point above about probability calibration can be at least somewhat mitigated by using ensemble averages.
Don't just assume "the $MODEL will figure it out" if you give it shitloads of degrees of freedom. Machine learning efficiency all comes down to efficiency of representation, and feature engineering can achieve huge payoffs if/when you incorporate domain knowledge and expertise.
Once you gain a perspective into the "universality" of statistical methods, optimization, and Bayesian probability theory, your work will become a lot easier to reason about. As an example, try to see if you can explain why least-squares fit results from the assumption that model residuals are normally distributed (and what connections this may have to statistical physics!).