Stay away, in my opinion. I spent a year supporting a SVM in a production machine learning application, and it made me wish the ML research community hadn't been so in love with them for so long.
They're the perfect blend of theoretically elegant and practically impractical. Training scales as O(n^3), serialized models are heavyweight, prediction is slow. They're like Gaussian Processes, except warped and without any principled way of choosing the kernel function. Applying them to structured data (mix of categorical & continuous features, missing values) is difficult. The hyperparameters are non-intuitive and tuning them is a black art.
GBMs/Random Forests are a better default choice, and far more performant. Even simpler than that, linear models & generalized linear models are my go-to most of the time. And if you genuinely need the extra predictiveness, deep learning seems like better bang for your buck right now. Fast.ai is a good resource if that's interesting to you.
Linear models are simpler. GBMs are more powerful, more flexible, and faster.
Every ML course I took had 3 weeks of problem sets on VC dimension and convex quadratic optimization in Lagrangian dual-space, while decision tree ensembles were lucky to get a mention. Meanwhile GBMs continue to win almost all the competitions where neural nets don't dominate.
I suspect my professors just preferred the nice theoretical motivation and fancy math.
Svms are, by default, linear models. The decision boundary in the Svm problem is linear and since it’s the max margin we may enjoy nice generalization properties (as you probably know).
You probably also know that decision tree boundaries are non Linear And piecewise. It’s not so straightforward to find splits on continuous features.
Ie If the data is linearly separable then why not. Even using hinge loss with nns is not uncommon.
You probably see gbms winning a lot of competitions compared to svms because a lot of competitions may have a lot of data and non linear decision boundaries. some problems don’t have these characteristics.
Kernel function is simple - Are you in a high dimensional space? If so, choose linear kernel. Else? Choose the most non-linear one you can (usually a guassian or RBF). I suppose quadratic and the other kernals are useful if what your modeling looks like that but in practice that is rare.
Prediction is not that slow with linear SVMs especially not compared to something like K-NN. The main hyperparamaters which matter are the "C" value and maybe class weights if you have recall or precision requirements. The C value is something that should be grid-searched, but you might as well be grid-searching everything that matters on every ML algorithm and in this regard SVMs are fast to iterate over (because the C value is all that matters).
Applying categorical and continuous features is not difficult if you choose to do it in anything more sophisticated than sklearn. Also, pd.get_dummies() exists (though it may lead to that slow prediction you're concerned about)
You're most likely right with GBM or Random Forests - though they can have all sorts of issues with parallelism if you're not on the right kind of system. You talk about linear models but SVMs are usually using linear kernals anyway and are a generalization of linear models (including lasso and ridge regression models).
Agreed -- linear SVMs, especially in text processing applications, is the one area where they are a natural fit. All their attributes complement the domain. Linear SVMs also have desirable performance characteristics.
But at that point, they also have a lot in common with linear models. Those also seem practical in that domain (though I have less experience here, tbh). And performant, when using SGD + feature hashing like e.g. vowpal wabbit.
My beef with non-linear kernels and structured data is a longer discussion, but I find kernel methods for structured data (which is usually high-dimension but low-rank -- lots of shared structure between features, shared structure between missingness of features) to be highly problematic.
> Prediction is not that slow with linear SVMs especially not compared to something like K-NN.
Provided your structural dimensionality is below about 10 (ie. 10 dominant eigenvalues for your features), then KNN can be O(log(N)) for prediction via a well designed Kd-Tree.
KNN is also really simple to understand, and to design features for. It also never really tends to throw up surprises, which for production is the kind of thing you want. Most importantly, the failures tend to 'make sense' to humans, so you stay out of the uncanny valley.
I’d agree on the training time but your serialized model should be small on disk since only the support vectors are needed for inference. At least with my experience that has been true.
They're the perfect blend of theoretically elegant and practically impractical. Training scales as O(n^3), serialized models are heavyweight, prediction is slow. They're like Gaussian Processes, except warped and without any principled way of choosing the kernel function. Applying them to structured data (mix of categorical & continuous features, missing values) is difficult. The hyperparameters are non-intuitive and tuning them is a black art.
GBMs/Random Forests are a better default choice, and far more performant. Even simpler than that, linear models & generalized linear models are my go-to most of the time. And if you genuinely need the extra predictiveness, deep learning seems like better bang for your buck right now. Fast.ai is a good resource if that's interesting to you.