Kernel function is simple - Are you in a high dimensional space? If so, choose l...

smbrian · on April 30, 2020

Agreed -- linear SVMs, especially in text processing applications, is the one area where they are a natural fit. All their attributes complement the domain. Linear SVMs also have desirable performance characteristics.

But at that point, they also have a lot in common with linear models. Those also seem practical in that domain (though I have less experience here, tbh). And performant, when using SGD + feature hashing like e.g. vowpal wabbit.

My beef with non-linear kernels and structured data is a longer discussion, but I find kernel methods for structured data (which is usually high-dimension but low-rank -- lots of shared structure between features, shared structure between missingness of features) to be highly problematic.

snovv_crash · on April 30, 2020

> Prediction is not that slow with linear SVMs especially not compared to something like K-NN.

Provided your structural dimensionality is below about 10 (ie. 10 dominant eigenvalues for your features), then KNN can be O(log(N)) for prediction via a well designed Kd-Tree.

KNN is also really simple to understand, and to design features for. It also never really tends to throw up surprises, which for production is the kind of thing you want. Most importantly, the failures tend to 'make sense' to humans, so you stay out of the uncanny valley.