I'd more generally describe the area as first order optimization, including methods like acceleration, automatic differentiation, stochastic approaches. Adam is just one trick for determining a hyperparameter.
They are usable everywhere derivative-based optimization is usable. Which certainly means SVM's, though since it's a shallow method you don't need much data to train it, and hence don't need a scalable optimization methods (it would just be unnecessarily slow). But you certainly could do it if you somehow needed to. Here's the first hit on google for "sgd svm': https://scikit-learn.org/stable/modules/generated/sklearn.li...
The fact that you can't use first order optimization methods for graphical models is one answer to the question of why everyone doesn't use them. Though for small models there are deep networks which model them and are trained as per usual for neural networks. I think this is still an active research area.
Nice yea I would agree with the vast majority of this, only thing I would add is that Adam/gradient methods are still useful in a graphical model e.g. to get a MAP estimate (and then you can get a rough posterior estimate using variational methods or Laplace approximation once you find the MAP). But I agree I wasn’t clear about what I mean when I say graphical models since I think most people would understand graphical models to mean a full MCMC sampling of the posterior and marginalization over hyperparameters. I would say it’s useful to understand why people do that and why that is useful, but many times that is (1) overkill and (2) inspires overconfidence in the result because once we marginalize over our prior distribution people tend to forget that our prior may have been a complete fudge. I just mean graphical models as a tool for model building, understanding how different models relate to one another, and as a recipe for deriving a loss function.
They are usable everywhere derivative-based optimization is usable. Which certainly means SVM's, though since it's a shallow method you don't need much data to train it, and hence don't need a scalable optimization methods (it would just be unnecessarily slow). But you certainly could do it if you somehow needed to. Here's the first hit on google for "sgd svm': https://scikit-learn.org/stable/modules/generated/sklearn.li...
The fact that you can't use first order optimization methods for graphical models is one answer to the question of why everyone doesn't use them. Though for small models there are deep networks which model them and are trained as per usual for neural networks. I think this is still an active research area.