Adam, RMSProp, etc are just flavors of gradient descent so they’re useful on anything from ResNet to logistic regression. There are more flavors like natural gradient that are more useful for smaller problems since they require a Hessian matrix, but gradient descent is gradient descent. We use Adam in production for logistic regression, not for any particular reason really, just happens to work.
I'm not the OP, but personally I see NN's as being really really useful where the input data is unstructured (such as text or images). The deep approach (appears to) build better features than a human can, but I'm not convinced that they are _that_ much better (or indeed at all) than standard methods for tabular data.
Once upon a time, when I used to hire data people, I'd ask them to tell me about a recent data project. They'd normally mention some kind of complex model, and I'd ask them how much better it was than linear/logistic regression. A really large proportion of candidates (around 50%) couldn't answer this because they'd never compared their approach to anything simpler.
One person told me that linear regression wasn't in the top 10 Kaggle models, so they would never use it.
Oh so training time is virtually irrelevant to us and if it weren’t we would have to be a lot more careful about optimization methods and possibly which language to use. We also cannot use NN for the models we build (we are restricted to LR, but LR has as much model capacity as you need as long as you include more and more feature interaction terms).
NN’s are universal function approximators. They can have arbitrary model capacity, and you can sort of control that with architecture decisions, loss function/regularization choices, and early stopping, but depending on the problem they can cause more problems than they solve. Usually you don’t really know if your NN will generalize well outside of your train/test distributions, so many times it’s better to have a simpler, more predictable model that you can control the behavior of. This is all from my personal experience and is completely moot when we’re talking about e.g. NLP or vision tasks or situations where you’re drowning in data. NNs are super interesting and powerful, don’t mean to suggest otherwise but the mantra is: “what is the right solution to my problem”. Lots of great advantages to NN’s as well (you can get them to do anything with enough cajoling and they can be solutions to major headaches you would usually have in e.g. kernel methods).