> But whenever a statement like this is made, the implication is that there is some alternative where we can solve the same problem but without priors.
I don't see the misunderstanding you mean. I said if you think a selection of the least-length or sparsest solution is ad hoc, then so is your choice of prior. Solving the same system without priors would be analogous (actually mathematical equivalent) to solving an inverse problem without regularization. Or failing to solve it in the ill-posed case.
As for deterministic, I mean not probabilistic. As in linear algebra and "curve fitting".
As for "nice-to-have", I mean you can do machine learning without having any of the statistical understanding we've talked about and instead making various choices simply "because they work".
As for statistics with out being a Bayesian, I did mean frequentist, though that may not cover everyone. You can even use a prior distribution that is estimated from data (people commonly do that with the naive Bayes method), whatever you want to call such a person. I wouldn't call them a Bayesian. You can simply view it as applying the chain rule of probability to get a more convenient form of your maximum likelihood equation.
> I don't see the misunderstanding you mean. I said if you think a selection of the least-length or sparsest solution is ad hoc, then so is your choice of prior. Solving the same system without priors would be analogous (actually mathematical equivalent) to solving an inverse problem without regularization. Or failing to solve it in the ill-posed case.
Oh totally, I agree with that.
> As for deterministic, I mean not probabilistic. As in linear algebra and "curve fitting".
As for "nice-to-have", I mean you can do machine learning without having any of the statistical understanding we've talked about and instead making various choices simply "because they work".
Yea I agree with this too, at least in principle. No issue with solving a lot of problems from a non statistical perspective since many times statistics is not the clear “right” choice. E.g. understanding that L1 regularization corresponds to a “Laplace prior” doesn’t give you that much deeper of an understanding of what you’re doing, since most people use L1 regularization to encourage sparsity. Also, if you’re more comfortable with a non-stats perspective on things, no problem approaching problems in the way you prefer.
Summary: I agree with everything you’ve said here. All that’s left is I think a difference of opinion about how important it is to understand the Bayesian perspective and I think that likely comes down to (1) the types of problems you typically work on, and (2) personal preference. I find personally that understanding the Bayesian interpretation is extremely helpful for building a deeper understanding of a wide variety of ML algorithms but I totally concede this is not necessarily a hard truth. So I stand by my advice, but will definitely agree that there are alternatives. I took the route of understanding ML without Bayesian stats first — really didn’t understand or know Bayesian stuff for a decent amount of time after I got into ML. I’ve found the Bayesian perspective has helped tremendously but that’s just me.
I don't see the misunderstanding you mean. I said if you think a selection of the least-length or sparsest solution is ad hoc, then so is your choice of prior. Solving the same system without priors would be analogous (actually mathematical equivalent) to solving an inverse problem without regularization. Or failing to solve it in the ill-posed case.
As for deterministic, I mean not probabilistic. As in linear algebra and "curve fitting".
As for "nice-to-have", I mean you can do machine learning without having any of the statistical understanding we've talked about and instead making various choices simply "because they work".
As for statistics with out being a Bayesian, I did mean frequentist, though that may not cover everyone. You can even use a prior distribution that is estimated from data (people commonly do that with the naive Bayes method), whatever you want to call such a person. I wouldn't call them a Bayesian. You can simply view it as applying the chain rule of probability to get a more convenient form of your maximum likelihood equation.