Hacker News new | past | comments | ask | show | jobs | submit login

> at most you can follow recipes cook book style.

Here I disagree with you pretty strongly. Once someone is comfortable with differentiable programming it's much more obvious how to build and optimize any type of model.

People should be more concerned about when to use derivatives, gradients, hessians, Laplace approximation etc rather than worry about the implementation details of these tools.

Abstraction can also aid depth of understanding. I know plenty of people who can implement backprop, but then don't understand how to estimate parameter uncertainty from the Hessian. The latter is much more important for general model building.




i am not sure what you are disagreeing with. chain rule is basic calculus that precedes understanding hessians. my argument is, if you can not understand what the chain rule is, you will not understand more complicated mathematics in ML. do you think i am wrong ?

EDIT: also uncertainty estimation is the stuff of probabalistic approach to ML. i would say that people who do probabalistic ML are quite mathematically capable (at least to my experience)


> chain rule is basic calculus that precedes understanding hessians.

It doesn't have to be that way. The hessian is an abstract idea and the chain rule and more specifically backpropagation are methods of computing the results for an abstract idea. When I want the hessian I want a matrix of second order partial derivatives, I'm not interested in how those are computed.

For a more concrete example, would you say that using the quantile function for the normal distribution requires you to be able to implement it from scratch?

There are many, very smart, very knowledgeable people that correctly use the normal quantile function (inverse CDF) every day for essential quantitative computation that have absolutely no idea how to implement the inverse error function (an essential part of the normal quantile). Would you say that you don't really know statistics if you can't do this? That a beginner must understand the implementation details of the inverse error function before making any claims about normal quantiles? I myself would absolutely need to pull up a copy of Numerical Recipes to do this. It would be, in my opinion, ludicrous to say that anyone wanting to write statistical code should understand and be able to implement the normal quantile function. Maybe in 1970 that was true, but we have software to abstract that out for us.

The same is becoming true of backprop. I can simply call jax.grad on my implementation of loss of the forward pass of the NN I'm interested in and get the gradient of that function, the same way I can call scipy.stats.norm.ppf to get that quantile for a normal. All that is important is that you understand what the quantile function of the normal distribution means for you to use it correctly, and again I suspect there are many practicing statisticians that don't know how to implement this.

And to give you a bit of context, my view on this has developed from working with many people who can pass a calculus exam and perform the necessarily steps to compute a derivative, but yet have almost no intuition about what a derivative means and how to use it and reason about it. Calculus historically focused on computation over intuition because that was what was needed to do practical work with calculus. Today the computation can take second place to the intuition because we have powerful tools that can take care of all the computation for you.


> Today the computation can take second place to the intuition because we have powerful tools that can take care of all the computation for you.

and that tool is backprop. if you do not understand what the chain rule is and what it is doing, that tool will be magic to you and you are blindly trusting its correctness. seeing that alot of risk is involved in using AI models in real life, blindly trusting your model is not a good approach

i agree that simply regurgitating rules of calculus is pointless to understanding. but thats definitely not what i mean when i talk about the need to understand the chain rule

ML is a mathematically intensive subject. there is no going around this fact


do you know all the assemlber instructions your pc/mac carried out for you in order to post this text on hn? i guess not


but that's my point. knowing how to compile a program does not make me a compiler engineer. in that sense feel free to use ML tools, but don't be fooled into thinking you will get a job as an ML engineer if you do not know what the chain rule is, or why we need to take a derivative in order to optimise a loss function. in fact, don't even be fooled into thinking you will get into a ML uni degree if you don't know what the chain rule is. i actually don't understand what is the problem. spend 10 minutes reading up on it and i am sure you will get it. i think an unwarranted phobia of mathematics is what is at play here


> my argument is, if you can not understand what the chain rule is, you will not understand more complicated mathematics in ML.

Are you sure about this?


yes. in europe admission into an ML-type masters degree lists all three standard levels of mathematical analysis as a bare minimum for application


If by understand, you mean understand and not regurgitate it when asked as a trivia question - I agree with you. However, there are different interpretations of the chain rule.


Are there any books that teach differentiable programming ?


not books but there are quite a few interesting and accessible papers. here is one

Pearlmutter, B.A. and Siskind, J.M., `Reverse-Mode AD in a Functional Framework: Lambda the Ultimate Backpropagator,'

http://www.bcl.hamilton.ie/~qobi/stalingrad/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: