This is a very, very convoluted way of understanding backprop.
If you want to understand it I highly recommend writing a neural network starting from a basic perceptron (no hidden layer) or even implementing basic linear regression with stochastic gradient descent. Then work your way up to a network with one hidden layer.
Try to focus on the maths and avoid looking at code examples.
This is a great book that breaks down the process step by step. I had my own neural nets working before I touched a framework, and it demystified what would have otherwise seemed like black magic.
The book I used to learn the basics many years ago was "An Introduction to Neural Networks", by Kevin Gurney. It's a bit outdated now, but I still consider it valuable.
I absolutely second the suggestion of working through the math of a shallow neural network to start.
I would also suggest that once one thinks they have a sufficient grasp of the steps to try to implement them at a low level without frameworks, just math libraries. Personally, I found Octave (highly similar to MATLAB) to be a nice granularity tier for learning the process programmatically.
This. And https://ml-cheatsheet.readthedocs.io/en/latest/backpropagati... . I am currently doing MIT Micromasters in Data Science, specifically the Machine Learning & Deep learning course. Last week's assignment was to build a neural network from scratch and these two resources have been a lifesaver.
This book explains and executes every single line of code interactively, from low level operations to high-level networks that do everything automatically. The code is built on the state of the art performance operations of oneDNN (Intel, CPU) and cuDNN (CUDA, GPU). Very concise readable and understandable by humans.
Try reading the last two paragraphs of the article first; it won’t make much more sense unless (I suspect) you already understood the point it was intended to convey, but you’ll save some time being confused:
> By now it’s probably worth dropping the allegory: the “workers” in our story are models, which could be individual layers of a neural network, or even whole models. And the process we’ve been discussing is of course the backpropagation of gradients, which are used to iteratively update the weights of a model.
> The allegory also introduced Thinc’s particular implementation strategy for backpropagation, which uses function composition. This approach lets you express neural network operations as higher-order functions. On the one hand, there are sometimes where managing the backward pass explicitly is tricky, and it’s another place your code can go wrong. But the trade-off is that there’s much less API surface to work with, and you can spend more time thinking about the computations that should be executed, instead of the framework that’s executing them. For more about how Thinc is put together, read on to its Concept and Design.
I like math, but ML is applied science, so at some point, it's an algorithm, so I don't really understand why math is really required.
I'd rather read pseudo code than math equations.
I would just assemble all the bricks myself, read the code, and ask questions later.
I have problems with the scholastic method:
* fill students with theory
* tell them to execute the exercise
* hope they understood the material
* evaluate with an exam
* use grades and eliminate the ones who failed
This doesn't work for all students, or it works only for very very "structured" students who managed to learn this way. A lot of people learn by doing, and reading/writing math is not going to help them. That's why a lot of developers want to read the code. Math is a language, but it's not always very well defined and often, teachers have their own notation and usages, which can be frustrating.
Code is well defined, and always checked and verified by a parser. Code is more reliable than mathematical notation.
It's perfectly understandable for developers to refuse to use math to do machine learning, when machine learning is applied. Most of those developers want a crash course and be operational as fast as possible. The theory is not always required, unless you want to do research and go further.
I have nothing against mathematics, I love maths, but for ML, you need statistics and other simple things, you don't need to go very far in the maths. Leave optimization to people who want to optimize. Do things step by step. First step is: let people understand and use machine learning.
Lots of people make use of relational databases without formally understanding set theory/normal forms. They just learn by example and hacking it. What you're seeing here with deep learning is the same situation.
Sure. And I didn't at all mean that people who aren't good at math should stop doing deep learning.
I just meant that it's weird to me that people seemingly tiptoe around using simple math to explain what is, at heart, simple math, and instead invent a whole new terminology and a dictionary of imprecise analogies to get by.
Most people afraid of “math” only have exposure to the second kind. I figure it has more to do with finding those type of mechanical operations boring (they are), and hence all of maths must be equally boring. If you’re a software developer, enjoy abstractions, programming language theory, and solving problems there is no way you would find math of the first sort boring. Alas, in high school pretty much all the math is of the boring mechanical kind. At least mine was, anyway. Even the first exposure to non-rigorous calculus is of the shitty plug-and-chug variety.
It's hard to go through life without developing a modest understanding of arithmetic and reading. You wouldn't even be able to check your receipts if school started with Peano's axioms. Consequently, numbers come first, as they have for millennia now. Which makes sense, because maths developed out of intuitive arithmetic.
As a matter of fact, my parents complain they never understood maths, because they were taught algebraic rules by rote. a * b = b * a. They can recite that, they just don't have a clue what to do with it.
Teachers just aren't good, and certainly not at maths. It's a struggle to teach billions the basic skills. There is no solution.
I also don't know any mathematician or physicist bad at arithmetic. Quite a few of them seem to delight in numbers and arithmetic puzzles.
So I really don't see how to avoid numbers and boring school work, nor how the former could have a positive effect.
My guess is because they were taught how to plug numbers into an equation and get an answer and didn't get taught math skills like algebraic manipulation.
This isn't a slight against people who aren't good at maths. I think the way education system work is pretty broken.
If you want to understand it I highly recommend writing a neural network starting from a basic perceptron (no hidden layer) or even implementing basic linear regression with stochastic gradient descent. Then work your way up to a network with one hidden layer.
Try to focus on the maths and avoid looking at code examples.