This is a textbook to help readers understand the steps that lead to deep learning. Linear algebra comes first especially singular values, least squares, and matrix factorizations. Often the goal is a low rank approximation A = CR (column-row) to a large matrix of data to see its most important part. This uses the full array of applied linear algebra, including randomization for very large matrices. Then deep learning creates a large-scale optimization problem for the weights solved by gradient descent or better stochastic gradient descent. Finally, the book develops the architectures of fully connected neural nets and of Convolutional Neural Nets (CNNs) to find patterns in data. Audience: This book is for anyone who wants to learn how data is reduced and interpreted by and understand matrix methods. Based on the second linear algebra course taught by Professor Strang, whose lectures on the training data are widely known, it starts from scratch (the four fundamental subspaces) and is fully accessible without the first text.
He was pretty awesome to take linear algebra from. He had this perfectly tuned "absent minded professor" persona that had people almost literally on the edge of their seats trying to help him finish his points and his sentences. I've never seen a class so engaged before or after.