Hacker News new | past | comments | ask | show | jobs | submit login

I get the motivation behind fast.ai's lessons, but I question whether it's the right approach. Their goal is to make deep learning accessible to programmers by reducing the mathematical background required. An analogous situation is the engineer who uses Autodesk to design car parts: He/she may not need to know the detailed implementation of the 3D graphics to use the software effectively.

The difference is that Autodesk relies on a mature, deterministic technology (3D graphics rendering). Deep learning is a stochastic process that depends on the data and the model. The training code, and especially the framework hooks, is the least important part. The example they give is three lines of code to train a cat vs. dog classifier. I've tried this classifier on a different binary image classification task: livers with and without tumors. It didn't work very well. There's lots of reasons: little variability between images, grey-scale images, different resolutions, etc. You can tweak the network, throw in more middle layers, try different kinds of layers, whatever, to get better results. All of that is guesswork if you don't understand what the CNN is doing at each stage. At this point in time you do need a formal education in linear algebra, calculus and statistics to investigate why a model does/does not work. It's not enough to know how to use the libraries.

On the flipside, you also need to know how to manipulate data and parse it into the correct format. This generally requires a year or two of programming practice in a good scripting language like Python. I will echo their thoughts that Ian Goodfellow's Deep Learning Book is remarkably lacking in this area. As a simple example, you cannot even use AlexNet without pre-processing your images to be 227x227 or 224x224 for GoogleNet. That's 10,000 images resized, labeled and loaded into the model before training can take place.

tl;dr IMHO in terms of being a competent user of deep learning: mathematics >= programming >>> knowing how to use a framework




The address exactly how you say deep learning should be taught in the course overview video. They aren't against learning all the maths, they just don't think you should do it first.

It's a top-down vs bottom-up teaching approach. The advantages are better motivation, better context and more immediate usefulness.

> All of that is guesswork if you don't understand what the CNN is doing at each stage.

Pff, it's still mostly guesswork even if you do understand what it is doing.


This is complete bullshit. You really don't need any math to make these systems work. Being able to preprocess and scrape data and conduct rudimentary error analysis is far more important. And the math you generally need to know is trivial.

This is just gatekeeping bullshit. Most of the time the right answer is "collect more/better data".


I really really agree with you. I also think that the fast.ai courses are amazing.

It is a question of motivation. If you're reasonably proficient at programming and want to hit the ground running on a specific application (especially in a domain that has well-established methods) fast.ai is probably what you're looking for.

If you're new to programming and mathematics, or want to work on the state of the art of these methods, you need to first truly understand more fundamental ideas. For those who fall into this camp, I am collecting resources (work in progress) and trying to organise them into a learning pathway: https://github.com/hnarayanan/deep-learning

You or anyone else who's interested is free to offer suggestions on learning material.


You already have Prof. Gilbert Strang's course in there(made me fall in love with linear algebra), I would like to suggest Prof. Joe Blitzstein's course (https://projects.iq.harvard.edu/stat110/home) on probability and Prof. Yaser Abu-Mostafa's course (https://work.caltech.edu/telecourse) on Machine Learning.


Thank you! I already had Prof. Yaser Abu-Mostafa's course in there too (also amazing), but I will check out Prof. Blitzstein's probability course as well.


How many people using 'jpeg' productively to create every day applications do you think understand the DCT?


I wasn't able to do a PhD with Stéphane Mallat now I can never use Jpeg2000 :(


You could petition Ingrid Daubechies as a means of last resort.


Is that an appropriate comparison though?


I think so. Jpeg is an enabling technology just like classification using neural nets. The interface is relatively simple in both cases, the applications are plentiful. Some meta data needs to be supplied to make sure the results are optimal. Both are quite complex under the hood and would require ample study in order to re-create them or in order to fully grok how they operate in every detail. But for high level applications - even if such knowledge would give you an edge - that in-depth knowledge is not a 100% requirement.


The big difference is that jpeg creation with default parameters works fine for probably > 99.9% of use cases. A jpeg encoding the picture of a car will be just as fine as that of a cat. This is not at all the case with the current state of ML, as the originator of this thread has also pointed out with their example.

But there is actually one more problematic area for jpeg: encoding of graphs, drawings etc. with a limited color palette and straight lines. Here, jpeg artifacts become more visible, and to reduce them, you can either turn up the quality, or use a better approach like svg or png. For this, at least a bit of more technical knowledge is required. How many non tech-savy people do even know about svg or png?

But an even more appropriate comparison with ML would be to ask how to improve jpeg to better deal with straight lines. For this, you clearly need to understand the maths.


The comment you originally replied to points out all the ways in which the deep learning "interface" is not relatively simple, at least not if your problem has any sort of deviation from the most simple use cases. For a user, making a jpeg is a one-time one-command affair. If you think training a neural net can be reduced to this level of abstraction (with the knowledge we have today), you have either never used them in practice, or you've been very lucky with the complexity of the problems you've encountered so far.


All of that is guesswork if you don't understand what the CNN is doing at each stage.

Well, what the CNN is doing at each stage is very simple to understand. There is a forward pass which is a matrix multiply (and addition of there is a bias) and then the matrix weights are learned in the backward pass, which is just basic differentiation and chain rule application. Now I am not trivializing differentiation ( when you try take the differntial of a vector, you are tearing your hair apart) but it's fundamentally a simple concept to understand.

Even with this understanding, designing deep neural nets and tuning hyper-parameters is mostly guesswork. Yes, the frameworks have little or nothing to do with this.

What I've found is that TensorFlow is difficult to wrap your head around for programmers because it's more like a DSL. You declare a computational graph and then run it multiple times. So when you are declaring a computational graph, you have no way of debugging that graph, unless you run it. Also the conversion from numpy arrays to Tensors and back is an expensive operation. PyTorch simplifies this to a great extent. You just create graphs and run like how you run a loop and declare variables in the loop. This is great for imperative programming. However, think about it - every time your graph is recreated. Now if it's just a variable re-initialization it's not a big deal, but we are dealing with Tensors so you give up efficiency for flexibility.

Again, all of this is immaterial for learning how to build deep neural nets. I would say, just stick to whatever framework you can wrap your head around. I am learning that my ability to tweak numpy arrays, visualize them in pyplot, load data from csvs using Pandas and the like will take me a lot further in learning deep learning.


It's not like you have to give up a lot - the graphs are simple data structures and creating them is not the expensive part of the training. The computation has to be re-done at every step in a static framework too, and this is the part that matters.


I like Andrew Ng's new Coursera deep learning course. You start by writing your own NN using just numpy and then slowly improve it over several sessions. By the time you get Tensorflow you've written several NN's from scratch and have a good understanding of how a NN works under the hood.


Do you really though, since your decisions are still going to be fairly simple like "more hidden layers" rather than solutions to differential equations. Couldn't you become competent just from having a broad experience training many networks for diverse purposes.


There's nothing wrong with that, but the mathematical background really isn't that complicated. It's just linear algebra and calculus.


Well, there's also nonconvex optimization, which is not exactly easy. On the other hand, "just use Adam" mostly works...


I'm currently working in parallel through fast.ai (practical programming) and Michael Nielsen's Neural Networks and Deep Learning ebook (basic mathematics behind NNs). I find that they complement each other well.

The Nielsen ebook: http://neuralnetworksanddeeplearning.com


You missed the entire point of fast.ai. They believe that it's better to be able to do basic, practical stuff before diving deeper. Most people will lose motivation if they have to learn an insane amount of stuff before even getting started with the cool stuff.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: