Hacker News new | past | comments | ask | show | jobs | submit login
Introducing Pytorch for fast.ai (fast.ai)
180 points by bryananderson on Sept 10, 2017 | hide | past | favorite | 45 comments



I get the motivation behind fast.ai's lessons, but I question whether it's the right approach. Their goal is to make deep learning accessible to programmers by reducing the mathematical background required. An analogous situation is the engineer who uses Autodesk to design car parts: He/she may not need to know the detailed implementation of the 3D graphics to use the software effectively.

The difference is that Autodesk relies on a mature, deterministic technology (3D graphics rendering). Deep learning is a stochastic process that depends on the data and the model. The training code, and especially the framework hooks, is the least important part. The example they give is three lines of code to train a cat vs. dog classifier. I've tried this classifier on a different binary image classification task: livers with and without tumors. It didn't work very well. There's lots of reasons: little variability between images, grey-scale images, different resolutions, etc. You can tweak the network, throw in more middle layers, try different kinds of layers, whatever, to get better results. All of that is guesswork if you don't understand what the CNN is doing at each stage. At this point in time you do need a formal education in linear algebra, calculus and statistics to investigate why a model does/does not work. It's not enough to know how to use the libraries.

On the flipside, you also need to know how to manipulate data and parse it into the correct format. This generally requires a year or two of programming practice in a good scripting language like Python. I will echo their thoughts that Ian Goodfellow's Deep Learning Book is remarkably lacking in this area. As a simple example, you cannot even use AlexNet without pre-processing your images to be 227x227 or 224x224 for GoogleNet. That's 10,000 images resized, labeled and loaded into the model before training can take place.

tl;dr IMHO in terms of being a competent user of deep learning: mathematics >= programming >>> knowing how to use a framework


The address exactly how you say deep learning should be taught in the course overview video. They aren't against learning all the maths, they just don't think you should do it first.

It's a top-down vs bottom-up teaching approach. The advantages are better motivation, better context and more immediate usefulness.

> All of that is guesswork if you don't understand what the CNN is doing at each stage.

Pff, it's still mostly guesswork even if you do understand what it is doing.


This is complete bullshit. You really don't need any math to make these systems work. Being able to preprocess and scrape data and conduct rudimentary error analysis is far more important. And the math you generally need to know is trivial.

This is just gatekeeping bullshit. Most of the time the right answer is "collect more/better data".


I really really agree with you. I also think that the fast.ai courses are amazing.

It is a question of motivation. If you're reasonably proficient at programming and want to hit the ground running on a specific application (especially in a domain that has well-established methods) fast.ai is probably what you're looking for.

If you're new to programming and mathematics, or want to work on the state of the art of these methods, you need to first truly understand more fundamental ideas. For those who fall into this camp, I am collecting resources (work in progress) and trying to organise them into a learning pathway: https://github.com/hnarayanan/deep-learning

You or anyone else who's interested is free to offer suggestions on learning material.


You already have Prof. Gilbert Strang's course in there(made me fall in love with linear algebra), I would like to suggest Prof. Joe Blitzstein's course (https://projects.iq.harvard.edu/stat110/home) on probability and Prof. Yaser Abu-Mostafa's course (https://work.caltech.edu/telecourse) on Machine Learning.


Thank you! I already had Prof. Yaser Abu-Mostafa's course in there too (also amazing), but I will check out Prof. Blitzstein's probability course as well.


How many people using 'jpeg' productively to create every day applications do you think understand the DCT?


I wasn't able to do a PhD with Stéphane Mallat now I can never use Jpeg2000 :(


You could petition Ingrid Daubechies as a means of last resort.


Is that an appropriate comparison though?


I think so. Jpeg is an enabling technology just like classification using neural nets. The interface is relatively simple in both cases, the applications are plentiful. Some meta data needs to be supplied to make sure the results are optimal. Both are quite complex under the hood and would require ample study in order to re-create them or in order to fully grok how they operate in every detail. But for high level applications - even if such knowledge would give you an edge - that in-depth knowledge is not a 100% requirement.


The big difference is that jpeg creation with default parameters works fine for probably > 99.9% of use cases. A jpeg encoding the picture of a car will be just as fine as that of a cat. This is not at all the case with the current state of ML, as the originator of this thread has also pointed out with their example.

But there is actually one more problematic area for jpeg: encoding of graphs, drawings etc. with a limited color palette and straight lines. Here, jpeg artifacts become more visible, and to reduce them, you can either turn up the quality, or use a better approach like svg or png. For this, at least a bit of more technical knowledge is required. How many non tech-savy people do even know about svg or png?

But an even more appropriate comparison with ML would be to ask how to improve jpeg to better deal with straight lines. For this, you clearly need to understand the maths.


The comment you originally replied to points out all the ways in which the deep learning "interface" is not relatively simple, at least not if your problem has any sort of deviation from the most simple use cases. For a user, making a jpeg is a one-time one-command affair. If you think training a neural net can be reduced to this level of abstraction (with the knowledge we have today), you have either never used them in practice, or you've been very lucky with the complexity of the problems you've encountered so far.


All of that is guesswork if you don't understand what the CNN is doing at each stage.

Well, what the CNN is doing at each stage is very simple to understand. There is a forward pass which is a matrix multiply (and addition of there is a bias) and then the matrix weights are learned in the backward pass, which is just basic differentiation and chain rule application. Now I am not trivializing differentiation ( when you try take the differntial of a vector, you are tearing your hair apart) but it's fundamentally a simple concept to understand.

Even with this understanding, designing deep neural nets and tuning hyper-parameters is mostly guesswork. Yes, the frameworks have little or nothing to do with this.

What I've found is that TensorFlow is difficult to wrap your head around for programmers because it's more like a DSL. You declare a computational graph and then run it multiple times. So when you are declaring a computational graph, you have no way of debugging that graph, unless you run it. Also the conversion from numpy arrays to Tensors and back is an expensive operation. PyTorch simplifies this to a great extent. You just create graphs and run like how you run a loop and declare variables in the loop. This is great for imperative programming. However, think about it - every time your graph is recreated. Now if it's just a variable re-initialization it's not a big deal, but we are dealing with Tensors so you give up efficiency for flexibility.

Again, all of this is immaterial for learning how to build deep neural nets. I would say, just stick to whatever framework you can wrap your head around. I am learning that my ability to tweak numpy arrays, visualize them in pyplot, load data from csvs using Pandas and the like will take me a lot further in learning deep learning.


It's not like you have to give up a lot - the graphs are simple data structures and creating them is not the expensive part of the training. The computation has to be re-done at every step in a static framework too, and this is the part that matters.


I like Andrew Ng's new Coursera deep learning course. You start by writing your own NN using just numpy and then slowly improve it over several sessions. By the time you get Tensorflow you've written several NN's from scratch and have a good understanding of how a NN works under the hood.


Do you really though, since your decisions are still going to be fairly simple like "more hidden layers" rather than solutions to differential equations. Couldn't you become competent just from having a broad experience training many networks for diverse purposes.


There's nothing wrong with that, but the mathematical background really isn't that complicated. It's just linear algebra and calculus.


Well, there's also nonconvex optimization, which is not exactly easy. On the other hand, "just use Adam" mostly works...


I'm currently working in parallel through fast.ai (practical programming) and Michael Nielsen's Neural Networks and Deep Learning ebook (basic mathematics behind NNs). I find that they complement each other well.

The Nielsen ebook: http://neuralnetworksanddeeplearning.com


You missed the entire point of fast.ai. They believe that it's better to be able to do basic, practical stuff before diving deeper. Most people will lose motivation if they have to learn an insane amount of stuff before even getting started with the cool stuff.


I love seeing this, and not just because I love PyTorch[1].

It's also because I believe it's in everyone's best interest to have more than one widely used framework controlled by a single company (TensorFlow).

Also, I think fast.ai's approach to teaching deep learning is the right one for the vast majority of developers: start with practical, immediately useful know-how instead of theoretical underpinnings. People who want to delve deeper, say, so they can develop innovative architectures, can always do so at their own pace after taking fast.ai's course. There are a ton of other online resources for learning subjects like linear algebra, multivariate calculus, statistics, probabilistic graphical models, etc.

[1] Here's why I love PyTorch: https://news.ycombinator.com/item?id=14947076


I do love PyTorch as well. I'm also somewhat confused, isn't PyTorch controlled by a single company [Facebook] too?


It's not controlled by Facebook in any way. It's true that a large part of the core team works there, but development is public and guided by community needs first.


I literally started this course yesterday. One of the annoying things is the software setup. Yes they provide an AWS image but it's fairly expensive and I already have a powerful GPU on my desktop. Unfortunately the Windows setup instructions are long, complicated, use out of date software, etc. You have to install a lot of different pieces of software, including Anaconda which apparently is yet another package manager (seriously?).

It's not quite as bad as the Javascript npm/gulp/bower/whatever insanity but it's not too far off. Get it together ML people!


Anaconda is the thing that makes it a lot easier. Under Linux this is all a lot simpler than under windows.


I spent hours trying to get setup before finding crestle mentioned in the forums. It's preloaded with everything you need and is billed by the second with 25 hours free


I am currently learning Tensorflow and Keras. I found the learning curve quite steep with Tensorflow and the whole static computation graph thing puzzling at first (I thought it was dynamic...). Now I am a lot more comfortable with both.

Really welcome new libraries and frameworks to make deep learning more accessible. My only fear is that the field becomes cluttered with a myriad of frameworks like the JS world. It would add a lot of confusion and apprehension for people entering the field IMHO since they would not know what to use and where to start.


The fact that I haven't heard about PyTorch wrappers before always made me feel like it had nailed the balance between expressiveness and customizability.

He also states that PyTorch is hard [1], which does not seem to be HN's overall opinion. So I guess they found Keras was limited in customization and PyTorch required some boilerplate for loading/processing data and training loops, and this new framework tries to fill the gaps?

I'm looking forward for their folloup posts.

[1]: https://twitter.com/jeremyphoward/status/906653539161694208


When can we expect this class? Also, Jeremy Howard recently commented that they were redoing the first Practical Deep Learning for Coders class, is this going to be the same course or will they be separate?

Have loved your material, thank you very much!


The way I understood his tweet is that they are basically rewriting the first course with PyTorch first, so I'm guessing the content will be identical, just the notebooks now using PyTorch in the background.


Part 2 is already out: http://course.fast.ai/part2.html


On Keras: >>On the other hand, it tends to make it harder to customize models, especially during training. More importantly, the static computation graph on the backend, along with Keras’ need for an extra compile() phase, means that it’s hard to customize a model’s behaviour once it’s built.

What does that mean, customize models during training?

Also, how are dynamic-graph architectures performing vs. models where the architecture doesn't change? Are they winning competitions?


I imagine he means customizing the optimizer stage?


I like the fast.ai forums http://forums.fast.ai also.

Would be nice to see use of cluttered mnist instead of coco as an initial toy example for teaching https://github.com/kevinjliang/tf-Faster-RCNN/blob/master/RE...


The dynamic graph makes a lot of sense to me, in terms of models that combine ideas not traditionally explored in the NN mainstream eg topology.

May I suggest the name "entelechy" (the realization of potential) as a candidate name for the framework.


I respectfully request that you not use entelechy for the name of the framework.

Kenneth J Hughes (Engineer and Founder, Entelechy Corporation)


good choice :) it's a cool name i have been using in my private projects for more than a decade.


And what's even better than pytorch? Just torch.


> Much to our surprise, we also found that many models trained quite a lot faster on pytorch than they had on Tensorflow.

Would love to see some benchmarks for that claim.


I benchmarked Keras+TF vs PyTorch CNNs back in May 2017:

1) Compilation speed for a jumbo CNN architecture: Tensorflow took 13+ minutes to start training every time network architecture was modified, while PyTorch started training in just over 1 minute.

2) Memory footprint: I was able to fit 30% larger batch size for PyTorch over Tensorflow on Titan X cards. Exact same jumbo CNN architecture.

Both frameworks had major releases since May, so I am sure these metrics might have changed by now. However I ended up adopting PyT for my project.


Pytorch is way ahead of tensorflow in terms of cuDNN API usage.


Same here.


I like PyTorch better as well. Note however, that one of its dependencies, gloo, comes with the infamous PATENT addendum. You won't be using it unless you do distributed training tho.


gloo is only one of the three currently supported backends. One can easily switch to MPI, and pick an implementation that comes with a license you want.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: