Open source deep learning models that programmers can download and run first try

reader5000 · on June 16, 2017

There isnt really any math to deep learning other than the concept of a derivative which is taught in high school calculus. The reason deep learning papers seem mathy is people take network architectures and various elementary operations on them and try to express them symbolically in latex using summations and indexing-hell. For example the easy concept of "updating all the neurons in one layer based on the neurons in the previous layer and connecting weights" is expressed as matrix-vector multiplication for not really any apparent reason other than it is technically correct and makes for slicker notation, and I guess makes it easier to use APIs that compute gradients for you. Deep learning however is broadly an experimental science, which in many ways is the opposite of math as traditionally envisioned, in which great insights follow deductively from prior great insights. If you ask a basic question like "why should use 4 layers instead of 3?" there is no answer other than "4 works better". Similarly with gradient descent versus random search in weight space. There are many problem domains where random search is as good as any known hill-climbing heuristic search (like gradient descent). Why is GD so effective when learning image classifiers expressed as stacked weight sums? Who knows.

rs86 · on June 16, 2017

There is a clear theoretical reason for using 4 layers vs 3. It allows for more degrees of freedom which translates to a higher VC dimension. This implies numerous trade-offs in model behavior.

Besides this point, there is much more than simple derivatives in deep learning. For example regularization can yield quadratic programming problems. Different optimization algorithms can have tremendous impact on training time and model performance. Models can be quite sensitive to specific parameters that you can't just set at random.

More ingenious architectures like GAN also require some fairly technical thinking to get right. There is much more than image classification and vanilla NN or CNNs.

Houshalter · on June 16, 2017

>There is a clear theoretical reason for using 4 layers vs 3. It allows for more degrees of freedom which translates to a higher VC dimension.

But then why does using 5 layers work worse than 4? Your theory is no good at predicting what the hyperparameters should be. The only way to find the correct hyperparameters is through empirical search.

>there is much more than simple derivatives in deep learning. For example regularization can yield quadratic programming problems. Different optimization algorithms can have tremendous impact on training time and model performance.

All these concepts are fairly simple also and can be expressed with little math. Additionally, a casual user doesn't need to have a deep understanding of them and the library will usually take care of it. Any more than a programmer needs to have a deep understanding of how an optimizing compiler works.

>More ingenious architectures like GAN also require some fairly technical thinking to get right.

The idea of using NNs to trick each other, is also fairly simple. It doesn't even involve any math.

drewbuschhorn · on June 16, 2017

As someone who got as far as diffeq in college math, and is working his way through fast.ai right now, the impression I get is of a field that's at the start of formalization. It's like they've got the basic operations like addition and subtraction, but multiplication is still on the horizon. Or like the early days of calculus when some mathematicians called it black magic.

lawless123 · on June 16, 2017

Well neural nets have been called a Dark Art in the past, though that seems to be changing now.

return0 · on June 16, 2017

indeed the most unfamiliar/off-putting part might be the matrix formulation of the designs (and the matching of dimensions), which is not even useful if you are trying to implement a toy example in programming. But you can equally well understand backpropagation by following the updating of a single weight, which is much more intuitive.

The other thing is the unfortunate/misleading/atrocious jargon that has been adopted.

davedx · on June 16, 2017

Using matrices to perform the calculations is an optimization over doing a bunch of for loops. This vectorization results in faster code within higher level languages and on certain hardware platforms (SIMD). It's nothing to do with "slicker notation", although having written gradient descent with for loops and matrix operations, the vectorized version is simpler and cleaner to read in my opinion.

Houshalter · on June 16, 2017

He's not complaining about using vectorization in code. The problem is papers and even explanations targeted at non-experts, often use obfuscated math in place of clear explanations. I've complained about this before here: https://news.ycombinator.com/item?id=13953530

Mathematical notation is basically a programming language. A programming language with weird symbols you can't type to search for, single letter variable names for everything, and no comments. And it's written by programmers that are obsessed with fitting everything into a simple line and making it as small as possible, no matter how difficult it is to read. Any programmer understands this is incredibly bad practice. And even if parse every step and perfectly follow what the code is doing, without explanation, it's pretty difficult to figure out why.

gnaritas · on June 16, 2017

> Mathematical notation is basically a programming language.

A very bad one that can only be executed by brains with the requisite existing historical knowledge; in fact it's more like bad pseudo-code that lacks the explicitness necessary to translate into actual instructions. It's basically condensed jargon intended for the already converted.

It'd probably be vastly easier to teach math with an actual programming language than with traditional notation. Scheme would be ideal for this.

davedx · on June 17, 2017

OK, I see what you're saying. I think you have the same issue with "real" programming languages too. If you compare some very concise Clojure or Scala code with the equivalent in Java, it can be quite hard to understand if you're not very familiar with the language. But I wouldn't necessarily say it's "incredibly bad practice". A Scala programmer can write concise and elegant code that to another Scala programmer is actually faster to understand because of that conciseness. Whereas the same code written with for loops and class method calls and all the boilerplate in Java would take more studying to filter out the low level constructions.

It's about the level of abstraction. And yeah if you don't understand the notation or syntax at the level of abstraction you're studying, it will be very hard.

(FWIW I find Scala code quite hard to understand sometimes, but I also find the more I know about the language, the more comprehensible it gets).

Houshalter · on June 17, 2017

It's not necessarily the conciseness that's a problem. Using foreach instead of a full for loop is one thing. What I'm complaining about is code in place of an explanation. E.g. imagine coming across some nasty piece of code like this: https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overv...

It doesn't matter how familiar you are with the language. Without an explanation of what the hell is going on, just looking at the code is useless.

davedx · on June 18, 2017

Now we're talking about documentation. You are correct, no code is conpletely self documenting. But that Quake code is very low level, the opposite of what I think the grandparent doesn't like (very high level abstract notation)

cr0sh · on June 16, 2017

I think OP means vector notation in the research papers, not using vector operations and representations at the code level.

That said, it can be useful for the beginner to implement a basic NN library from the ground up, to understand how the vector processing works, as well as what is going on step-wise with backprop and such.

Once that is fully understood, the next step of utilizing true vector processing libraries can be taken, and so on - eventually culminating in using and understanding libraries like TensorFlow.

Having the background of the lower levels gives you an appreciation and even insights when you transition to higher level frameworks.

That's just my opinion, though.

return0 · on June 16, 2017

This has nothing to do with understanding backpropagation (which (correct me if i m wrong) is really the core of DL). In fact in the old days backpropagation was all about "propagating the deltas" and nothing about vectorizing.

nahumfarchi · on June 16, 2017

That's interesting, I always thought that the graphical explanation was elaborate and confusing since a NN is just a bunch of matrices with non-linearities inbetween. To each his own I guess.

I definitely agree though that it's more of an experimental science at the moment.

return0 · on June 16, 2017

OTOH thinking in matrices only may be limiting, as there are potential designs one might want to try (brain-inspired for example) that cant be expressed in matrix operations.

daddyo · on June 16, 2017

If we could calculate the perfect model architecture for all unseen data, without relying on evaluation/experimentation/heuristics, then we'd have effectively solved the halting problem.

Mathematically, closest to that would be Hilbert's program.

Though neural nets can paint like Van Gogh nowadays, asking them to come up with Hilbert's program may be a bit too much of an ask. Yet I would not deeply mind if researchers would revisit papers like http://www.ics.uci.edu/~rickl/publications/1996-icml.pdf "On the Learnability of the Uncomputable".

_nx010_ · on June 16, 2017

They use matrices for computational efficiency. That's why linear algebra (along with diff equations and probability theory) is one of the prerequisites for any non-mooc machine learning course.

Houshalter · on June 16, 2017

I've programmed neural networks without knowing any linear algebra. When I needed to figure out how to use vector operations for speed, it took like 5 minutes to search for matrix multiplication on wikipedia. You can't get by without even knowing that, as element wise operations can do everything just as fast.

shoshin23 · on June 16, 2017

I would say that this title is misleading. A lot of what is presented there needs a strong grasp of deep learning(and the other underlying concepts behind them.), without which all you'll do is load the examples on Xcode and run them.

Moreover, I would probably encourage people to read examples of Tensorflow or Caffe2 running on iOS rather than something like Forge. Forge is an interesting project but won't really help you if you don't have a clue about MPS or Deep Learning.

itg · on June 16, 2017

The original title, which is also the title of the repo, was much more accurate. The author mentions these are models to download and start playing with right away, not a set of repositories to help you learn deep learning.

sctb · on June 16, 2017

OK, we've updated the title from the (slightly edited) repo description of “Examples to get started with Deep Learning without learning any of the math” to this phrase from the description.

solomatov · on June 16, 2017

It's a bad idea to learn deep learning without learning the math.

gabrielgoh · on June 16, 2017

I'm almost completing my phd in math, and as much as it pains me to say this, I think a great amount of deep learning can be accessible without "the math". And as tempting as it is to tell everyone to pay their mathematical dues by carrying the proverbial buckets of water up the steps, you can get by very well in the field with only the very basic basics of mathematical knowledge.

This is how i'd describe it. Deep learning is a set of tinker toys. Lego blocks if you will that you can sculpt with data into some very interesting models. Its an art, where the brushstrokes are matrices. Place an attention module here, and a convolution net there. And throw in a tensor with a softmax, and viola.

Now I love math. and part of me really wants to see deep learning become a mathematical discipline. Deep in the backwaters there are parts of deep learning involve some math (think variational inference, bayesian models, etc). And I do want deep learning to be about condition numbers and combinatorics. But if you want to be perfectly honest with a newbie in the field, if you want to get your feet wet in deep learning, don't waste 3 months on a class on advanced optimization or measure theory or probability. Just dive in

nilkn · on June 16, 2017

I pretty much agree with you. I have a bachelor's degree in pure mathematics from a top 15 university -- I even published research as an undergraduate. Deep learning strikes me as an experimental science and even a subset of ordinary programming more than it does a subset of mathematics or statistics or statistical learning. I've read the entirety of the Deep Learning book by Goodfellow et. al. It's a fantastic book, but I was shocked at the lack of robust theory and the relative simplicity of the math. Even an introductory analysis textbook like Rudin is considerably harder, and Rudin is regularly tackled by undergraduates at any highly ranked university.

The only reason to fret about this in my opinion is if you're a PhD machine learning engineer who doesn't want the field to open up to non-PhDs. I think data scientists are used to being able to say, "hey, if you don't have a PhD, you really can't do or understand what I do" -- deep learning represents potentially a huge culture shock to that attitude. But even the Google Brain research team has some non-PhDs now.

I do think deep learning practitioners should learn the math. I just don't think there's actually that much math to learn. Certainly, if you read through the TensorFlow MNIST tutorial and you have no idea what cross-entropy is and you don't understand what the softmax layer is for, you need to go back to the basics. But these are concepts that anyone with any reasonable engineering degree can pick up relatively quickly.

As an example, I submit the five articles on Distill, a new online machine learning journal:

http://distill.pub/

Notice that only the first has any real math, and even there the math is just not very advanced -- it's undergraduate-level.

minkzilla · on June 17, 2017

Do you have any recommendations for books to get into deep learning? I want some theory and math, but not too much because I do not have that much mathamitical background. I have only gone through calculus 2. (I will be taking calc 3 this coming semester). I am a computer science major.

I have dabbled in writing a super simple neural network to solve the MNIST. Using an example written in python and porting it to go, so that I couldn't copy and paste. I had to see what each step did. It very rapidly went to about 30% accuracy and stuck there. So I know I did something wrong. But I abandoned it after not being able to figure out what.

nilkn · on June 22, 2017

Once you take calculus 3, I'd recommend diving right into the Deep Learning book: http://www.deeplearningbook.org/

It's definitely the best reference on the subject. With only calculus 3 under your belt the math won't be trivial, but it should overall be fairly approachable and certainly much more so than something like "The Elements of Statistical Learning".

gabrielgoh · on June 16, 2017

hey, I authored the first article :D

daddyo · on June 16, 2017

Then again, you probably learned how to sort a list in Python, before you heard about TimSort. Going the other way around is commendable, but not necessary.

If a student wants to learn how to play the guitar, you show them 3 chords so they can play Bob Marley or Oasis.

You don't require them to first study consonance, dissonance, rhythm, melody, timbre, dynamics, articulation, texture, form, expression, notation, song writing, Schenkerian analysis, harmonic identity, semiotics, and musical set theory.

Someone who can play the guitar with a passion, can be taught to learn musical notation. The other way around is not guaranteed.

Your suggestion is not necessarily bad: It's good to learn the maths about the Wasserstein metric if you are using GAN's. But for effective teaching your suggestion is archaic, and part of the mindset that makes student's eyes glaze over when being taught mathematics. Can you point to a success story of a student to neural network researcher that did not start with a practical application?

whytaka · on June 16, 2017

I understand the chain-rule. I understand using the derivative to minimize error. What I want to know is, what math is applied where to architect better models? Preferably with examples.

Retric · on June 16, 2017

I disagree, now prove your point.

return0 · on June 16, 2017

Question: What is DL without the math? Network design? Stacking nodes in a graph in arbitrary ways? The days of handcracted design are numbered anyways, so learning DL without the math is not a good long term investment.

RhodesianHunter · on June 16, 2017

Disagree. We will build abstraction layers on top of the best tools as we always have. You don't need to know assembly to be a programmer any more.

solomatov · on June 16, 2017

Your deep learning models don't always work as expected. In such cases you need to debug it. Understanding how models works internally is required for debugging them.

joshvm · on June 16, 2017

I would wager that a lot of people who "do deep learning" have absolutely no idea about the models they're using.

Hyperparameter optimisation is basically a fudge right now - you try everything and see what works. Even the research groups who came up with the standard network stacks, like VGG, basically lucked out and found an architecture that worked, then tried several variants and found one that worked better. DL papers are full of handwaving speculation about why particular networks perform better than others, but right now it's just that: highly educated speculation.

This isn't limited to deep learning. If you want to try any kind of machine learning, it's totally reasonable to throw different fitting functions at your problem to see which one works best. Unless you have an unusually clear problem category, it's rarely possible to say at the outset that "This problem would best be solved with method <X>". A counter here would be that if you need to classify images, you should almost certainly use a convnet.

You need some understanding about why things might be going wrong, e.g. your loss isn't moving -> crank up the learning rate. You're seeing nans? Probably your learning rate is too high. But that doesn't really need any serious maths to understand. You can get by quite well by figuring out empirical rules.

I'm not arguing that you shouldn't learn the maths, it's a wise idea to, but many people use deep learning models without knowing how backpropagation works for instance.

rrggrr · on June 16, 2017

Gosh I wish this existed for Python.

jcl · on June 16, 2017

Could you clarify? Several of the examples are Python projects (with and without Tensorflow), and others are apps consuming models that probably came from Tensorflow.

shpx · on June 16, 2017

https://github.com/fchollet/keras/tree/master/examples

amelius · on June 16, 2017

So sad to see the divide between iOS and Android platforms.

Half of this stuff I can't run.