MNIST for ML Beginners: The Bayesian Way

Eliezer · on June 14, 2017

Never use predicting stock prices as an example of anything. Pick literally anything other than stock prices to display confidence bounds. This helpful example diagram violates the EMH even worse by showing a large predictable directional change. You might as well illustrate physics with a chart of bowling balls falling upward.

kmangutov · on June 14, 2017

Could you elaborate why predicting stock prices is an invalid example?

fnbr · on June 14, 2017

It's almost impossible, for two reasons:

1) It's incredibly difficult to create strategies that work because there are thousands of highly trained, highly paid people working to create strategies.

2) Any strategy that you do create will self-correct over time and become useless, as strategies find examples where the market has misjudged prices; by exploiting the strategy, the market should, over time, stop misjudging the prices.

The "EMH" that the parent is referring to is the Efficient Market Hypothesis, which states that the market price is the "correct" price, and that it is impossible to predict a better price, as the current price incorporates all possible information.

I, personally, don't subscribe to that version of the EMH; I believe the one implied by 1), which is that it's difficult to compete with the thousands of math PhDs working on Wall Street.

panarky · on June 14, 2017

> It's almost impossible ...

You'd think so, and yet ML is transforming investment and trading strategies.

From The Economist:

  Castle Ridge Asset Management, a Toronto-based upstart, has achieved
  annual average returns of 32% since its founding in 2013. It
  uses a sophisticated machine-learning system, like those used
  to model evolutionary biology, to make investment decisions.

  It is so sensitive, claims the firm’s chief executive, Adrian
  de Valois-Franklin, that it picked up 24 acquisitions before
  they were even announced.

Source: http://www.economist.com/news/finance-and-economics/21722685...

Eliezer · on June 16, 2017

Does anyone believe in the straw form of the EMH? Not I, surely. Standard weak form suffices to imply that nobody should be showing an ML-derived chart showing an expected tripling of price.

harel · on June 14, 2017

I wish someone would come up with a tutorial for ML for the mathematically challenged. Something more practical and less theoretical.

srean · on June 14, 2017

I understand and am sympathetic towards the sentiment, but honestly, its a bit like asking for a tutorial on swimming that does not involve water.

Something like that can be written, but it wont be very useful. Most of the simple stuff, the non-mathematical parts would get automated away. You don't want to be in a position where you are competing with someone's commodity script (sometimes just a for loop), unless the situation calls for desperate measures.

A bodybuilder got to lift them weights.

One genuine scenario could be that you personally do not do ML but want to evaluate/understand what your hires are doing. Even then, its hard to do avoid the math if you want to do a semi-decent job.

harel · on June 14, 2017

I get it, and I've actually used the same metaphor just a couple days ago. What I mean though, is that the maths are OK, and although the heavy notation goes over my head (as I'm not academic), I understand the ideas. But I'm lacking some practical examples of how this is put to use. I understand how neural networks work for example, but I do not understand the maths behind how they work. There was (is?) a site called ai-junkie which used (does?) have very practical and layman docs about A.I. before ML became the buzz word du jour.

jclos · on June 14, 2017

Honestly, the best thing you can do is try to implement your own shitty neural net with only Python + Numpy, from scratch, with only a basic understanding of the math. It will make most of the math very concrete very fast.

srean · on June 14, 2017

Cannot upvote this enough.

I think this is the only way to really grok backpropagation. The hours of staring at the update formula till your eyes glaze over the subscripts and superscripts and the summations would not give you as good an understanding as implementing a toy neural net with just a single hidden layer. Its actually a whole lot easier than parsing those low-level notation. It can be done better with high level notation but then you would need familiarity with the relevant mathematical abstractions.

annnnd · on June 15, 2017

I respectfully disagree. Of course you need to really understand backpropagation for any advanced stuff, but it is easier to ignore it at the beginning. Take Keras container, copy some MNIST example from somewhere and tweak it. Then, when you have a general feel of how training NNs works, gradually learn about each concept - BP should of course be one of the first. By the time you get into math stuff it will probably make much more sense because you will understand how it applies to your case.

But I guess the approach depends on how you best learn, so there is no wrong answer. Just jump in!

jclos · on June 14, 2017

Yes, once you've dealt with implementing it, the notation just makes sense as the most compact way of formalizing what you just did.

harel · on June 14, 2017

Good idea! I'll give it a go

RandyRanderson · on June 14, 2017

Don't bother. The guy that wrote Encog has several yt tutorials on how he did it and there are hundreds of others.

I learned nothing from impl my own other than why all NN libs break when you input values > 1.

Any decent NN lib will be way better that whatever you could write in a week full time.

Pick your fav lang, find the most used NN lib and try a kaggle competition.

If you can get to about rank 50% your training is complete.

doktrin · on June 14, 2017

Naive question : is this strategy equally effective for all ML models?

kgwgk · on June 14, 2017

This strategy is equally effective for most things in life.

doktrin · on June 14, 2017

> This strategy is equally effective for most things in life.

Not if we understand 'effective' to also mean 'cost & time effective'

You'd learn a lot by building a nuclear reactor from first principles, but it's not the most effective way to develop an intuition about how one operates.

yorwba · on June 14, 2017

I think you want to talk about whether the strategy is efficient, which I agree it is not. However, if you already tried understanding several general descriptions and it didn't work out, implementing something from scratch is an inefficient but effective way of really grokking it.

doktrin · on June 14, 2017

> I think you want to talk about whether the strategy is efficient

effective and efficient are synonyms in this context

http://www.thesaurus.com/browse/efficient?s=t

http://www.dictionary.com/browse/efficient?s=t

http://www.dictionary.com/browse/effective?s=t

jclos · on June 14, 2017

Some require a lot more understanding than others (for instance I'm not sure I'd be comfortable implementing a kernelized SVM from scratch, even though intuitively I know how it works) but basic neural networks (simple perceptron, simple feedforward network, simple recurrent network) are quite easy to grasp, and backpropagation is very intuitive. You can even use finite difference approximation [1] to bypass the derivatives when you're starting (at the cost of some efficiency) and figure out the rest as you go.

[1] https://en.wikipedia.org/wiki/Finite_difference

srean · on June 14, 2017

Oh I do understand you. The manual for driving a car has to be different from the manual for designing a car. I believe you are looking for a manual to drive the car where you don't really need a lot of visibility into the inner workings.

A problem is that ML is not quite as mature as a car yet, so the driving manuals will be a bit on the thinner / shallower side.

> I understand how neural networks work

Quickly write that down please, that would do the world a favor. Researchers are still grappling with the question 'why the hell does this freaking thing work as well as it does, when it does'.

harel · on June 14, 2017

Haha, I meant, the practical idea behind their usage. The example of OCR via a NN was very good in drilling that concept into my head (many inputs leading down to output). The HOW (in capitals) they work bit - I'm not going there. This thread here got me searching and I'm reading through the basic TensorFlow docs. That's, so far, sinking in.

candiodari · on June 15, 2017

Given that the difference between any 2 ML implementations is going to be 2 things, everything else is going to be eliminated by a hyperparameter search. Neither of these require more than basic mathematical knowledge. Certainly nothing that wouldn't be covered in an undergraduate statistics course.

And of course, it's loads and loads of work.

What makes the difference in success for machine learning projects (this is assuming you have some process to avoid screwups in place):

1) quality of how the data is input into the network. You can almost never "usefully" put raw data in front of a neural network. Call it "data represantation" (for instance, for comparing stock prices do a neural net DNN predictor on the prices. Now do the same on the deltas (today - yesterday). Works 1000x better (still not good enough, but very clear difference))

It may not compare to the feature engineering of SVMs, but it's very present (it kinda does imho, but people tend to get very defensive when suggesting that)

2) your cost/loss function. There are tricks like GANs which are simple in concept and avoid some of the issues, but even there you have your image comparator you're still going to need. You can massively improve image comparisons (e.g. mean square difference of the image scaled 4x4, times 1e12, plus same for 16x16 times 1e6, plus mean square difference at real res beats the crap out of just taking mean square difference. Very good results have also been obtained for MNIST by comparing sorted lists of black pixels, instead of actual images)

Many things depend on your cost/loss function, and you especially have the eternal problem : I want it to improve in 10 different ways. How do I balance those well into a single number ? Robot should grab the teddy bear, and shouldn't hit anything else. Those are easy to balance, but how would you balance grabbing the teddy bear at all versus not damaging it ? It will make a huge difference in whether your neural net converges at all.

davedx · on June 14, 2017

I'm currently following Andrew Ng's machine learning course on Coursera. I was initially a bit concerned because it's been a long time since I've had to do any serious math, but I think he strikes a really nice balance between enough math to understand what the algorithms do, and not so much that you get scared away by too much depth or too many concepts at once. The practical assignments also have this nice balance. I highly recommend it -- if you can remember the basics of algebra and calculus (I mean really the basics, like that if you differentiate the equation of a straight line then it gives you the tangent of the line), then you will probably not get too lost.

The programming assignments also give you that nice practical experience: you get training sets to code and run ML algorithms against, using a language (Octave) that has a nice mix of low/high level-ness.

I strongly recommend it.

jacquesm · on June 14, 2017

That's been done:

http://course.fast.ai

Can't recommend this enough.

drewbuschhorn · on June 14, 2017

Halfway through, it's really good. Also much more understandable since he shows you toy examples in Excel and gives you the workbooks to play with.

swframe2 · on June 14, 2017

Try "Neural Networks Demystified" by "Welch Labs" on youtube.

https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX...

putnam · on June 14, 2017

So, there's a lot of material out there but it's disjointed. If you get annoyed implementing your own neural net with only Python + Numpy, you might try some more complex examples just to immediately point at them and say "I ran this" and "It did that". (The article uses neural networks so I'm addressing that here even though your question and the article title use the much broader 'ML'.)

1) Brandon Rohrer, now Data Scientist at Facebook, has a few great talks, including one on Bayes' Theorem/Bayesian Inference - https://m.youtube.com/playlist?list=PLVZqlMpoM6kbaeySxhdtgQP...

2) When asking future data scientists what tutorials for ML/NNs they like, they have usually found http://machinelearningmastery.com/ through Google and swear by it.

3) Josh Gordon, Developer Advocate at Google, has some simple ML/DL videos up in a 'Recipes' playlist: https://m.youtube.com/playlist?list=PLOU2XLYxmsIIuiBfYad6rFY...

If you want to just step through other people's code, you can do that too. Disclaimer: I put the below list together and it's not for ML broadly but for DL. That said if you want to run some examples fast and see the output, a number of folks have made that work for you -

I for one was floored to find great iOS examples (admittedly now deprecated for iOS 11). But If you have an iPhone with Metal (5s and up) Matthijs Hollemans - who wrote the iOS Apprentice at Ray Wenderlich - has Inception, YOLO, and MobileNets pre-trained and ready to go using Xcode, and it's fun to watch them work on your phone - https://medium.com/@SamPutnam/deep-learning-download-and-run...

netvarun · on June 14, 2017

I can't recommend this series highly enough https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec...

maurits · on June 14, 2017

Andrew Ng has an excellent ML course on coursera [1] that is very light on math.

[1]: https://www.coursera.org/learn/machine-learning

tryitnow · on June 14, 2017

ML really requires more than a tutorial. Just bite the bullet and take Andrew Ng's ML class.

To understand ML you've got to have at least a basic understanding of the math. And it's really not that difficult, especially if you find the right class/book/professor/etc.

The problem is that there are a ton of terrible writers and instructors out there.

I think it's just as important to ignore the terrible stuff (pretty much any blog post on ML) as it is to learn from the good stuff (e.g. Ng's ML course).

Houshalter · on June 14, 2017

This is the most useful resource https://metacademy.org/

calvinh123 · on June 14, 2017

Worth pointing out that the mean field approximation used in this example means the posteriors are uncorrelated with each other - exactly what the author finds in the "joint posterior distribution" figure.

msimpson · on June 14, 2017

I wish people would stop using Raleway or implement a fix. This is what I see:

http://imgur.com/a/locAq

cromulen · on June 14, 2017

Can you elaborate what the issue is? The site looks nothing like that to me. I see much more contrast. http://imgur.com/g49XOip

I ask because I like and plan to use Raleway on my blog and would hate to have readability issues.

msimpson · on June 14, 2017

There is a known issue regarding Raleway due to aliasing in Chrome, see:

https://webmasters.stackexchange.com/questions/74672/renderi...

I hear claims it has been fixed, but I still see the issue in Chromium 58 on Linux.

Here's one bug which seems to have been tracking the issue with no resolution as of yet:

https://bugs.chromium.org/p/chromium/issues/detail?id=152304

You can see in the final comment from November of 2015 that the issue was, and from my experience still is, massively affecting Raleway:

https://bugs.chromium.org/p/chromium/issues/detail?id=152304...

EDIT: I just checked and this issue no longer affects me in Chrome 59 on Windows. Although I still have the issue on Linux.