Learning from Scratch by Thinking Fast and Slow

nlperguiy · on Dec 28, 2017

The original paper.

The references in the paper paint a much clearer picture of where exactly the idea behind reinforcement learning with optimal, suboptimal, random oracles comes from. There are also mathematical proofs that these setups work.

I was quite shocked to not see [6, 16] references in any of the recent MCTS papers.

These references prove why the stuff works and show how well it works. But the whole field of imitation learning seems invisible to the deep RL papers. Don't have the faintest idea why.

The algorithm described is the ultimate generalized algorithm. If you have the expert policy the algorithm is learning completely supervised, if expert policy is suboptimal but the score (loss) is fully calculable the learned policy will outperform the reference policy, if expert policy is completely random the algorithm behaves as reinforcement learning.

What the paper at the top adds is the ability to improve the expert policy with the learned one simultaneously in unison and the math covered previously guarantees improvement.

adamweld · on Dec 28, 2017

And of course, the name is in reference to Daniel Kahneman's excellent research and book by the same title. One of the most influential pieces of literature I've had the pleasure of reading, everyone should read it.

https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow

signa11 · on Dec 28, 2017

> One of the most influential pieces of literature I've had the pleasure of reading, everyone should read it.

seriously ? i have had almost an opposite reaction to this, more in line with : https://jasoncollins.org/2016/06/29/re-reading-kahnemans-thi... (which was discussed here https://news.ycombinator.com/item?id=12030791)

marklgr · on Dec 28, 2017

I liked that book very much when it came out, but I'm now in the leery camp. Replication and the general quality of studies is a matter of concern, but even one of the core tenets of the book, namely system I vs system II, turns out to be a great oversimplification that can fool as much as it can help.

The thing is, Kahneman is likeable, he has a good reputation and books like these are pure candy for the audience who enjoy that kind of literature (myself included)--but that's also a good warning signal. Could it be too simple, too convenient and too satisfying to be true? How do you know when you fall in love with an idea/theory?

mtpn · on Dec 28, 2017

I'm more concerned with, when you discount the studies that have been shown to be poor, does the overall system 1/system 2 model become weaker? I think it does, a bit, but he takes pains to point out that it is just a shorthand and not a real thing, so we should already be skeptical of taking it too far? How about the thinking vs remembering selves? I think that holds up ok.

We're early in our understanding of human behavior. Many of our ideas are wrong. Everything is hard to test. But though the book has flaws and should be evaluated again (would love to see it rewritten), I still think there is usefulness in the models he suggested. And in behavioral economics in general. I think we're far away from having enough replicated studies to be really "sure" of anything, especially with this crisis, but moving in a reasonable direction.

hycaria · on Dec 28, 2017

Loosely related : I thought he distanciated himself from his older writings by now ?

sombremesa · on Dec 28, 2017

Good question. From: http://slatestarcodex.com/2014/12/12/beware-the-man-of-one-s...

  But the question remains: what happens when (like in most cases) you don’t have a funnel plot?

  I don’t have a good positive answer. I do have several good negative answers.

  Decrease your confidence about most things if you’re not sure that you’ve investigated every piece of evidence.

  Do not trust websites which are obviously biased (eg Free Republic, Daily Kos, Dr. Oz) when they tell you they’re going to give you “the state of the evidence” on a certain issue, even if the evidence seems very stately indeed. This goes double for any site that contains a list of “myths and facts about X”, quadruple for any site that uses phrases like “ingroup member uses actual FACTS to DEMOLISH the outgroup’s lies about Y”, and octuple for RationalWiki.

  Most important, even if someone gives you what seems like overwhelming evidence in favor of a certain point of view, don’t trust it until you’ve done a simple Google search to see if the opposite side has equally overwhelming evidence.

mcguire · on Dec 28, 2017

(As an aside, please don't use indentation for quoting. I like your points, but they're hard to read:

"But the question remains: what happens when (like in most cases) you don’t have a funnel plot?

"I don’t have a good positive answer. I do have several good negative answers.

"Decrease your confidence about most things if you’re not sure that you’ve investigated every piece of evidence.

"Do not trust websites which are obviously biased (eg Free Republic, Daily Kos, Dr. Oz) when they tell you they’re going to give you “the state of the evidence” on a certain issue, even if the evidence seems very stately indeed. This goes double for any site that contains a list of “myths and facts about X”, quadruple for any site that uses phrases like “ingroup member uses actual FACTS to DEMOLISH the outgroup’s lies about Y”, and octuple for RationalWiki.

"Most important, even if someone gives you what seems like overwhelming evidence in favor of a certain point of view, don’t trust it until you’ve done a simple Google search to see if the opposite side has equally overwhelming evidence."

interlingua7 · on Dec 28, 2017

I wonder wether. this algorithm could be explained using Bayesian Learning. How it solves tbe trade-off between exploitation and exploration.

jph00 · on Dec 28, 2017

FYI this is the paper that lays the key foundation for AlphaZero, which recently got a lot of attention for easily beating the earlier Go-winning algorithms without looking at human games, and then beat the best chess algorithm with 6 hours training.

oh_sigh · on Dec 28, 2017

Hours seems like the wrong metric to measure a (potentially) highly distributable training time. Has deepmind released something like how many floating point operations it took, or perhaps how many watts?

sanxiyn · on Dec 28, 2017

DeepMind released number of games played (Table S3). 44 million games for Chess, 24 million games for Shogi, 21 million games for Go.

sitkack · on Dec 28, 2017

> Repeated deep study gradually improves intuitions.

Cognition and metacognition. The highest form of knowing is knowing why. There is an easy to solution to most of this, ruthless application of the scientific method. Ruthless. Zero Ego. Blank Slate every time.

visarga · on Dec 28, 2017

Yes, I wholly agree. Let's say we make an AI agent and it creates a hypothesis. How is it going to test it, to make sure it is causal and not a mere correlation? By devising an experiment. So the agent needs to work like a scientist: propose idea, test idea, iterate. Even children learn about the world like that - they interact with the world trying out their ideas and seeing what works and what doesn't.

But when the agent doesn't have access to a simulator or a world where it can play, how could it understand the causal relations, and thus, be able to reason? The most important thing for the agent is access to experimentation, and that's why supervised learning (fixed dataset) is fundamentally limited by comparison to RL (environment based learning).