Oversimplifying and ignoring a lot of important details, the key idea proposed by the authors is that the brain's phasic dopamine system is a model-free reinforcement-learning system that learns to train the prefrontal cortex as a more efficient model-based reinforcement-learning sytem -- a form of meta-learning which the authors accurately refer to as meta-reinforcement learning.
The "Results" section provides compelling evidence that the authors might be on to something. The authors show and discuss the outcomes of six different kinds of computer experiments in which a (relatively simple) meta-reinforcement learning software system is shown to learn and behave in qualitatively similar ways as, for example, monkeys and rats in equivalent lab experiments.
One implication for products, if this work pans out, is that the systems which "adapt to you" (learn your voice, your schedule, what your face looks like) won't have to change the weights of their network. I believe this could lead to much better systems because adjusting the weights of an already-deployed network is dangerous - you could make the system perform poorly. It's also simpler for engineers because you can scrap all that logic and just deploy one RL network.
To simplify even further, would it be right to say that dopamine is not just a carrot being used to adjust weights but that the entire dopamine system actively learns how/when/why to use the carrot to effectively train?
if so, some interesting real world parallels i can see:
1/ this is why good teachers early on have such a profound effect. Teachers act as a dopamine system so good teachers teach meta learning via whatever they teach
2/ teaching the meta directly is difficult as the meta of the meta is too abstract. Which is why grammar/math may tend to feel out of touch. They are already the meta.
3/ drugs can mess up the dopamine system and throw the feedback loop out of whack. Even if you have the best resources, the “teacher” is now inept
Somewhat interesting: Bananas contain the ingredients necessary to produce dopamine.
I can see how this all plays out as a way for some apes to get some bananas. Even social structures, fairness and other things (I love this video - two monkeys getting unequal pay https://www.youtube.com/watch?v=meiU6TxysCg ) can be explained by a dopamine system that just likes to give us bananas and sex. And it's incentivized to conserve energy because this is better for survival (also called lazy) - interesting is that humans have some tendency to do stuff that don't necessarily is conserving energy. Maybe this is what makes us successful: Sometimes we invest excessive energy to try new things which happens to let us survive better through innovation (which leads to genetics that encode this behavior).
I like it, although it isn't necessarily what I call my meaning of life (I'm a dualist, because materialism is too damn dry). I like bananas, though (and yes, the role of bananas may be exaggerated here).
Thought long about it. From a materialistic standpoint there are only quantitative differences between a organism which has a sensor, some type of memory and an actuator and human beings, although they sure look different. Still following the same principles.
When I further simplify, we are all just energy (and matter, which is just a form of it) following some first principles hallucinating our consciousnesses trying to evade the eventual entropy that we'll reach nevertheless because this is how the universe works.
- - -
Cool fact is that animals are capable of rational thinking (crows drop nuts to break them approx. at the minimal height necessary to achieve that - optimal energy usage. I'm pretty sure they don't even realize that their brains calculate this based on their experience) and I'm sure that all people act rational w.r.t their training data (some outliers like traumatic events and other life circumstances that differ from the average just change some weights in a way that it looks irrational for outsiders). This is indeed difficult to defend because it depends on the semantics on the word "irrational" which is man-made after all.
https://www.youtube.com/watch?v=HRVGA9zxXzk (a bird which can identify itself in a mirror, simple self-awareness - I like the fact that the brains of birds are more efficient because they are space-constrained to be able to fly better)
(the brain as a neural net with meta-learning capabilities that tries to guess what happens next and what it should do next. Emotions and some pre-wiring based on genetics enables us that we don't start with a complete random brain structure because it gives us better surviving abilities if we're able to see and feel as soon as we get out of our mothers).
> and yes, the role of bananas may be exaggerated here
Tried to be funny. It seems I've failed. It's not about that banana.
Point was that our social structures can be explained by some first principles e.g. individuals try to acquire enough resources (what a trivial assertion, but I guess that is the point of first principles) and that we can explain the origin of our social structures with them.
Fun fact: It is not very common for monkeys to eat bananas in the wild - they simply don't have them [1].
Oversimplifying and ignoring a lot of important details, the key idea proposed by the authors is that the brain's phasic dopamine system is a model-free reinforcement-learning system that learns to train the prefrontal cortex as a more efficient model-based reinforcement-learning sytem -- a form of meta-learning which the authors accurately refer to as meta-reinforcement learning.
The "Results" section provides compelling evidence that the authors might be on to something. The authors show and discuss the outcomes of six different kinds of computer experiments in which a (relatively simple) meta-reinforcement learning software system is shown to learn and behave in qualitatively similar ways as, for example, monkeys and rats in equivalent lab experiments.
I'm still digesting the implications.
Highly recommended reading.