Who decides what is the correct information to learn? What will prevent a bad actor from providing subject material that teaches people to bring harm to themselves or others. Post Traumatic Stress Disorder sounds, at least to the layman, as this very design pattern, but obviously reinforces undesirable subjects.
It sounds like you have grossly misunderstood what this paper is about. Like, in the order of several scientific disciplines wrong. You were probably confused by the authors using some of the commonly used terms, but nevertheless, this work is in the field of Machine Learning.
I can see healthy discussion abounds on HN. As I understand it, Machine Learning is aiming to create computer systems that can learn from a set of training data. The machine starts out like an infant, with no experience, and no knowledge, just sensors and memory. But like all computers, garbage in begets garbage out.
The machine/infant may receive stimuli through their transducers. Computers are provided the stimuli via digitized images, audio, or text, depending on type of learning system. Infants are provided stimuli via their 5 senses. They receive their images through their eyes. Their audio is received through their ears. Its probably a little early at the infant stage, but in a few years, they will receive text through their eyes as well. Lets leave Taste and Smell for another time.
What is traditionally considered good parentage consists of curating these stimuli. There are all sorts of dangers a child will encounter. Without the shepherd to mitigate these dangers, and reinforce their negative consequences, the child will almost certainly die in infancy. But past a few years, although they may still have several years of development left, once their basic needs of food and shelter are met, a toddler will learn whatever they are continuously exposed to, in accordance to how the people they respect react to the situation.
To the extent the child does not die, it accrues positive associations with stimuli that their parents approve of. If my parents are smiling and laughing, I'm going to associate the activity with happiness. If my parents are yelling and have an angry face, I'm going to associate that activity with anger. As I accrue these associations, I slowly begin to become more and more self sufficient.
However, notice how I didn't explain any singular activities. If my parents lacked the patience or resources to teach me, they might start sending me mixed messages. Perhaps I'm playing with legos one day, learning all of the positive things we all assume legos teach children. But my dad has a rough day at work, and comes home, steps on a lego. Now my dad is furious, screaming at me about my legos. This is now a negative association for legos. If I continue to accrue similar experiences, I'll likely have an irrational aversion to legos later in life.
So with as much snark as you can muster, could you please patronize me a little more, and correct any misunderstandings I still maintain about how machine learning is not analogous to developing human brains?
Not necessarily. At a minimum you need access to the sensory environment of the subject: Teens on Twitter are more easily radicalized when their timeline consists largely of terrorist propaganda or war front reporting on civilian casualties. Facebook has done experiments where they changed the sentiment of the timeline for a certain user and saw a significant sentiment change in future posts by that user.
Besides, the average human is not able to set a password, and their brains are open to all sorts of attacks. Cults, terrorist organizations, and multi-level marketing schemes abuse these weaknesses to get their followers to do things that may not be in their own best interest.
Hallucinations can be induced through LSD, or psylocibin, or just being in the middle East over the last century. Perhaps you should ask yourself if you've gotten good at parrying Russian troll accounts in the last couple of weeks, or if you've learned to question your own assumptions. My grandfather got his ptsd in Okinawa. My brother's friend got his in a statepark when he took too much LSD.
My original point is that cognitive behavioral therapy is a medium. It is just as good at creating addicts as it is helping them recover. Teenagers learning to put up with the downsides of cigarettes to gain their peers social proof is cognitive behavioral therapy. Its pretty successful too, if you happen to manufacture tobacco.
You may be interested in the field of adversarial reinforcement learning. In adversarial reinforcement learning, an agent operates in the presence of a destabilizing adversary that applies disturbance forces to its system.
See also the Adversarial Bandit:
> Another variant of the multi-armed bandit problem is called the adversarial bandit, first introduced by Auer and Cesa-Bianchi (1998). In this variant, at each iteration an agent chooses an arm and an adversary simultaneously chooses the payoff structure for each arm. This is one of the strongest generalizations of the bandit problem as it removes all assumptions of the distribution and a solution to the adversarial bandit problem is a generalized solution to the more specific bandit problems.
Good robust RL algorithms are able to learn in the presence of adversarial noise. Correct information is information that allows you to compress reality better. When an agent is able to compress reality better (has access to a better generalizing world model), it will be rewarded. Correct information is information that helps an agent better optimize its policy function.
You actually hit on an interesting angle of research, and you probably will be vindicated in the near future, when adversarial images (those that fool state-of-the-art image classifiers to fail), move to adversarial agents (those that fool other agents into making bad decisions). However, this research was not about multi-agent systems, though the opponents (those that shoot fireballs and try to kill the agent) can already be seen as adversaries to the agent's goal of staying alive longer.
To stay in our abstract mode of thinking, does this effectively kick off an arms race? Lets assume Bob has bad intentions, and wants to rule the world to benefit himself at the expense of others, and Alice has good intentions, and wants to improve living conditions for everyone around the world. If Bob has sufficiently larger data centers and greater overall throughput in his system, would it be accurate to say Bob will be able to always deduce and subsequently employ the "Trojan horse" which meets all of Alice's criteria of what an authorized user of her system must meet?
Yes. Though AI is already in an arms race (mostly US vs. China/Russia).
Likely: Future AI will be decentralized for exactly these reasons. We don't want a single bad actor to control it. Security agencies are now warning that Russia is building a large botnet in the case it needs to go to war, and wants to disable enemy infrastructure. The US has similar needs.
Well designed game theory makes it possible for adversaries to cooperate. So it is no guarantee that Alice is always susceptible to Bob's attacks. Cryptography provides methods that can't be attacked if properly implemented. Defense and offense also can have differing costs: It can be way (computationally) cheaper to create defenses for Alice, than it is to craft adversarial offenses for Bob.
Though the risk is real: Spam preceded spam-filters. There was a short period (in internet years) where spam was more effective than our methods to counter it. So intelligent self-modifying worms/viruses will probably precede intelligent self-learning anti-viruses.
We also see both inverse reinforcement learning (learning about the policy of another agent through observing its behavior), adversarial RL (forcing another trading bot to make unprofitable decisions), and computational arms-races (who has the lowest latency?) between High Frequency Trading firms.