That's actually starting to happen. Deepmind built an AI that can learn to beat the best human players on many atari games. After just a few hours of playing the game and learning. And of course it uses all that advanced image recognition stuff. That has always been the hard part. The actual game playing part is just a simple reinforcement learning neural network that is put on top of it.
The reason it's AI is because it isn't specific to speech recognition. Deep learning is very general. The same algorithms work just as well at speech recognition, or translation, or controlling robots, etc. Image recognition is just the most popular application.
I've read elsewhere that they trained in whether the score went up. For it to really be a general game playing AI, it should be able to figure out the goals of games without scoring systems, such as adventure games.
That is, unfortunately, impossible. All AIs need some kind of reward function, an incentive to do things. Without that they have no reason to do anything.
Specifically the ones that require any sort of identification of state. It kicks butt in hand eye coordination tasks and it is awesome that it can learn those hand-eye tasks automatically, but higher order reasoning is obviously out. AI makes progress every year and when those progress in individual tasks are coalesced into a coherent single entity we will have what the average person calls intelligent.
Yes, but that's because the specific ANN the Deepmind researchers used doesn't have any state. IIRC it's given the last 2 or 3 frames as input. No doubt that makes the learning easier (I should mention that reinforcement learning is used, unlike most tasks ANNs are applied to). But recently stateful ANNs (typically based on Long Short-Term Memory (LSTM) networks, from 1997 already) are becoming more and more popular. I would like to see someone make another attempt at Atari games with such a stateful network; probably already done actually.
The reason it's AI is because it isn't specific to speech recognition. Deep learning is very general. The same algorithms work just as well at speech recognition, or translation, or controlling robots, etc. Image recognition is just the most popular application.