I think that's what the AlphaGo team did - they trained their agent against itse...

fallingfrog on March 14, 2016 | parent | context | favorite | on: Yann LeCun's comment on AlphaGo and true AI

I think that's what the AlphaGo team did - they trained their agent against itself, and it learned new moves not explicitly programmed in! With an evaluation function just saying ahead / not ahead.