>I believe the AI strategy you describe is known as "simulated annealing".
Not necessarily. You could get much the same effect (the program learning by playing itself over and over) with many Reinforcement Learning algorithms (like TD learning, say).
There are major differences between Value Function (loosely, perceived_state -> perceived_long_term_reward map) based RL algorithms and algorithms that work only in the policy (loosely, the perceived-state -> chosen_action map) space like Simulated Annealing (or Genetic Algorithms). Barto and Sutton (somewhat loosely) use the term "evolutionary algorithms" to distinguish Value Function based algorithms from those that only manipulate policy
Not necessarily. You could get much the same effect (the program learning by playing itself over and over) with many Reinforcement Learning algorithms (like TD learning, say).
There are major differences between Value Function (loosely, perceived_state -> perceived_long_term_reward map) based RL algorithms and algorithms that work only in the policy (loosely, the perceived-state -> chosen_action map) space like Simulated Annealing (or Genetic Algorithms). Barto and Sutton (somewhat loosely) use the term "evolutionary algorithms" to distinguish Value Function based algorithms from those that only manipulate policy