> Sort of. Hyperparameter tuning is indeed difficult, but in principle it can be...

> Sort of. Hyperparameter tuning is indeed difficult, but in principle it can be automated and there has been great work on this in the past few years.

There's far more to modification than just hyperparameter tuning; this is why people can keep turning out papers on the latest breakthrough. The gates of LSTMs are a non-trivial addition to RNNs. Attaching a differentiable stack or figuring out how to best do soft attention requires far more thought than just bashing at hyperparameters. CTC, dilated convolutions, residuals, extending auto-encoders to capture the data distribution were not trivial ideas. VGGNet and other conv nets are a very manual heavy design, differing significantly from vanilla MLPs and not something that could be automatically arrived at (at least not yet). Things like FractalNet or recursive tree based architectures differ greatly from vanilla networks.

> Even if the number of layers and convolution sizes and learning rates are totally different, the principles are the same.

Again, the underlying captured algorithms are very different. If you could look at the underlying source code these neural net programs represent, they'd vary at least as much from each other as the code for say a merge sort, insertion sort, binary search or an AVL tree would.

> This problem was solved in the past with temporal difference learning. Maybe there are technical reasons they didn't do that, I don't know.

The problem pointed out in the lecture, part of why it took so long to find a good Go policy, was that immediate moves differed so little from each other that it was too difficult to get any kind of signal without the brute force approach to decorrelating they took. There are probably better ways but that was the shortest delta available to progress.