This works for an architecture which has been well tuned and studied before, lik...

This works for an architecture which has been well tuned and studied before, like LSTM or Transformer.

Once you do research on the model, testing out things, it often tends to become such kwarg monster in many frameworks.

Having everything (relevant) in one file (even in the config file itself with hyper params) allows you to copy the file for every experiment and modify it inplace. This avoids the kwargs mess. But then the config files are very complex, and can become messy in other ways (esp for research projects). Example: https://github.com/rwth-i6/returnn-experiments/blob/master/2...

Such approach makes it much more flexible and does not mess with the baseline code. As you say, it's more like an evolutionary DNA-like approach, where you then tend to do crossovers with other evolved good-performing configs, etc.