> Let’s say you want to improve on a complex process where the physics is highly...

> Let’s say you want to improve on a complex process where the physics is highly approximated (i.e., a “spherical cow” situation); you have a choice to input the data into a deep network that will (hopefully) output the desired result or you can train the network to find the correction in the approximate result. The latter method will almost certainly outperform the former.

This aligns with my experience (computational chem PhD). When applying a strong, general-purpose mathematical patch to an existing model, use as much of the existing model as possible. Otherwise the patch will have a hard time fitting, and maybe be worse than what you started with. Philosophically, this also comports with my thinking (it's the modeling equivalent of Chesterton's Fence https://en.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fence).