I remember a Nature paper where they did essentially this (I can't remember enough to look it up quickly). There were two parts to the model, one of which was a simpler "transformation" part involving little optimization per se, the other of which was a more traditional DL model.