I mean this in the nicest possible way (so please don't take offense), but I think you're missing what is being said there. The framework itself should have no impact, positive or negative, on model accuracy. That being said, it can be extremely challenging to reproduce results given the stochasticity of random batches and asynchronous updates. Furthermore, precisely specifying the methods of data augmentation can be tedious, thus the protocol is often only partially detailed in published work which further exacerbates the challenges of reproducing results.
We could introduce a simple rule to get rid of this kind of problem: if it can't be reproduced with the data supplied then it isn't true so you can't publish.