With squared loss where it's easy for the loss to be zero, then yes, it will hav...

Schiphol · on Oct 20, 2020

Thanks. I guess my worry was that, once you are doing extremely well and your loss is very low, gradients are no longer independent, and will tend to go mostly up. Is this wrong?

joshribakoff · on Oct 20, 2020

I’m an ML newb but I think this would be true only of a converged model. Your model could always technically diverge in another epoch if learning rate is high enough and you process a batch of extreme outliers

Even then, you may have still converged on a local optimum which was the take away I got from the article