This title is still a bit click baity because ultimately this problem has very little impact on any real world usage of stochastic gradient descent - and especially the more important ones, where getting infinitesimally close to any particular local minimum is specifically and totally not what people care about (and getting to a global minimum is something we've long known to give up on).
It is a theoretically interesting result about the PPAD complexity class itself. Not sure if it should be STOC best paper though, if it was nominated as best paper because of the weak link to machine learning I'd be a bit disappointed.
It is a theoretically interesting result about the PPAD complexity class itself. Not sure if it should be STOC best paper though, if it was nominated as best paper because of the weak link to machine learning I'd be a bit disappointed.