In very large-scale sparse settings, the optimization strategy is tightly couple...

srean · on Sept 9, 2015

Thanks for replying in spite of the fact that the audience would have moved to other threads by now. I am totally with you in the claim that to deploy a solution one has to be fully cognizant of its optimization theoretic implications. What I meant by orthogonal is the the properties of the model does not care what algorithm was used to optimize provided you reach an unique optimizer. Here it is tricky to make the same claim because although convex in a set of variables, its not jointly convex.

Where I am not with you is in the hint that kernels have feature maps that are necessarily opaque. In fact inhomogeneous poly kernels ar great examples where the feature map is known in closed form. Although such a map is available it is not always efficient to use it. But what it lets you do is optimize the weights of each dimension on tje mapped feature space but executed in the native space. In fact, and I am greatly tickled by the coincidence, I had sent a mail to colleagues where I suggested a form where instead of the standard Euclidean dot product in the poly kernel is replaced by a convex combination of other favourite kernel of choice. The training algo remains almost the same. I was not aware of this piece of work but the mechanism is pretty much the same (little more general. Essentially replaces the std dot product that appear in the kernel expression by other kernel expressions. If you are thinking recursion now well that's intended) and can be pushed further.