If you read the Google paper, you'll notice that they actually refer to this and...

If you read the Google paper, you'll notice that they actually refer to this and other work by Liu et al. The overall technique is the same, estimate the original camera path, calculate an optimal camera path, retarget the input frames to a crop window that fits the optimal path.

The primary difference seems to be estimation and calculation technique. Liu's work does a structure-from-motion reconstruction, ie: rebuild a 3d model of the original scene. Google's work uses something called pyramidal Lucas-Kanade to do 'feature tracking' instead. This is sort of localized reconstruction, it seems to only care about the viewport differences from frame to frame. They then feed it through some linear programming voodoo to get the best path.

I don't understand either well enough to say why one is better than the other, although I'd guess it's because Lucas-Kanade is temporally and physically localized, it's easier to farm out to a parallel cluster than an SfM technique.

There also seems to be a difference on the rear end of the technique, having feature detection allows them to add 'saliency' constraints, ie: retarget based on the inclusion of certain features, like a person's face. Again, the math is beyond my understanding, but it seems like this isn't part of Liu's work.