The idea is that FM is a way to learn the polynomial kernel, by representing each high-order term as a low dimensional dot-product. It improves generalization error by learning a better polynomial kernel, because terms in the kernel are learnt jointly instead of separately, which helps a lot on the sparse combinations of features.