For small filter sizes, convolution is going to be faster than an FFT approach. Plus, correct me if I'm wrong, but you need to perform a convolution for every output pixel where the filter kernel is different for each convolution (Sampling the lanczos filter at different points depending on the resample ratio), which would really slow down an FFT approach.
Beyond the mismatched signal size, the FFT approach creates circular convolutions, which is not what you want in images. You'd need windowing or a larger effective signal size, and pay the cost of conversions back and forth.
It's a bit of a misnomer to talk about a distinction between convolution and FFT. By the convolution theorem, the two are mathematically equivalent. In addition, on paper, FFT-based convolution scales much better than traditional convolution because it reduces the complexity from O(n^2) to O(n log n).
I haven't done the benchmarks for a fully optimised implementation, but comparing naive implementations you can easily tell that FFT-based is much faster (even with all of the tricks that MATLAB does to optimise sparse matrix operations).