Of course, the problem with these "synthetic" datasets are when the translation models overfit to the problems in these multilingual sentence encoders/embedders.
Still, it lets you massively increase the amount of data you're training on, which is generally worth it.
Still, it lets you massively increase the amount of data you're training on, which is generally worth it.