Of course, the problem with these "synthetic" datasets are when the translation ...

whimsicalism on Oct 23, 2020 | parent | context | favorite | on: First AI model that translates 100 languages witho...

Of course, the problem with these "synthetic" datasets are when the translation models overfit to the problems in these multilingual sentence encoders/embedders.

Still, it lets you massively increase the amount of data you're training on, which is generally worth it.