i was using the automatic translation for Mongolian on facebook recently, and I couldn't make much sense out of the output. Interesting what kind of translation software was being used there.
Most of the models announced in papers can't be deployed in production because they are not optimised for inference efficiency. For example Google Translate the public service is not as good as the SOTA papers.
Just for research. Most research models are not used for anything. Imagine having to serve a model that requires 2 x 32GB GPU to billions of users.
Text to speech is also much worse in deployment compared to research. The recent research models have much better intonation.
GPT-3 is the worst offender here - it's so big that it becomes almost uneconomical to run, and certainly impossible to offer for free. (estimated requirements are 11 Tesla V100 GPUs)
They could have sold this as a service to those who need higher quality translation services; think about the wasted gpu time required to build the models. (Also laymen like me would think that all the effort is being wasted) also practical usage is also a kind of test, isn't it?
Between German and English one example that stuck with me was it confusing "farmer" and "builder" because in German, within compound words both of these map to "-bauer". Cue a number of auto-translated job adverts advertising positions as a "street farmer" and suchlike…