As I mentioned in another comment, the library just allows the models to be run in the browser. The models generally give the same outputs as if they were run with their PyTorch equivalents, so, the quality can (for the most part) be blamed on the original model.
Also, remember to play around with generation parameters. Some tasks like code completion and speech-to-text work best with greedy sampling (sample=false, top_k=0), while others like text generation work best with random sampling (sample=true, top_k>0)
As I mentioned in another comment, the library just allows the models to be run in the browser. The models generally give the same outputs as if they were run with their PyTorch equivalents, so, the quality can (for the most part) be blamed on the original model.
Also, remember to play around with generation parameters. Some tasks like code completion and speech-to-text work best with greedy sampling (sample=false, top_k=0), while others like text generation work best with random sampling (sample=true, top_k>0)