Yes! Perplexity is a great idea. Although you could technically have a low perplexity prediction that is not similar to the ground truth transcription.
CER is definitely more granular. There are papers that basically count Deletions, for example, as 0.5(D) when calculating WER - since they consider Deletions "less bad", but if these weights aren't standardized then WER scores will be super hard to compare.
Personally I think some metric including some type of perplexity is the way to go.
CER is definitely more granular. There are papers that basically count Deletions, for example, as 0.5(D) when calculating WER - since they consider Deletions "less bad", but if these weights aren't standardized then WER scores will be super hard to compare.
Personally I think some metric including some type of perplexity is the way to go.