> since less common words tend to be more important for meaning. Exactly. Errors...

lunixbochs · on Sept 10, 2021

You could weigh insertions by how much perplexity they add (sum), deletions by how much perplexity they remove (-sum), and replacements by how big the ppl difference is in the replaced word (abs(sum)). And report this as a 4-part score (combined mean, then separate i/d/r). Lower is better.

Theory being you don't want to add or remove confusing words, but common stop words are less of an issue.

I'm not sure how this interacts with a multi word replacement, where the new words together make sense but independently make no sense to the LM.