In my experience with data quality management, manual translation of these edge ...

In my experience with data quality management, manual translation of these edge cases is not pleasant. Yet it can be very valuable. It's a bit like "online learning" in machine learning - each time an error is found, you provide the correct answer. Yes, you might end up with a long array of phrases/regexes to check against. However, it scales just right for the amount of data you have and provides high quality results.