In my experience with data quality management, manual translation of these edge cases is not pleasant. Yet it can be very valuable. It's a bit like "online learning" in machine learning - each time an error is found, you provide the correct answer. Yes, you might end up with a long array of phrases/regexes to check against. However, it scales just right for the amount of data you have and provides high quality results.