Even directly for Latin alphabet languages you would typically want to know the language, both to reduce the charset (i.e. the distinctions between smudgy ã or â or ā or ä or á often are trivial if you know the language but may be quite hard to tell otherwise) and to use a proper language model for disambiguating individual character guesses.