> If that's the case the winning move is get the bytes, and convert them to UTF8...

Ygg2 · 2024-06-15T13:26:27.000000Z

> That would require knowing the original encoding.

If you don't know that's one more reason to get bytes and try to figure out encoding. Usually using lib like encodings.rs or WTF8.rs

> As long as the APIs you have to use take bytes and not utf8 strings.

You can convert one into the other, albeit converting to str requires a check.

dralley · 2024-06-15T01:19:58.000000Z

>That would require knowing the original encoding.

Or just use a library that can detect the encoding, and spit out utf-8. There's several of those.

josefx · 2024-06-15T13:16:20.000000Z

Yes, you can try to automatically guess the wrong encoding based on statistics that only work some of the time when given large amounts of free flowing text.