> If that's the case the winning move is get the bytes, and convert them to UTF8
That would require knowing the original encoding.
> or just process it as bytes
As long as the APIs you have to use take bytes and not u8 strings.
> Modern systems should be able to convert various encodings at GiB/s into UTF8.
They might even guess the correct source encoding a quarter of the time and it will break any legacy application that still has to interact with the data. I guess that would be a win-win situation for Rust.
Yes, you can try to automatically guess the wrong encoding based on statistics that only work some of the time when given large amounts of free flowing text.
That would require knowing the original encoding.
> or just process it as bytes
As long as the APIs you have to use take bytes and not u8 strings.
> Modern systems should be able to convert various encodings at GiB/s into UTF8.
They might even guess the correct source encoding a quarter of the time and it will break any legacy application that still has to interact with the data. I guess that would be a win-win situation for Rust.