What kind of overhead does gzip have? I'd be interested to know how many characters you could fit into 80 compressed characters. Some mapping of (2 byte?) unicode characters to 3ish lowercase letters could be effective. Is there any standard way like that?
This is an interesting question. I haven't researched the overheads.
Separately, I feel like if you account for grammatical rules, it is possible to eliminate certain "filler" words in a lossy fashion but add them back later based on grammatical rules. For example, you don't need to say "in mice", you could just say "mice", the meaning is obvious, and "in" could be fixed in post-processing at the client end.
You could also quite possibly eliminate all vowels and still reconstruct everything accurately.
> For example, you don't need to say "in mice", you could just say "mice", the meaning is obvious, and "in" could be fixed in post-processing at the client end.
> You could also quite possibly eliminate all vowels and still reconstruct everything accurately.
My guess is that 10 minutes after that is rolled out, someone will have found a collision that decompresses to some kind of dirty joke