you dont need 256 codepoints so you can neatly represent an octet (whatever that is), you just need 2 bits. you can just stack as many diacritical marks you want on any glyph. either the renderer allows practically unlimited or it allows 1/none. in either case that's a vuln. what would be really earth shattering is what i was hoping this article was: a way to just embed "; rm -rf ~/" into text without it being rendered. you also definitely dont need rust for this unless you want to exclude 90% of the programmer population.
I think the Rust is more readable for bytemucking stuff than dynamic languages because the reader doesn't have to infer the byte widths, but for what it's worth the demo contains a TypeScript implementation: https://github.com/paulgb/emoji-encoder/blob/main/app/encodi...
An octet is a group of 8 bits. Today we normally use the word "byte" instead. The term is often used in older internet protocols and comes from an era where bytes were not necessarily 8 bits.