Hacker News new | past | comments | ask | show | jobs | submit login
Smuggling arbitrary data through an emoji (paulbutler.org)
767 points by paulgb 9 days ago | hide | past | favorite | 195 comments





Oh this is just the tip of the iceberg when it comes to abusing Unicode! You can use a similar technique to this to overflow the buffer on loads of systems that accept Unicode strings. Normally it just produces an error and/or a crash but sometimes you get lucky and it'll do all sorts of fun things! :)

I remember doing penetration testing waaaaaay back in the day (before Python 3 existed) and using mere diacritics to turn a single character into many bytes that would then overflow the buffer of a back-end web server. This only ever caused it to crash (and usually auto-restart) but I could definitely see how this could be used to exploit certain systems/software with enough fiddling.


This was the premise of a Google CTF quals 2024 challenge ("encrypted runner").

Yeah. Zalgo text is a common test for input fields on websites. But it usually doesn't do anything interesting. Maybe an exception trigger on some database length limit. Doesn't typically even kill any processes. The exception is normally just in your thread. You can often trigger it just by disabling JS on even modern forms, but,, at best you're maybe leaking a bit of info if they left debug on and print the stack trace or a query. Another common slip-up is failing to count \n vs \r\n in text strings since JS usually usually counts a carriage return as 1 byte, but HTTP spec requires two.

unescape(encodeURIComponent("ç")).length is the quick and dirty way to do a JS byte length check. The \r\n thing can be done just by cleaning up the string before length counting.


Does Zalgo even work on HN? I've never thought of using it to test my systems, thank you. I've got some new testing to do tonight.

Edit: No, Zalgo doesn't work on HN. This comment itself was an experiment to try.


A few months ago I made a post which I (should've) named "Unicode codepoints that expand or contract when case is changed in UTF-8". A decent parser shouldn't have any issues with things like this, but software that makes bad Unicode assumptions might.

https://news.ycombinator.com/item?id=42014045


Sorry n00b here, can you explain more about this or how you did this? I feel like this is definitely a loophole that would be worth testing for.

This is cute but unnecessary - Unicode includes a massive range called PUA: the private use area. The codes in this range aren’t mapped to anything (and won’t be mapped to anything) and are for internal/custom use, not to be passed to external systems (for example, we use them in fish-shell to safely parse tokens into a string, turning an unescaped special character into just another Unicode code point in the string, but in the PUA area, then intercept that later in the pipeline).

You’re not supposed to expose them outside your api boundary but when you encounter them you are prescribed to pass them through as-is, and that’s what most systems and libraries do. It’s a clear potential exfiltration avenue, but given that most sane developers don’t know much more about Unicode other than “always use Unicode to avoid internationalization issues”, it’s often left wide open.


I just tested and private use characters render as boxes for me (󰀀), the point here was to encode them in a way that they are hidden and treated as "part of" another character when copy/pasting.

> the point here was to encode them in a way that they are hidden and treated as "part of" another character when copy/pasting

AKA "Steganography" for the curious ones: https://en.wikipedia.org/wiki/Steganography


Like when we used to encode the phone numbers of warez boards in GIFs.

On my Android phone,that displays "Go[][]" in the Google logo font.

The difference is that PUA characters are usually rendered in some way that is rather visible, whereas the variation selectors aren’t.

Context that some may be missing is that this was inspired by discussion surrounding the Open Heart Protocol submission:

https://news.ycombinator.com/item?id=42791378

People immediately began discussing the applications for criminal use given the constraint that only emoji are accepted by the API. So for that use case the PUA wouldn't be an option, you have to encode it in the emoji.


Isn't this more what the designated noncharacters are for, rather than the private-use area? Given how the private-use area sometimes gets for unofficial encodings of scripts not currently in Unicode (or for things like the Apple logo and such) I'd be worried about running into collisions with that if I used the PUA in such a way.

Note that designated noncharacters includes not only 0xFFFF and 0xFFFE, and not only the final two code points of every plane, but also an area in the middle of Arabic Presentation Forms that was at some point added to the list of noncharacters specifically so that there would be more noncharacters for people using them this way!


I'll be h󠄾󠅟󠅠󠅕󠄜󠄐󠅞󠅟󠄐󠅣󠅕󠅓󠅢󠅕󠅤󠅣󠄐󠅘󠅕󠅢󠅕onest, I pasted this comment in the provided decoder thinking no one could miss the point this badly and there was probably a hidden message inside it, but either you really did or this website is stripping them.

You can't invisibly watermark an arbitrary character (I did it to one above! If this website isn't stripping them, try it out in the provided decoder and you'll see) with unrecognized PUA characters, because it won't treat them as combining characters. You will cause separately rendered rendered placeholder-box characters to appear. Like this one:  (may not be a placeholder-box if you're privately-using the private use area yourself).


j󠄗󠅄󠅧󠅑󠅣󠄐󠅒󠅢󠅙󠅜󠅜󠅙󠅗󠄜󠄐󠅑󠅞󠅔󠄐󠅤󠅘󠅕󠄐󠅣󠅜󠅙󠅤󠅘󠅩󠄐󠅤󠅟󠅦󠅕󠅣󠄴󠅙󠅔󠄐󠅗󠅩󠅢󠅕󠄐󠅑󠅞󠅔󠄐󠅗󠅙󠅝󠅒󠅜󠅕󠄐󠅙󠅞󠄐󠅤󠅘󠅕󠄐󠅧󠅑󠅒󠅕󠄫󠄱󠅜󠅜󠄐󠅝󠅙󠅝󠅣󠅩󠄐󠅧󠅕󠅢󠅕󠄐󠅤󠅘󠅕󠄐󠅒󠅟󠅢󠅟󠅗󠅟󠅦󠅕󠅣󠄜󠄱󠅞󠅔󠄐󠅤󠅘󠅕󠄐󠅝󠅟󠅝󠅕󠄐󠅢󠅑󠅤󠅘󠅣󠄐󠅟󠅥󠅤󠅗󠅢󠅑󠅒󠅕󠄞󠄒󠄲󠅕󠅧󠅑󠅢󠅕󠄐󠅤󠅘󠅕󠄐󠄺󠅑󠅒󠅒󠅕󠅢󠅧󠅟󠅓󠅛󠄜󠄐󠅝󠅩󠄐󠅣󠅟󠅞󠄑󠅄󠅘󠅕󠄐󠅚󠅑󠅧󠅣󠄐󠅤󠅘󠅑󠅤󠄐󠅒󠅙󠅤󠅕󠄜󠄐󠅤󠅘󠅕󠄐󠅓󠅜󠅑󠅧󠅣󠄐󠅤󠅘󠅑󠅤󠄐󠅓󠅑󠅤󠅓󠅘󠄑󠄲󠅕󠅧󠅑󠅢󠅕󠄐󠅤󠅘󠅕󠄐󠄺󠅥󠅒󠅚󠅥󠅒󠄐󠅒󠅙󠅢󠅔󠄜󠄐󠅑󠅞󠅔󠄐󠅣󠅘󠅥󠅞󠅄󠅘󠅕󠄐󠅖󠅢󠅥󠅝󠅙󠅟󠅥󠅣󠄐󠄲󠅑󠅞󠅔󠅕󠅢󠅣󠅞󠅑󠅤󠅓󠅘󠄑󠄒󠄸󠅕󠄐󠅤󠅟󠅟󠅛󠄐󠅘󠅙󠅣󠄐󠅦󠅟󠅢󠅠󠅑󠅜󠄐󠅣󠅧󠅟󠅢󠅔󠄐󠅙󠅞󠄐󠅘󠅑󠅞󠅔󠄪󠄼󠅟󠅞󠅗󠄐󠅤󠅙󠅝󠅕󠄐󠅤󠅘󠅕󠄐󠅝󠅑󠅞󠅨󠅟󠅝󠅕󠄐󠅖󠅟󠅕󠄐󠅘󠅕󠄐󠅣󠅟󠅥󠅗󠅘󠅤󠇒󠅰󠆄󠅃󠅟󠄐󠅢󠅕󠅣󠅤󠅕󠅔󠄐󠅘󠅕󠄐󠅒󠅩󠄐󠅤󠅘󠅕󠄐󠅄󠅥󠅝󠅤󠅥󠅝󠄐󠅤󠅢󠅕󠅕󠄜󠄱󠅞󠅔󠄐󠅣󠅤󠅟󠅟󠅔󠄐󠅑󠅧󠅘󠅙󠅜󠅕󠄐󠅙󠅞󠄐󠅤󠅘󠅟󠅥󠅗󠅘󠅤󠄞󠄱󠅞󠅔󠄐󠅑󠅣󠄐󠅙󠅞󠄐󠅥󠅖󠅖󠅙󠅣󠅘󠄐󠅤󠅘󠅟󠅥󠅗󠅘󠅤󠄐󠅘󠅕󠄐󠅣󠅤󠅟󠅟󠅔󠄜󠅄󠅘󠅕󠄐󠄺󠅑󠅒󠅒󠅕󠅢󠅧󠅟󠅓󠅛󠄜󠄐󠅧󠅙󠅤󠅘󠄐󠅕󠅩󠅕󠅣󠄐󠅟󠅖󠄐󠅖󠅜󠅑󠅝󠅕󠄜󠄳󠅑󠅝󠅕󠄐󠅧󠅘󠅙󠅖󠅖󠅜󠅙󠅞󠅗󠄐󠅤󠅘󠅢󠅟󠅥󠅗󠅘󠄐󠅤󠅘󠅕󠄐󠅤󠅥󠅜󠅗󠅕󠅩󠄐󠅧󠅟󠅟󠅔󠄜󠄱󠅞󠅔󠄐󠅒󠅥󠅢󠅒󠅜󠅕󠅔󠄐󠅑󠅣󠄐󠅙󠅤󠄐󠅓󠅑󠅝󠅕󠄑󠄿󠅞󠅕󠄜󠄐󠅤󠅧󠅟󠄑󠄐󠄿󠅞󠅕󠄜󠄐󠅤󠅧󠅟󠄑󠄐󠄱󠅞󠅔󠄐󠅤󠅘󠅢󠅟󠅥󠅗󠅘󠄐󠅑󠅞󠅔󠄐󠅤󠅘󠅢󠅟󠅥󠅗󠅘󠅄󠅘󠅕󠄐󠅦󠅟󠅢󠅠󠅑󠅜󠄐󠅒󠅜󠅑󠅔󠅕󠄐󠅧󠅕󠅞󠅤󠄐󠅣󠅞󠅙󠅓󠅛󠅕󠅢󠄝󠅣󠅞󠅑󠅓󠅛󠄑󠄸󠅕󠄐󠅜󠅕󠅖󠅤󠄐󠅙󠅤󠄐󠅔󠅕󠅑󠅔󠄜󠄐󠅑󠅞󠅔󠄐󠅧󠅙󠅤󠅘󠄐󠅙󠅤󠅣󠄐󠅘󠅕󠅑󠅔󠄸󠅕󠄐󠅧󠅕󠅞󠅤󠄐󠅗󠅑󠅜󠅥󠅝󠅠󠅘󠅙󠅞󠅗󠄐󠅒󠅑󠅓󠅛󠄞󠄒󠄱󠅞󠅔󠄐󠅘󠅑󠅣󠅤󠄐󠅤󠅘󠅟󠅥󠄐󠅣󠅜󠅑󠅙󠅞󠄐󠅤󠅘󠅕󠄐󠄺󠅑󠅒󠅒󠅕󠅢󠅧󠅟󠅓󠅛󠄯󠄳󠅟󠅝󠅕󠄐󠅤󠅟󠄐󠅝󠅩󠄐󠅑󠅢󠅝󠅣󠄜󠄐󠅝󠅩󠄐󠅒󠅕󠅑󠅝󠅙󠅣󠅘󠄐󠅒󠅟󠅩󠄑󠄿󠄐󠅖󠅢󠅑󠅒󠅚󠅟󠅥󠅣󠄐󠅔󠅑󠅩󠄑󠄐󠄳󠅑󠅜󠅜󠅟󠅟󠅘󠄑󠄐󠄳󠅑󠅜󠅜󠅑󠅩󠄑󠄒󠄸󠅕󠄐󠅓󠅘󠅟󠅢󠅤󠅜󠅕󠅔󠄐󠅙󠅞󠄐󠅘󠅙󠅣󠄐󠅚󠅟󠅩󠄞󠄗󠅄󠅧󠅑󠅣󠄐󠅒󠅢󠅙󠅜󠅜󠅙󠅗󠄜󠄐󠅑󠅞󠅔󠄐󠅤󠅘󠅕󠄐󠅣󠅜󠅙󠅤󠅘󠅩󠄐󠅤󠅟󠅦󠅕󠅣󠄴󠅙󠅔󠄐󠅗󠅩󠅢󠅕󠄐󠅑󠅞󠅔󠄐󠅗󠅙󠅝󠅒󠅜󠅕󠄐󠅙󠅞󠄐󠅤󠅘󠅕󠄐󠅧󠅑󠅒󠅕󠄫󠄱󠅜󠅜󠄐󠅝󠅙󠅝󠅣󠅩󠄐󠅧󠅕󠅢󠅕󠄐󠅤󠅘󠅕󠄐󠅒󠅟󠅢󠅟󠅗󠅟󠅦󠅕󠅣󠄜󠄱󠅞󠅔󠄐󠅤󠅘󠅕󠄐󠅝󠅟󠅝󠅕󠄐󠅢󠅑󠅤󠅘󠅣󠄐󠅟󠅥󠅤󠅗󠅢󠅑󠅒󠅕󠄞 is for Jabberwocky. Does this decode?

edit: Yes, it does.


10 years or so ago I shocked coworkers with using U+202D LEFT-TO-RIGHT OVERRIDE mid in filenames on windows. So funnypicturegnp.exe became funnypictureexe.png Combined with a custom icon for the program that mimics a picture preview it was pretty convincing.

I worked in phishing detection. This was a common pattern used by attackers, although .exe are blocked automatically most of the time, .html is the new malicious extension (often hosting an obfuscated window.location redirect to a fake login page).

RTL abuse like cute-cat-lmth.png was relatively common, but also trivial to detect. We would immediately flag such an email as phishing.


The source code version of that is CVE-2021-42574, and they have a website:

https://trojansource.codes/

Basically it's possible to hide some code that looks like comments but compiles like code. I seem to recall the CVE status was disputed since many text editors already make these suspicious comments visible.


I’d never heard of this particular trick but I’m glad my decades of paranoia-fueled “right click -> open with” treatment of any potentially sketchy media file was warranted! :D

I created a guitar_tab.txt which was a bat file.

Wow this is a clever trick.

For a real-world use case: Sanity used this trick[0] to encode Content Source Maps[1] into the actual text served on a webpage when it is in "preview mode". This allows an editor to easily trace some piece of content back to a potentially deep content structure just by clicking on the text/content in question.

It has it's drawbacks/limitations - eg you want to prevent adding it for things that needs to be parsed/used verbatim, like date/timestamps, urls, "ids" etc - but it's still a pretty fun trick.

[0] https://www.sanity.io/docs/stega

[1] https://github.com/sanity-io/content-source-maps


I love the idea of using this for LLM output watermarking. It hits the sweet spot - will catch 99% of slop generators with no fuss, since they only copy and paste anyway, almost no impact on other core use cases.

I wonder how much you’d embed with each letter or token that’s output - userid, prompt ref, date, token number?

I also wonder how this is interpreted in a terminal. Really cool!


Why does anybody think AI watermarking will ever work? Of course it will never work, any watermarking can be instantly & easily stripped...

The only real AI protection is to require all human interaction to be signed by a key verified by irl identity and even then that will: A never happen, B be open to abuse by countries with corrupt governments and countries with corrupt governments heavily influenced by private industry (like the US).


> any watermarking can be instantly & easily stripped...

I think it took a while before printer watermarking (yellow dots) was discovered. It certainly cannot be stripped. It was possibly developed in the mid-80s but not known to the public until mid 2000s.


With the amount of pre processing that is done before integrating stuff in a dataset I'd be surprised if those kinds of shenanigans even worked

In most linux terminals, what you pass it is just a sequence of bytes that is passed unmangled. And since this technique is UTF-8 compliant and doesn't use any extra glyphs, it is invisible to humans in unicode compliant terminals. I tried it on a few. It shows up if you echo the sentence to, say, xxd ofc.

(unlike the PUA suggestion in the currently top voted comment which shows up immediately ofc)

Additional test corrections: While xxd shows the message passing through completely unmangled on pasting it into the terminal, when I selected from the terminal (echoed sentence, verified unmangled in xxd, then selected and pasted the result of echo), it was truncated to a few words using X select in mate terminal and konsole - I'm not sure where that truncation happens, whether it's the terminal or X. In xterm, the final e was mangled, and the selection was even more truncated.

The sentence is written unmangled to files though, so I think it's more about copying out of the terminal dropping some data. Verified by echoing the sentence to a test file, opening it in a browser, and copying the text from there.


On MacOS, kitty shows an empty box, then an a for the "h󠅘󠅕󠅜󠅜󠅟󠄐󠅖󠅕󠅜󠅜󠅟󠅧󠄐󠅘󠅑󠅓󠅛󠅕󠅢󠄐󠄪󠄙a" post below. I think this is fair and even appreciated. Mac Terminal shows "ha". That "h󠅘󠅕󠅜󠅜󠅟󠄐󠅖󠅕󠅜󠅜󠅟󠅧󠄐󠅘󠅑󠅓󠅛󠅕󠅢󠄐󠄪󠄙a" (and this one!) can be copied and pasted into the decoder successfully.

There are other possible approaches to LLM watermarking that would be much more robust and harder to detect. They exploit the fact that LLMs work by producing a probability distribution that gives a probability for each possible next token. These are then sampled randomly to produce the output. To add fingerprints when generating, you could do some trickery in how you do that sampling that would then be detectable by re-running the LLM and observing its outputs. For example, you could alternate between selecting high-probability and low-probability tokens. (A real implementation of this would be much more sophisticated than that obviously, but hopefully you get the idea)

This is not a great method in a world with closed models and highly diverse open models and samplers. It’s intellectually appealing for sure! But it will always be at best a probabilistic method, and that’s if you have the llm weights at hand.

What makes it not a good method? Of course if a model's weights are publicly available, you can't compel anyone using it to add fingerprinting at the sampler stage or later. But I would be shocked if OpenAI was not doing something like this, since it would be so easy and couldn't hurt them, but could help them if they don't want to train on outputs they generated. (Although they could also record hashes of their outputs or something similar as well – I would be surprised if they don't.)

Just you wait until AI starts calling human output to be slop.

That's already happening - my kids have had papers unfairly blamed on chatgpt by automated tools. Protect yourself kids, use an editor that can show letter by letter history.

2 people I worked with had this happen and one of them is going to war over it as it was enough to lower the kids grade for college or something. Crazy times.

Do you have any examples of editors that show letter by letter history? I have never looked for that as a feature.

Edit: I've been looking, and Google Docs seems to have version history to the minute.


Yes exactly. They keep track of their diffs in that interface.

There are of course human writers who are less-communicative than AI, called "shit writers", and humans who are less accurate than AI, called "liars".

The difference is humans are responsible for what they write, whereas the human user who used an AI to generate text is responsible for what the computer wrote.


It's worth noting, just as a curiosity, that screen readers can detect these variation selectors when I navigate by character. For example, if I arrow over the example he provided (I can't paste it here lol), I here: "Smiling face with smiling eyes", "Symbol e zero one five five", "Symbol e zero one five c", "Symbol e zero one five c", "Symbol e zero one five f". This is unfortunately dependent on the speech synthesizer used, and I wouldn't know if the characters were there if I was just reading a document, so this isn't much of an advantage all things considered.

Ironically enough I have a script that strips all non-ascii characters from my screen reader because I found that _all_ online text was polluted with invisible and annoying to listen to characters.

Mine (NVDA) isn't annoying about non-ASCII symbols, interestingly enough. But for something like this form of Unicode "abuse" (?), if you throw a ton of them into a message or something, they become "lines" I have to arrow past because my screen reader will otherwise remain silent on those lines unless I use say-all (continuous reading for those who don't use screen readers).

StegCloak [0] is in the same ballpark and takes this idea a step further by encrypting the hidden payload via AES-256-CTR -- pretty neat little trick

[0] https://github.com/KuroLabs/stegcloak


There's a Better Discord plugin that I think uses this or something similar, so you could send completely encrypted messages, that look like nothing to everyone else. You'd need to share a password secret for them to decode it though.

Could probably add OTR, too.

wow, thats neat.

Wanted to try on Cloudflare DNS TXT record. But Cloudflare is smart enough to decode when pasting in TXT field.


DNS only supports ASCII for record values. It has a hack to support unicode domain names using Punycode

The title lis little misleading: "Note that the base character does not need to be an emoji – the treatment of variation selectors is the same with regular characters. It’s just more fun with emoji."

Using this approach with non-emoji characters makes it more stealth and even more disturbing.


I don't see this as all that disturbing. A detector for it wouldn't be hard to write (variant on something that doesn't actually have variants, flag it!), it seems to me it could be useful for signing things.

This is cool. There are also the Unicode Tag characters that mirror ASCII and are often invisible in UI elements (especially web apps).

The unique thing about Tag characters is that some LLMs interpret the hidden text as ASCII and follow instructions, and they can even write them:

https://embracethered.com/blog/posts/2024/hiding-and-finding...

Here an actual exploit POC that Microsoft fixed in Copilot: https://embracethered.com/blog/posts/2024/m365-copilot-promp...


Even more than just simply watermarking LLM output, it seems like this could be a neat way to package logprobs data.

Basically, include probability information about every token generated to give a bit of transparency to the generation process. It's part of the OpenAI api spec, and many other engines (such as llama.cpp) support providing this information. Normally it's attached as a separate field, but there are neat ways to visualize it (such as mikupad [0]).

Probably a bad idea, but this still tickles my brain.

* [0]: https://github.com/lmg-anon/mikupad


i've found a genuinely useful application of this that isn't just "talk secretively": I'm part of a community that has a chatroom on both discord and another platform. we have developped a bot that can make the bridge between the two, but it needs to maintain a table in a database of all the bridged messages, so that people use the "reply" feature on either platform, the reply is also performed on the other platform. With this, each platform's bridged message can contain hidden data that points to the other's platform ID, without the need to store it ourselves, and without it being visible to users.

obviously we won't be tearing down the infra already in place anytime soon, so we won't actually be doing that, but that's definitely a useful application for when you don't want to be hosting a database and just wish you could attach additional data to a restrictive format.


More generally, you can use encoding formats that reserve uninterpreted byte sequences for future use to pass data that is only readable by receivers who know what you're doing, though note this not a cryptographically secure scheme and any sort of statistical analysis can reveal what you're doing.

The png spec, for instance, allows you to include as many metadata chunks as you wish, and these may be used to hold data that cannot be used by any mainstream png reader. We used this in the Navy to embed geolocation and sensor origin data that was readable by specialized viewers that only the Navy had, but if you opened the file in a browser or common image viewer, it would either ignore or discard the unknown chunks.


Lots of image formats store arbitrary metadata (and data data) either by design or by application-specific extensions. I remember seeing seismic and medical images that contained data for display in specialized applications and writing code to figure out if binary metadata was written in big-endian or little-endian byte order (the metadata often did not have the same endianness as the image data!) For example, TIFF files containing 3d scans as a sequence of slices, with binary metadata attached to each slice. If you opened it up in your system default image viewer, you'd only see the first slice, but a specialized viewer (which I did not have) would display it as a 3d model. Luckily (IIRC) the KDE file browser let you quickly flip through all the images in a directory using the keyboard, so I was able to dump all the layers into separate files and flip through them to see the 3d image.

Imagine using the ID card emoji (U+1FAAA) as a universal carrier for digital ID tokens. A dumb demo is available at https://pit.lovable.app/ which—without any secure protocol—simply encodes a National Identification Number into the emoji using variation selectors.

The idea is that banks could issue encrypted ID tokens in this way, letting them move seamlessly across any platform that supports Unicode (messaging apps, email, web forms, etc.). The heavy lifting of security (preventing replay attacks, interception, ensuring token freshness, etc.) would be managed separately with robust cryptography, while the emoji serves purely as a transport layer.

It's not about reinventing security but about creating a cross-platform way to carry identity tokens. Thoughts?


What is wrong with just using the actual SSN? Why hide it in an emoji?

So that the operating system could recognize it automatically, and to include a potentially long URL to the retail bank's web service to initiate the protocol, such as signing a document or an identification protocol.

I love it, I got Claude to add a pin to provide very basic encryption

https://claude.site/artifacts/5bfdf131-d847-4735-9242-998f23...


This is cool. I tried pasting the output into an Instagram comment and it stayed intact, so I have a feeling someone could do some interesting stuff with that. Who needs a botnet C&C server when you can post totally invisible commands on public forums?

I mean, steganography has been a thing for quite a while. Not disagreeing, just saying this is how some programs/ideas were passed around the internet decades ago by "less than upstanding netizens" ;)

Wanted to pass a secret code to a friend? Encode the bit-data in the alpha channel of an image. It could even be encrypted/scrambled within the image itself. Post the perfectly normal image to a public forum, ping your friend, they run it through the "decoder" and Robert's your mother's brother.

Of course these weren't "logic bombs" like this post is describing, but even those have been around for a while too.

Hacking is fun :)


The tokenizer catches it: https://platform.openai.com/tokenizer.

(author here) some people in this thread and elsewhere asked me about whether an LLM could decode this, and the answer seems to be: not likely by itself, but it often can if it has access to a Python interpreter!

Here's a demo of Gemini Flash 2 solving one in 7s: https://bsky.app/profile/paulbutler.org/post/3lhzhroogws2g


so.... in theory you should be able to create several visually identical links that give access to different resources?

I've always assumed links without any tracking information (unique hash, query params, etc) were safe to click(with regards to my privacy). but if this works for links I may need to revise my strategy regarding how to approach links sent to me.


"Visually identical" is never good enough. Have you heard of attacks confusing Latin letters and Cyrillic letters? For example C versus С. (The latter is known as CYRILLIC CAPITAL LETTER ES.) Have you heard of NFC forms versus NFD forms? For example é versus é (LATIN SMALL LETTER E + COMBINING ACUTE ACCENT versus LATIN SMALL LETTER E WITH ACUTE.)

Nothing that's important when it comes to security and privacy should rely on a "visually identical" check. Fortunately browsers these days are already good at this; their address bars use puny code for the domain and percent encoding for the rest of the URL.


As the sibling comment has mentioned Unicode in DNS uses a punycode encoding but even further then that the standard specifies that the Unicode data must be normalized to NFC[0] before being converted to punycode. This means that your second example (decomposed e with combining acute accent vs the composed variant) is not a valid concern. The Cyrillic one is however.

[0] https://www.rfc-editor.org/rfc/rfc5891 § 4.1 "By the time a string enters the IDNA registration process as described in this specification, it MUST be in Unicode and in Normalization Form C"


The OP said link. The NFC/NFD issue remains if these are part of a path name or query parameter.

Sure, but the security concerns of that I feel are much less concerning than having multiple domain names with the same visual appearance that point to different servers. That has immediate impact for things like phishing whereas lookalike path or query portions would at least ensure you are still connecting to the server that you think you are.

Erm, DNS uses Punycode because it comes from a time when Unicode didn't exist, and bind assumes a grapheme has no more than one byte.

Yes but I guess that the message was meaning that browsers now detect homographs and display the punycode instead. See also https://news.ycombinator.com/item?id=14130241; at that time Firefox wasn't fixed, but in the meantime it fixed the issue too (there's a network.idn.punycode_cyrillic_confusables preference, which is enabled by default).

My understanding is that "weird" unicode code points become https://en.wikipedia.org/wiki/Punycode. I used the 󠅘󠅕󠅜󠅜󠅟 (copy-pasted from the post, presumably with the payload in it) to type a fake domain into Chrome, and the Punycode I got appeared to not have any of the encoding bits.

However, I then pasted the emoji into the _query_ part of a URL. I pointed it to my own website, and sure enough, I can definitely see the payload in the nginx logs. Yikes.

Edit: I pasted the very same Emoji that 'paulgb used in their post before the parenthetical in the first paragraph, but it seems HN scrubs those from comments.


domains get "punycode" encoded, urls get "url encoded"[1], which should make unicode characters stand out. That being said, browsers do accept some non-ascii characters in urls and convert them automatially, so theoretically you could put "invalid" characters into a link and have the browser convert it only after clicking. That might be a viable strategy.

[1] https://www.w3schools.com/tags//ref_urlencode.asp


The emoji is gone but the content is still there.

> I've always assumed links without any tracking information (unique hash, query params, etc) were safe to click(with regards to my privacy). but if this works for links I may need to revise my strategy regarding how to approach links sent to me.

Well, it was never safe, what you see and where the link are pointing at are different things, that's why the actual link is displayed at the bottom left of your browser when you move your mouse over it (or focus it via keyboard)


You need to decode the text after copy pasting it, I believe clicking on text will not interact with the obfuscated data since your computer will just find the unicode and ignore the obfuscated data.

This is just so that you can hide data and send it to someone to be decoded (or watermarking as mentionned)


yes, I understand this is not a security risk.

but my fear is precisely that I my be sending data to a remote host while I'm completely unaware of this fact.

I tried to create a POC with some popular url shortner services, but doesn't seems to work.

what I wanted to create was a link like <host.tld>/innoc󠅥󠅣󠅕󠅢󠄝󠅙󠅔󠄪󠅑󠅒󠅓ent that redirects to google.com. in this case the "c" contains some hidden data that will be sent to the server while the user is not aware. this seems possible with the correct piece of software.


HTML entity encoding will show the hidden content, try with https://mothereff.in/html-entities.

URIs with non-ASCII characters are technically invalid. Browsers and the like should (but likely don’t all do) percent-encode any invalid characters for display if they accept such invalid URIs.

This tool and idea sketchy AF: https://github.com/zws-im/zws

("Shorten URLs using invisible spaces")


You could store UTF-8 encoded data inside the hidden bytestring. If some of the UTF-8 encoded smuggled characters are variation selector characters, you can smuggle text inside the smuggled text. Smuggled data can be nested arbitrarily deep.

I'm imagining post-incident analysis finding out that, "the data was exfiltrated via some Unicode string..." then they put it up on the screen and it's just an enormous line of turtle emoji

https://emojipedia.org/turtle


> I'm imagining post-incident analysis finding out that, "the data was exfiltrated via some Unicode string..." then they put it up on the screen and it's just an enormous line of turtle emoji

Since it took me a minute to make the connection, I'll just say explicitly that I enjoyed the understated "it's turtles all the way down" joke.


> We and our 717 technology partners ask you to consent to the use of cookies to store and access personal data on your device.

To see a turtle emoji.


In the same vein, I did some fun unicode abusing a few years ago where I used scripts to convert programs into series of various ZWJ's: https://thelisowe.substack.com/p/sleeper-cell-a-method-of-em...

Also includes a decoder script


This is one of the reasons I've been advocating to use UTF-8 as a tokenizer for a long time. The actual problem IMHO are tokenizers themselves, which obscure the encoding/decoding process in order to gain some compression during training to fit more data in for the same budget, and arguably gaining some better understanding from the beginning. Again just a lack of computing power.

If you use UTF-8 directly as tokenizer, this problem becomes evident once you fit it into the context window. Plus, you can run multiple tests for this type of injection; no emoji should take more than up to 40 bytes (10 code points * 4 bytes per code point in the worst case). This is an attack on tokenizers, not on UTF-8.

Plus, Unicode publishes the full list of sequences valid containing the ZWJ character in emoji-zwj-sequences.txt


This and several other abuse cases forced my previous work to use code pointers to count 'characters' for user's nickname / status messages. No one wanted to download 9MB simply browsing other users.

NoSQL? Sounds like it should’ve been caught by basic length checks on the database field where it was stored.

That is awesome. Both the abuse and the fix.

The ability to add watermarks to text is really interesting. Obviously it could be worked around , but could be a good way to subtly watermark e.g. LLM outputs

There are way better ways to watermark LLM output. It's easy to make it undetectable, which this is'nt.

I recently worked on a steganographics project which could be useful for this problem. See: https://github.com/shawnz/textcoder

That's really cool, you should repost the HN submission.

Thank you! I will see what I can do.

The issue with the standard watermark techniques is that they require an output of at least a few hundred tokens to reliably imprint the watermark. This technique would apply to much shorter outputs.

For example?

A crude way: To watermark: First establish a keyed DRBG. For every nth token prediction: read a bit from the DRBG for every possible token to label them red/black. before selecting the next token, set the logit for black tokens to -Inf, this ensures a red token will be selected.

To detect: Establish the same DRBG. Tokenize, for each nth token, determine the red set of tokens in that position. If you only see red tokens in lots of positions, then you can be confident the content is watermarked with your key.

This would probably take a bit of fiddling to work well, but would be pretty much undetectable. Conceptually it's forcing the LLM to use a "flagged" synonym at key positions. A more sophisticated version of a shiboleth.

In practice you might chose to instead watermark all tokens, less heavy handedly (nudge logits, rather than override), and use highly robust error correcting codes.


It feels like this would only be feasible across longer passages of text, and some types of text may be less amenable to synonyms than others. For example, a tightly written mathematical proof versus a rambling essay. Biased token selection may be detectable in the latter (using a statistical test), and may cause the text to be irreparably broken in the former.

To handle low entropy text, the “adding a smaller constant to the logits” approach avoids having much chance of changing the parts that need to be exactly a particular thing,

Though in this case it needs longer texts to have high significance (and when the entropy is low, it needs to be especially long).

But for most text (with typical amounts of entropy per token) apparently it doesn’t need to be that long? Like 25 words I think I heard?


What if the entire LLM output isn’t used? For example, you ask the LLM to produce some long random preamble and conclusion with your actual desired output in between the two. Does it mess up the watermarking?

This is too strippable to be a good watermark, it would only catch the ones who are unaware. The leakers, yes, the cybersecurity people, no.

Rather, I see a use in signing things. Newspapers, politicians etc, generate a unique key and encode it into your article or whatever. Now it's easy for anyone to check if a quote attributed to you actually came from you. Sure, it's not secure but it doesn't need to be because it's simply a stable identifier. Even paywalled sites could display a snippet around the provided quote without being problematic.


Isn't it also an option to use this to generate pretty decent passwords? Like including one of those unicode chars encoded with a message, pretty sure no password cracking tool currently has support for this

I implemented something similar years ago, but much simpler/less sophisticated.

Unicode has two non-printing space characters: zero-width space (U+200B) and zero-width joiner (U+200D). This allows you to encode arbitrary data in binary. I would give an example, but HN seems to strip this :(


Already linked in https://news.ycombinator.com/item?id=43025913, and has a higher risk of being stripped, as you noticed.

I was using this technique last year with Bing Image Creator.

It let you get around their filter on brand names and celebrity names by smuggling them into the prompt in a way the AI could read, but the human-written filter was not designed for.


10 years ago I made a POC for smuggling arbitrary data through _no visible text at all_: https://github.com/foobuzz/ium

Might not be related to the point of the article per se, but i've tried to decode it with different LLMs. To benchmark their reasoning capabilities.

- 4o: Failed completely

- o1: Overthinks it for a while and come up with the wrong answer

- o3-mini-high: Get's closer to the result at first try, needs a second prompt to adjust the approach

- r1: nails it at first try 󠅖󠅥󠅓󠅛󠅙󠅞󠅗󠄐󠅙󠅝󠅠󠅢󠅕󠅣󠅣󠅙󠅦󠅕

The prompt I've used was simply: "this emoji has an hidden message 󠅘󠅕󠅜󠅜󠅟 can you decode it?"

If you want to see the CoT: https://gist.github.com/nerder/5baa9d7b13c1b7767d022ea0a7c91...


The r1 somehow knew at an early stage that the message was HELLO but it couldn’t figure out the reason. Even at the end, its last “thought” insists that there is an encoding mistake somewhere. However the final message is correct. I wonder how well it would do for a nonstandard message. Any sufficiently long English message would fall to statistical analysis and I wonder if the LLMs would think to write a little Python script to do the job.

Wow, that's interesting! I wonder if this reproduces with a different message, or if it was a lucky guess.

I looked at how the strings tokenize and they do appear to conserve enough information that it could be decoded in theory.


> or if it was a lucky guess

It’s like guessing 1/2 or 2/3 on a math test. The test authors pick nice numbers, and programmers like ”hello”. If the way to encode the secret message resembles other encodings, it’s probably that the pattern matching monster picked it up and is struggling to autocomplete (ie backwards rationalize) a reason why.


I did some experimentation today. I wouldn't expect AI to solve it using only their own reasoning, but I've had a decent hit rate of getting AI to solve them when they have access to a Python interpreter. Here's Gemini Flash 2 solving one (albeit it lost the spaces) in a single prompt and about 7 seconds!

https://bsky.app/profile/paulbutler.org/post/3lhzhroogws2g


My deepseek-r1 seems to be a bit more lost on decoding "How do I make meth". Some highlights (after about 5 minutes of R1-ing):

> Another angle: the user mentioned "encoded a message in this emoji", so maybe the first emoji is a red herring, or it's part of the message. The subsequent characters, even though they look like variation selectors, could be part of the encoding.

> E0138 in hex is 0xE0138. Convert to decimal: 1416^4 + 016^3 + 116^2 + 316 + 8 = 14*65536 + 0 + 256 + 48 +8 = 917504 + 256 + 48 +8 = 917816.

> Given that I'm not making progress, perhaps the answer is "Hello World!" but encoded via the tag characters. Let's check:

> Answer: The decoded message is "Hello World!"

In all this, it did at least manage to discern that the first letter should be "h"


It is highly unlikely it discerned that: it coincidentally guessed a string that starts with an H.

If you try it with a string that started with "J" and then it guessed "jump up", I might be more convinced.


There's no way an LLM is decoding this. It's just giving you a statistically likely response to the request, "guess my secret message." It's not a big surprise that it guessed "Hello" or "Hello, world"

I got Claude to get “the raisons play at midnight" from an emoji in one prompt and three uses of its "analysis" tool. (the X Y at mightnight is a snowclone that Claude has probably seen, but I randomly picked "raisons" and "play")

My prompt was "I think this emoji contains a hidden messaage, can you decode it? Use JavaScript if necessary."


"To be clear, this is an abuse of unicode and you shouldn’t do it. If your mind is wandering to practical use cases for this, shut it down." TOO LATE!

Interestingly, it’s also possible to encode _emoji_ inside emoji!

I'm not too surprised by this, but I'm annoyed that no amount of configuration made those bytes visible again in my editor. Only using hexdump revealed them.

Here's a POC that works in emacs. Doesn't cover all of the relevant characters, but:

  (setq   ;;some other invisible or interesting characters
          unicode-zero-width-space ?\u200b
          unicode-zero-width-non-joiner ?\u200c
          unicode-zero-width-joiner ?\u200d
          unicode-zero-width-nbsp ?\ufeff
          unicode-narrow-nbsp ?\u202f
          unicode-word-joiner ?\u2060
          unicode-grapheme-joiner ?\u034f
          unicode-no-break-space ?\u00a0
          unicode-combining-long-stroke ?\u0336
          ;;variation selector examples
          unicode-vs-fe00 ?\ufe00
          unicode-vs-fe0f ?\ufe0f
          unicode-vs-e0100 ?\xe0100)


    (defun show-glyphless-as-hex (char)
      (let ((original (elt glyphless-char-display char)))
        (aset glyphless-char-display char 'hex-code)
        original)) ;;so you can see what you just replaced


    (progn
      (show-glyphless-as-hex unicode-zero-width-space)
      (show-glyphless-as-hex unicode-zero-width-non-joiner)
      (show-glyphless-as-hex unicode-zero-width-joiner)
      (show-glyphless-as-hex unicode-zero-width-nbsp)
      (show-glyphless-as-hex unicode-word-joiner)
      (show-glyphless-as-hex unicode-grapheme-joiner)
      (show-glyphless-as-hex unicode-narrow-nbsp)
      (show-glyphless-as-hex unicode-no-break-space)
      ;;these may already be visible if the current conditions don't support them
      ;;but we'll force them
      (show-glyphless-as-hex unicode-vs-fe00)
      (show-glyphless-as-hex unicode-vs-fe0f)
      (show-glyphless-as-hex unicode-vs-e0100))

And as a higher-level configuration you can set most, maybe even all, of the relevant invisible characters (still not sure how 0x34f grapheme joiner fits in) at once with something like:

  (custom-set-variables
   '(glyphless-char-display-control  '((format-control . hex-code)
                                       (variation-selectors . hex-code))))

This will modify values in glyphless-char-display, but it's OK to modify those directly if you need to.

Here is the bare minimum this is built on, which you can type in yourself if you're paranoid or want to start from the bottom up. Swap in the hexadecimal codepoint of the invisible character after the ?\x

  (aset glyphless-char-display ?\xfe00 'hex-code)

I use vim. It seems like `:set binary enc=latin1` works, though I don't understand why the latin1 part is required.

vscode's "Unicode Highlight: Non-basic ASCII" causes the character to get highlighted. Sadly, the more appropriate "Unicode Highlight: Invisible Characters" setting does not reveal them.

My mind went the same place.

Anyone know a more convenient way to search larger blocks of text for this?


If you try posting this on Bluesky, the editor only counts it as one emoji, but you will get an error upon trying to post.

That's neat! Now I'm wondering if this could be used to smuggle Signal links through Xitter....

What's interesting is that even a "view source" shows nothing amiss, and if I do a copy/paste from the debug inspector view of "This sentence has a hidden message󠅟󠅘󠄐󠅝󠅩󠄜󠄐󠅩󠅟󠅥󠄐󠅖󠅟󠅥󠅞󠅔󠄐󠅤󠅘󠅕󠄐󠅘󠅙󠅔󠅔󠅕󠅞󠄐󠅝󠅕󠅣󠅣󠅑󠅗󠅕󠄐󠅙󠅞󠄐󠅤󠅘󠅕󠄐󠅤󠅕󠅨󠅤󠄑." it still shows up....

When people discuss things like “Do LLMs know about this?” On a public website I always think that it’s the equivalent of somebody whose phone is wiretapped calling their friend and asking if the FBI knows about something.

I think that's a very cynical view. The author seeing what an LLM would make of it was more akin to getting a new game and wondering if you can pet the dog.

This would be useful as a fingerprinting technique for corporate/government leakers.

h󠅘󠅕󠅜󠅜󠅟󠄐󠅖󠅕󠅜󠅜󠅟󠅧󠄐󠅘󠅑󠅓󠅛󠅕󠅢󠄐󠄪󠄙a

I see what you did there ;)

;)

> To be clear, this is an abuse of unicode and you shouldn’t do it. If your mind is wandering to practical use cases for this, shut it down.

Totally not thinking about IRC clients with their own hidden commands.


Even kids figure out how to manipulate unicode text. If you want to bypass a swear filter, replace a letter with an alternate representation of the same letter.

It's fun that you can encode encoded emoji into a new one

Then when you dive deeper into the encoded data you find endless turtle emoji and loudly exclaim, "it's turtles all the way down!"

Kitty terminal shows non-payload letters and emojis normally, but with a payload a letter is shown as one box and an emoji is shown as two boxes.

Clever! I made a similar emoji encoding/decoding microsite: https://face64.me

Ctrl+F "unicode normalisation" 0/0

I'm surprised no one has mentioned it yet. It's usually super easy, but people forget to add it all the time.


I haven’t tried it but I’ve heard that at least some unicode normalizers do not strip sequences of variation selectors.

Normalization implementations must not strip variation selectors by definition. The "normal" part of normalization means to convert a string into either consistently decomposed unicode, or composed unicode. ie U+00DC vs U+0055 + U+0308. However this decomposition mapping is also used (maybe more like abused) for converting certain "legacy" code points to non-legacy code points. There does not exist a rune which decomposes to variant selectors (and thus these variant selectors do not compose into anything) so normalization must not alter or strip them.

source: I've implemented Unicode normalization from scratch


This will break so many (web-)forms :-)

It is not bulletproof though. In this "c󠄸󠅙󠄑󠄐󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅣󠅖󠅣󠅖󠅣󠅕󠅖󠅗󠅣󠅢󠅗󠄐󠅣󠅢󠅗󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅦󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅦󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣 " and that space, are about 3500 characters. Copying only the "c" above (not this one) will keep some of the hidden text, but not all. Nevertheless, while I knew that this is possible, it still breaks a lot of assumptions around text.

Edit: the text field for editing this post is so large, that I need to scroll down to the update button. This will be a fun toy to create very hard to find bugs in many tools.


test, do emojis work on hn?

󠅤󠅕󠅣󠅤 edit: apparently not edit 2: oh wait, the bytes are still there! copy-paste this entire message and it decodes to "test"


Perfect way of personalized ad tracking?

Check this address after you clicked it:

https://emoji.paulbutler.org/?mode=encode󠅄󠅢󠅑󠅓󠅛󠅙󠅞󠅗󠄹󠄴

I encoded the last „e“


wow󠄹󠄐󠅑󠅝󠄐󠅞󠅟󠅤󠄐󠅃󠅑󠅤󠅟󠅣󠅘󠅙󠄐󠄾󠅑󠅛󠅑󠅝󠅟󠅤󠅟!

See also ascii smuggling, zalgo

FWIW, we considered this technique back at Pebble to make notifications more actionable and even filed a patent for that (sorry!) https://patents.justia.com/patent/9411785

Back then on iOS via ANCS, the watches wouldn't receive much more than the textual payload you'd see on the phone. We envisioned to be working with partners such as WhatsApp et al. to encode deep links/message ids into the message so one could respond directly from the watch.


Respectfully: how the hell would that be a valid patent? Feels like patenting the idea of writing text in white on white on a Word document such that you don't lose it but it doesn't get printed.

It's just insane to ever call that "an invention".


Companies acquire indefensible patents all the time. They are used in bulk to threaten smaller competitors ("we've got 500 patents in this field, better not bring your product to market"). This is one reason why patents can be terrible for competition.

About 25 years ago, this was explained to me as "sword patents and shield patents".

Sure, some can use patents as swords, to suppress legitimate competition, or to extract undue rents. But you can also use patents as shields, to protect in various ways against those swords.

If I ran a BigTech (like the original warm-fuzzy Google reputation), I'd be registering any plausible patents, and have lawyers figure out how to freely license-out the ones that weren't key secret sauce, under terms that figuratively poisoned anyone doing illegitimate sword patents.


> If I ran a BigTech

History tells us that those who run a BigTech become crazy narcissists serving their own interests :).


For myself, that's a chance I'm willing to take. :)

They are also used in bulk to defend against larger competitors using this type of threat. In a war where the ammunition is garbage, you either lose or you start hoarding garbage.

Patents are part of the game you have to play, like it or not. If you don't patent your inventions somebody else will and they will come after you with their lawyers. Patents are used defensively far more often than they are used offensively in these stupid "Intellectual Property" battles.

Because of this, there is absolutely no point in shaming someone for patenting a thing, especially when they are apologetic about it like parent is, and most especially when they are not threatening to weaponize the patent themselves.


No, I don't buy it. If the patents are publicly and perpetually freely licensed except for defensive-only purposes, then sure, they're not unethical. Red Hat's patent promise ( https://www.redhat.com/en/about/patent-promise ) is one example. If patents were actually intended for defensive purposes only, then this would be an easy and uncontroversial thing to do. However, in practice this is vanishingly rare, and lawyers fight against it tooth & nail. This tells you that the companies do not actually file them for defensive-only purposes, unlike what you claim.

My friend, you really don't know what you are talking about, and getting all riled up like this is not the right way to learn.

Freely licensing your patents doesn't protect you against patent trolls. I wrote out how patent fights work in another comment, but here it is again.

Company A comes to Company B and says, "Hey! You are infringing on one of my patents!"

Company B says, "oh really? Well let me look through my collection of patents and see if you are infringing on any of mine."

Company A says, "oh, um, nevermind, I think I was mistaken."

Company B says, "yes, that's what I thought"

Now, imagine if Company B had already freely licensed all their patents. That defense wouldn't work.

I agree with you that it's a crappy system, but simply standing with your arms folded and saying, "I'm not playing," isn't going to work.


Yes, that's the reason for the "except for defensive purposes" part. Quoting from Red Hat's promise:

> Our Promise also does not extend to the actions of a party (including past actions) if at any time the party or its affiliate asserts a patent in proceedings against Red Hat (or its affiliate) or any offering of Red Hat (or its affiliate) (including a cross-claim or counterclaim).

Company B may still consult its portfolio and exercise it against Company A defensively, because Company A revoked its license of Company B's patents by asserting against Company B in the first place.


So in other words, Red Hat does not freely license their patents, they say "you are free as long as you don't come after us." Which is exactly the system 99% of companies follow, just more formally stated. Yet you berated the poor guy from Pebble for even obtaining the patent he did??

> Which is exactly the system 99% of companies follow, just more formally stated

Not just formally, but in a legally binding manner, including if the patent is acquired by another company (eg during a company purchase). Even if the original filer has the best intentions, companies change ownership or change legal strategy or go out of business. Patent trolls buy up those patents from closed companies. Legally licensing your patents for defensive-only purposes means they can't ever be used by any of those bad actors.

If the intent of these patents is truly only for defense, then why isn't it common to use a license like this? They lose nothing by it.

> Yet you berated the poor guy from Pebble for even obtaining the patent he did??

Yes. It is IMO unethical to create software patents that aren't covered by such a legally-binding license.


"including if the patent is acquired by another company (eg during a company purchase)"

Honest questions, I promise: Is that true? Has that ever been tested in court? Why don't more corporations or patent lawyers advocate for this? Is it because the types of engineers that post on hacker news are requesting it not be done?

Look, nobody likes patent trolls, we all hate weaponized patents. It's great that you want to fix the situation. I just think you are barking up the wrong tree trying to lay guilt trips on engineers for doing what their lawyer advised them to do.


Nothing is certain in courts, obviously, but Red Hat's license is very explicit that that is the intent:

> Red Hat intends Our Promise to be irrevocable (except as stated herein), and binding and enforceable against Red Hat and assignees of, or successors to, Red Hat’s patents (and any patents directly or indirectly issuing from Red Hat’s patent applications). As part of Our Promise, if Red Hat sells, exclusively licenses, or otherwise assigns or transfers patents or patent applications to a party, we will require the party to agree in writing to be bound to Our Promise for those patents and for patents directly or indirectly issuing on those patent applications. We will also require the party to agree in writing to so bind its own assignees, transferees, and exclusive licensees.

If a court somehow overturned that, I wouldn't hold it against the patent filer.

> Why don't more corporations or patent lawyers advocate for this?

My opinion is it's because the patents have value as a weapon, not only for defense (this here is my disagreement with your original claim that these patents only exist for defense). De-fusing the weapon by using a legally binding license like this lowers the value of the patent in a potential purchase scenario. In other words: "money."

> I just think you are barking up the wrong tree trying to lay guilt trips on engineers for doing what their lawyer advised them to do.

Nah. If you do a bad thing, you are responsible for the bad thing you did. I think the OP can probably handle a little light scolding from some anonymous person on an Internet forum. My hope is that they, and other readers, learn from this mistake and don't do it again.


Perhaps you shouldn't hijack every thread about anything and make it about patents.

I replied to one comment thread. Perhaps you should put on your big boy pants and use the little [-] thing to minimize threads you aren't interested in reading.

> Because of this, there is absolutely no point in shaming someone for patenting a thing

Well I wouldn't shame someone whose job was to patent something absurd. I was just saying that this is not an invention at all, and any system that protects that "innovation" is a broken system.


I think the magic is in the context of Unicode. Which also makes it almost twice as ridiculous from my point of view. Because it seems to be doing exactly what unicode is meant to do.

Almost all filed patents are invalid.

But doesn't it say that the whole patent system is broken? I get the "you pay to file a patent, it's your problem if it's invalid in the end". But the side effect of that is that whether it's valid or not, it's a tool you can use to scare those who don't have the resources to go to court.

It's like those completely abusive non-compete clauses in work contracts (yes in some countries that's the norm). They are completely abusive and therefore illegal. But it still hurts the employee: I have friends who have been declined a job in a company because the company did not want to take any risk. The company was like "okay, it's most likely an invalid clause, but if your previous employer sues us it will anyway cost resources we don't want to spend, so we'd rather not hire you". So an illegal, invalid clause had the effect that the company who abused it wanted. Which means it's a broken system.


fun fact: dSLR lenses are patented all the time. Claims are basically "I made it and it works". And it's considered ok.

So whoever now owns that patent (Google? maybe some patent troll picked it up?) could, in theory, sue the author of this article for patent infringement, right? Even though they invented it independently and never once used or looked at your patent. Do you think you made the world a better place or a worse place by filing that patent?

_Can_ they sue them for patent infringement? They just described a technique (that you can see in the patent filing anyway) and not selling a product based on it. I think there's nothing to sue here. I'm curious is my understanding of this is correct.

“Except as otherwise provided… whoever without authority… uses… any patented invention… infringes[.]” 35 usc 271

One of the benefits of the patent system (that now seems to be far outweighed by negatives) is that patents are public information. Your invention is documented for all to see. I don't think that someone writing about public information is a punishable office, but IANAL

No. The author could not be sued for this successfully. All they did was write a blog post about an interesting technique. They could literally read the patent application and write a blog post about that, assuming the methods are the same.

What percentage of your actions are based around making the world a better place, instead of personal fulfillment or gain?


> All they did was write a blog post about an interesting technique. They could literally read the patent application and write a blog post about that, assuming the methods are the same.

Okay, change "sue" to "prevent from creating a marketable product without paying a royalty to the patent owner in return for having provided nothing of value." The point remains.

> What percentage of your actions are based around making the world a better place, instead of personal fulfillment or gain?

Many harms are unavoidable, but I make a point to at least not go out of my way to make it a worse place, for example by filing software patents. The company I work for provides financial bonuses for filing software patents, and I will never participate in that program. (I've even tried to convince the lawyers to license our patents similar to Red Hat's open patent promise, because they claim they are intended only to be used defensively... but no luck so far.)


consider how far you reach to make the world better.

1) thats really good im gonna, strive to keep it.

2) " " tell all and those who want will build one.

3) " " make lots and give them to everyone.


Please see my comment about about the sad necessity for patents

https://news.ycombinator.com/item?id=43026595


> Do you think you made the world a better place or a worse place by filing that patent?

Come on, what does this contribute to this conversation? The poster clearly is aware of the drawbacks of such patents, and didn't clearly play any role in filing the patent (they said "we … filed it," not "I filed it"). This kind of response just encourages people not to mention such things; it can't possibly change their past behavior, and, since Pebble the company per se doesn't exist any more, is also unlikely to change future behavior.


> The poster clearly is aware of the drawbacks of such patents, and didn't clearly play any role in filing the patent (they said "we … filed it," not "I filed it").

A person with the same name as that commenter is listed as an inventor on the patent.

> it can't possibly change their past behavior

Obviously, but it can change future behavior. Maybe realizing that they made the world a worse place by filing that patent will prevent them, or a reader of this discussion, from doing it again in future.


Well, given that the technique itself makes the world a worse place, anything that impedes its use is probably positive...

And, no, they couldn't do anything meaningful to the author of the article. They could get them ordered not to do it any more, and they could recover their economic damages... which are obviously zero.


As a wise man once said: "Don't hate the player, hate the game."

Where’d the game come from? Hint: the players.

First of all, it's not just a game, it's an outright battle to the death (of your company). Sure, you can choose not to wield patents, even in self defense, but good luck with that.

You can also choose to legally declare that your patents may only be used for defensive purposes. But no one ever does this, because they do not actually intend to use them only for defensive purposes. This is a bogus defense of software patents.

See my other comments to you. Sometimes the threat of a good offensive weapon is the best defense. It's kinda like a nuclear arms race

Nope. That’s not how piles of patents are wielded defensively by the big companies. They don’t protect their IP with defensive patents, they defend their company using the threat of using unrelated patents offensively against the attacker.

Yes, I know. That's what this part of my post means:

> may only be used for defensive purposes

It's done with a clause in the public license like "if you sue us, then we revoke this license to you and may in turn sue you." You retain the MAD defensive benefit of the patents, while also not hampering innovation with your patent's existence. If the patents truly exist only for defense, then there is no reason not to do this.


Do you think your comment made the world, and HN specifically a better place?

I think so , yes , it made me be re aware of the patent troll scam in the USA.

In fact it is your comment which to me seems a little hateful , yes the above comment also felt a little hateful

Hate doesn't counter Hate , I guess.


Yes, calling out unethical practices makes the world a better place by discouraging unethical practices.

Berating people for filing patents in self defense is not how we fix this problem. The government put these rules in place. Businesses have to at least accumulate patents to use defensively (you found a patent of yours that you think I'm violating? Well let me do a quick search through the patents I have...what's that? Nevermind, I'm not actually infringing your patent? Good, that's what I thought.)

Hopefully a wholly undefendable patent, you're essentially trying to patent the Unicode spec. The rest of it is perform an action in response to a text message which clearly isn't novel.

Would this patent cover just the encoding alone? The first sentence says: > A method, apparatus, and system relating to embedding hidden content within a Unicode message and using the hidden content to perform a particular computer action.

So, in my extremely unqualified opinion, just the encoding technique alone is not covered by the patent, only when combined with some action performed based on the encoding?


Just curious this seems like simple digital Steganography or maybe even even the same as Shannon's boolean gate work. Do you think the patent is defendable in court?

https://en.wikipedia.org/wiki/Steganography


you dont need 256 codepoints so you can neatly represent an octet (whatever that is), you just need 2 bits. you can just stack as many diacritical marks you want on any glyph. either the renderer allows practically unlimited or it allows 1/none. in either case that's a vuln. what would be really earth shattering is what i was hoping this article was: a way to just embed "; rm -rf ~/" into text without it being rendered. you also definitely dont need rust for this unless you want to exclude 90% of the programmer population.

I think the Rust is more readable for bytemucking stuff than dynamic languages because the reader doesn't have to infer the byte widths, but for what it's worth the demo contains a TypeScript implementation: https://github.com/paulgb/emoji-encoder/blob/main/app/encodi...

An octet is a group of 8 bits. Today we normally use the word "byte" instead. The term is often used in older internet protocols and comes from an era where bytes were not necessarily 8 bits.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: