Hacker News new | past | comments | ask | show | jobs | submit login

Random aside to the other commenter's linked articles, I find it a bit coincidental that the supposed "kill switch" environment variable, yolAbejyiejuvnup=Evjtgvsh5okmkAv, decodes from UTF-16LE to UTF-8 as 潹䅬敢祪敩番湶灵䔽橶杴獶㕨歯歭癁 which google translates to "You can't do it without a soul."



Any even-length alphabetic ASCII string decodes to random Chinese characters in UTF-16LE. Digits and = unlock some Japanese hiragana, Korean hangeul and assorted punctuation, but those only make up a small fraction of the total.

For example, 'backdoor'.encode('ascii').decode('utf_16_le') == '慢正潤牯', which Google Translate turns into "Slow and positive", but it's just nonsense.


I'm naive to the translation tech space but is this sort of thing unique to languages like Chinese? I figured all this stuff was mostly solved. Like I wouldn't expect dflhglsdhfgalskjdf to have Google Translate output some grammatically valid Spanish output.


There is one difference between gibberish Chinese and Latin character sequences. In Chinese, each character indeed carry some meanings (like a word). So I guess the model may hallucinate some output inspired by these meanings. In the case "慢正潤牯" -> "Slow and positive", it actually translated the first two characters literally (慢 -> slow, 正 -> correct/positive/upright).

So equivalent English gibberish would be like "hast prank bibble done anut me me ions." Google translates this one to "对我而言,恶作剧已经完成了。" (To me, the prank has been done.) in Chinese -- very valid sentence, and "¿Me has hecho una broma a mí, Bibble?" in Spanish -- also seems valid.

I guess the model is (over) optimized to generate valid outputs. This can be a feature, so it still translates grammatically invalid but to some degree understandable text (like with typos or non-standard Internet language).


I think the Latin script might be somewhat protected because random jumbles of letters do appear as serial numbers and whatnot, but for other scripts, anything goes.

I say ҏӲҨЏ ҜъКѠ ЇЩіН гӞэѷ in "Russian", Google Translate says "Let's talk about it".


Amazing. How did you find it out?


I hadn't looked into that story before so was following the rabbit hole of articles and gists and stuff and saw that some referenced a kill switch via env variable, so I just tossed it into that CyberChef online tool using its "magic mode" and ticked the "intensive mode" box and it was the top result. Just commented because I hadn't seen it elsewhere and figure it might be a little easter egg of sorts.


It seems that Google Translate simply output garbage when you input garbage. So it should be indeed a coincidence to get this translation.


Wow I didn't realize what implicit trust I put in their translation output. Indeed I just tried some other Chinese -> English translation sites and they vary widely on what they output. Is it gibberish chinese characters these translators just guess on? Either way thanks for the insight I clearly put too much assumed faith in their quality/accuracy.


Right, completely gibberish. as a native speaker, I can recognize at most 4 characters, and not even one subsequence makes any sense.

Actually just by shuffling these characters you have a good chance to get some specious translations (adding a punctuation makes it more likely to generate a completed sentence): "祪癁番䔽䔽!" -> "I am so sick!" "獶獶祪灵癁癁癁!" -> "The soul is full of blood!"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: