Hacker News new | past | comments | ask | show | jobs | submit login

The source mentions in a reply that they were able to reproduce this exact text multiple times through different prompt injection approaches, right down to the typo "you name" rather than "your name", which seems unlikely to happen if it were making it up out of thin air.

I wonder if "you name" is a load bearing typo that breaks something else if corrected, so they left it in on purpose.




The prompt for Bing Chat was previously reproduced by the same person as here, using the same trick. The Bing lead disclaimed it as inaccurate, though: https://twitter.com/MParakhin/status/1627491603731423232


> load bearing typo

I propose we standardise this terminology. It's too good to be neglected.


There are other examples as well, Referer[sic] in HTTP is one. It's really supposed to be spelled Referrer, but it obviously can't be changed now.


I love how “load bearing” is used here!!


I'm a big fan of "load bearing printf" but it deserves wider transferred usage.


Could be token compaction resulting in some loss of fidelity.


> right down to the typo "you name" rather than "your name", which seems unlikely to happen if it were making it up out of thin air.

Why is it unlikely? Why does prompting it different ways and getting the same result make it unlikely?


It's also very unlikely that an LLM would hallucinate a prompt with a spelling mistake in it. LLMs are really good at spelling.


That seems to be a fundamental misunderstanding of what LLM hallucinations are?

A hallucination, when it comes to LLMs, just means "the algorithm picking most likely next tokens put together a string of tokens that contains false information". It doesn't mean the LLM is having a novel false idea each time. If the first time it hallucinates it thinks that that misspelling is the best next-token to use, why wouldn't it keep thinking that time and time again (if randomness settings are low)?


Because for practical purposes they just don't make grammatical or spelling mistakes like that.

Obviously they're a black box so it's possible there could be some very rare edge cases where it happens anyway, but it'd be a complete fluke. Changing the prompt even superficially would essentially cause a butterfly effect in the model that would prevent it from going down the exact same path and making the same mistake again.


"Detective, why is it unlikely the witness is lying when several other witnesses say exactly the same thing? Detective?"


It's not other witnesses, it's the same system.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: