Hacker News new | past | comments | ask | show | jobs | submit login

Odd how many of those instructions are almost always ignored (eg. "don't apologize," "don't explain code without being asked"). What is even the point of these system prompts if they're so weak?



It's common for neural networks to struggle with negative prompting. Typically it works better to phrase expectations positively, e.g. “be brief” might work better than ”do not write long replies”.


But surely Anthropic knows better than almost anyone on the planet what does and doesn't work well to shape Claude's responses. I'm curious why they're choosing to write these prompts at all.


Maybe it would be even worse without it? I've found that negative prompting is often ignored, but far from always ignored so it's still useful.


I’ve previously noticed that Claude is far less apologetic and more assertive when refusing requests compared to other AIs. I think the answer is as simple as being ok with just making it more that way, not completely that way. The section on pretending not to recognize faces implies they’d take a much more extensive approach if really aiming to make something never happen.


Same with my kindergartener! Like, what's their use if I have to phrase everything as an imperative command?


Much like the LLMs, in a few years their capabilities will be much improved and you won't have to.


It lowers the probability. It's well known LLMs have imperfect reliability at following instructions -- part of the reason "agent" projects so far have not succeeded.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: