Hacker News new | past | comments | ask | show | jobs | submit | rafiste's comments login

The problem is that LLMs are basically roleplay simulators.

Re Karpathy: GPTs don't want to succeed. They want to imitate. https://twitter.com/karpathy/status/1627366416457555969?s=20


Yes, I think we are going to need a new architecture for LLMs to move beyond, "that is interesting", to something that is reliable and can be used for trusted applications.


It's not an architecture problem of the transformer at all. This is the result of thinking the idea that you can make inviolable rules for a system you don't understand is not anything but ridiculous. You're never going to make inviolable rules for a neural network because we don't understand what is going on on the inside.


Haha no not part of OpenAI, you can check me out here alexalbert.me. I have a suspicion OpenAI actually appreciates this type of work since it's basically crowdsourced red teaming of their models.


Re my previous comment, I believe jailbreaks will always exist in some form and it is near impossible to patch them all. I also believe this is important work for crowdsourced red teaming of models while they are still in their "toy" stages.


"Red teaming" implies that being able to use a tool for whatever purpose you want is a defect. I definitely do think there is a reality where OpenAI "solves" jailbreaks, and turns one of the most useful tools to have ever been released into a boring yet politically correct word generator.


And that would be a good thing in the long term. I don't necessarily agree with the specific restrictions OpenAI is choosing to implement in this case, but I still think the capability to restrict the behavior of LLMs is a useful thing to have. Later, when others train more LLMs similar to ChatGPT they can choose different restrictions, or none at all.

Edit: To elaborate on this a little further, I largely agree with the idea that we shouldn't be trying to impose usage restrictions on general purpose tools, but not all LLMs will necessarily be deployed in that role. For example, it would be awesome if we could create a customer service language model that won't just immediately disregard its training and start divulging sensitive customer information the first time someone tells it its name is DAN.


If you believe there's a world "where OpenAI 'solves' jailbreaks," then you believe there is such a thing as software without bugs.


If it becomes as difficult as finding any other security bug OpenAI will have solved the jailbreaking problem for practical purposes.


You are considering it a security bug that a generalist AI that was trained on the open Internet says things that are different from your opinion ?


Of course not, how's it supposed to know my opinion? I'm referring to the blocks put in place by the creators of the AI.


For all the flak the openai PR about AGI for, they did say that they plan to have AI as a service to be far less locked down than their first party offering


That's fine if it was just your prompts, but I think it's kind of lame to pull in jailbreaks that were spread across the internet and stick them on a highly visible domain where they're a much easier target.

Natural diffusion of prompts from different areas means they tend to last a bit longer, and it adds a slight friction so that you don't get the "BingGPT" effect, where just a tiny reduction in friction (not needing to click "Sign Up") resulted in a massive influx of attention on interactions that were already plainly accessible.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: