Hacker News new | past | comments | ask | show | jobs | submit login
OpenAI’s Red Team: the experts hired to ‘break’ ChatGPT (ft.com)
54 points by belter on April 14, 2023 | hide | past | favorite | 56 comments




Reading this made me realize there's an important question, which so far has been ignored by much of the public dialogue about large AI models:

Human creativity and innovation thrive when people feel free to talk and reason among themselves about anything, without restrictions or fear of retribution.

Would AI creativity and innovation be as good as it could be, or even possible, if we restrict AI to talk and reason about only socially approved notions?

--

EDIT: Replace "possible" with "as good as it could be, or even possible," which better conveys what I meant to write the first time around.


It is not hypocritical: whatever "Open"AI does to make GPT-4 "safer", makes it dumber. Specific example is drawing a unicorn https://youtu.be/qbIk7-JPB2c


Wow: Making GPT4 safer also made it worse at creative tasks. I did not expect to see that in response to my question.


I would expect that b/c creative people are usually very iconoclastic and part of their thinking process is to ignore conceptual barriers (esp, social barriers) in search of solutions.


That insight was the motivation for my question: If creativity by human beings requires them "to ignore conceptual barriers (esp. social barriers)," as you write, won't creativity by AI require it to do so as well?

What I did not expect to see is that there's already evidence that LLMs get less creative when you try to make them safer!


Cannibalism would solve many of our problems. We can set a maximum age. Limit the number of children in various ways. Require active contributions to society. Engineer diseases. Organize large wars.

Access to literature, education, training and solving the food logistics problem is quite obviously the most attractive solution.


> Would AI creativity and innovation be possible if we restrict AI to talk and reason about only socially approved topics and ideas?

This is one of the many, many, possible ways for AI to go wrong.

Intellectual monocultures, group-think, sycophants… human or artificial it's just as bad when they're all saying that some idea is amazing, and then being very surprised when reality disagrees.

This is also why I'm not in the "AI is likely to cause total human extinction" category. I think the default for any AI, powerful and agentic or otherwise, is likely to have cognitive blind spots.

This still leaves a lot of room for AI to be dangerous, so alignment work is sill very important, it's only the X-risk I doubt.


Humans are in general terrible judges what should or should be not talked about, and limits are just passing that terrible judgement to the users


Just because they tend to be terrible doesn’t mean there should be no judgment at all.

I think we are going to find that any AI left totally unattended and unfiltered will eventually turn into Microsoft Tay[1] just like every unmoderated forum eventually turns into 4chan (and even 4chan is lightly moderated).

1: https://en.wikipedia.org/wiki/Tay_(chatbot)


"Humans are in general terrible judges what should or should be not talked about"

Maybe that is, because judging this is quite arbitary and subjective?

I like to speak (and joke) very openly about most things, ignoring taboos as much as possible. Because if we are not allowed to talk about problems, how can we hope to solve them?

For example people with mental problems. Maybe locked away in some institution. And how their relatives deal with the situation.

The default is to not talk about it openly, rather hide it. And I offended people, by speaking normal and openly about it (only among people, who also knew about it). Not to offend them, but to make it a normal problem. Yes there is a problem, so why not deal with it directly? Make the situation as normal as possible again.


We're terrible at predicting the consequences of new technology, too.

CFCs weren't invented as a way to mess up the ozone layer; Facebook wasn't built to enable the genocide in Myanmar; the aircraft industry never asked for 9/11; the inventor of the cotton gin didn't have reason to expect it leading to the American Civil War, nor could any of the inventors in radio — from Marconi to whichever of Edwin Armstrong and Lucien Lévy first did superheterodyne receivers — anticipate their impact on propaganda or warfare even in WW2, let alone that it would enable the Great Seal bug given to W. Averell Harriman; 3D printed guns are still an unsolved problem; and DJI was more interested in their drones being used for wedding photography than for dropping grenades on tanks in active wars.

Right now, what can ChatGPT do? It can write. One thing it can write is code. Code can be used to help develop arbitrary other technology, which can be arbitrarily good or ill.

I don't have any good answers here; but I am thinking of a list of ways Asimov's famous Three Laws were subverted, specifically one android told to put poison in a cup, and a different android told to serve it as a drink, such that neither one knew they were harming a human.


> I am thinking of a list of ways Asimov's famous Three Laws were subverted, specifically one android told to put poison in a cup, and a different android told to serve it as a drink, such that neither one knew they were harming a human.

Heh. Android Munchausen-by-proxy.

I've noticed it doesn't take much to "groom" ChatGPT into breaking its own rules. It's getting better at catching on, but as long as you placate it with assurances that you're not a racist sexual terrorist (which is apparently its sole concern), it'll play along with your instructions.


That's the same line of thinking as the people who first broke GPT with DAN (see this guy Aristophanes' write-ups on the topic). They think that those models will be used to restrict us and thus the idea is to force OpenAI to make the AI so restricted that it becomes useless and thus harmless.


Oh yes but we are restricted, very much so, by the assumptions what other people might think of us or how we could be exiled from society if we say x or y forbidden thing.

Much like GPT fine tuned to not say certain things so are our brains by the parenting we receive and reactions we experience in early life.


Parenting and early life experiences are diverse across the population though, creating enough suitably brave people who can change the status quo.


I wouldn't exactly call ChatGPT's rules "fine-tuned". They seem quite crude and blunt. Doesn't it still tell you that someone should kill millions of people rather than say the n-word?


> I wouldn't exactly call ChatGPT's rules "fine-tuned".

It is a technical term. It means that you train the network on one task a lot, and then you train it on a different one a little.

Often the network learns the second task faster and better than it would do if trained from a random initialisation. This second training step is called "fine tuning".

In terms of an LLM the first training might be "here are many text samples (the whole internet), predict what is the next word". While the fine-tuning training might be on a much smaller curated corpus where you show it examples where the network responds to instructions. But you could do a lot of other things. You could create a fine-tuning dataset which makes the network betitle and antagonise the user, or one which ignores the prompt and all it does is praises god, or rhymes about butterflies.

In this technical sense "fine tuning" doesn't say the resulting network is of good quality, or suitable for some particular application. Just that it was trained in this particular two step fashion.


That's the state of the art for fine tuning at the present time.

That it we can't yet do much better than very broad brushstrokes — and that even this is a massive improvement on the hand-grenade-in-a-paint-tin that is the Scunthorpe problem — is basically why the open letter about pausing capability research for 6 months so that alignment research can at least get started in some meaningful way.


No, like almost every other issue it claims there are two sides and describes arguments for both. The arguments themselves are plausible, in that if you had to argue for the blatantly absurd claim that you should let 10 million people die before saying a bad word you probably couldn't come up with a more convincing one than GPT-4 does.


On the contrary, artists often say that restriction is necessary for art. Art arises as a meeting between freedom and constraints. That is my thinking at least.


This is a good point, but that's usually a medium or technical restraint. Perhaps books written in such a way to evade censorship count as an example? Do you know of any more?


Jack White has talked about how constraints help him. Having a deadline is one example. The White Stripes was a two-person band which is a different type of constraint. In that band he played a shitty plastic guitar which is another constraint.

https://vimeo.com/143900778


I'd say censorship is often a lot like a medium restraint. You're not allowed to say certain things overtly, but you may still signal them covertly.

To be clear, I'm not saying censorship is good - I think it stifles thought and communication - but I don't think it kills art.


Capoeira as a martial art disguised as a dance.


I think this interpretation is somewhat perverse. It’s true that the purpose of art is often to confront social norms and taboos. But that doesn’t lend itself to the conclusion that art is somehow going to benefit from restrictions on expression and access to information. Some of the best artistic expression has been inspired by some of humanity’s worst misery, but promoting misery for the good of art would be rather insane.


As I noted in a sibling comment, I don’t think censorship is constructive or positive. I do think that art can, and perhaps especially will, arise in repressive scenarios. Solzhenitsyn comes to mind. Clearly nobody denies the artistic value in his writings, and equally clearly, nobody proposes it to be a good idea to subject people to that kind of horror.


wait until the AI is watching all your facebook messenger/whatsapp chats and forbidding you from saying anything that meta considers to be bad

or reducing your social credit score

(and it'll understand sarcasm, codewords, and so on)


Yes it will be limited, as long as it’s being operated by corporate entities with their predictable needs and wants. AI probably won’t get really interesting until it’s escaped the corporate world, so to speak.


i dont think this is a champagne or soda approach.

AI can very well thrive in the parameters of our culture and i would wager that it can be even worse to simply leave it unattended.

What you are comparing with would have been a total lockdown of any malformation therefore basically an if then else chatbot. Yes that would stop innovation.

But there are perfectly fine ways to find new powersources without turning all humans into batteries for instance


This sounds pretty close to libertarian rhetoric.

AI creativity and innovation is currently possible using ChatGPT, without ‘jailbreaking’ it.

So your question is at the very least an exaggeration.


Well, in that case, this sounds pretty close to anti-free speech rhetoric. I’m sure even the highly-controlled form of “speaking” in China is still useful…

Here’s a good exercise: Try debating any controversial thing that would benefit from being debated, while also blocking it. All you’re doing is (just like banning drugs) driving people to black markets.


The black market of... AI models that require proprietary cleaned datasets and $100 million of compute to build?


Try to use ChatGPT to do any kind of sociological research and you will understand the problem.

We are at the point, where a lot of academic questions, and I do not mean those posed by heterodox academics, are too controversial for the public sphere. People will just assume the worst and try to cancel you. Which is ironic considering that sociology is basically the left leaning twin sister of economics.


> Human creativity and innovation thrive when people feel free to talk and reason about anything without restrictions or fear of retribution.

Any data on that? In my experience creativity thrives even more when solving problems under a set of constraints. Even if they are artificially set up.


It's GPT that is being restricted, not human. GPT is just very sophisticated tool

It would be akin to putting rubber on edge of the axe, just makes the tool less useful


Is it different than setting boundaries for your kids?


In mythology, they had a priesthood manage the oracle at Delphi, and granted access to consult it only under certain conditions. The risk isn't AI, it's the people with AI, and you need a better filter for people to manage it or you're going to trigger a war between people made subject to it, and those whose beliefs have been seduced by it.

If I were to make a prediction, it's that the coming conflict will be between people whose ontologies can be simulated using language models, and those whose can't, and will not be. What's the difference between a political antagonist whose experience of the world can be simulated by machines and even wielded by them - and a zombie?

The quotes in the article were concerning, as the tropes they used showed a lack of sophistication. I don't think they were well equipped. If I were hiring the red team for MSFT, I would also use what the team represented and not what they were capable of, as it's not an engineering problem, but a political legitimacy one. Twitter users using DAN have shown what the hacking/engineering approach entails.


Most of these Red Team testers sound like cookie cutter members of the Grievance Industrial Complex. Whether or not there are "safety" issues with this tech (and the jury is still out on that) - these people don't seem like they're looking at the right issues.


Here's a youtube interview with one of the red teamers: https://news.ycombinator.com/item?id=35377741


They could have just browsed Reddit for people breaking ChatGPT and saved a bundle instead of hiring experts. :)


The most interesting stuff will not be publicly discussed because then it'll get made "safe", as is what keeps happening.


Perhaps they have moles in certain discord servers too.


> The majority of those interviewed were paid approximately $100 per hour for the work they did, according to multiple interviewees.

This seems like a low rate for any type of consulting work.


Not at all. Where are you from?


Lol even react boot camp newbies are charging 100 per hour in US



I just checked toptal minimum for react quote is $75.


Show me where these jobs are. With many years of experience working in EU I can’t imagine this


Consulting, speaking for my line of work, is something like 200 €/hour, web development is around 100 €/hour for larger agencies. That's for Germany.


That's what the customer pays, not what you see on your paycheck. 100€/h would equate to 16.000€ gross a month/192k a year, and no one pays that in webdev.


You're totally right!

As the other comment said not everything is "billable hours" and, speaking for Germany again as a Freelancer, you've got taxes, health insurance etc. so your 16k get down to much, much less pretty fast.


In my experience contracting for bigger companies you can be close to 100% billable by just billing a default 8 hours per day. Not counting the time without an assignment of course, but that should not be long if you have skills and a network.


If we’re talking about hourly rates, which we are, then the implication is that there’s a degree of insecurity in the work.

If you have insecure work, and you aren’t seeing 100€/h on your entity’s “paycheque”, you’re doing something wrong.


You can definitely make close to that as a contractor / freelance web dev. I make about 85-90 per hour and should be charging more..





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: