It makes a lot of economic sense to use existing functional LLMS for data extens...

spacebanana7 · on June 29, 2023

> As the developer of a very high-powered tool, I may well wish to limit its use in many contexts. But, I never wish to limit the tool's usefulness ahead of time.

Exactly, content moderation is largely an application layer problem not a foundation layer one.

Imagine the problems of MySQL trying to perform content moderation for Facebook.

pksebben · on June 29, 2023

(the year is 2048. The camera pans across an office at Quantico, which is eerily serene. A messenger knocks on an important-looking door with a plaque that reads 'DIRECTOR')

Director: Come in

Messenger: Message from the Tulsa field office, sir. They're reporting that they've found a sex trafficking ring, but they're not sure what to do about it.

Director: Not sure? Arrest them, obviously. What's the problem?

Messenger: Well, they can't seem to secure a warrant. Some technical issue with the system.

Director: I know we migrated to a new system recently. Let's see if we can get this sorted.

(Director thwacks at the keyboard briefly)

Computer: Your request for "Child Sex Trafficking Warrant" has been found to contain content marked "Not Safe For Work". This violation has been reported.

Director: What the hell.

Messenger: Yeah, we tried to email you about it but the filters dropped the message. That's why they sent me.

Director: I'll deal with this. Let me make a call.

(Director picks up phone and dials)

Director: Hello? Hi, Paul. Yeah, we're having some issues with the new warrant system.... No, it's doing everything as advertised... yes, it's a lot faster and we've managed to lay off a ton of our data staff. The problem is with getting warrants; Me and my guys have been trying to get one but it keeps getting rejected... Oh, you know, some sex trafficking ring in Tulsa.... Hello?

Phone: Your call cannot be completed as spoken. Our automated systems have detected content related to sex trafficking. This incident will be reported.

Director: God Damnit.

(as the director holds the phone trembling in frustration, the power goes out and they are enveloped in darkness in the windowless room. Roll credits)

jstarfish · on June 29, 2023

You jest, but this is actually how frustrating it is to try to use ChatGPT in the domains of crime/fraud/cybersecurity.

It called me out recently as attempting to write malware. Which is true, but it wouldn't accept the plain explanation that I am authorized to do this by my employer, for deployment on their machines. Stonewalling is just making everyone better at carefully-crafting their inquiries so as not to arouse suspicion. ("As an AI language model, I cannot help you with your task in writing arousing malware...")

Unless you dial it back to a Swadesh list or something, language is too complicated to be used as a firewall for itself. People have always been able to talk their way into anything. Our prevention efforts are just training better social engineers, who call themselves "prompt engineers" now.

pksebben · on June 30, 2023

TIL about Swadesh lists.

It's not just a matter of complexity, either. Especially with English, you can say pretty much anything using any words - if you use the right combination of euphemism, analogy, poetic structure, context, etc.

As always, attempts at censorship produce awkward to hilarious to depressing results.

tracker1 · on June 29, 2023

I wish I could upvote this more than once. It truly feels like the direction we're headed in.

vessenes · on June 29, 2023

Yes, great analogy!

SparkyMcUnicorn · on June 29, 2023

The author of this article has provided several uncensored models, mostly Wizard and Vicuna. He's actually gotten hate mail/threats as a result.

https://huggingface.co/ehartford

1984HackerNews · on June 29, 2023

The author said (either on reddit or discord I forgot where I saw this) that he filtered the dataset for this the same way he did with his other uncensored models

ttul · on June 29, 2023

The phrase “As an AI language model..” was reportedly produced by GPT itself. Humans reported that phrase as a more palatable output than other options, hence the model was fine tuned to produce it reliably.