It's not suprising when you think what llms really are: when you "censor" them, ...

BoorishBears · on Aug 3, 2023

That's not how that works. Take some uncensored or "unaligned" models hallucinating racist things based on a name:

The default name for a person is John Doe. Anglo Saxon names in general are extremely common across the internet for non-nefarious reasons. So the tokens that make up "John" have a ton of associations in a wide variety of contexts and if the model hallucinates there's no particularly negative direction you'd expect it to go.

But Mohammed doesn't show up as often in the internet, and while it's also for non-nefarious reasons, it results in there being significantly fewer associations in the training data. What would be background noise for in the training data for John ends up being massively distorted by the smaller sample size: even tendencies for people to make racist jokes about the name.

-

People have this weird idea that OpenAI and co are aligning these models according to some hidden agenda but the reality is minorities are a minority of the training data for very obvious reasons. So if you don't "censor" them, you're not making them more truthful, you're leaving them dumber for a lot of tasks.

There's censorship beyond that which feels very CYA happening, but I really hope people aren't clamoring to sticking models that aren't intelligent enough to realize the tokens for John vs Mohammed should not affect a summarization task into anything even tangentially important...

bhouston · on Aug 3, 2023

> But Mohammed doesn't show up as often in the internet, and while it's also for non-nefarious reasons, it results in there being significantly fewer associations in the training data. What would be background noise for in the training data for John ends up being massively distorted by the smaller sample size: even tendencies for people to make racist jokes about the name.

I do a lot of astrophotography - https://www.astrobin.com/users/bhouston/ Very often you do not have enough data of specific features you were trying to capture -- they are just too faint and close to the noise floor. The solution isn't for me to just go in and manually draw in photoshop in what I think it should look like though - that is just making up data - the solution is to get more data or leave it as it was captured.

I think it is the same thing with these LLM models. Do not make up data to fill in the gaps, show me what is really out there. And I will be a big boy about it and deal with it head on.

BoorishBears · on Aug 3, 2023

So if you can't make out the feature... do you turn around and tell people you actually did make it out and share your imagined version if it?

Because that's what the LLM does uncensored. It makes up data to fill in the gaps.

Thanks for showing why not all big boys can actually deal with this stuff.

wmorgan · on Aug 3, 2023

John is not an Anglo Saxon name :-P