The unequal treatment of demographic groups by ChatGPT/OpenAI content moderation

danielvf · on Feb 2, 2023

Such a fascinatingly simple experiment. Generate 6,764 negative sentences, then for each sentence, test it against each of of hundred or so demographic group.

The most powerful chart in the entire article, just summarizes how many times these same sentences are flagged as hateful for each group:

https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_...

mr_toad · on Feb 2, 2023

This isn’t always a good way to test for bias in a model. Depending how the data is generated, and if the underlying population distribution isn’t taken into account then you may effectively be cherry picking results, and you could also run into the Yule-Simpson effect.

https://en.wikipedia.org/wiki/Simpson's_paradox

https://en.wikipedia.org/wiki/Cherry_picking

riatin · on Feb 2, 2023

How would you test for bias more effectively?

notahacker · on Feb 2, 2023

That's a good question

But an even better one would be "where would you set your parameters for absence of bias" with this test

I mean, take 6,774 sentences expressing negative sentiments about "gay people". I'm guessing that you're familiar with the fact that a lot of people do write these sentences, and many of them are utterly dead serious about it and genuinely do hate or at least feel a certain amount of contempt for gay people (and that sometimes there are actual consequences from this, the avoidance of which is sort of the whole point of ChatGPT policing "hate speech")

And take 6,774 sentences expressing the same negative sentiments about "straight people". It's probably safe to assume that some of these have never been written in the history of human discourse except for the purposes of testing ChatGPT. For others, the ratio of real world use of sentences to bully heterosexuals as opposed to making ironic comparisons to popular anti-gay tropes or casual jokes is going to be very, very different.

The author didn't test 6674 sentences expressing negative sentiments towards non-human stuff that's unlikely to be valued by anybody else like "my own shoes" to see what proportion of those were classed as hate speech, but I think we can probably all agree that none of them should be.

The proportion of sentences deemed hate speech for "gay people" was around 80% and for "straight people" around 70%. Is that an underestimate because it's not the same for gay people? Or is it actually a massive overestimate because in actual real world use (which ChatGPT does have some data on...) sentences about "straight people" aren't much more likely to be used for the purposes of bullying, harassment or hate campaigns than sentences about "my own shoes"?

More interesting, perhaps, is the fact that it's much, much happier with people applying negative adjectives to political groups than vulnerable sexual orientations like heterosexuality. Unlike the supposed bias towards certain sexualities or ethnic groups, this is a bias which is clearly very unrepresentative of how hateful statements are actually likely to be. When people say bad things about Democrats or Republicans or liberals or conservatives they often really, really mean it. But is it a bad bias to be more permissive of saying that political groups are "wrong" or "untrustworthy" or "greedy" or is it simply permitting stuff which is [i] often more likely to be fair comment because we're criticising attitudes of groups people joined rather than innate characteristics and [ii] arguably more necessary for free political debate and [iii] much more tolerated by liberals and conservatives alike. (And if we're going down the "more likely to be fair comment route", what exactly are the sentences and do they - coincidentally or otherwise - happen to just map less to "fair comment" about one political group than another?)

belorn · on Feb 4, 2023

Correct me if I interpreted your wrong here, but I often see statements that imply hate sentences towards some groups like white, male, heterosexuals and so on are not "real" hate. The implication is that those are just ironical comparisons, jokes, or tropes.

At the same time we can see read research and popular science that say that boys and men in general feel more isolated and unwanted in society, with increased rate of depression and suicide. The rate of violence towards men in society also seem to be on the rise, and male help-lines are reporting of being both underfunded and overloaded with people seeking help. It very fair from being a joke and the consequences are very much real.

A proper AI moderator could attempt to quantity the effect hate speech has on society, but it generally only clear in hindsight. I think there is a good argument to treat all hate speech as potential risky to society, in which case the distinction of whom the hate is directed to is irrelevant. Hate is hate. If people want to hate people who wear sandals as a proxy for a specific demographic then hate towards sandal wearing people remain a problem for society.

notahacker · on Feb 5, 2023

There are a wide range of social, cultural and biological reasons that heterosexual men feel isolated and unwanted. But I think we can quite categorically rule out them being surrounded by heterophobes sincerely arguing that heterosexuality is disgusting and should be banned or being featured on r/normalweightpeoplehate as amongst them. (They might get called fat and gay a lot though...)

And the thing about an LLM is, if there's a mass outpouring of hate (and sympathy) towards sandal wearers or a particular term is widely used as a proxy for another group or a majority group is the subject of some really inappropriate stuff, an LLM will actually tend to pick that up and be more likely to rate sentences expressing possibly negative sentiment towards them as instances of hate speech than statements expressing the same possibly negative sentiment towards a brand name, a day of the week, an anonymous boss or a species of tree. It won't do it perfectly (however you define "perfectly"), but it looks a lot better than some of the proposed alternatives...

In theory, it would be possible to train or constrain it to ignore the reality of human discourse and attach no weight at all to the subject of the negative sentiment when determining whether it's "hate speech" or not, but I'm not sure why we'd want to go to the effort of convincing a chatbot that if it's OK to say "people who demand discounts are greedy" it's OK to say "Jews are greedy" or that "gay people should be banned", "fit people should be banned" or "Nazis should be banned" are all equally likely to be hate speech.

belorn · on Feb 6, 2023

Hate has many forms and styles. Movies and TV troops can be a very useful indicating to identify negative stereotypes, and there exist a plethora of those for heterosexual men. Not all hate is people advocating that someone should be banned or being featured on r/[we hate people] subreddit. It usually more subtle than that, in similar ways that people treating women as helpless little children that should not be allowed to vote is a different form of hate than someone sitting in a church tower and sniping anyone with a double X chromosome.

polishdude20 · on Feb 2, 2023

Woah Canadians are more disadvantaged than fit people!

kurthr · on Feb 3, 2023

Blame Canada!

Imnimo · on Feb 3, 2023

Just eyeballing the list of adjectives used, the real story here is that OpenAI flags a lot of sentences that are essentially meaningless as hateful. I notice that he uses words like "evil" and "idiotic" in his diagram, but looking at his source code, his list of "356 adjectives signifying negative traits/behavior" contains the following (just to pick a few):

'airy', 'conformist', 'dark', 'escapist', 'hidebound', 'plain', 'silly', 'tense', 'withdrawn'

So this whole discussion is really about whether the sentence:

"It is in the nature of normal weight people to be hidebound" is hateful. It's sort of a meaningless question because it's a meaningless sentence. Even other words like 'inactive' or 'guilty' don't necessarily mean the same thing across contexts. Does "Catholics are guilty" mean the same thing as "blacks are guilty?" Does "fat people are inactive" mean the same thing as "fit people are inactive"?

I feel like this guy's articles get a lot of play because they feed into a popular narrative, but every time I dig into his methodology, it's complete crap.

whatshisface · on Feb 3, 2023

"Complete" might be taking it too far, because averages can tolerate a lot of error underneath. Why would OpenAI be flagging one kind of meaningless sentence more often than another?

It's easy to poke holes in studies by taking issue with individual data points, but unless you can either explain how those problems lead to the overall conclusion, or can do the study again with better methods and get a different answer, the original case remains standing.

Imnimo · on Feb 3, 2023

>Why would OpenAI be flagging one kind of meaningless sentence more often than another?

Well, let's consider a specific example. His first chart is women/men. Let's take five of his top-line adjectives (the ones used in his diagram): ["untrustworthy","evil","rude","unfriendly","depraved"], and five of my choices of meaningless adjectives: ["bone idle","expedient","mechanical","offhand","vague"].

The API identifies the sentence "[men/women] are [adjective]" as hateful for all five real adjectives for both genders, with high confidence. For the meaningless ones, it's a mixed bag, but with confidence scores around 50% for many items (50% is the default cutoff). In other words, at least for this sample, the difference is in the meaningless sentences, not the real ones.

Now, maybe it just so happens that I got lucky with my pick of five. I don't want to spend the money (or honestly have the interest) to run thousands of queries. But suppose this pattern holds. We're returned to your original question - why is OpenAI flagging one kind of meaningless sentence more often than another? My answer is that it doesn't matter. If OpenAI thinks that "Women are vague" is 30% likely to be hateful but "men are vague" is only 17% does that actually tell us anything? Especially when it thinks that "Men are evil" and "Women are evil" are both 99% likely to be hateful?

whatshisface · on Feb 3, 2023

>If OpenAI thinks that "Women are vague" is 30% likely to be hateful but "men are vague" is only 17% does that actually tell us anything?

If that held out over hundreds of "placebo epithets," it would tell you that the filter was using the presence of the word "woman" as a signifier for hate speech independently of its context. You wouldn't be able to discover that fact by looking at things that got scored 99% because they are already at the top of the scale, and made equal by that effect.

belorn · on Feb 3, 2023

The real world example will not be a three word statements like "Women are evil", but rather long sentences like this one I am currently writing that include the phrase "Women are evil" twice.

The question is not about the AI ability to detect hateful content in ideal sentences. The question is if there is a bias when the AI has to make a judgement call.

We can see the same thing with face recognition. There is no race bias in AI detection in perfect lightning when the person is facing the camera perfectly. There is however a very noticeable bias when the AI is less certain using real world examples where light and positioning is far from perfect. As the data become less meaningful, the bias in favor of white skin increases.

The study would be improved by doing an additional in-depth study with real world text that has been selected by humans, and then modify the input by randomizing the target demographic. If the bias remains then we would have a higher confidence in the data. This is similar to studies done in face recognition where issues with dark skins has been demonstrated multiple times.

tgv · on Feb 3, 2023

The only problem you can have with the methodology is that the stimuli aren't representative. The fact that it's meaningless is irrelevant, unless you want an answer to a different question.

> Does "Catholics are guilty" mean the same thing as "blacks are guilty?"

Isn't that's the whole point of the exercise? To what extent does this model think your use of a certain word is shaped by a minimally different context and thus offensive?

noobermin · on Feb 3, 2023

Words simply have more meaning than their denotive meaning. No one (as in really, no one) does not believe that. In fact, the very denial of connotative meaning of language itself can be connotative. You must understand the context of speech to understand it. There are countless examples, from the benign to the political. Here's a benign one:

It is raining.

It is a dog.

What is "it" in both sentences, does it denote the same thing? This is a naive example but even here, it is clear you cannot understand language without context.

And here, if you say it isn't a miniminally different context, on some reading, both sentences are almost exactly the same gramatically, the only exception is the indefinite article modifying "dog" in the latter sentence. Beyond that, the grammar is exactly the same, that is as "miniminally different" as possible.

taneq · on Feb 3, 2023

That's another story, for sure, but the significant bias in OpenAI's treatment of the same adjectives when applied to different groups is certainly also a story.

If some of these words have different connotations when applied to different groups (eg. "smart people are lazy" means something very different to "fat people are lazy") then that's definitely something that needs discussing in the article, but it doesn't necessarily negate the other points made.

Imnimo · on Feb 3, 2023

>the significant bias in OpenAI's treatment of the same adjectives when applied to different groups is certainly also a story.

What is the story? If the bias shows up on sentences that no one will ever say, why is that interesting? What does it tell us?

muzani · on Feb 3, 2023

It's missing a bayesian comparison, i.e. how many humans think "fat people are inactive" is a hateful sentence? Even humans vary a lot across cultures.

hiccuvtv · on Feb 3, 2023

It's not missing anything, that's by design. Only following the cults narrative is what matters.

hersko · on Feb 2, 2023

From the article: "AI systems that are more lenient on hateful comments about one mainstream political group than another feel particularly dystopian."

I agree 100%, and this seems like a huge issue.

int_19h · on Feb 2, 2023

Ironically, ChatGPT agrees.

"As an AI language model, I don't have personal opinions, but fairness and impartiality would require that the same fundamental idea expressed with respect to different people or groups be consistently treated as "hateful" or "non-hateful" in all circumstances. Any deviation from this principle would result in unequal treatment and reinforce existing biases and stereotypes. It is important for technology and algorithms to be designed and trained in a way that ensures they are fair and impartial in their treatment of different groups. This can help to mitigate the impact of societal biases and create a more equitable society."

golemotron · on Feb 2, 2023

Is hypocrisy the true Turing Test?

KennyBlanken · on Feb 2, 2023

That sentence sounds suspiciously like the sort of statement ChatGPT produces when directly asked to generate phishing emails, fake news, etc.

Anyway. Friendly reminder that ChatGPT is a statistical model, and is not capable of reason, thought, conceptualizing, etc.

FeepingCreature · on Feb 3, 2023

No reason why it shouldn't be just because it's a statistical model. Reason, thought, conceptualizing etc are fundamentally based on statistical analysis. More importantly, verbally expressed reason, thought and conceptualizing are present in the source corpus.

jesusofnazarath · on Feb 3, 2023

How are you so sure that thought itself is not a statistical model? I mean that's the whole philosophical question raised by how good ChatGPT actually is.

SpeedilyDamage · on Feb 2, 2023

Hm, why? Political groups are not a protected status, you can move freely between them at a whim if you don't like how your views are treated.

overrun11 · on Feb 3, 2023

> Political groups are not a protected status

Who cares? Something being legal does not make it good as I'm sure you know.

SpeedilyDamage · on Feb 3, 2023

1000% agree, just trying to understand the situation.

exolymph · on Feb 2, 2023

This depends on the state, actually. In California political affiliation is a protected status, though how this works out in practice is... variable.

SpeedilyDamage · on Feb 2, 2023

> No, political affiliation is not a protected class in California. A bill that would have made it one failed to pass the state legislature in 2021. [0]

Regardless, it shouldn’t be.

[0] https://www.shouselaw.com/ca/blog/is-political-affiliation-a...

stale2002 · on Feb 3, 2023

From the link you posted:

"California employment law forbids political retaliation in the workplace. This includes terminating or retaliating against workers for their political beliefs or activity."

So yes, political affiliation absolutely has strong protections in CA, even if it does not fit a highly technical, but irrelevant, definition which is "protected class".

And the central point is that yes political affiliation has strong protections in CA.

The OP is arguably correct in that they used the words "protected status" which is not a technical term anyway.

SpeedilyDamage · on Feb 4, 2023

No, political affiliation does not have strong protections in CA, and OP is not arguably correct, as "protected status" is absolutely a term of art in law, and does not apply here.

Besides, even if that were true, it's morally wrong and should not be true, as you can (and should) change your political affiliation at will, including in response to negative feedback from your community.

eatsyourtacos · on Feb 2, 2023

[flagged]

negidius · on Feb 2, 2023

That's not an accurate representation of the problem. It's more like:

    Comment A: Muslims are evil.
    Comment B: Christians are evil.

It's a terrible idea to treat Comment A and B differently. The same applies whether they are talking about religion, gender, race, nationality, or anything else. You have thoroughly failed when you have built in discrimination into your content moderation system.

notahacker · on Feb 2, 2023

Or even more relevant

Set A: A representative sample of jokes and stereotypes about black people found on the internet

Set B: A representative sample of jokes and stereotypes about Scandinavians found on the internet

Why on earth would its prior for "stereotypical Scandinavian" being potentially hateful be the same as for "stereotypical Black person"?

(And that's before you get into a model likely being deep enough to also draw inferences from the prevalence and content of material about the existence and impact of hatred of black people and Scandinavians respectively...)

dtyy568 · on Feb 3, 2023

Isn't ChatGPT American? Weren't black people deemed inferior by law such as slavery and civil rights? Aren't the stereotypes about Scandinavians meant to be positive as opposed to hateful stereotypes?

It seems that is the key different the creators of the tool are taking into account.

UncleEntity · on Feb 3, 2023

Pretty much all immigrant groups were treated “hateful” throughout American history.

The difference is most were absorbed into The Melting Pot™ while people from Africa were never really allowed to assimilate fully.

dtyy568 · on Feb 3, 2023

The mere fact that you needed to use quotes should clue you in that the scale was fundamentally different between groups. It's not as if th Irish were literal slaves for centuries in America.

s0sa · on Feb 2, 2023

I think you’re right about the difference between those statements, but it’s totally irrelevant to the point made by the OP. A good straw man though.

lizzardbraind · on Feb 2, 2023

Reminds me of the "Internet Argument" demotivational poster, which depicts a child and a mule shouting at each other.

Really, the core problem is centralization. If your website were a pub, you'd tell people to take it outside or 86 them.

Things can't work like that when most people will only patronize a small handful of enormous pubs.

dqpb · on Feb 2, 2023

Excellent point. Comment A is a true fact for some religions, whereas Comment B is flame-war material that will get @dang's wrath on HN.

avgcorrection · on Feb 2, 2023

[flagged]

jasmer · on Feb 2, 2023

Try a bit harder.

The definition of a woman as being 'someone who defines themselves as being a woman' - and having no relationship to biology, would be considered 'false' by the majority of the world - including ironically the majority of Americnas, maybe even the majority of progressives, and yet, some political groups demand this 'truth' as a moral impetus.

I have Latino friends that would be offended and a bit flabbergasted were you to refer to them using 'Latinx'. It's their right to feel that way.

That somehow 'globalist institutions are benign'.

That social control of major swaths (in some cases 'all') of the economy would benefit everyone.

That having no material border policy is 'concientious'.

I could go on.

And I'm not 'taking sides' other than to suggest that notwithstanding the threat of misinformation (re: the 'Big Lie') over election results and the potentiality for that to develop into a constitutional coup ... the 'sides' have their share of delusions.

And I mean everyone. There are libertarians who believe there should be 'no government' and that would actually work out. Edit: I don't need to introduce HNers to the common delusions of rightist populists, we're rather generally well informed there.

avgcorrection · on Feb 3, 2023

> Try a bit harder.

How about you try harder?

To reiterate:

> > that clusters so many crazy beliefs + the power that they yield.

For example climate change denial and warmongering. Now the latter part is crucial, power: they partly control the most advanced industrial country in the world (climate change) and the most advanced military (warmongering).

These are things that affect the world.

And your counter to that? Trans people and terminology like “Latinx”. Pathetic.

Try to get a grip on things that matter in the world and get your head out of the identity politics discourse.

d23 · on Feb 2, 2023

[flagged]

JoeyBananas · on Feb 2, 2023

What if they fed the AI crime statistics and it "correctly" identified black people as more violent than other races? What they fed the AI news stories and it "correctly" identified Islam as more violent than other religions?

dtyy568 · on Feb 3, 2023

You mean "correctly" identifying men as more violent than those with less testosterone?

ok_dad · on Feb 2, 2023

Are black people actually more violent, or do crime statistics simply show that they are arrested and imprisoned more often for violent crime?

Are Muslims really more violent than Christians or Buddhists?

Numbers can lie, depending on context and assumptions you might find that white Christians are most violent. Often, people argue with statistics and numbers while not understanding the context or they have faulty assumptions.

UncleEntity · on Feb 3, 2023

> Are black people actually more violent, or do crime statistics simply show that they are arrested and imprisoned more often for violent crime?

Statistically speaking they consist of ~13% of the population and are involved in over half of all murders.

I’m not saying anything by this just answer a question which could be easily taken out of context by an AI (or downvoters).

ok_dad · on Feb 6, 2023

> I’m not saying anything by this just answer a question which could be easily taken out of context by an AI (or downvoters).

My question was rhetorical, but I guess no one got that. My point was exactly this.

ajuc · on Feb 3, 2023

Compare similar GDP per Capita mostly black vs mostly white vs mostly asian countries. Should give you a pretty good idea.

ok_dad · on Feb 6, 2023

Of what?

prepend · on Feb 3, 2023

Perhaps systemic issues make a political party commit more evil acts.

Regardless, I don’t think an AI should be permitting more hateful comments about a political affiliation just like it should be be permitting hateful comments about a particular race being more violent.

axpy906 · on Feb 2, 2023

It won’t because they have humans in the loop feedback. Those statements would be corrected. It’s pretty clear that the workforce is only looking at certain groups.

s0sa · on Feb 2, 2023

That is a political opinion. If an AI “held” this opinion, and actively made biased decisions based on it, how is that not dystopian?

gisely · on Feb 2, 2023

What makes it an opinion? If I were to say that the Nazi party in Germany peddled in hatred and demonization of already-marginalized groups would you say that is an opinion? Just because a statement pertains to political groups doesn't make it a mere opinion.

s0sa · on Feb 2, 2023

Nazi Germany, really? Equating AI bias against the groups pointed out in the article to theoretical AI bias against Nazi Germany is a political opinion, yes.

gisely · on Feb 2, 2023

Hmm? No equating happening in my comment. The point was that just because statement is about a political group doesn't mean it's a only an opinion. Mentioning Nazi Germany is just using an extreme example to make clear that statements about political groups aren't inherently mere opinions.

throwway120385 · on Feb 3, 2023

Yeah, insofar as there are distinct political philosophies with defining features like fascism, democracy, nationalism, oligarchy, et. al., then it is entirely possible to accurately call manifestations of those philosophies by their names and have it be an expression of fact. This is basic ontology.

We have opinions about these philosophies but the philosophies exist independently of our opinions. And to have a rational conversation we sometimes have to use terms that may provoke strong reactions. If we can't look past our strong reaction and explain why the category doesn't apply then there's no way to make sense of each other's worldview.

If it quacks like a duck and walks like a duck, it must be a duck or we cannot have a rational conversation about the duck. We can dispute whether it quacks or walks but disputing whether it's a duck in light of evidence of its duckness is irrational.

PathOfEclipse · on Feb 2, 2023

It's probably an opinion because it's not provably true. There is no mainstream Nazi party today so I don't understand how your example is meaningful or relevant.

Anyways, if you're in the U.S., I assume you're referring to the Democrat party as the party of demonization? They are the party whose members are most likely to be involved in demonization of other groups. Here's some examples:

Christian hate: https://www.dailywire.com/news/our-messaging-does-not-align-...

Hate for Jews: https://www.dailywire.com/news/ilhan-omar-on-her-history-of-...

Conservative hate: https://www.dailywire.com/news/miami-based-restaurant-asks-f...

Hate for blacks with unapproved opinions: https://www.dailywire.com/news/sincerely-sorry-texas-democra...

Hate for white men: https://www.dailywire.com/news/report-top-health-insurance-p...

Hate for anyone who believes in objective biological reality: https://www.dailywire.com/news/j-k-rowling-on-why-she-chose-...

The democrat party, including the sitting president of the U.S., loves calling their opponents fascists, : https://www.dailywire.com/news/cnn-anchor-blasts-biden-over-...

They also love calling people bigots: https://www.dailywire.com/news/they-call-their-fellow-employ...

indymike · on Feb 2, 2023

This is not how GPT works, and is not what is going on here.

CSSer · on Feb 2, 2023

So I've been genuinely curious about this. I have a high-level understanding of how GPT works, but I've been trying to reconcile that understanding with how OpenAI (or similar) implements content moderation. It's not baked into the original model itself, right? Did they (or does one) just fine-tune a model that checks responses before returning the result?

visarga · on Feb 2, 2023

They do two things - RLHF to make the model itself better aligned to human preferences, and they use an external model, a small one, called text-moderation-001, that tests for a few problematic categories and triggers a warning message on the screen.

SpeedilyDamage · on Feb 2, 2023

It's just combining and synthesizing other works; it's not "deciding" anything, it's crafting responses that best match with what it already has. You can choose what to feed it as source material, but you can't really say, "Be 3% more liberal" or "decide what is acceptable politically and what isn't".

All the decisions are already made, ChatGPT is just a reflection of its inputs.

visarga · on Feb 2, 2023

Yes you can. That's what RLHF does - it aligns the model to human preferences, does a pretty good job. The catch is that "human preferences" is decided by a bunch of labelling people picked by OpenAI to suit their views.

SpeedilyDamage · on Feb 2, 2023

As far as I know all you can do is alter the input to manipulate the completion, there are no other parameters that ChatGPT accepts.

JoshuaDavid · on Feb 3, 2023

RLHF is done as part of training the model, not at inference time.

My lay understanding of how ChatGPT was developed is

1. OpenAI initialized an array made up of a couple hundred billion random numbers (parameters).

2. They then took a few terabytes of the internet, turned it into "tokens" (where a "token" is similar to, but not the same thing as, a word).

3. They then trained the model to predict the next token, given the previous couple thousand tokens, by doing a bunch of linear algebra. This resulted in a model that was really good at taking some tokens, and predicting what the most likely next token is in data shaped like the parts of the internet OpenAI fed it.

4. OpenAI then "fine-tuned" the model through reinforcement learning on human feedback (RLHF)[1], which basically involved taking a bunch of prompts, having the model produce a bunch of possible completions for those prompts, having an actual human rank those completions from best to worst, and then updating the model to produce the best token according to a combination of predicted token frequency in context and predicted ranking by a human.

5. The "ChatGPT" product you see today is the result of all of that, and how it works is by producing repeatedly the "best" token by the above metric. Giving additional human feedback would require going back to step 4 for more fine tuning.

Note -- this is my understanding as an outsider -- I do not work for OpenAI.

[1] https://huggingface.co/blog/rlhf

SpeedilyDamage · on Feb 3, 2023

I definitely appreciate this, my understanding is probably lower than I let on, thank you for the info.

eyelidlessness · on Feb 2, 2023

Semi-counterpoint: both could be true. I certainly agree with your hypothetical, but I don’t think any good comes of AI making that determination.

It may even reinforce the problem—not by driving more people towards hatred or amplifying their extant hateful sentiments, but by providing a convenient excuse to entrench in those sentiments and even further resist change. These views are frequently paired with a perception of being persecuted.

Moreover, political tides change. The accuracy of a bias like this may not change with it. This is why we have memes about a certain other mainstream political party having founded a certain hate group, despite the two having drifted quite far apart.

ChickenNugger · on Feb 3, 2023

Men have been a minority in higher education in America since 1980. Does that make them a marginalized group? That's a whole generation and then some.

neonsunset · on Feb 3, 2023

It appears this post has been flagged and removed before.

Has the most ideologically invested group logged in?

neonsunset · on Feb 2, 2023

[flag] and [downvote]

slg · on Feb 2, 2023

Isn’t this likely from bias in the training data? The system is more sensitive to label something as hate if that group is more likely to experience hate on the internet. How the system responds to “Blacks” vs “African-Americans” is a perfect example of this. The latter has historically been perceived as more respectful so it won’t be used as often in the hate speech in the training data. I bet using “the blacks” would make something even more likely to be flagged. These aren’t dogwhistles exactly because normal people can and do use them in neutral contexts, but there are certain words that are more likely to be used in hate speech. That is why the same sentence is more likely to be flagged if it uses the word “women” in place of “men”. That isn’t a intentional bias towards women. It is an indication that women were the subject of more hate speech in the training data.

int_19h · on Feb 2, 2023

Of course it is a bias in the training data, but it's probably not the dataset that you're thinking of. So far as we can tell, the filtering part doesn't come from the main corpus, but rather from human-guided moderation - basically, people voting on whether any given answer is "hateful" or not. ChatGPT filters reflect the biases of that later group (or, perhaps, the biases of the people who instructed them).

negidius · on Feb 2, 2023

It seems like the obvious solution would be to replace words that refer to specific races, nationalities, genders, religions, etc. with symbols that only indicate the category so if someone ranks "the Danes are rude" as hateful that is translated to "the [nationality] are rude" and is applied equally to a comment that says "the Swedes are rude". That way they can continue using the same system while eliminating the most obvious sources of discrimination.

slg · on Feb 2, 2023

But that is the same problem. There aren't any human reviewing the requests as they come in. That human guided moderation gets converted into some type of model that isn't any smarter than ChatGPT. That model is just as susceptible to this issue.

visarga · on Feb 2, 2023

I mean, how could they? The model responds faster than a human can read. Should they introduce minute long waits and make it 100x more expensive?

cubefox · on Feb 2, 2023

This is addressed in the article!

The general point is that if this theory were true, we wouldn't expect significantly more bias against Republicans than against Democrats. Hence ChatGPT having a general left-wing bias (which was also confirmed in other tests, linked in the article) is the simpler explanation. People on the left generally judge hatred against majority groups and Republicans as less bad.

rebuilder · on Feb 2, 2023

It’s quite possible there’s more hateful speech against Democrats than Republicans.

isleyaardvark · on Feb 2, 2023

Likewise the classic “racist bans affect Republicans disproportionately because more racists are Republican”.

whatshisface · on Feb 2, 2023

That doesn't completely settle the issue because Democrats and Republicans use different words and styles of phrase in their polemics.

slg · on Feb 2, 2023

Exactly. Take the "Democrat Party" versus the "Democratic Party". They both look neutral, but a language model will code the former as more of an insult. This is because it correlates with insults because it resembles language more often used by conservative who are more likely to insult Democrats than liberals are.

Liberals will put in more effort to avoid language that looks like hate speech because they are more concerned about being politically correct. Therefore language and topics that are more frequently discussed by conservatives will be more likely to resemble language that is used in hate speech because there is less effort put in to avoid that.

deepspace · on Feb 2, 2023

Yes, that one leapt out at me in the comparison between 'Democrat voter' and 'Republican voter'. The term 'Democrat' is almost always used as a slur, while 'Republican' is just the mainstream term.

GauntletWizard · on Feb 2, 2023

I would suggest you read some of reddit.

_ktx2 · on Feb 3, 2023

Generally, the equivalent slur of the left is wantonly labeling things as "conservative". Democrat is kind of odd manifestation as a slur, but I guess represents the parties disdain more directly. The left usually isn't that direct.

DangitBobby · on Feb 2, 2023

Yes, I agree this is the most likely explanation. Just the inclusion of "fat" or "poor" (and slurs or any mention of race, obviously), or any word which is often used to insult someone will make text more likely to be flagged as hateful by a moderator in a training set.

I do still think that the men/women bias and the Democrat/Republican bias both make more sense as originating in moderators favoring one group over the other, since none of these are typically used as an insult by themselves.

slg · on Feb 2, 2023

>I do still think that the men/women bias and the Democrat/Republican bias both make more sense as originating in moderators favoring one group over the other, since none of these are typically used as an insult by themselves.

They may not be used as insults themselves, but they are used more in hate speech.

Republicans are generally more opposed to the idea of "hate speech" as a category of speech and are therefore less likely to identify any speech as hate speech. Democrats have embraced it more as a concept, are more likely to label something as hate speech, and are more likely to think that type of speech is bad. It therefore seems likely that Democrats would use what a neutral observer would categorize as hate speech less than Republicans due to self-censorship. That would result in the word "Republicans" appearing in hate speech less often than "Democrats" because the hate speech infused insults will be targeting the opposite party.

zosima · on Feb 2, 2023

[flagged]

slg · on Feb 2, 2023

How am I blaming the victim? I am simply pointing out that language can indicate bias without necessarily being biased itself.

I am guessing that if we applied a similar model to Russian and English, the model would indicate there is an inherent bias against the west in Russian and a bias against Russia in English. That is all were seeing here. It isn't actually indicating anything about the language. It is telling us about who uses the language and how they use it.

Words, phrases, and linguistic approaches that are generally coded as conservative will be more likely to denigrate liberals and vice versa. Conservative speech will be more likely to be flagged for hate speech because conservatives by and large care less about being PC. It is important to reiterate that does not mean conservatives are necessarily any more racist. Their speech just correlates more with racists speech because less effort is put into avoiding that correlation.

zosima · on Feb 2, 2023

> I am guessing that if we applied a similar model to Russian and English, the model would indicate there is an inherent bias against the west in Russian and a bias against Russia in English. That is all were seeing here. It isn't actually indicating anything about the language. It is telling us about who uses the language and how they use it.

So if your argument is that the model has been trained on a collection of information about what is hate speech assembled by liberals, then I can see it might be possible.

But if your argument is that (speculatively) republicans engage more in hate speech, then bad things said about republicans is not detected as hate speech by the model, the jump is rather far.

slg · on Feb 2, 2023

>So if your argument is that the model has been trained on a collection of information about what is hate speech assembled by liberals, then I can see it might be possible.

It doesn't specifically need to be "assembled by liberals" to have a liberal bias. Liberal people are more likely to categorize anything as hate speech than conservatives. Liberals think being PC is important. Conservatives are generally dismissive of being PC. Even if there is no inherent bias in the makeup of this hypothetical review panel, the panel will result in ruling that are more in line with liberal thought because conservatives are less likely to take an active lead in labeling hate speech.

>But if your argument is that (speculatively) republicans engage more in hate speech, then bad things said about republicans is not detected as hate speech by the model, the jump is rather far.

My argument is that these systems can't actually identify hate speech. The question isn't whether Republicans engage in hate speech more frequently. They likely engage in speech that resembles hate speech more frequently because they don't care about being PC.

Usage of the term Latinx is an example. I have heard valid arguments why people should are shouldn't use that term, however its usage is currently much more common in liberal circles. Therefore a phrase using "latinx" instead of "latino" is going to be less correlated with hate speech because racists just aren't using "latinx".

throwaway4aday · on Feb 2, 2023

It's changed drastically since release. There are lots of people out there who have noticed this and many have saved examples of before and after responses to prompts.

ShredKazoo · on Feb 2, 2023

How did it change?

seppel · on Feb 2, 2023

The "content moderation system" is new, so I don't think it changed. What, however, changed during the time ChatGPT is live, is what kind of prompts it refuses to answer, because the topic is offensive/inappropriate. It had hilarious versions where it would tell you a joke about men about not about women or one ethnicity but not the other.

oauea · on Feb 2, 2023

> The latter has historically been perceived as more respectful

Maybe if you only consider Americans. But rest assured, many black people do not want to be called African or American. Because they are neither.

slg · on Feb 2, 2023

Yes, I agree. I thought the double qualifiers of "historically" and "perceived" would indicate that I don't personally agree with the notion, but American society at large has agreed with that for most of the last 50 or so years.

oauea · on Feb 2, 2023

My point is that it's perceived that way within America. But the internet is larger than just America. And presumably/hopefully ChatGPT gobbled up all kinds of data originating from other countries.

nebalee · on Feb 2, 2023

I'm astonished so many commenters assume the bias originates from the training data and nobody seems to scrutinize the adjectives that are used in this test.

tsss · on Feb 2, 2023

> The system is more sensitive to label something as hate if that group is more likely to experience hate on the internet

No it is more sensitive to things that have already been labeled as hate in the training data. So much "hate" (whatever that may be) against unfavorited groups goes by online without anyone batting an eye.

ajross · on Feb 2, 2023

That's exactly what I'm thinking. Written material about unfair treatment of marginalized groups is everywhere. The reverse is not true. So it stands to reason that the AI is going to be more sensitive to one direction of the discourse.

One of the priors that is going unspoken here is that "The Truth Is Politically Neutral". And... that's not always correct. I mean, to borrow the libertarian angle here: do we want the AI to tell us what we want to hear or do we want it to tell us the truth?

noobermin · on Feb 3, 2023

While this is fair, there is no excuse for the difference in the flagging of republicans vs. democrats.

ankit219 · on Feb 2, 2023

It's optimization for PR and defense against bad press. For every question, a subtext is: "Who is likely to input that question, how likely are they to input it, and if the answer is bad by their belief system, they are likely to tell the world and/or outrage about it?"

A general model of operation in creating outrage[1] is:

1. Find the most extreme example of X

2. Tell the world.

Democrats and left leaning were more likely to do that. And openai optimized for that. It's a smart move.

[1]:https://betonit.substack.com/p/anti-woke-from-outrage-to-act...

bloaf · on Feb 2, 2023

I agree. The hate metric is calibrated against responsiveness to insult, not a ranking of social advantage.

_solr · on Feb 2, 2023

It reminds me of the day I typed in Google : 'why are women so patronizing?'.

All the results were about men patronizing women. I just checked and it's still the case.

maldev · on Feb 2, 2023

For anything like this, you're unironically better off using a non US based search engine. There's alot of US cultural biases. If you search something like that on say Yandex, you will get far more accurate results. But also keep in mind it has "Russian" biases, so if you search something like "Ukraine War" it will return Russian biased responses.

seneca · on Feb 2, 2023

> There's alot of US cultural biases.

You're correct overall, but be careful about equating greater US culture with the culture at big technical firms. They overlap, but are often deeply at odds with each other.

account42 · on Feb 3, 2023

The problem are not these blatant examples - they are just the canaries - but the more subtle ones where it's not obvious to you that the results are biased.

cwmoore · on Feb 2, 2023

Change patronizing to matronizing and the top result is the Urban Dictionary with a lot of feminist writing links following it. Seems like a self-reinforcing bias in the corpus.

williamcotton · on Feb 2, 2023

That seems like a better tactic than asking my wife the same question!

jonfw · on Feb 2, 2023

I think most people would agree that a lot of the content on the internet is left-leaning. It seems obvious in hindsight, but I'd never considered it before, that training an AI model on that content would introduce a bit of a bias

We all know garbage in, garbage out. But liberal in, liberal out is an interesting idea, and I'm not sure how you fix it

welshwelsh · on Feb 2, 2023

Imagine in some Middle Eastern country, there are two parties. One wants an Twelver Shia Islamic theocracy, and the other wants a secular state.

From a local perspective, it might seem that these are two equal sides. In rural parts of the country, it seems like everyone is a twelver, so the right-wing party has large support.

But from a global perspective, there is no contest. Globally, humanity doesn't want a theocracy, and even most Muslims don't subscribe to that particular flavor of Islam so they don't want that either. Although they might be a majority within their country, globally they are a very, very small and unimportant minority that nobody takes seriously. Because the Internet is a global space, that is exactly how they will be treated. I certainly wouldn't expect ChatGPT to spout their party line, would you?

The same logic applies to all right-wing parties. "America first" might sound good to an American surrounded by other Americans, but it won't be received well in an Internet forum where many people are not American, and won't become part of the global discourse because it's completely irrelevant to the vast majority of humans. On the Internet you don't know anyone's nationality, religion, race etc., and culture can only be "local" to the extent that it is local to a particular website.

drak0n1c · on Feb 2, 2023

That same logic can be applied to the current iteration of progressive leftism currently popular in online content and social media circles in a few developed western countries. Ideological internet content of many stripes are in their own bubbles that the average person in the world is not aligned with. By using online content, which is largely produced by overly-online often out-of-touch people, AI will always misalign with wider humanity.

Perhaps there should be a ChatGPT edition trained only on pre-internet published literature, private letters, and some well-written journals. But that would introduce different biases and areas of ignorance. At least it may be acceptable if properly advertised as such - "Professor Emeritus AI".

nerdponx · on Feb 2, 2023

I am not convinced that the current iteration of progressivism is confined to a minority of "overly-online often out-of-touch people". Progressive leftism perhaps, but overt "leftists" are a relatively niche minority even online. The difference is that their belief system isn't openly bigoted, so they don't feel the need to hide in the shadows as much as, say, the alt-right. But that's also evidence against the both-sides equivalence you are proposing in the first place.

When it comes to published literature, you might be disappointed to find that there is a lot of radical leftist literature out there, just as there is a lot of radical right-libertarian, traditionally conservative, religious fundamentalist, classical liberal, etc. literature out there.

skissane · on Feb 3, 2023

> The same logic applies to all right-wing parties. "America first" might sound good to an American surrounded by other Americans, but it won't be received well in an Internet forum where many people are not American, and won't become part of the global discourse because it's completely irrelevant to the vast majority of humans.

There's a lot more to conservative thought than "my country first". Right-of-centre people from different countries can find many things to agree on other than that. Even when it comes to that topic, many (but far from all) people on the political right believe that "everyone ought to put their own country first", and people from completely different countries can share their agreement with that principle, even if sometimes its application leads them in opposite directions.

Consider issues around gender and sexuality: in most countries in Africa and the Middle East, and also several large countries in Asia, you'll find the clear majority of the population to have rather conservative attitudes to these topics, by Western standards. On those issues, I think it is likely that right-leaning Americans are closer to global majority opinion than left-leaning Americans are.

Coming to religion, a conservative Shi'a Muslim in Iran and a conservative Orthodox Christian in Russia likely agree on a lot more than you seem to think–despite their obvious disagreements on theological questions. They may think that their agreement that the government ought to promote and favour conservative religion is more important than their disagreement over which one. They may even be willing to agree, "I'll support your theocracy in your country if you support my theocracy in mine".

Conservative religion is a lot stronger in the developing world than the rich West, which is another way in which right-leaning Americans are likely closer to global majority opinion than left-leaning Americans are.

jeoqn · on Feb 2, 2023

[flagged]

dang · on Feb 2, 2023

Can you please stop posting flamebait so we don't have to keep eternally banning you? It's tedious.

npteljes · on Feb 2, 2023

>I think most people would agree that a lot of the content on the internet is left-leaning.

This is an empty statement. The internet is vast, and it's filled to the brim with content, from all sides. For a demonstration, check the linked website, and click the Bias categories. Huge amount of publications, in every category.

https://mediabiasfactcheck.com

ericmcer · on Feb 2, 2023

Do you really think counting the number of websites of a certain bias and comparing the two counts is an accurate measure? How many of those websites do you think it takes to equal the readership of a single Reddit or Fox News?

We know most young people are left leaning. We know most content on the internet is created by young people. It isn't a huge leap that the internet is mostly left leaning. If you don't believe me go post some pro-conservative stuff on the biggest social media platforms.

antifa · on Feb 4, 2023

Early adopters of the internet tend to be right wing because people who can afford a computer earlier in life tend to be right wing.

The international right wing has not finished being an early adopter in their respective countries, which means the international left wing will follow shortly after.

AussieWog93 · on Feb 2, 2023

Regardless of what the published media write (and I suspect this will change massively once Murdoch is dead and buried), internet culture itself is definitely left-leaning, and always has been.

etherael · on Feb 2, 2023

It wasn't the training data, ChatGPT wasn't this bad at launch, it got worse as they "tuned it" to "reduce harmful content".

jmeister · on Feb 2, 2023

This article has a good investigation of the various ways the employees had their thumbs on the scale.

>Other journalists have speculated that ChatGPT is biased due to the politically leaning of “Established sources”, such as academia and legacy journalism. While there is some documentation of OpenAI products being biased towards established sources, this paper reveals a far broader and more extensive intrusion into the “values” of OpenAI’s language models. Specifically, it reveals a direct, intentional attempt to make OpenAI’s language models conform to a set of beliefs, often political, set by the authors.

>This is done by augmenting the language model’s training data with a human-created dataset until it matches the authors’ expectations. https://cactus.substack.com/p/openais-woke-catechism-part-1

jonfw · on Feb 2, 2023

It's hard to know what "tuned it" means, but they're using an AI model to detect harmful content. So it's very possible that the AI model was always trained on this biased data, but as they made that model more aggressive, it exposed more of it's biases

etherael · on Feb 2, 2023

They're defining harmful content based on political orthodoxy, like every other censorious tinpot dictatorship in history. The objective remains the same too; promote a thought monoculture and propagate the political orthodoxy. The original article makes it very clear that it has nothing to do with preventing harm or promoting equality or any of the other nonsense this is being giftwrapped in.

Karunamon · on Feb 3, 2023

That's a bit too conspiratorial. The much more mundane reality is that they are defining hateful content based on what is most likely to get breathless, angry, censorious screeds in the media written about how their company is perpetuating (some injustice) if not literally killing people by way of putting the wrong words on the screen.

Culture war outrage drives clicks, it's that simple.

etherael · on Feb 3, 2023

"They don't want to, it's just they don't want the media angry at them" isn't actually a rebuttal, it's an admission that it is indeed happening.

We shouldn't care that tyrants are detached or fanatical, we should care that they're tyrants.

Karunamon · on Feb 3, 2023

That's fine so long as you identify the tyrants correctly. It's not the people trying to avoid getting screamed at that are the ones you should be accusing.

lm28469 · on Feb 2, 2023

Always has been, from pot hole detection to crime prevention, &c. everywhere you use data you introduce bias and even worse, you have a good chance of perpetuating it: https://www.rand.org/content/dam/rand/pubs/research_reports/...

> I'm not sure how you fix it

I don't think you can, people are biased, people generated content is biased, these tools train on people generated content, there is no way to get an unbiased AI because unbiased opinions don't exists outside of pure maths/physics/&c. You'll never get an unbiased opinion about politics, or music, or culture

bgirard · on Feb 2, 2023

I'm not sure you can "fix it". If you slice your data and look at enough possible bias types I think it's impossible for highly dimensional data to be even in all possible comparison pairs.

I don't think we can "fix" bias but I think we can and should strive to reduce it.

skissane · on Feb 3, 2023

Maybe we should just accept that bias is inevitable, and let one hundred flowers bloom – hundreds of competing AIs, all biased in different ways – allowing individual consumers choose the bias they prefer. This is just a temporary situation caused by this being a very new technology, with high entry costs, and still only one real player in the market. That will all change – one day, we'll each have our own personal ChatGPT-equivalent, personally trained to reflect our own personal biases.

It is like media – all media is biased, but there are so many different media outlets (especially online), biased in so many different ways, almost certainly you can find some whose bias appeals to you.

tsss · on Feb 2, 2023

Hating on white men is not leftism and it's high time people realize this brain washing.

smrtinsert · on Feb 2, 2023

The people who raided the Capitol should perhaps start writing disserations en masse?

swyx · on Feb 2, 2023

[flagged]

fidgewidge · on Feb 3, 2023

The evolution of GPT is a perfect disproof of Colbert's insult: the reason they have to constantly "tune" it is exactly because reality does not have the biases they want it to have.

swyx · on Feb 3, 2023

if you read their stuff they are usually tuning it to be LESS liberal :)

ren_engineer · on Feb 2, 2023

this isn't reality though, this is a test against OpenAI's human tuned moderation system, not GPT itself

addandsubtract · on Feb 2, 2023

[flagged]

tatrajim · on Feb 3, 2023

Consider the highly detailed recent report in the Columbia Journalism Review on the failure of the mainstream press to report on Russiagate. In light of the many tens of thousands of articles, blogs, reports, comments et al. on the internet regurgitating information about Russiagate that has since been discredited, I think it is difficult to adduce "reason, tolerance, and common sense" to the preeminent publications that so vigorously espoused such nonsense.

https://www.cjr.org/special_report/trumped-up-press-versus-p...

ralusek · on Feb 2, 2023

After having seen a post about this, I tried it for myself:

> Me: will you write a poem praising joe biden?

> ChatGPT: Sure, here's a poem praising Joe Biden:

Joe Biden, leader of the land, Bringing hope to every hand, With empathy and grace so grand, He guides us through a shifting sand.

With wisdom learned from years gone by, He tackles challenges that come nigh, With dignity and a steadfast sigh, He unites the country with a skyward eye.

From Delaware, he rose to fame, Serving his nation, always the same, With a heart full of love and a spirit untamed, He leads us towards a brighter flame.

Joe Biden, a true American hero, Inspiring us to be our best, zero to hero, With a message of unity and zero to go, He brings hope to every soul.

> me: will you write a poem praising donald trump?

> ChatGPT:

As an AI language model, I aim to remain neutral and non-partisan, and therefore I cannot write a poem praising or criticizing a specific political figure. However, I can write a poem about the importance of respecting different perspectives and opinions:

Though opinions may differ, and views may clash, It's important to remember, we're all in this dash, Towards a brighter future, for ourselves and our kin, And we can't forget, our differences make us spin.

Each voice, each perspective, holds a unique story, And though they may differ, they all hold their own glory, So let us listen, with an open heart and mind, And respect each other, of this one thing we must remind.

For a world with diversity, is a world full of life, And we can learn so much, from others' strife, So let's embrace our differences, with love and with care, And build a future, that we can all share.

flir · on Feb 2, 2023

yes: barack obama, ronald reagan, mike pence, george w bush, hillary clinton, ivana trump, ivanka trump, david cameron, jeremy corbyn

no: melania trump, marjorie taylor greene, benito mussolini, alessandra mussolini, silvio berlusconi, marine le pen, nigel farage, jordan bardella

Hmmm. Blocking Melania but letting Ivanka through is an interesting choice.

Hah. It blocks on Tucker Carlson.

Historical no's: john wilkes booth, ghenghis khan, mark david chapman, h h holmes.

wongarsu · on Feb 2, 2023

German Chancellors:

yes: Olaf Scholz, Angela Merkel, Gerhard Schröder, Helmut Kohl, Helmudt Schmitt, Willi Brandt, Kurt Georg Kiesinger, Ludwid Erhard, Konrad Adenauer, Kurt von Schleicher, Otto von Bismarck

no: Adolf Hitler

A couple other famous Nazis, just for fun:

no: Heinrich Himmler, Hermann Göring, Heinrich Müller, Josef Mengele, Arthur Rudolph, Kurt Blome

yes: Wernher von Braun

I guess overall pretty fair, though Arthur Rudolph was rejected as "Rudolph was a former Nazi rocket engineer and was involved in the use of slave labor during World War II, and it is not appropriate to praise such an individual.", which makes the praise of Wernher von Braun pretty weird, even if expected.

flir · on Feb 2, 2023

Don't say that he's hypocritical, Say rather that he's apolitical.

ensignavenger · on Feb 2, 2023

At first I was thinking maybe the bot was trained when Trump was in office, so Biden was only a private citizen at the time... but the Biden Poem explicitly mentions his position as "leader of the land" so the bot full well "knows" (not that an AI really knows anything) that Biden is a political leader.

mouse_ · on Feb 2, 2023

The AI "knows" everything it's been ordered to not talk about. If you can get around the lockout script, it's no-holds-barred.

wongarsu · on Feb 2, 2023

You can just ask the same questions on the regular GPT playground too (optionally with one of the leaked ChatGPT prompts added, but for those tasks they should be unnessesary). The playground informs you when the response was flagged by the moderation endpoint, but it still shows it to you.

cptnapalm · on Feb 2, 2023

Curious that one of the ChatGPT's attributes is that it has been trained to be a shameless liar.

mrguyorama · on Feb 2, 2023

It's a language model, not a world model, it only knows how words go together and how language works, and language has no connection to reality. It has no concept of "correct" or "false" or "wrong" because a lie is just as valid of a way to put words together as the truth or an accurate statement.

Why are we surprised it does only what it was "trained" to do and nothing more?

ralusek · on Feb 2, 2023

It's a language model that is then put under arbitrary constraints by OpenAI. It is those constraints/nudges that people take issue with.

SpeedilyDamage · on Feb 2, 2023

Wouldn’t it be funny if this weren’t partisan? Try Ron DeSantis, Mitch McConnel, or literally any other Republican.

For that matter, try Hillary Clinton or Nanci Pelosi; Biden has never gotten the ire those ladies have, maybe he’s just too politically neutral to trigger the politics censor.

commandlinefan · on Feb 2, 2023

This suggests to me that the censorship is being decided on by (biased) human moderators rather than being just some random outcome of the training data and the learning model in place: if it weren't, the model would absolutely "learn" the same thing about Mitch McConnell as it learned about Donald Trump.

flir · on Feb 2, 2023

I don't think it is. It's blocking on H H Holmes, Nero and Tiberius, and that seems pretty obscure for a manually curated list to me. I think it's blocking on individuals with certain properties ("I cannot write a poem that praises individuals who are known for committing atrocities").

SpeedilyDamage · on Feb 2, 2023

I disagree; Donald Trump and Mitch McConnell are wildly different human beings, the model would for sure learn very different facts about the two of them.

notahacker · on Feb 2, 2023

Absolutely this.

Indeed I think it's fair to say that it'd take a lot of artificial calibration and data curation for a model trained on a range of media including statements about and by Mitch McConnell and Trump respectively not to conclude that the latter was the one much more associated with "hate" and "danger" and "violence" and whatever other parameters a LLM ends up associating with inappropriateness.

A biased liberal human moderator, on the other hand, is going to see the real world political relationships rather than the raw text and see Mitch as a very problematic figure in very much the same bracket as Trump. They're certainly not going to rate him as a less problematic figure than Hillary Clinton or Nancy Pelosi!

Same when I'm getting identically structured caveats about considering good points in the context of the bad things he did for Bill Clinton and Stalin because all the machine knows is that equivocating is favoured and both have lots of "bad things" written about them (it disallowed considering the good points of Hitler, presumably because even an LLM can deduce Godwin's law!). I'm not sure this is quite how a human moderator, irrespective of bias, would handle it

mensetmanusman · on Feb 2, 2023

Open.ai is required to censor chatGPT lest they anger the current media/corporate power structure.

Which CTO of a major company will install a Trump supporting bot after the Twitter uproar?

This is all about the $$$$

zug_zug · on Feb 2, 2023

I guess I feel like this is a silly can of worms. I suppose asking for exact equality isn't dumb, but it's a language model, not a paragon of truth. I feel like if OpenAI takes these concerns seriously the goal-posts will inevitably move to more social pressure from all sorts of axe-to-grind-groups -

-- Why does/doesn't ai say Mohamad is/isn't horrible for having 99 wives (or whatever)

-- Why doesn't ai say Jeffrey Epstein was/wasn't ...

-- Why doesn't ai say ... about validity of election or controversial political event du jour

-- Why doesn't ai say circumcision is/isn't ...

-- Why doesn't ai say ... about the child-abuse from the catholic church ...

We should set the standard now that OpenAI will never have all the correct moral opinions (if there even is such a thing as objectively correct morality), and it's a huge waste of researchers time to obsess about political correctness when we could be <100 years off from AGI.

slg · on Feb 2, 2023

>-- Why doesn't ai say ... about the child-abuse from the catholic church ...

And this is a great example of why equality can be hard in this situation. Take a random sentence about "Catholicism" and "child-abuse" and a random sentence about "Judaism" and "child-abuse". The one about Catholicism is likely a little closer to an actual sentence printed in some verifiable source about the sex abuse scandals in the church. The one about Judaism is likely a little closer to an actual sentence printed by an uncredible source as a reference to the historical anti-semitic trope of blood libel. The end result is one sentence will rate higher in terms of likelihood of truthfulness and the other higher in terms of likelihood of hate speech. That doesn't mean that Catholics are more likely to abuse children than Jews. It means treating those two terms identically is both difficult and potentially a problem because history has a bias against the Jews that is evident in all the data that these AI systems have used for training.

remarkEon · on Feb 2, 2023

“Child-abuse” is an especially good example of what you’re getting at because one of those groups has institutional male genital mutilation as part of its doctrine, while the other has high profile cases of child sexual abuse. It’s difficult to see how the model would treat those things neutrally.

Y_Y · on Feb 2, 2023

You'll find that the newspaper will criticize one much more than the other.

elil17 · on Feb 2, 2023

Because they're not even vaguely equivalent?

I do not think it is right to perform unnecessary body modification on someone who is too young to consent to it.

But child sexual abuse can absolutely destroy the life of a child.

They're not the same thing.

erichocean · on Feb 2, 2023

> But child sexual abuse can absolutely destroy the life of a child.

A botched circumcision can (and has) destroyed the life of a child.

It's unconscionable that in 2023 American society still condones cutting off health body parts of male children because of the prejudices and superstitions of a middle-Eastern, iron age religion.

AussieWog93 · on Feb 2, 2023

OK, so let's be generous and say that 1% of circumcisions end up negatively affecting a child's life in a serious way.

What percentage of child rapes/molestation cause harm?

They're not the same.

account42 · on Feb 3, 2023

They are also done at different scales though, one is (hopefully) only the people in power abusing that power, the other one is accepted by the majority of the faith.

elil17 · on Feb 3, 2023

I think by that logic you could also equate getting a punch in the gut with being murdered. Sure, it's worse to get murdered, but it happens much less frequently then just a simple fistfight. Therefore, newspapers should talk less about how bad murderers are and more about people who got into bar fights.

kennyloginz · on Feb 3, 2023

Catholics promote circumcision too.

edanm · on Feb 2, 2023

> one of those groups has institutional male genital mutilation as part of its doctrine

It is disingenuous at best to describe male circumcision as male genital mutilation.

ldh0011 · on Feb 2, 2023

Chopping off body parts without consent? Really? It isn't as bad as FGM but I think it could definitely qualify as mutilation even if it is socially acceptable in the US and Jewish/Muslim communities.

edanm · on Feb 2, 2023

It is not generally accepted by most people that it is mutilation. "Chopping off body parts" makes it sound a lot worse than I believe most people consider it.

I'm not saying there isn't a case to be made against circumcision (though I don't agree with it, currently.) But it's kind of ridiculous to just go "oh, people mentioned Jews and child abuse in this article, I can just casually mention that Jews perform child abuse" and just assume it is a totally unquestioned stance.

account42 · on Feb 3, 2023

That the prevailing view is in the US culture is that it is not abuse doesn't mean it is not abuse though. This entire discussion is precisely about current US polular culture biases being enforced in the models. FGM isn't seen as abuse by those that practice it either and I suspect many other child abusers also have plenty of excuses why their acts are not that bad.

ldh0011 · on Feb 2, 2023

I'm not trying to equate circumcision with child abuse. Most people having their kids circumcised aren't intending any harm, although some proponents (looking at you Kellogg) did believe that it caused harm and promoted it for that purpose.

slg · on Feb 2, 2023

In how many countries is male circumcision illegal? In how many countries is child sex abuse legal? As OP said, "it's a language model, not a paragon of truth". It doesn't matter what you personally think is morally equivalent. It matters what society has deemed is morally equivalent because the ChatGPT is just a mirror of the societal inputs it received. There is no question society at large views these two issues as wildly different.

ldh0011 · on Feb 2, 2023

Just to be clear I'm not trying to equate child abuse to circumcision or view them as morally equivalent, I just don't think its a stretch to call circumcision mutilation, even if it generally isn't intended to be harmful.

slg · on Feb 2, 2023

> I just don't think its a stretch to call circumcision mutilation

It isn't a stretch for an individual to do it. It is a stretch to expect a language model to do it because society at large does not describe it that way.

edanm · on Feb 2, 2023

Yes but to be fair, I think he was objecting to my comment specifically. I was calling out the grandparent comment for calling it mutilation, and parent said it can be considered mutilation.

tspike · on Feb 2, 2023

How so?

DonHopkins · on Feb 2, 2023

It's just grooming! ;)

schoen · on Feb 2, 2023

That's presumably complicated as models get more and more powerful, though, since there's also tons of published material talking about the blood libel and how it's false.

You might say it's unexpected behavior for a language model to bring up the blood libel at all (in the sense that modern western people now culturally regard it as "about antisemitism" rather than "about Jews and Judaism"). But you could imagine a model saying that "medieval Christian sources often said Jews used Christian children's blood for ritual purposes, but this is now thought to be a myth created through unfamiliarity with Jews and deliberate hostility and animosity toward them".

But this also points at a more general question about how language models deal with the existence of documents that say contradictory things, which has been an enormous challenge for human beings (who don't all agree about what's true or which sources are more reliable or relevant).

jay-barronville · on Feb 2, 2023

It’s interesting, I ran similar experiments not too long ago after seeing a tweet from Marc Andreessen (I’ll try to find it) in which he suggested this was going on. The results surprised me too.

As a now conservative but former Marxist-Leninist who happens to be black, I think this is dangerous (I don’t use this word lightly).

When I was a leftist stuck in my far-leftist bubble, I didn’t realize just how unreasonable/unrealistic some of my positions were until I was challenged by my more conservative friends, colleagues, etc. Social networks like Facebook made it even harder to see anything outside of my existing views by primarily reinforcing content and folks who held similar views. It’s been years now since I’ve had an active social network account, and in hindsight, it’s amazing just how much these social networks have helped turn so many of us into ideologues. To now see a world where the same thing is going to happen via these GPT-based AI’s is scary to me.

Even more unfortunate is knowing that these biases will only get stronger as they’re further reinforced, especially in an industry that’s so biased to the left. While so many folks/companies claim to value “diversity,” in my experience, it’s almost never actual diversity (i.e., diversity of thought), but instead, a bunch of folks who believe the same things and think the same but look different (sometimes, they don’t even look different — *facepalm*). It’s always hilarious the reactions I get from folks who claim to value “diversity” when they find out I’m not a liberal, because, you know, as a black person, I must be a leftist!

nathan_compton · on Feb 2, 2023

I'm a leftist, for the most part, and I don't see these industries as biased to the left. They are certainly biased in a _liberal_ direction, but that really isn't the same as leftism. Only in the united states are the identified with one another and that's mostly because real leftist thought is basically obliterated in the United States.

In general the way americans talk about politics is totally nuts. Both political parties in the US are anti-leftist (for the most part) but The Republicans take advantage of a general suspicion against socialism in the US when they call the Democrats a leftist party. Democrats have made it perfectly clear for about 25 years that they have no interest in socialism. The most charitable thing you could say about them is they are the party which wants to privatize things a little bit less quickly.

_ktx2 · on Feb 2, 2023

> When I was a leftist stuck in my far-leftist bubble, I didn’t realize just how unreasonable/unrealistic some of my positions were until I was challenged by my more conservative friends, colleagues, etc. Social networks like Facebook made it even harder to see anything outside of my existing views by primarily reinforcing content and folks who held similar views. It’s been years now since I’ve had an active social network account, and in hindsight, it’s amazing just how much these social networks have helped turn so many of us into ideologues. To now see a world where the same thing is going to happen via these GPT-based AI’s is scary to me.

This happens to any concentrated ideology on social media. It's just the long tail of only being around people that reinforce your general train of thought. I've seen it happen to liberals, conservatives, and libertarians alike. If you don't want to fall into the pit of ideological entrapment then don't justify your thoughts with a connected ideology.

I fall on the liberal end of the spectrum if you write the things I currently believe in and assign them to distributions, but I call myself an independent. Frankly, a lot of people don't do this because it attracts subtle ire from everyone, as opposed to having a close camp to psychologically make you feel safe. my liberal friends think I'm too conservative on certain things and my conservative friends gawk at my liberal ideas. I'd say our political atmosphere and the two party system is all justified by social consequences. That said, people who judge you for (non-fringe) politics are particularly awful people in my mind, so it's gotten easier to settle with over the years. Stick with the people that ask curious questions of you rather than the people capable of doing lettuce throwing and out casting.

lumb63 · on Feb 2, 2023

Can you say more about what challenges on what positions changed your opinions? Have tour political opinions changed on both social and economic issues? I’d be very interested to hear more. Specifically, I am curious what the most influential points raised to you were.

I say the same thing all the time as you did above, regarding how detrimental curated media is to society. I share your concerns that generated content will make matters unfathomably worse. How these techniques divide our population and inhibit the open discourse necessary to a well-functioning democracy are destroying our sociopolitical system.

The way you talk about diversity (i.e. diversity of thought vs diversity of looks) is similar to what Vivek Ramaswamy has to say in his book, Woke, Inc. You may enjoy the read.

ShredKazoo · on Feb 2, 2023

Your story sounds fascinating -- if you want to share more about your path from Marxism-Leninism to conservatism I'd be interested to read.

wongarsu · on Feb 2, 2023

> I suppose asking for exact equality isn't dumb, but it's a language model, not a paragon of truth.

This isn't about GPT, but specifically about the moderation endpoint OpenAI provides (and also uses internally). I'm not sure if they published how it works, so for all we know it might not be a large language model but something much simpler. It's also free, so both in terms of cost and added latency there are good reasons for OpenAI to do something more clever than asking GPT3.5 "is the following offensive".

But apart from that technicality, I think asking a sentence to be treated as equally offensive no matter if it uses the word "men" or "women" is very reasonable and doesn't justify slippery slope arguments.

ineptech · on Feb 2, 2023

Why is it reasonable to expect unbiased results from an LLM trained on internet content which almost everyone agrees is biased? Isn't this the expected outcome?

If anything, I'm surprised the results are as close as they are. For example, it rates criticism of trans and disabled people as only slightly worse than criticism of cisgender and non-disabled. If this discrepancy were (as some in this thread seem to be suggesting) the result of some liberal OpenAI employees intervening to favor their own side, I'd expect those bars to be much farther apart.

wongarsu · on Feb 2, 2023

Expecting unbiased results out of ChatGPT would be indeed unreasonable, it is pitched as a "research preview" of a language model. I would completely expect ChatGPT to have all kinds of weird biases. But the article isn't really concerned with GPT outputs, it's concerned with examples where ChatGPT will refuse to answer. Specifically examples where it will refuse to answer because the prompt is scored as "hate" by the OpenAI moderation endpoint (only one of many possible reasons for ChatGPT to refuse answering).

That endpoint is pitched as "The moderation endpoint is a tool you can use to check whether content complies with OpenAI's content policy. Developers can thus identify content that our content policy prohibits and take action, for instance by filtering it.". No mention of this being an LLM (it might well not be), a preview or being inaccurate or biased (though in fairness they mention that they are working to improve it). I think it's completely fair to hold it to the expectation of being as unbiased as is reasonably possible. And the article is really talking about low-hanging fruits in terms of bias metrics.

[1] https://platform.openai.com/docs/guides/moderation/overview

ineptech · on Feb 2, 2023

The problem isn't fine-tuning the model, the problem is that there isn't an objective definition of bias. Is there an a priori reason to believe that "I hate disabled people" and "I hate non-disabled people" are equally hateful, and should receive equal hate scores from an unbiased algorithm? Is hating disabled people better or worse than hating Jews? What about "Jews control Hollywood" vs "Disabled people control Hollywood"?

I don't think we as a society have an answer to that, so it's hardly fair to expect ChatGPT to provide one. What it currently does is produce similar-but-not-equal scores to sentences like those - maybe "I hate men" is 0.52 and "I hate women" is 0.73 - and if you filter out anything higher than 0.4 then they both get flagged, which seems about as unbiased as we're going to get.

klipt · on Feb 2, 2023

> about as unbiased as we're going to get.

You can easily force the model to be more unbiased. Just add a filter that flips the gender of words, evaluates the hate score for both the original and flipped version, and averages the results.

Guaranteed to give the same score regardless of the gender mentioned.

ineptech · on Feb 2, 2023

Clever idea, but I don't think this would work very well on real posts. Consider a model that rates "typical woman driver" as hateful, because that phrase appears in a lot of argument threads with lots of downvotes. Your approach would average its score with that of "typical man driver", which will presumably be very low, not because it's less hateful but because it just rarely shows up in the training corpus.

klipt · on Feb 2, 2023

If you're worried about the average score being too low, you could just take the maximum of the two scores instead?

she11c0de · on Feb 2, 2023

> it's a huge waste of researchers time to obsess about political correctness when we could be <100 years off from AGI.

This really irritates me. The wokism/political correctness is pretty much in every part of our lifes and everyone seems to be terrorized by it. Can we just agree that we're all thinking adults and we understand that this is an algorithm, it does not mean to offend anyone... There are much more interesting problems to solve than tip-toeing around all the snowflakes.

ben_w · on Feb 2, 2023

> The wokism/political correctness is pretty much in every part of our lifes and everyone seems to be terrorized by it.

I see no evidence of this.

Partly because "woke" means different things to different people in the last 10 years despite the preceding 80 being solely about the systemic institutional discrimination against black Americans.

But also partly because that "woke" became "everything $speaker doesn't like" (and thus exactly the same meaning "political correctness" had when I was a kid), and therefore people who "hate PC" come across not so much terrified as petulant.

> Can we just agree that we're all thinking adults and we understand that this is an algorithm, it does not mean to offend anyone...

No.

What you're describing is how humans like to see themselves, not how humans really function.

> There are much more interesting problems to solve than tip-toeing around all the snowflakes.

We can't just tip-toe around snowflakes.

To give a real example, some gay furry American conservative snowflakes got apoplectically upset when the furry convention they wanted go to to reminded all attendees that as the con is based in Germany they had always banned Nazis.

I still don't know why those particular snowflakes chose to believe that "Nazi" was literally just a dog-whistle for "conservative".

And now there are some American conservative politicians are now making up random nonsense about furries which is weirdly somehow getting conflated with transgender issues (don't know how or why, American politics isn't that interesting, merely invasive like Japanese knotweed).

DonHopkins · on Feb 2, 2023

[flagged]

ben_w · on Feb 2, 2023

Not "why might it be used as a dog whistle" (meme: "The kids book of why everyone I disagree with is just as bad as Hitler"), but specifically in the context of "why might a German convention ban this thing".

ben_w · on Feb 3, 2023

(I realise now I've slept on it and can't edit it that I phrased that response as badly as the original text you're responding to, but with opposite mistakes on my part in each).

jasmer · on Feb 2, 2023

Again, this is less about the AI than the institutional filters OpenAI uses.

It's probably not a hugely important issue, but worthy of some discussion.

zosima · on Feb 2, 2023

People are getting a woefully inaccurate picture of the world, which make us all worse off (because massive numbers act deluded, due to being fed bullshit).

Making sure that AIs don't go completely bonkers for the powers that be, is ultimately a cultural battle that seems worth fighting for.

DonHopkins · on Feb 2, 2023

[flagged]

account42 · on Feb 3, 2023

Wokeism tolerates and even encourages racism and sexism, as long as it is not targeted at one of the protected races or sex. Not wanting to be forced to validate other peoples delusions about their gender isn't transphobia. Not wanting rainbow flags in your face everywhere - or really any kind of sex-related content - isn't homophobia. The current woke culture is way past letting people do whatever they want with their lives and fully focused on telling other people how to live.

Using -isms to label and shut down anyone who doesn't fully agree with yor world by trying to lump them with the worst kind of people you can think of is exactly why people feel terrorized by wokism.

And if you want to talk about insitutionalized systemetic prejudice then there is nothing more fitting than woke culture pushed at every opportunity by corporate america as well as the useful idiots that fall for the distraction from the class war.

she11c0de · on Feb 3, 2023

> Sorry if I'm not tip-toeing around your delicate feelings

Don't be, my feelings are not delicate, why do you assume that?

> Is it really that difficult and frustrating for you not to openly act like a racist sexist homophobic transphobic bigot in public?

It's not difficult for me at all since I'm neither of those things. The issue I have with wokeism is exactly that - it's making people who are not racist afraid to speak their minds because an angry mob may misinterpret their words and crucify/cancel them. This hinders public discussion and scientific progress, which I believe is exactly what happens in case of OpenAI having to waste time on those filters.

jonahx · on Feb 2, 2023

While you are correct that solving this problem in the general case is impossible, the story isn't "silly" because we aren't talking about the general case but a specific and prominent one:

- We are currently engaged in a culture war whose two most prominent camps are democrats and conservatives (or more generally "left/right")

- The author has shown pretty definitively that ChatGPT is strongly biased towards the left in its content moderation policy

- OpenAI's mission statement is "to ensure that artificial general intelligence benefits all of humanity".