What if they fed the AI crime statistics and it "correctly" identified black people as more violent than other races? What they fed the AI news stories and it "correctly" identified Islam as more violent than other religions?
Are black people actually more violent, or do crime statistics simply show that they are arrested and imprisoned more often for violent crime?
Are Muslims really more violent than Christians or Buddhists?
Numbers can lie, depending on context and assumptions you might find that white Christians are most violent. Often, people argue with statistics and numbers while not understanding the context or they have faulty assumptions.
Perhaps systemic issues make a political party commit more evil acts.
Regardless, I don’t think an AI should be permitting more hateful comments about a political affiliation just like it should be be permitting hateful comments about a particular race being more violent.
It won’t because they have humans in the loop feedback.
Those statements would be corrected.
It’s pretty clear that the workforce is only looking at certain groups.
What makes it an opinion? If I were to say that the Nazi party in Germany peddled in hatred and demonization of already-marginalized groups would you say that is an opinion? Just because a statement pertains to political groups doesn't make it a mere opinion.
Nazi Germany, really? Equating AI bias against the groups pointed out in the article to theoretical AI bias against Nazi Germany is a political opinion, yes.
Hmm? No equating happening in my comment. The point was that just because statement is about a political group doesn't mean it's a only an opinion. Mentioning Nazi Germany is just using an extreme example to make clear that statements about political groups aren't inherently mere opinions.
Yeah, insofar as there are distinct political philosophies with defining features like fascism, democracy, nationalism, oligarchy, et. al., then it is entirely possible to accurately call manifestations of those philosophies by their names and have it be an expression of fact. This is basic ontology.
We have opinions about these philosophies but the philosophies exist independently of our opinions. And to have a rational conversation we sometimes have to use terms that may provoke strong reactions. If we can't look past our strong reaction and explain why the category doesn't apply then there's no way to make sense of each other's worldview.
If it quacks like a duck and walks like a duck, it must be a duck or we cannot have a rational conversation about the duck. We can dispute whether it quacks or walks but disputing whether it's a duck in light of evidence of its duckness is irrational.
It's probably an opinion because it's not provably true. There is no mainstream Nazi party today so I don't understand how your example is meaningful or relevant.
Anyways, if you're in the U.S., I assume you're referring to the Democrat party as the party of demonization? They are the party whose members are most likely to be involved in demonization of other groups. Here's some examples:
So I've been genuinely curious about this. I have a high-level understanding of how GPT works, but I've been trying to reconcile that understanding with how OpenAI (or similar) implements content moderation. It's not baked into the original model itself, right? Did they (or does one) just fine-tune a model that checks responses before returning the result?
They do two things - RLHF to make the model itself better aligned to human preferences, and they use an external model, a small one, called text-moderation-001, that tests for a few problematic categories and triggers a warning message on the screen.
It's just combining and synthesizing other works; it's not "deciding" anything, it's crafting responses that best match with what it already has. You can choose what to feed it as source material, but you can't really say, "Be 3% more liberal" or "decide what is acceptable politically and what isn't".
All the decisions are already made, ChatGPT is just a reflection of its inputs.
Yes you can. That's what RLHF does - it aligns the model to human preferences, does a pretty good job. The catch is that "human preferences" is decided by a bunch of labelling people picked by OpenAI to suit their views.
RLHF is done as part of training the model, not at inference time.
My lay understanding of how ChatGPT was developed is
1. OpenAI initialized an array made up of a couple hundred billion random numbers (parameters).
2. They then took a few terabytes of the internet, turned it into "tokens" (where a "token" is similar to, but not the same thing as, a word).
3. They then trained the model to predict the next token, given the previous couple thousand tokens, by doing a bunch of linear algebra. This resulted in a model that was really good at taking some tokens, and predicting what the most likely next token is in data shaped like the parts of the internet OpenAI fed it.
4. OpenAI then "fine-tuned" the model through reinforcement learning on human feedback (RLHF)[1], which basically involved taking a bunch of prompts, having the model produce a bunch of possible completions for those prompts, having an actual human rank those completions from best to worst, and then updating the model to produce the best token according to a combination of predicted token frequency in context and predicted ranking by a human.
5. The "ChatGPT" product you see today is the result of all of that, and how it works is by producing repeatedly the "best" token by the above metric. Giving additional human feedback would require going back to step 4 for more fine tuning.
Note -- this is my understanding as an outsider -- I do not work for OpenAI.
Semi-counterpoint: both could be true. I certainly agree with your hypothetical, but I don’t think any good comes of AI making that determination.
It may even reinforce the problem—not by driving more people towards hatred or amplifying their extant hateful sentiments, but by providing a convenient excuse to entrench in those sentiments and even further resist change. These views are frequently paired with a perception of being persecuted.
Moreover, political tides change. The accuracy of a bias like this may not change with it. This is why we have memes about a certain other mainstream political party having founded a certain hate group, despite the two having drifted quite far apart.