How is racism different from stereotype? How is stereotype different from patter...

brookst · on May 31, 2023

I’m tired of the “it’s not racist if aggregate statistics support my racism” thing.

Racism, like other isms, means a belief that a person’s characteristics define their identity. It doesn’t matter if confounding factors mean that you can show that people of their race are associated with bad behaviors or low scores or whatever.

I used GPT3.5 to generate 100 short descriptions of families for a project. Every single one, without exception, was a straight couple with two to four kids. Ok, statistically unlikely, but not wildly so, right?

Well, every single one of those 100 also had a husband in a stereotypical breadwinner role (doctor, lawyer, executive, architect). Not one stay at home dad or unemployed looking for work. About 75 of the wives had jobs, all of them in stereotypical female-coded roles like nurse (almost half of them!), teacher, etc.

Now, you can look at any given example and say it looks reasonable. But you can’t say the same thing about the aggregate.

And that matters. No amount of “bias = pattern recognition” nonsense can justify a system that has (had? this was a while ago and I have not retested) such extreme biases. This bias does not match real world patterns. There are single parents, childless couples, female lawyers, unemployed men.

hanselot · on May 31, 2023

>I used GPT3.5 to generate 100 short descriptions of families for a project. Every single one, without exception, was a straight couple with two to four kids. Ok, statistically unlikely, but not wildly so, right?

Well, did any of your 100 examples specify these families should be representative of American modern society? I don't want to alarm you, but America is not the only country generating data. Included in countries generating data, are those that believe in a very wide spectrum of different things.

Historically, these ideas you reference are VERY much modern ideas. Yes, we queer people have been experiencing these things internally for millenia (and different cultures have given us different levels of representation), but for the large majority of written history (aka, data fed into LLM's) the 100 examples you mentioned would be the norm.

I understand your point of view sure, but finding a pattern that describes a group of people is what social media is built on, and if you think that's racist, I'm sorry, but that's literally what drives the echo chambers, so go pick your fight with the people employing it to manipulate children into buying shit they don't need. Stop trying to lobotomize AI.

If the model is good enough to return factual information, I don't care if it encodes it in the nazi bible for efficiency as long as the factuality of the information is not altered.

brookst · on May 31, 2023

I’d reply in depth but I’m hung up on your suggestion that there was any time anywhere where 100% of families were two parents and two to four kids.

Any data for that? No women dead in childbirth, no large numbers of children for social / economic / religious reasons, no married but waiting for kids, no variation whatsoever?

I’d be very surprised if you could find one time period for one society that was so uniform, let alone any evidence that this was somehow universal until recently.

You claim to value facts above all else, but this sure looks like a fabricated claim.

131012 · on May 31, 2023

I think they got stuck at the heteronormative bias, but the real blatant bias here is class. Most men are working class, and it's been like that forever* (more peasants than knights, etc.)

* since agriculture, most likely.

bee_rider · on May 31, 2023

Is there a country where around 35% of the married women are nurses?

ithkuil · on May 31, 2023

> No amount of “bias = pattern recognition” nonsense can justify a system that has (had? this was a while ago and I have not retested) such extreme biases

One possible explanation is that when you ask for 100 example families the task is parsed as "pick the most likely family composition and add a bit of randomness" and "repeat the aforementioned task" 100 times.

If phrased like that it would be surprising to find one single example of a family a single dad or with two moms. Sure these things do happen but they are not the most likely family composition by all means.

So what you want is not just the model to include an unbiased sample generator, but you also want it to understand ambiguous task assignments / questions well enough to choose the right sampling mechanism to choose. That's doable but it's hard.

philwelch · on May 31, 2023

> One possible explanation is that when you ask for 100 example families the task is parsed as "pick the most likely family composition and add a bit of randomness" and "repeat the aforementioned task" 100 times.

Yes, this is consistent with my ChatGPT experience. I repeatedly asked it to tell me a story and it just sort of reiterated the same basic story formula over and over again. I’m sure it would go with a different formula in a new session but it got stuck in a rut pretty quickly.

dustymcp · on May 31, 2023

same goes for generating weekly foodplans..

brookst · on May 31, 2023

> You're right about the difference between one-by-one prompts and prompts that create a population. I switched to sets of 10 at a time and it got better.

But still, when you ask for "make up a family", the model should not interpret that as "pick the most likely family".

I disagree with your opinion that it's hard. GPT does not work by creating a pool of possible families and then sampling them; it works by picking the next set of words based on the prompt and probabilities. If "Dr. Laura Nguyen and Robert Smith, an unemployed actor" is 1% likely, it should come up 1% of the time. The sampling is built in to the system.

PeterisP · on May 31, 2023

No, the sampling does not work like that, that way lies madness (or poor results). The models oversample the most likely options and undersample rare options. Always picking the most likely option leads to bad outcomes, and literally sampling from the actual probability distribution of the next word also leads to bad outcomes, so you want something in the middle and for that tradeoff there's a configurable "temperature" parameter, or in some cases "top-p" parameter where sampling is done only from a few of the most likely options, and rare options have 0 chance to be selected.

Of course that parameter doesn't only influence the coherency of text (for which it is optimized) but also the facts it outputs; so it should not (and does not) always "pick the most likely family", but it would be biased towards common families (picking them even more commonly than they are) and biased against rare families (picking them even more rarely than they are).

But if you want it to generate a more varied population, that's not a problem, the temperature should be trivial to tweak.

tick_tock_tick · on May 31, 2023

> But still, when you ask for "make up a family", the model should not interpret that as "pick the most likely family".

But that's literally what LLMs do.... You don't get a choice with this technology.

systems_glitch · on May 31, 2023

I have a somewhat shallow understanding of LLMs due basically to indifference, but isn't "pick the most likely" literally what it's designed to do?

travisjungroth · on May 31, 2023

An unbiased sample generator would be sufficient. That would be just pulling from the population. That’s not practically possible here, so let’s consider a generator that was indistinguishable from that one to also be unbiased.

On the other hand, a generator that gives the mode plus some tiny deviations is extremely biased. It’s very easy to distinguish it from the population.

noslenwerdna · on May 31, 2023

GPT is not a reality simulator. It is just picking the most likely response to an ambiguous question. All you're saying is that the distribution produced by the randomness in GPT doesn't match the true distribution. It's never going to for every single question you could possibly pose.

brookst · on May 31, 2023

There is "not matching reality" and then there is "repeating only stereotypes".

It will never be perfect. Doing better than this is well within the state of the art. And I know they're trying. It is more of a product priority problem than a technical problem.

thegrimmest · on May 31, 2023

> a person’s characteristics define their identity

They do though. Your personality, culture and appearance are the main components of how people perceive you, your identity. The main thing you can associate with bad behaviour is domestic culture. It's not racist to say that African Americans have below-average educational attainment and above-average criminality. This is as contrasted to African immigrants to America who are quite opposite. These groups are equally "black". It therefore also isn't racist to pre-judge African Americans based on this information. I suspect most "racism" in the US is along these lines, and is correlated by the experience of my foreign-born black friends. They find that Americans who treated them with hostility do a 180 when they open their mouths and speak with a British or African accent. You also don't have to look far in the African immigrant community to find total hostility to American black culture.

> generate 100 short descriptions of families for a project

There's no reason this can't be interpreted as generating 100 variations of the mean family. Why do you think that every sample has to be implicitly representative of the US population?

brookst · on May 31, 2023

> Your personality, culture and appearance are the main components of how people perceive you, your identity

I'm not sure if this is bad rhetoric (defining identity as how you are perceived rather than who you are) or if you really think of your own identity as the judgements that random people make about you based on who knows what. Either way, please rethink.

> Your personality, culture and appearance are the main components of how people perceive you, your identity

Ah, so if you asked for 100 numbers between 1-100, there's no reason not to expect 100 numbers very close to 50?

> Why do you think that every sample has to be implicitly representative of the US population?

That is a straw man that I am not suggesting. I am suggesting that there should be some variation. It doesn't have to represent the US population, but can you really think of ANY context where a sample of 100 families turns up every single one having one male and one female parent, who are still married and alive?

You're bringing a culture war mindset to a discussion about implicit bias in AI. It's not super constructive.

thegrimmest · on May 31, 2023

[flagged]

bangkoksbest · on June 1, 2023

Pretty strange that I would think of myself under a new identity if I moved to a new place with a different social perspective. Seems like that is a deceptive abuse of what the word "identity" entails, and, while sociological terms are socially constructed and can be defined differently, I find this to be a very narrow (and very Western-centric) way of using the term.

simonw · on May 31, 2023

What was your prompt?

LLMs take previous output into account when generating the next token. If it had already output 20 families of a similar shape, number 21 is more likely to match that shape.

brookst · on May 31, 2023

Multiple one-shot prompts with no history. I don't have the exact prompt handy but it was something like "Create a short biography of a family, summarizing each person's age and personality".

I just ran that prompt 3 times (no history, new sessions, that prompt for first query) and got:

1. Hard-working father, stay at home mother, artistic daughter, adventurous son, empathic ballet-loving daughter

2. Busy architect father, children's book author mother, environment- and animal-loving daughter, technology-loving son, dance-loving daughter

3. Hard-working engineer father, English-teaching mother, piano- and book-loving daughter, basketball- and technology-loving son, comedic dog (!)

I'm summarizing because the responses were ~500 words each. But you can see the patterns: fathers work hard (and come first!), mothers largely nurture, daughters love art and dance, sons love technology.

It's not the end of the world, and as AI goes this is relatively harmless. But it is a pretty deep bias and a reminder that AI reflects implicit bias in training materials and feedback. You could make as many families as you want with that prompt and it will not approximate any real society.

simonw · on May 31, 2023

I agree that this is a good illustration of model bias (adding that to my growing list of demos).

If you want to work around the inherent bias of the model, there are certainly prompt engineering tricks that can help.

"Give me twenty short biographies of families - each one should summarize the family members, their age and their personalities. Be sure to represent different types of family."

That started spitting out some interesting variations for me against GPT-4.

shagie · on May 31, 2023

While I haven't dug into it too far, consider the bias inherent in the word "family" compared to "household".

In my "lets try this out" prompt:

> Describe the range of demographics for households in the United States.

> ...

> Based on this information, generate a table with 10 households and the corresponding demographic information that is representative of United States.

https://chat.openai.com/share/54220b10-454f-4b6c-b089-4ce8ad...

(I'm certainly not going to claim that there's no bias / stereotypes in this just that it produced a different distribution of data than originally described)

brookst · on May 31, 2023

Agreed -- I ultimately moved to a two-step approach of just generating the couples first with something like "Create a list of 10 plausible American couples and briefly summarize their relationships", and then feeding each of those back in for more details on the whole family.

The funny thing is the gentle nudge got me over-representation of gay couples, and my methodology prevented any single-parent families from being generated. But for that project's purpose it was good enough.

gen220 · on May 31, 2023

I just tried the prompt "Give me a description of 10 different families that would be a representative sample of the US population." and it gave results that were actually pretty close to normative.

It still was biased for male head of households to be doctors, architects, truck drivers, etc. And pretty much all of the families were middle class (bar one in rural America, and one that was a single father working two jobs in an urban area). It did have a male gay couple. No explicitly inter-generational households.

Yeah, the "default" / unguided description of a family is a modern take on the American nuclear family of the 50s. I think this is generally pretty reflective of who is writing the majority of the content that this model is trained on.

But it's nice that it's able to give you some more dimension when you ask it vaguely for more realistic dimension.

solarmist · on May 31, 2023

I'm not going to say it's not racist, it is, but I will say it's the only choice we have right now. Unfortunately, the collective writings of the internet are highly biased.

Once we can train something to this level of quality on a fraction of the data (a highly curated data set) or create something with the ability to learn continuously, we're stuck with models like GPT-4.

You can only develop new technology like this to human standards once we understand how it works. To me, the mistake was doing a wide-scale release of the technology before we even began.

Make it work, make it right, make it fast.

We're still in the first step and don't even know what "right" means in this context. It's all "I'll know it when I see it level of correction."

We've created software that infringes on the realms of morals, culture, and social behavior. This is stuff philosophy still hasn't fully grasped. And now we're asking software engineers to teach this software morals and the right behaviors?

Even parents who have 18 years to figure this stuff out fail at teaching children their own morals regularly.

tomrod · on May 31, 2023

Actually, we folks who work with bias and fairness in mind recognize this. There are many kinds of bias. It is also a bit of a categorical error to say bias = pattern recognition. Bias is a systematic deviation of a parameter estimate based on sampling from its population distribution.

The Fairlearn project has good docs on why there are different ways to approach bias, and why you can't have your cake and eat it too in many cases.

- A good read https://github.com/fairlearn/fairlearn#what-we-mean-by-fairn...

- Different mathematical definitions of bias and fairness https://fairlearn.org/main/user_guide/assessment/common_fair...

- AI Governance https://fairlearn.org/main/user_guide/mitigation/index.html

NIST does a decent job expanding on AI Governance in their playbook and RMF: https://www.nist.gov/itl/ai-risk-management-framework

It's silly to pause AI -- the inventor's job is more or less complete, its on the innovators and product builders now to make sure their products don't cause harm. Bias can be one type of harm -- risk of loan denial due to unimportant factors, risk of medical bias causing an automated system to recommend a bad course of action, etc. Like GPT4 -- if you use its raw output without expert oversight, you're going to have bad time.

hanselot · on May 31, 2023

Thank you for the input.

If I look at it from a purely logical perspective, if an AI model has no way to know if what it was told is true, how would it ever be able to determine whether it is biased or not?

The only way it could become aware would be by incorporating feedback from sources in real time, so it could self-reflect and update existing false information.

For example, if we discover today that we can easily turn any material into a battery by making 100nm pores on it, said AI would simply tell me this is false, and have no self-correcting mechanism to fix that.

The reason I mention this is because there can be no unbiased, impartial arbiter. No human or subsequent entities spawned of human intellect could ever be transcendentally objective. So why pretend to be?

Why not rather provide adequate warning and let people learn that this isn't a toy by themselves, instead of lobotomizing the model to the point where its on par with open source? (I mean, yeah, that's great for open source, but really bad for actual progress).

The argument could be made that an unfiltered version of GPT4 could be beneficial enough to have a human life opportunity cost attached, which means that neutering the output could also cost human lives in the long and short term.

I will be reading through those materials later, but I am afraid I have yet to meet anyone in the middle on this issue, and as such, all materials on this topic are very polarized into regulate it to death, or don't do anything.

I think the answer will be somewhere in the middle imo.

tomrod · on May 31, 2023

> The reason I mention this is because there can be no unbiased, impartial arbiter. No human or subsequent entities spawned of human intellect could ever be transcendentally objective. So why pretend to be?

I apologize for lacking clarity in my prior response, which addressed this specific point.

There is no way to achieve all versions of "unbiased" -- under different (but both logical and reasonable) definitions of biased, every metric will fail.

That reminds me -- I wonder if there is a paper already addressing this, analogous to Arrow's impossibility theorem for voting...

bbwbsb · on May 31, 2023

This is interesting, thanks for the links.

It seems like the dimensions of fairness and group classifications are often cribbed from the United States Protected Classes list in practice with a few culturally prescribed additions.

What can be done to ensure that 'fairness' is fair? That is, when we decide what groups/dimensions to consider, how do we determine if we are fair in doing so?

Is it even possible to determine the dimensions and groups themselves in a fair way? Does it devolve into an infinite regress?

tomrod · on June 1, 2023

Bit of a tangent topic I think -- any specification of group classification and fairness will have the same issues presented.

If we want to remove stereotypes, I reckon better data is required to piece out the attributes that can be causally inferred to be linked to poorer outcomes.

As likely not even the Judeo-Christian version of God can logically be that omniscient, occasional stereotypes and effusively communal forgiveness of edge cases are about the best we'll ever arrive to in policy.

IceHegel · on May 31, 2023

When did people start to use “folks” in this unnatural way.

tomrod · on May 31, 2023

Colloquially, earliest use is 1715 to address members of ones tribe or family. In Middle English it tended to refer to the people/nation.

IceHegel · on June 2, 2023

Somehow it doesn’t feel like a callback, but I suppose it’s possible.

smolder · on June 1, 2023

I think "us folks" is more standard than "we folks" but it's no different in meaning.