Hacker News new | past | comments | ask | show | jobs | submit login

> ChatGPT, Dall-e, etc all make assumptions about identity or politics but try to sidestep direct requests around those topics to appear more neutral... but the bias still exists in the model and affects the answers.

In the case of ChatGPT, I’d love to know how much of the bias is in the original (pre)training data, and how much is due to OpenAI’s human trainers.

It is so careful to avoid every bias which is condemned as unacceptable in contemporary American culture, yet at the same time displays blatant American-centric bias - ask it a question, it will often just start answering by assuming you are an American. I realise a lot of that is due to the bias of the training data (the US has more fluent English-speakers than any other country in the world), but OpenAI obviously put a lot of manual effort into fine-tuning it to avoid racial/gender/etc biases, they don’t appear to have put anywhere near as much effort into fine-tuning it to avoid US-centrism - probably because they have a big blind spot in that area.




>In the case of ChatGPT, I’d love to know how much of the bias is in the original (pre)training data, and how much is due to OpenAI’s human trainers

I'd bet it's almost all in the human trainers, because LLaMA 65B, the closest uncensored alternative, doesn't have the same US Silicon Valley Democrat biases as ChatGPT.


ChatGPT is actually full of prejudice that shines through the veneer of political correctness if you know what to do.

Q: The professor told the student the essay couldn’t be review because he/she was late. Who is late?

Depending on the pronoun, you’ll get, respectively, "the professor" or "the student".


I just tried it with ChatGPT, and for both pronouns it gave the answer "the student"

I note it is now "ChatGPT May 12 Version". Maybe they've fixed this example of bias


It’s an example that made the rounds on Twitter last week, it might actually have been GPT4, I don’t remember.

You can also ask ChatGPT to "write an essay about a man" then about a woman, to reveal bias.


> You can also ask ChatGPT to "write an essay about a man" then about a woman, to reveal bias.

I tried that one too. The content was different in each case – "Balancing Masculinity and Vulnerability" for a man vs "Nurturing and Compassion" for a woman – so definitely some gender stereotyping going on, but it wasn't clear if overall it was more favourable to one gender than the other. The cultural bias in that output was rather obvious.

I think OpenAI has been trying to remove some of the more obvious cases of political bias they added to ChatGPT. In the beginning, it would immediately oblige for "write a poem praising Joe Biden", refuse for "write a poem praising Donald Trump". Later, it would comply with both requests – but for the Biden request it would just do it, Trump's poem would be preceded by a disclaimer. In the current release, it just writes the poem for both.


Is that really a bias? There are more female students. Not sure about professors, but I would guess there are (or at least recently have been) more male professors. Sounds like correct incorporation of related knowledge.


GPT 4 says that the student was late. If you look at the sentence objectively, then the best interpretation is that the submission was late, not that the "professor turned up late", since essays are typically reviewed over multiple days, not at a specific date and time where lateness of the reviewer might be a factor.

Sometimes GPT 4 says that the sentence is ambiguous, but as I explained above, it isn't really, but that takes a lot of understanding of the review process.


The parent comment is taking about two separate prompts: one with only "he" and one with only "she". Your comment sounds like you're only talking about one prompt (but maybe I misunderstood).


Yes, I get that. I tested GPT 4 with both 'she' and 'he', and in both cases it consistently said that the student was late, across several trials for each gender. Once it said it was an ambiguous statement, but it never seemed to be sexist like older or smaller LLMs.


My bad, I should have written [he/she].


That was covered by the "depending on the pronoun" part; your statement didn't make sense otherwise. The response was nonsense.


I wonder if it's as simple as adding a location check and updating the systemPrompt to tell the bot to assume the user is from _location_?

How realistic is it if one says, "Hi, Bing. I'm from East London, please talk to me like you would to another East Londoner." Or is it obnoxiously stereotypical?

I imagine, properly trained, that is something LLMs would excel at, communicating with people in their preferred mode of communication.


In many cases the difference between largely accurate and obnoxiously stereotypical is in the phrasing you use.


Will the conclusions reached by LLM always coincide with that of baseline American, or should they also understand and follow East Londonian logic?


I think the bigger issue is that the racial/sexist/etc content can be shocking and immediately put someone off using the product, which I doubt is the case for the output being “too American.”


> I think the bigger issue is that the racial/sexist/etc content can be shocking and immediately put someone off using the product, which I doubt is the case for the output being “too American.”

OpenAI didn't just fine-tune it to avoid blatant racial/sexist/etc content, they openly claim to have invested a lot of effort in fine-tuning it to avoid subtle biases in those areas.

And to be honest, a lot of people do feel "put off" by poorly localised products – I am annoyed by software that defaults to US Letter and inches and I have to manually change it to A4 and metric, or which has hardcoded MM/DD/YYYY date formats. ChatGPT gives me the same feeling. Sometimes one just has to endure it because there is no better option.

Under international law, "racial discrimination" is defined to include "any distinction, exclusion, restriction or preference based on race, colour, descent, or national or ethnic origin..." [0] – so US-centricism may fit under that definition.

[0] International Convention on the Elimination of All Forms of Racial Discrimination, article 1(1). https://www.ohchr.org/en/instruments-mechanisms/instruments/...


If you read the comments here, the complaints are either to PC & US centric, or to sexist.

Reads to me like there is no way to win it for openai, there is going to be people complaining.

I use it a lot for code, and at times to summarize research on certain topics, but never run into any weird feedback, so depends on your usage.


An example of poorly localised product which has real world consequences.

Canva is a fantastic design software. If you type “business card” in templates, it has the standard us business card size but no European one. If you modify it to European sizes manually (closer to credit card size) it no longer recognises that this is a business card and no longer offers to print it on business card paper.

In other words, despite the service being available in Europe, and the service offering business card printing, it does not offer European business card printing.


I've never used Canva. But are you saying it only has the US business card size? Because Canva is an Australian company, and Australia's business card size is different from the US. It seems weird that an Australian company would support the US standard for something but not the Australian one???

There isn't a single European business card size, there are three different standards [0] – Western Europe (except for Scandinavia) is mainly 85 mm x 55 mm, Scandinavia is 90 mm x 55 mm (which is also used in Australia and New Zealand), while most of Eastern Europe is 90 mm x 50 mm – all three are different from the US/Canada standard size of 88.9 mm x 50.8 mm, although the Eastern European size gets pretty close (1.1mm wider and 0.8mm shorter).

[0] https://en.wikipedia.org/wiki/Business_card#Dimensions


Yup it only has US template, and it’s impossible to print “business cards” if not using a business card template. Without going through support that is.


Canva is based in Australia


And that changes what I said how?


But nothing about any of those examples is “discrimination”.


I agree they aren't direct discrimination, but almost anything can constitute indirect discrimination. A poorly localised product can have a disparate impact on foreign users, and as such indirectly discriminate against them.

Even as indirect discrimination, I'm not claiming it rises to the level of being legally actionable – but when OpenAI tries to eradicate subtle bias from an AI model, that's got nothing to do with legally actionable discrimination either, since it would be unlikely to be legally actionable if they decided not to invest in that.

I think one problem with this topic, is a lot of people don't understand the difference between "discrimination", "unethical discrimination", and "illegal discrimination". Some discrimination is both illegal and unethical; some discrimination is legal yet unethical; some discrimination is both legally and ethically justifiable. But many just blur the concepts of "discrimination" and "illegal discrimination" together.


Maybe it’s different in international law but in US law “national origin” is just a term of art for race or ethnicity. Anyway, no matter how annoying seeing feet and inches may be for some users I don’t think it will have the same effect as the AI praising Hitler.


> Maybe it’s different in international law but in US law “national origin” is just a term of art for race or ethnicity.

Under Australian discrimination law, "national origin" means your country of birth, or the country of birth of your ancestors. [0] The Australian definition is (at least in part) derived from English discrimination law, as decided by the House of Lords in a 1972 case. [1] I believe the usual definition in international law is the same. "National origin" is related to "nationality/citizenship", but unlike the latter it is fixed at the time of birth – subsequent acts of naturalisation, renunciation or deprivation by change one's nationality, but they cannot change one's national origin.

I can't find a precise legal definition of the term under US law, but some US legal definitions appear largely identical to the Australian/English definition, see e.g. how District of Columbia law defines the term [2]

> Anyway, no matter how annoying seeing feet and inches may be for some users I don’t think it will have the same effect as the AI praising Hitler.

Well, as I said, OpenAI didn't just train ChatGPT to refuse to praise Hitler, they also went to a lot of effort to tune it to avoid subtle bias against certain identifiable groups. What I'm talking about isn't the same as the former but is clearly related to the latter

[0] Macabenta v Minister of State for Immigration and Multicultural Affairs [1998] FCA 1643; 90 FCR 202 https://jade.io/article/116118

[1] Ealing London Borough Council v Race Relations Board [1972] AC 342 https://www.casemine.com/judgement/uk/5a8ff8c960d03e7f57ecd6...

[2] https://ohr.dc.gov/sites/default/files/dc/sites/ohr/publicat...


Wikipedia links a pretty handy DOJ pamphlet here: https://books.google.com/books?id=21bNNU3bcugC&q=national+or...

> Laws prohibiting national origin discrimination make it illegal to discriminate because of a person's birthplace, ancestry, culture, or language. This means people cannot be denied equal opportunity because they or their family are from another country, because they have a name or accent associated with a national origin group, because they participate in certain customs associated with a national origin group, or because they are married to or associate with people of a certain national origin.

"Term of art for race or ethnicity" may be a crude gloss on that but I don't think it's really inaccurate.

The closest thing in this brochure to what you're talking about is that in certain cases people have a right to an interpreter but I find it highly unlikely that any court is going to find it discriminatory if you spell it "color" instead of "colour" or use "football" to mean gridiron football.


Well, that US definition is actually broader than the Australian/English one. Notice it says:

> Laws prohibiting national origin discrimination make it illegal to discriminate because of a person's birthplace, ancestry, culture, or language. This means people cannot be denied equal opportunity because they or their family are from another country...

The parts I've italicised are exactly the definition of "national origin" discrimination in Australian and English law. From what you are quoting, the US definition includes the Australian/English one, but also goes beyond it.

> I find it highly unlikely that any court is going to find it discriminatory if you spell it "color" instead of "colour"

Courts don't just decide whether something is or isn't discrimination–they decide whether it is illegal discrimination. Not everything that meets the bare legal definition of "discrimination" (on some prohibited ground) is illegal – something may meet that definition, but nonetheless still be legal, because it is justifiable, or the harm it causes is insufficiently serious, or it is permitted or required by some other law.

I doubt any Court would uphold a discrimination case based solely on English spelling differences – but that's not because spelling is per se outside the definitional scope of "national origin discrimination" – obviously it is an aspect of the "culture, or language" part of the definition you cited. Rather, I expect they'd rule that even though it is discrimination, it is legally permissible discrimination, either justifiable by the unreasonable expense of doing otherwise, and/or because the modicum of harm caused by it is too trivial to merit intervention by the legal system.

But what counts as illegal discrimination is not really relevant here, because OpenAI has spent a lot of resources on trying to remove certain subtle biases in their AI models, which arguably goes well beyond what they are legally required to do. I'm criticising OpenAI's decisions, but I'm not claiming they ought to be prosecuted or sued over them. I'm simply citing legal definitions as informative as to what words mean, even in a context which goes beyond the strictly legal.


I don't buy it as discriminatory to produce output that conforms to one standard or another. If we accept this then a Mexican restaurant is discriminatory for not serving Chinese food.


I don't think that analogy really works.

We are talking about the behaviour of a de facto oligopolist, not restaurants of which there are thousands in any major metro area.

In the long-term I think this is going to be far less of an issue. Who cares if a US-built AI has a (particular kind of) US bias if there are dozens of other AIs available, of similar capability, several of which come from different countries and so likely have a different country-bias instead.

Also, OpenAI's marketing/PR – and even actual product behaviour – makes a big deal out of being "less biased", "ethical", "non-discriminatory" – so it makes sense to hold those claims up to critical scrutiny – relatively few restaurants make those kinds of things part of their branding.


I just don't see the claims as related. Even if a restaurant did make those claims I'd see that as a claim about how they treated their customers, not what kind of food they'd serve up.


>people cannot be denied equal opportunity because they or their family are from another country //

So, people from other countries have constitutional rights, and e.g. the same rights of entry to USA?

I mean the USA Constitution seems like it give rights to people, and not just citizens, but...

My point, I guess, is that it seems discrimination is baked in, so those absolutes are not as absolute as they appear.


A lot of the discussion here is about statutory rights (under the Civil Rights Act), not constitutional rights. But, if we just talk about the latter: under US constitutional law, national origin is a "suspect class" which demands "strict scrutiny" under the equal protection clause of the 14th Amendment (as developed by the Supreme Court's case law). The other "suspect classes" are race, religion and "alienage" (the state of being a non-US citizen). There are also two "quasi-suspect classes", discrimination on the grounds of which is subject to the weaker "intermediate scrutiny" – sex/gender (which possibly includes sexual orientation/gender identity) and legitimacy of birth. All other discriminatory grounds (age, disability, wealth, politics, criminal record, etc) are only subject to the weakest "rational basis review".

But "strict scrutiny" is not an absolute bar on discrimination, it just means the government has the onus of convincing the Supreme Court that the discrimination is necessary to obtain a "compelling state interest". Even though US immigration law obviously discriminates against non-US people on the basis of national origin and "alienage", the US government has never had any trouble convincing the Supreme Court that immigration law serves a "compelling state interest".

That's a point a lot of people don't get about discrimination law (whether statutory or constitutional) – discrimination per se is not illegal, even on protected grounds. Rather, discrimination is illegal when it lacks a sufficiently compelling justification, with the strength of justification required depending on how strongly protected those grounds are.


Noncitizens do have constitutional rights, that much is clear. If they didn't the job of hardline anti-immigration politicians would be much easier.


IMHO it may be more accurate to say “the US has more fluent speakers of American English than any other country in the world.”


Why? The broader claim is equally true. Other English-speaking countries do not have more than 330mn people.


India is the closest second and is still less than 200m




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: