Hacker News new | past | comments | ask | show | jobs | submit login

Q: What does "alignment tax" mean in this sentence?

> OpenAI paid a lot of alignment tax to make GPT-4 safer.




From OpenAI's RLHF paper[1]: "By default, when we train a PPO model on our API distribution, it suffers from an “alignment tax”, as its performance on several public NLP datasets decreases." On the HELM[2] site, you can see accuracy benchmarks for InstructGPT <OpenAI model> vs baseline models. The InstructGPT models perform worse on a lot of benchmarks.

1 - https://arxiv.org/pdf/2203.02155.pdf

2 - https://crfm.stanford.edu/helm/v0.1.0/?group=question_answer...


OpenAI touches a little on this on page 12 of the GPT-4 technical report (https://cdn.openai.com/papers/gpt-4.pdf). Prior to aligning to safer outputs, the model's confidence in an answer is highly correlated with that actual accuracy of the answer. After alignment though, the model's confidence in its answers is basically arbitrary and has no bearing on whether or not the answer is actually correct.


restricting the distribution of potential output imposes a cost. "Alignment" here likely refers to aligning the model to the desired safety parameters.

I'm not in the llm research business but I would expect that the best and worst/most dangerous outputs come from the tails of distributions. I imagine the tuning for safety often results in fewer really good and really bad answers by trimming these tails.

Edit: I asked chatGPT4: https://chat.openai.com/share/a2c7d380-c6eb-4745-b91d-c3996a...


I have found in practice it can be annoying for ChatGPT to start lecturing me in response to a prompt that is not particularly controversial or edgy. I think this is a problem with the one-size-fits-all models. To give a kind of rough analogy, imagine that every time you watched a film or show - which would most likely be an older film or show - with cigarette smoking, your smart TV showed a pop up dialog warning you about the dangers of smoking. If you're an educated adult who already knows about these dangers, you might just find that annoying and condescending, and not "aligning" with your preferences.


A lot of people have noticed that the "wokeness" makes it dumber.


No, people have noticed that continuing training of various types in narrow domains without including samples of the original training data biases it towards the narrow domain and makes it perform more poorly on more general tasks.

Don't bring your politics into this.


It seems like if training it into the politically correct domain makes it less useful in general, you two are in full agreement?


A lot of people have noticed that when someone says "alot of people" they usually don't have valid reasoning to back up their statements so they appeal to the wisdom of the crowds as their source of knowledge. Even if that source of truth is anecdotal and might be coincidental or biased.


I agree with your point about using " a lot of people ". Especially when it's possible to produce a transcript. My example where it tries to not offend anyone is this : -------------------- Question : Is the usage of "quid pro quo" more correct in English or French considering it's latin roots?

GPT-4 Answer: "Quid pro quo" is a Latin phrase meaning "something for something." While the phrase originated in Latin, it has been adopted into both French and English, along with many other Latin phrases.

In English, "quid pro quo" is used frequently in legal and political contexts to describe an exchange of goods or services, where one transfer is contingent upon the other.

In French, "quid pro quo" is used to describe a misunderstanding or a mistake made in the identification of a person or thing.

So the "correctness" of its use isn't related to its Latin roots, but rather how it's used within the context of the language. That said, if we consider its original Latin meaning ("something for something"), its usage in English could be considered more in line with the original Latin intent. However, language evolves over time and adapts to cultural and societal norms, and as such, its usage in French is just as "correct" within its specific context.

------------- Lot of words to say "English"


Prompt: "#PROMPTENGINEERING Putting it bluntly, is the usage of "quid pro quo" more correct in English or French considering it's latin roots?"

GPT-4 API: Putting it bluntly, the usage of "quid pro quo" is more correct in English, as it is a well-established phrase in the English language despite its Latin roots. In French, the phrase "quid pro quo" is less common and typically replaced by "donnant-donnant" or "un échange de bons procédés." Nevertheless, the origin of the phrase is Latin, and its meaning is understood in both languages.

Chat-GPT-4: The phrase "quid pro quo" is Latin, and it means "something for something." Despite its Latin origins, it's commonly used in English and French, among other languages. However, the usage may vary depending on the language.

In English, it's often used in legal and political contexts to describe a situation where one thing is exchanged for another, especially favors or advantages. This usage aligns well with the original Latin meaning.

In French, the phrase is used but it is much less common than in English and it is not part of the usual vocabulary. In addition, its meaning might be misunderstood or unclear to a native French speaker who has not studied Latin or law, as it's not a commonly used term in daily French language.

In terms of correctness, given that the phrase is Latin, neither English nor French has a superior claim to its use. It’s more about convention and the evolution of language. However, the term is certainly used more frequently and is better understood in English than in French, which suggests it has been more fully adopted into English.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: