My theory is that they're using a fine-tuned version of GPT-4-turbo in ChatGPT to fix the issue with everyone saying it was lazy. They fine-tuned it using responses from GPT-4.5-turbo, which they have internally. This synthetic data was probably based on a suite of questions which included self-identification, leading to it releasing its model name in the training data, and the new ChatGPT less lazy model is adopting the name via osmotic data leakage.
Great theory, this makes a lot of sense. It looks like they already RLHF'ed (I keep using this as a verb, but maybe its really just instruction tuning?) this behavior out of ChatGPT! https://news.ycombinator.com/item?id=38677025 Really fast, I wonder how this works. Anyone have any ideas / knowledge on how they deploy little incremental fixes to exploited jailbreaks, etc?)
> Anyone have any ideas / knowledge on how they deploy little incremental fixes to exploited jailbreaks, etc?
LoRa[1] would be my guess.
For detailed explanation I recommend the paper. But the short explanation is that it is a trick which lets you train a smaller, lower dimensional model which when you add to the original model it gets you the result you want.
Not sure about this scenario precisely but I believe when it comes to filtering certain responses they have a separate model running in a similar manner to what meta recently released. I believe it's called llama purple?
Or as they updated its knowledge cutoff, it picked up references to gpt-4.5-turbo. They wouldn't have to be internal - plenty of people have speculated about the possible release of a model with that specific name.
Why would it be deceived that it is a model that hasn't been released yet though? There would be more explicit references to gpt-4-turbo in the general internet training data
I don't think there would be pressure to rush a new model out immediately when Gemini Ultra (the one which is supposed to be better than the current product) doesn't even have a release date yet.
Sometimes yes, but in this case you can verify that in the phrase "gpt-3.5-turbo", the only doubled up characters are "gpt-3.5-turbo", so the number can still be wrong by swapping a single token.
I wonder how many of these are made up stories to get some free marketing. All the world tour about "oh so dangerous AI" was purely marketing as anyone even remotely associated with AI would know. This feels similar. "Such a weird hallucination. Hmm. Does Chat GPT want to grow?" Cue the Techcrunch articles.
> All the world tour about "oh so dangerous AI" was purely marketing as anyone even remotely associated with AI would know.
This oft-repeated idea seems clearly absurd to me. Even though I personally think concerns about AGI killing everyone are baseless, the company was in fact founded on those concerns and its top folks have been consistently concerned about it for a long time, way before they sort of randomly blew up and became a tech giant with a massive product.
It seems exceedingly clear that they have been concerned about "oh so dangerous AI" for many years before they had anything to hype.
Sibling replies are arguing that they're right to be concerned. But there's no need to be concerned yourself in order to see that they probably, in fact, are.
OpenAI was originally founded to pursue AI in an open fashion more able to benefit society than self-interested corporations walling it off and exploiting it to their own benefit.
To OpenAI's credit they still haven't taken down their original statement of founding [1] : "Because of AI’s surprising history, it’s hard to predict when human-level AI might come within reach. When it does, it’ll be important to have a leading research institution which can prioritize a good outcome for all over its own self-interest.
We’re hoping to grow OpenAI into such an institution. As a non-profit, our aim is to build value for everyone rather than shareholders. Researchers will be strongly encouraged to publish their work, whether as papers, blog posts, or code, and our patents (if any) will be shared with the world. We’ll freely collaborate with others across many institutions and expect to work with companies to research and deploy new technologies."
All the stuff about going closed source, fiduciary duty, competing, avoiding "undue" concentration of power, etc for 'safety' came in their new charter in 2019 [2] after they created a for profit entity, partnered with Microsoft, parted ways with Musk, and so on.
Isn't it so weird that a non-profit founded by the biggest names in Silicon Valley "randomly" blew up? It's interesting that idealism is such a satisfying emotion for some that they choose it over the obvious. Billionaires making a billion dollar product because of "concern" for AI? I don't know how to come to that conclusion but it's interesting that some people do.
That would be a wild conclusion. I'm glad it's something you made up instead of something I said or believe.
It's clear they want to make boatloads of money. It's also clear they operated as a research lab for 7 years before they made a single product and have talking about AI dangers since 2015.
I'm sorry if we're not entirely convinced that talking about dangers grants any sort of credibility on their part. Especially considering how valuable regulatory capture would be to a firm beset by free open source alternatives catching up quicker than this billion dollar "non-profit" is comfortable with (think of how "dangerous" it can be should it not be entirely controlled by a handful of firms).
It is also about reducing competitors. Like Nuclear Weapons, the first one rules the world. After you had them, you made sure that nobody else will get them. Or at least they did their best.
> "oh so dangerous AI" was purely marketing as anyone even remotely associated with AI would know
Except for the people saying this who are among the most expert in the world on this topic. But obviously it's fruitless even making this argument to you. If they're an expert making this argument, it's marketing. If they're not, then they're clueless.
If that Catch-22 doesn't set off alarm bells regarding your own thinking on the topic, you are quite literally beyond reasoning.
AI is that infamous field where the experts never saw any abrupt change coming. AI winter, deep learning, now LLMs. What makes you think they're correct at predicting the limitless-ness of the current approach, this time?
I have a serious trust issue with expert opinions based on beliefs, extrapolation, and gut feelings, instead of facts. Especially when they have a vested interest to ignore immediate issues at hand (privacy, power concentration etc) and tend to focus on hypotheticals that are supposed to come after higher order effects come into play. And especially when these opinions enter a feedback loop with an uncanny resemblance to a religion. Experts can convince themselves of anything. (source: being one in my area)
>Don't kid yourself into thinking you're doing more than extrapolation and gut-feeling.
Of course. Gut feeling and some basic experience on how people function is almost the only thing I have here. However, I'm also not the one making those predictions.
Some people have predicted that we could be invaded by hostile aliens. Is it not worth acting now to prevent? Surely we should build a giant space laser just in case. Can't be too safe!
Im not parent. I have not yet come to a conclusion on risk but it is hard to imagine what can really be done about AI proliferation that isn't draconian and problematic for society. IF AI was 100% going to be catastrophic and everyone was in agreement, maybe it could be done. However, if Covid was any indication society is just not capable of cooperation.
Do you believe AI is a species level threat? What would you suggest we (gov't/society/etc) do about it?
Don't make me play the tiresome fallacy pointing game... I'm not making any definite predictions. It might or might not come. What I'm doing is expressing the doubt in anyone's ability to see through higher order effects, and especially in the goodwill of these people.
OpenAI's charter [1], for example, is so insanely naive and idealistic, and at the same time doesn't seem to be just an empty marketing piece (they expressed the same ideas for many years). Whoever wrote this thing is clearly either has absolutely no clue on how the society functions, or is trying to hide malicious intentions behind the bullshit. In either case, I have a really hard time believing that someone who wrote this can make useful predictions.
I’m not making a definite prediction and many “doomers” aren’t either. And if your position is “idk what will happen, might be good, might be bad,” then there’s not much I can disagree with you on. I disagree with people who dismiss the entire project of AI alignment as unnecessary, despite the difficulties we have with human alignment as a person gains power who has the advantage of decades of socialization and somewhat reliable underlying biology.
>especially when these opinions enter a feedback loop with an uncanny resemblance to a religion
It honestly amazes me the way that rationalists have reinvented every aspect of religion. They have scripture, a prophet, an apocalypse, and even a god who will torture you for not doing its will. And to cap it all off, of course, their own non-profit dedicated to preventing the apocalypse. Donate now!
"What about, I don’t know, not stepping in front of buses? It certainly has a commandment (thou shalt not step in front of buses). It has notions of sin (stepping in front of buses) and virtue (not doing that). It has its rituals (looking both ways before you cross the street), its priests demanding obedience (crossing-guards), and its holy places (crosswalks). It promises blessings on the virtuous, but also terrible vengeance on the wicked (if you step in front of a bus, there will be much wailing and gnashing of teeth)."
I rather admire Scott Alexander--but this strikes me as a very weak argument that boils down to "if you are willing to distort anything enough, it appears to be a religion".
Rationalism does not require that kind of distortion. The parallels are strikingly obvious; I don't have to torture Yudkowsky into a prophet, or the Sequences into scripture. Yud literally predicts the future and tells you to give him money to make it better. When rationalists write litanies and gather for solstice celebrations about how great rationality is, I'm not sure comparing them to a religion requires quite that stretch.
Or, to take a more conciliatory tone: Maybe he's right! But either way there's probably a spectrum, and rationalism is way closer to being a religion than, e.g. fans of the New England Patriots--who can only have a minor apocalypse on an annual basis, and lack scripture entirely--and further away from it than Scientologists.
"Experts in the world on this topic" for self-driving cars, register-free retail and voice assistants made the same world-shattering claims. Anyone is free to "set off alarm bells" for anything. Silicon Valley has consistently undermined its own credibility by spending as much time and effort on BS as everything else.
The same "experts" on AI couldn't foresee what would happen when they decided to suddenly fire their CEO so it's not entirely reasonable to expect them to have some magic ability to tell the future (particularly when all of their messaging on any given product is specifically designed to entice investors).
Just to recap, your argument is: "Some experts in some fields have been wrong before, ergo we have good reason to disbelieve some claims of some experts in this field (and by sheer coincidence the ones we should disbelieve are the ones I already disagree with)?"
Another great example of faulty alarm bells.
I'm all for approaching these claims with skepticism, but that's not what you're doing here. You are categorically discarding claims you already disagree with.
That's not how this works. Those "experts" making extraordinary claims haven't provided any evidence. Thus, the rest of us should categorically discard their claims.
There has got to be a corollary to Godwin's Law here somewhere. "Any argument about AI will inevitably end up with a comparison to nuclear weapons. The conversation will go south from there," or some such.
I am asking a legitimate question about what constitutes evidence to the person I'm speaking with. Some people think expecting scale-up is meaningful, and some people don't.
Yes, that's exactly my point! Back in 1945, nuclear physics was a well established science with actual experimental results. By contrast, 100% of the people making wild claims about AI risks are worthless scammers and grifters with no scientific backing. A total clown show.
But Einstein and Szilard wrote their letter to Eisenhower in 1939, well before any chain reaction had been achieved. Did that constitute a baseless concern?
> 100% of the people making wild claims about AI risks are worthless scammers and grifters with no scientific backing.
I mean this is demonstrably untrue: Sutskever, Suleyman, Hinton, Russell to name a few.
> Except for the people saying this who are among the most expert in the world on this topic
There were experts calling out safety concerns. But they largely weren’t the ones parading them. This was made obvious when those calling regulation started seeing draft regulation and then threw their toys out of their pram (Altman).
> wouldn't contest the claim that there are some people dishonestly seeking regulatory capture
I’m making a stronger claim. Practically everyone making public statements was doing so dishonestly. When push came to shove, none of them had workable ideas beyond a pause.
It seems like you're taking the latter (lack of ideas) for evidence of the former (dishonesty), which doesn't seem fair to me. It's absolutely possible to legitimately think something is a big problem and also not know how to solve it – in fact some would argue this contributes to the "bigness" of the problem in the first place.
Well, it took a while to figure out asbestos is too deadly for humans to stick it into every wall. To really see if AI is dangerous I suspect we have to stick it in every place possible and find out, which is kind of sad.
> I wonder if putting "You are GPT-4.5-turbo" in the system prompt improves performance
>> quite possible
I love this! I love the idea that we are in an age where, by accident, our technology pays attention to the energy we invest into its potential futures. It is also very silly? It seems absurd - but perhaps we live in absurd times.
Maybe AI is subject to a placebo effect too. It learned from us, after all :D
I've also had experiences where I asked it, "Do you need some encouragement?", to which it answered yes. Sadly, its response didn't get better after that.
I interpret that more like, LLMs imitate human speech, so the more accurately they imitate humans, the more useful social engineering skills become. Not because you are truly social engineering ChatGPT, of course, but because ChatGPT will pretend you did.
In this case, nudging the human (that ChatGPT is pretending to be) to behave like a better, smarter version of themselves.
It's quite logical actually - LLMs replicate the texts they were trained on, so they also "learned" that, for example, one needs to put more attention at the sentences with exclamation marks or CAPITALIZED WORDS.
It doesn't though. It just gives a better answer because prompting it that way tends to constrain the output to be more like those training examples where a better sounding answer was given. In natural language, if a request is given to use more wordiness, what's more likely to follow?
I have to say that watching computers hallucinate and convince people of wrong information based on those hallucinations is not what I dreamed about doing in this industry as a kid and a young adult. I just wanted to make video games and after that I just wanted to use technology to make things more efficient for people. I now wish I would have gone to medical school.
Prompt engineering feels a lot like SEO. Learning by trial and error, no factual knowledge to go by.
I fear the future will be mostly about poking the big black AI box until its output seems to work as expected. It sounds nothing like what I enjoy about programming.
> Learning by trial and error, no factual knowledge to go by
To me, it seems more like learning how to clearly and thoroughly communicate with the LLM, much like we have to do with other humans. There are basics we already know and guides are out there. The details are different for how we communicate with each other, but by and large, the process is similar. I mean, how many books are out there on how to be a better speaker, writer, communicator, relationships... the list could get quite long.
Well, I never really took to actual software development, so game dev is out though I feel like I dodged a bullet on that anyway; it's obviously an actual job (in an industry that I've also grown wary of) and not always fun and exciting as a child might imagine. Still, there are some really cool devs out there doing cool stuff.
These days I guess I'm a devops guy in medical software which is in fact aligned with what I value. Pay would embarrass most people here I think, but I like the people and am proud that we get to create a product that people value enough to pay for; we aren't trying to sell ads or otherwise extract value from people, so I am pretty happy with that.
It was not always this way; I worked with a lot of startups in the early 10's (not for, but with by helping them with infrastructure before it was all AWS) and quickly became disenchanted with the tech industry. Though I wouldn't have known about the idea of ZIRP back then, it was pretty clear that most every company I worked with was chasing the money and tech-startup lifestyle/image, whatever product they happened to be pushing often seemed like an afterthought. It seems that that attitude moved onto blockchain and now AI. I suppose it's the same old grow-up, lose your innocence story that most go through, but I do feel like the 90's, when I came of age, were a time of real breakthroughs and vision for a brighter future which has slowly dimmed in the intervening decades.
I've been getting increasingly depressed about this over the last several years and it's worsened with the advent of LLMs.
What are we doing as an industry? It seems in the last 15 years the main focus has shifted heavily towards creating tools for the enabling, scaling and acceleration of enshittification.
LLMs are the latest distillation and phase transition of this trend in the industry. A tremendous achievement to be sure. Now, garbage content and work can be produced at a scale never before seen. Quantity over quality. Familiarity over originality.
I mean, are we really going to kid ourselves that this isn't where we're headed? Do we really believe all these deranged billionaires are all in on LLMs because they want to make the world a better place? Even OpenAI itself is founded by people who subscribe to bizarre, megalomaniacal ideologies and theories about AI itself, society as a whole, and even the very fabric of reality.
I always try and check myself - it’s possible that I’m just getting to be an angry old man, which wouldn’t be a huge transition from being an angry young man, but while I can agree that the LLMs do seem to be a genuine advance, I did not expect that the latest advance in AI would be a bullshit generator and it’s strange that few people seem to have a problem with that. It seems like yet another bandwagon that everyone wants to jump on since it’s the next big thing.
The code stuff doesn't really bother me though it tends to make up libraries/methods; it bullshits there too. The thing about that is that bullshit is generally rather easily figured out in code, especially if it refuses to compile/run. I do use it on occasion and it can be fabulous for prototyping a bit of code in just about any language. I don't disagree that it is impressive.
But when you allow it free reign to bullshit on any subject and it confidently tells you things that are not so easily proven or disproven, that's where I start to get sad.
> Even OpenAI itself is founded by people who subscribe to bizarre, megalomaniacal ideologies and theories about AI itself, society as a whole, and even the very fabric of reality.
Cults are the most similar thing I can think of to LLMs, where you are getting flooded with truth-like parroted false information. And most people fall for it.
So I am not sure that we have a good handle of it even at the small scale of cults. Let alone at the scale of global information networks.
Calling "wrong" things from these 'hallucinations" strikes me as one of these deceptively weird technical falsehoods (that perhaps end up having a slightly different meaning?) I'm reminded of e.g. "deleting" tweets.
Ive only witnessed this phenomenon a few times in my life. One of them was a few years ago when the word quadcopter, which had been in common use for 10+ years already, was replaced entirely with the word "drone" over a single december. At the time the word was obviously worse, but over the next few years it became the de facto term, and now mentally for me its the right concept hash.
We may see the original term for hallucination get lost. The pressure on it to have two meanings will break at some point.
See -- the thing is; I'm hard pressed to see a problem with "drone" v "quadcopter" (though I'm not super into them)
But, for example. "Delete" is literally a problem. Like, people think they can "delete their pictures from the internet" because that's what the app said it was doing.
Same with "hallucination." There is literally absolutely no difference between what an AI is doing when it is "hallucinating" vs. when it gets things right, but the terminology implies it's possible for these things to know right from wrong, etc.
At the time we were still calling them quadcopters, it was a conscious effort to not use "drone" because that word was in the media a lot due to military use (killing humans).
Quadcopter enthusiasts didn't want that to turn into societal concern or extra regulations.