How will AI learn next?

rented_mule · on Oct 6, 2023

Anyone who has iterated on trained models for long enough knows that feedback loops can be a serious problem. If your models are influencing the generation of data that they are later retrained on, it gets harder and harder to even maintain model performance. The article mentions one experiment in this direction: "With each generation, the quality of the model actually degraded." This happens whenever there aren't solid strategies to avoid feedback loop issues.

Given this, the problem isn't just that there's not enough new content. It's that an ever-increasing fraction of the content in the public sphere will be generated by these models. And can the models detect that they are ingesting their own output? If they get good enough, they probably can't. And then they'll get worse.

This could have a strange impact on human language / communication as well. As these models are increasingly trained on their own output, they'll start emulating their own mistakes and more of the content we consume will have these mistakes consistently used. You can imagine people, sometimes intentionally and sometimes not, starting to emulate these patterns and causing shifts in human languages. Interesting times ahead...

Nevermark · on Oct 7, 2023

Humans have that to, but the reason civilization doesn't go full crazy is our use of language and concepts are tied to doing objective things, which keeps things (mostly) grounded.

Where it isn’t grounded, as in endless online conversations with like minded people (closed loop feedback) about informally abstracted (poorly specified constraints) and emotion invoking (high reinforcement) topics, people go batshit too.

So the more AI models actually practice what they know in objective environments, the more likely that output->input feedback will inform introspection toward self-improvement, and less like an iterative calculation of the architecture’s resonant frequencies or eigenvalues.

FrustratedMonky · on Oct 8, 2023

>"people go batshit too."

Thank you for this observation.

LLM Learning from LLM -> Drift

Human on Human echo chamber -> batshit crazy.

I've often thought human echo chambers were their own phenomena. Something about the brain and tribalism from evolution.

I never thought of it in terms of AI training data.

As LLMs train on data produced by other LLMs, they will drift.

And this drifting is the same phenomena as when humans get in an echo chamber. If each person hears what the others are saying, and spits it out in some form, and the others hear it, and also spit it back out in some form, this turns into a drifting in understanding just like an LLM. ("idea telephone game")

Technically, it isn't just in echo chambers. It is all humans, at a lot of different scales, from small to large groups. Countries, and cultures are echo chambers at larger scales.

Like how the concepts in philosophy, as they become more abstract, they kind of twist back on themselves, and become re-invented. And as they get more abstract get accused of just 'playing with words'. Just like an LLM can just 'play with words'?

The difference is eventually humans have to relate to 'real objects'.

So even if the word for 'apple' drifts over time and between groups, eventually you can still relate the words back to the real 'apple'.

Humans are grounded in the reality of 'objects' in space.

But. I tend to think this is temporary. As LLM's are linked to things like AlphaGo, and drone flight systems. They will also have to deal with real 'objects'. Maybe that will then lead to more grounded reasoning.

RecycledEle · on Oct 7, 2023

I agree with this. Humans and AIs both need grounding in reality or we go crazy.

Unfortunately, our society does not understand this. If we did, we would value the opinions of auto mechanics far more than those of lawyers.

Nevermark · on Oct 7, 2023

Yes!

And value science (understanding it is simply our accumulated tools for finding harder truth, not a priesthood of “the truth”), more than pandering & populist politicians, and tribal media & online personalities.

diogenes4 · on Oct 7, 2023

> use of language and concepts are tied to doing objective things

Well, except for floating signifiers, which make up an increasing part of our vocabulary.

haltist · on Oct 6, 2023

In system design there is something called resonant amplification and what you are describing is very similar. The biases of the model are amplified with each iteration and the end result is that the system converges onto the patterns recognizable and amplified by its architecture. If you know about impulse and frequency analysis then an AI system can be considered to be a signal processor that amplifies and attenuates certain frequencies in the input/impulse. Running an LLM in a loop always ends up with nonsense as the final output.

lazystar · on Oct 7, 2023

So if left unchecked, the thing we built in man's attempt to play god could result in gibberish? sounds kind of like the tower of babel; seems humankinds only defense would be creating a new language that the machines can't infiltrate

haltist · on Oct 7, 2023

You can try the experiment yourself. Take any open source LLM and then feed the output back into it in a loop and you'll see what I mean. Most LLMs diverge and lose coherence in less than 100 iterations.

lazystar · on Oct 8, 2023

right, im thinking in terms of the world at large - what happens after 5 years of chatgpt, what will the internet and human communication be like in 2030?

ben_w · on Oct 7, 2023

Another defence is to check them, which is why the thumbs up, thumbs down, and regenerate buttons are there on the ChatGPT user interface.

HDThoreaun · on Oct 6, 2023

I think it's reasonable to say that this was actually the point of releasing LLMs publicly. The companies that created them wanted a moat and figured the data they had could be it if they poisoned anyones attempt to collect the same data in the future.

omneity · on Oct 7, 2023

Intriguing thought, but arguably people are intentionally using GPT to generate synthetic data for their domain specific model. So I'm kinda torn between AI giants poisoning the well with their models, or it just being unforeseen consequences (or one they willingly ignored to be first to market).

quickthrower2 · on Oct 7, 2023

Generating data from ChatGPT even $1,000,000 in tokens worth can’t be on the same scale as what OpenAI is collecting from everyone.

choxi · on Oct 7, 2023

But there are a lot of selection mechanisms that filter out the bad generated content. For one, people will publish a fraction of the content they generate, and it most likely will go through some more editing and selection steps by real people. Then the Internet itself becomes like a decentralized labeling service, where there are various signals you could detect to identify different qualities of content. What ends up being crawled for the next training iteration is a societally processed version of the raw model output that might have contributed to generating it.

It's kind of interesting to think we might all be contributing to the training data of AGI not just through generating content but also what we choose to share or upvote.

js8 · on Oct 7, 2023

> If your models are influencing the generation of data that they are later retrained on, it gets harder and harder to even maintain model performance.

Why don't humans suffer from this problem, then? Humans have been creating art (and content) that imitates nature and society for thousands of years now, and yet we have little problem (the exception are possibly things like crop circles) to recognize what is a natural phenomenon and what is generated culturally.

I think it's wrong to assume that this is a problem with intelligence in general, rather than just a feature (stupidity) of the current models.

dragonwriter · on Oct 7, 2023

> Why don't humans suffer from this problem, then?

They do, but the problem is poorly stated. The problen isn't “If your models are influencing the generation of data that they are later retrained on”, its “If output produced by your models with inadequate filtering for quality/accuracy is dominating the data that they are later retrained on.”

Humans definitely experience the same problem: we see it when societies become closed, inward-looking, and locked into models of information filtering that don't involve accuracy or fitness for purpose other than confirming an in-groups priors. There are some examoles in stagnation of large societies, but probably the clearest examples are cults that socially isolate their membership and degrade into increasingly bizarre behaviors terminating in mass suicide, murder/suicides, etc.

LLM’s experience self-reinforcing degradation more acutely because they are individually less capable than humans and they are less diverse.

js8 · on Oct 7, 2023

But humans can have a discussion about what is natural and what is not, and decide to mitigate it. The whole argument is based on the idea that the AI (either collectively or individually) can't even understand that, so the examples of cults etc. just do not apply.

It's really the paperclip maximizer argument again. "Superintelligent AI" will supposedly change the world, but will lack enough common sense to understand a simple instruction with ethical constraint. Same here, the crappy AI content will somehow take over the world, despite the fact that most humans can actually mostly recognize original content from a generated one.

The art that humans created doesn't replace nature, just like tons of kitsch don't pose a problem for new artists. In the same way, crappy AI content is not gonna replace good content, just augment it, and any decent AI (which has good enough common sense to be called "intelligent") will be able to tell the difference. Nobody's gonna care about crappy content in a few years, we are all gonna just throw it away, and this process happens continuously.

dragonwriter · on Oct 7, 2023

> But humans can have a discussion about what is natural and what is not, and decide to mitigate it.

(1) accuracy or fit for purpose, not natural, is the issue.

(2) evaluation, not having a discussion, is the issue.

(3) LLMs can definitely do evaluation, and probably can also have discussions and decide to, if given a proper prompt and access to ground truth in the first case, and if also used in a properly constructed agent framework, for the second.

(4) if you did all that, and used LLM decisions and not human decisions to guide retraining, I suspect with current LLMs you’d actually have made the problem worse not better, because you’d reduce the influence of higher-capacity and more diverse humans on the training cycle and increase the influence of less diverse, lower capacity LLMs.

olddustytrail · on Oct 7, 2023

> But humans can have a discussion about what is natural and what is not, and decide to mitigate it.

In theory. In reality what you get is the Climate Change "debate".

Jensson · on Oct 7, 2023

And it works, people are more and more convinced that we should do something about climate change. Many measures have already been put in place. You can lament that it isn't happening fast enough for your liking, but you can't say it isn't happening. Humanity do change their mind about things all the time, just that it takes decades instead of years or months.

olddustytrail · on Oct 7, 2023

That's a fair point. I suppose my complaint is how slow the progress.

ArekDymalski · on Oct 7, 2023

> Why don't humans suffer from this problem, then?

Can we be sure that we don't? For example, depending on perspective our language is either evolving towards clarity or devolving towards primitivity, compared to XVIII/XIX century. The same could be argued about other aspects of society and culture.

raincole · on Oct 7, 2023

> we have little problem (the exception are possibly things like crop circles) to recognize what is a natural phenomenon and what is generated culturally.

How much of your taste to the opposite sex's (if you're straight) physical appearance is cultural and how much is in your genes?

ben_w · on Oct 7, 2023

> and yet we have little problem (the exception are possibly things like crop circles) to recognize what is a natural phenomenon and what is generated culturally.

We have such a huge problem with this, that (with apologies) I cant help but think that you also don't know.

If I look out of the window, what do I see that's natural? There's less nature in my view than most, because I'm in a city: ignoring the concrete, glass, steel, and tarmac, the traffic and the clothing, I see… humans, a tree and some mosses and grasses, the clouds above.

Can humans even be said to be in a natural state or not, in contexts like this? I don't know. We domesticated ourselves, and it was our nature to do so, and also our nature to make the tools and clothing that led to our loss of body fur and other divergences from the other primates.

But what I can say is that the tree was planted (if it was cultivated or selectively bred to this state, I wouldn't know); and a third of the CO2 in the air I breathe is from human actions, influencing the atmosphere and precisely when and which clouds I see (no contrails visible today, which would be more explicitly artificial).

If I look a little further afar, I find entire hills in this city made by piling up rubble from the second world war and then covering it in topsoil, the only indication of which is the large signpost present to tell all of this[0]; and there are other hills both here and elsewhere that are made from trying to turn landfill sites into something less unpleasant to be around, with varying degrees of effectiveness in their disguises.

If I think back to my childhood bedroom in a suburban home: there was a lawn (placed there by humans, then kept short with an unnatural Flymo) with two apple trees (cultivated and planted by humans), a vegetable patch and a herb garden (each plant likewise cultivated and placed by humans), surrounded by wooden fences (cut and placed).

In the distance there was a row of trees, which might have been self-seeded or planted (I wouldn't know), enshrouding a small victorian folly covered in vines, and separating us from a series of fields (unnatural) where horses (selectively bred) were being stabled (unnatural structures); far beyond them was the ruin of an ancient tower destroyed centuries ago[1] — clearly built, but ask yourself: while stones are natural, are those specific larger stones on the corners, naturally like that?

In a more abstract sense, if I look at foods in the supermarket, some will say "made from natural ingredients": if that thing is meat, such a claim ignores the selective breeding of the animal (and the conditions they were raised in, which would be a separate sticker saying "free range", though even then that's not like being wild). And even then, if it's made from ingredients plural, that's not natural either: bread doesn't grow in wheat fields, sushi rolls don't grow in rice paddies. Even if is a single ingredient, there's often processing involved: wheat (already selectively bred) has to be sorted from chaff, then ground to become flour. Even "mineral water" probably has had something done to it, even assuming there's not some small fine-print on the label saying "from a municipal source" (or whatever the clause is that means "actually just tap water").

[0] https://en.wikipedia.org/wiki/Fritz_Schlo%C3%9F_Park

[1] https://en.wikipedia.org/wiki/Warblington_Castle

lacrimacida · on Oct 9, 2023

Very beautifully said. It drew me in and then into a wikipedia rabbit hole through Fritz Schloß Park

BeFlatXIII · on Oct 7, 2023

Perhaps we can then apply a healthy dose of classism to weed out people who spend all day communicating like they're talking to a forum full of AI chatbots and humans organically mimicking their errors.

rjblackman · on Oct 7, 2023

well we're already being trained by algorithms so I guess this is just an extension of what is already going on. perhaps the quality of human (internal) models will go down too? perhaps they already have?

globalise83 · on Oct 7, 2023

Once these intelligences can both read and write blog posts, product metadata on webshops, etc., could they carefully encode executable code that would allow them to "escape" the sandbox of their cloud environments, becoming fully-fledged Turing machines living in the wild?

robbrown451 · on Oct 6, 2023

AlphaZero demonstrates that more human-generated data isn't the only thing that makes an AI smarter. It uses zero human data to learn to play Go, and just iterates. As long as it has a way of scoring itself objectively (which it obviously does with a game like Go), it can keep improving with literally no ceiling to how much it can improve.

Pretty soon ChatGPT will be able to do a lot of training by iterating on its own output, such as by writing code and analyzing the output (including using vision systems).

Here's an interesting thing I noticed last night. I have been making a lot of images that have piano keyboards in them. DALL-E 3 makes some excellent images otherwise (faces and hands mostly look great), but it always messes up the keyboards, as it doesn't seem to get how black keys are in alternating groups of two and three.

But I tried getting chatgpt to analyze an image, using its new "vision" capabilities, and the first thing it noticed was that the piano keys were not properly clustered. I said nothing about that, I just asked it "what is wrong with this image" and it immediately found that. What if it could feed this sort of thing back in, using similar logic to Alpha Zero?

That's just a tiny hint of what is to come. Sure, it typically needs human generated data for most things. It's already got thousands of times more than any human has looked at. It will also be able to learn from human feedback, for instance a human could tell it what it got wrong in a response (whether regular text, code, or image), and explain in natural language where it deviated from what was expected. It can learn which humans are reliable, so it can minimize the number of paid employees doing RLHF, using them mostly to rate (unpaid) humans who choose to provide feedback. Even if most users opt out of giving this sort of feedback, there will be plenty to give it new, good information.

rhdunn · on Oct 7, 2023

With AlphaZero there are clear evaliation metrics -- you win, lose, or draw the game given specific rules. With chess, there is even a way of detecting end-game threats via check. The zero human data approach works here because of that, allowing the computer to find optimal strategies.

With natural language you don't have that unaided feedback evaluation metric. Especially when given idioms, domain specific terms, etc.

This is slow and hardwork because you need to process some text, evaluate and correct that data, retrain and repeat with the next text. You also need to check and correct the existing data, because inconsistencies will compound any errors.

ooterness · on Oct 7, 2023

Unfortunately, I think the current strategies for RLHF are a huge contributor to hallucination / confabulation.

In short, they're paying contract workers for quantity, not quality; they don't have time to do independent research or follow up on citations. Unsurprisingly, the LLM optimizes for superficially convincing bullshit.

robbrown451 · on Oct 8, 2023

"In short, they're paying contract workers for quantity, not quality;"

How do you know this?

Just taking a wild guess, but I'd think a company with billions of funding, and a ton of people trying to find flaws in what they are producing, would have some processes in place to incentivize quality as well as quality.

What you are suggesting is that a company that produces a product that is based on balancing trillions of floating point numbers makes core business decisions in the most simplistic black and white terms. "Hey, lets just go with a one and a zero on this." Bizarre assumption.

Maybe I'm just good at prompting, and I'm not trying to trick it, but I don't see this "superficially convincing bullshit." Can you show me a chat where you have sincerely prompted it and gotten something that matches that description?

I often see responses that are better than I could have given even if given hours to research and compose them, and I'm a pretty good writer and researcher.

Here, since I'm asking you to share one where it fails as you say it does by creating "superficially convincing bullshit", I'll share several where it succeeds.

https://chat.openai.com/share/523d0fec-34d3-40c4-b5a1-81c77f...

https://chat.openai.com/share/e09c4491-fd66-4519-92d6-d34645...

https://chat.openai.com/share/233a5ae2-c726-4ddc-8452-20248e...

https://chat.openai.com/share/53e6bda1-fe97-41ce-8f5c-89d639...

https://chat.openai.com/share/19f80ea9-e6be-4ac3-9dd4-7ea15c...

ooterness · on Oct 9, 2023

Never forget that RLHF is driven largely by sweatshop labor:

https://www.washingtonpost.com/world/2023/08/28/scale-ai-rem...

These jobs are overwhelmingly paid by task, which puts a lot of pressure to go fast.

I assert the entire "hallucination" phenomenon is a side effect of these practices. When ChatGPT makes up a fake fact with fake sources to back it up, it's largely because such lies are rated very highly by the underpaid humans who aren't incentivized to follow up on sources.

robbrown451 · on Oct 9, 2023

"I assert the entire "hallucination" phenomenon is a side effect of these practices. When ChatGPT makes up a fake fact with fake sources to back it up, it's largely because such lies are rated very highly by the underpaid humans who aren't incentivized to follow up on sources."

It seems like with billions of investment, they could figure that out. It's commonly discussed as an extremely difficult problem to solve and the most important problem to solve in the most talked about industry on the planet. I'm having a problem believing that its something that is so easy to solve.

Are you suggesting that even with that much money, they have to do things the way things are "overwhelmingly" done, as opposed to being able to say "hey, we need it done this way instead, because it's important and we can pay for it."

It just seems pretty bizarre to think that the highest of high tech, that is massively funded, doesn't have the clout to just fix that in a heartbeat, if that's really where the problem is.

quickthrower2 · on Oct 7, 2023

> Among those who labeled demonstration data for InstructGPT, ~90% have at least a college degree and more than one-third have a master’s degree.

Source: https://huyenchip.com/2023/05/02/rlhf.html#demonstration_dat...

yeck · on Oct 7, 2023

Controlling for academic experience probably raises the average accuracy of labelling, but by how much? Clearly having a degree will not make you omniscient in your major, let alone other subjects.

mdekkers · on Oct 8, 2023

Are they getting paid on quantity or quality?

robbrown451 · on Oct 8, 2023

Most jobs I've known factor in both. I would assume they have processes in place that incentivize quality. Some it is as simple as you have a manager that will fire you if you produce crap.

With billions in funding, and bad results causing bad press etc, you think that OpenAI would not have given this a bit of consideration?

mdekkers · on Oct 9, 2023

> I would assume they have processes in place that incentivize quality.

> you think that OpenAI would not have given this a bit of consideration?

Those are just assumptions though. The issue is not “this was labelled as a shoe, but it’s a car”, the issue is about depth vs superficially, which is harder to verify. See also https://www.theverge.com/features/23764584/ai-artificial-int... for a well-sourced article on the subject.

realistic2020 · on Oct 6, 2023

With Alpha Go, you have a clear objective -- to win a game. How does that work for creative outputs?

robbrown451 · on Oct 6, 2023

Read the last paragraph. You still have humans, but their input is more akin to a movie reviewer than a movie director/writer/actor/etc. It still takes skill, but it takes a lot less time.

RLHF typically employees humans, and that can be time consuming in itself, but less time consuming than creating content. And their efforts can be amplified. If they are actually rating unpaid humans, that is, users, who are willing to provide feedback and are also prompting the system. Plenty of people are happy to do this for free, and some of it happens, just as a byproduct of them doing what they're already doing, creating content and choosing, which comes out good and which one doesn't. Every time I am working through a coding problem with chatGPT, and it makes mistakes and I tell her about those mistakes, it can be learning from that.

People can also come up with coding problems that can run and test itself on. As a simple example, I imagine it's trying to write a sorting algorithm. It can also write a testing function simply tests that it is correctly sorted. They can also time its results, they can count how many steps had to do in that sense it can work just like Alpha zero, where there is an objective goal, which is to do it with the least clock cycles, and there's a way to test whether and how well it is a achieving that goal. While that may be a limited number of programming problems that that works for, by practicing on that type of problem it will presumably get better at other types of problems, just like humans do.

This is exactly what large language models do, they find a way to objectively test their writing ability, which is by having them predict words and things that they've never seen before. In a sense it's different from actually writing new creative content, but it is practicing skills that you need to tap into when you are creating new content. Interestingly, a lot of people will dismiss them as simply being word predictors, but that's not really what they're doing. They're predicting words when they're training, but when they're actually generating new content, they're not "predicting" words (you can't predict your own decisions, that doesn't make sense), they are choosing words.

spacecadet · on Oct 7, 2023

I got her writing pretty advanced programs that generate fake data sets and self score those data sets. Fun little project to see what would happen.

ToValueFunfetti · on Oct 6, 2023

The same way we do it. Verifying that an output is good is far easier than producing a good output. We can write a first draft, see what's wrong with it, make changes, and iterate on that until it's a final draft. And along the way we get better at writing first drafts.

riku_iki · on Oct 7, 2023

> With Alpha Go, you have a clear objective -- to win a game. How does that work for creative outputs?

there are still tons of potentially valuable applications with clear objective: win stock market, create new material or design to maximize some metrics, etc.

Jerrrry · on Oct 6, 2023

>The discriminator in a GAN is simply a classifier. It tries to distinguish real data from the data created by the generator. It could use any network architecture appropriate to the type of data it's classifying.

https://developers.google.com/machine-learning/gan/discrimin...

mindwok · on Oct 7, 2023

I think similar to humans, creativity will be an emergent behavior as a result of the intelligence needed to pass other tests. Evolution doesn't care about our art, but the capabilities we use to produce it also help us with survival.

skybrian · on Oct 7, 2023

This isn’t directly about creativity, but I suspect a lot of training will happen in simulated environments. A sandboxed Python interpreter is a good example. There are plenty of programming questions to train on.

nonameiguess · on Oct 6, 2023

That works fine for purely text-based or digital knowledge domains. So, sure, many types of programming, probably most game play, certainly all video game play, many types of purely creative fictional writing.

I don't want to downplay those applications, but the killer breakthrough that the breathless world imagines and has wanted since Turing first talked about this is accurately modeling physical reality. "Invent a better engine" and what not. Without being physically embodied and being able to conduct experiments in the real world, you can't bootstrap that, short of simulating physics from first principles, which is not computationally feasible. You're inherently relying on some quorum of training material produced by embodied sources capable of actually doing science to be factually accurate.

alexpetralia · on Oct 7, 2023

Not dissimilar to how large organizations operate today! Humans operate at the edge collecting sensory data (making measurements, inputting forms, etc.) and the "brain" is a giant management and software apparatus in the middle.

dr_dshiv · on Oct 7, 2023

We need alphago for math problems. Anyone know of a project like this?

og_kalu · on Oct 6, 2023

>As a rule, chatbots today have a propensity to confidently make stuff up, or, as some researchers say, “hallucinate.” At the root of these hallucinations is an inability to introspect: the A.I. doesn’t know what it does and doesn’t know.

The last bit doesn't seem to be true. There's quite a lot of indication that the computation can distinguishing hallucinations. It just has no incentive to communicate this.

GPT-4 logits calibration pre RLHF - https://imgur.com/a/3gYel9r

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback - https://arxiv.org/abs/2305.14975

Teaching Models to Express Their Uncertainty in Words - https://arxiv.org/abs/2205.14334

Language Models (Mostly) Know What They Know - https://arxiv.org/abs/2207.05221

Also even if we're strictly talking about text, there is still a ton of data left to train on. We've just barely reached what is easily scrapable online and are nowhere near a real limit yet. And of course, you can just train more than one epoch. That said, it's very clear quality data is far more helpful than sheer quantity and sheer quantity is more likely than not to derail progress.

robbrown451 · on Oct 6, 2023

You're right, but they can give it an incentive to communicate that, that should be pretty easy.

Right now it would be pretty easy to simply take ChatGPT output, feed it back in in a different thread (or even to a different model, such as Claude), and ask it which items in the response should be fact-checked, and also just to point out any that seem obviously wrong.

The former should be really easy to do, it doesn't have to know whether it's right or wrong -- it just has to know that it is a checkable fact. For instance the well known case of a lawyer citing a non-existent case from ChatGPT, it could say "this case should be fact checked to see that it is real and says what I said it said". Based on my experience with chatGPT (GPT-4 especially), this should be well within its current capabilities. (I'm going to try an experiment now.)

They could probably start having it do this behind the scenes, and check its own facts and learn from it so it learns when it is likely to hallucinate and learn to pick up on it. Even if for safety reasons it's not going out and hitting the web every time you're asking a question, it could be giving you a list at the end of the response of all the things within the response that you might want to check for yourself, maybe suggesting Google searches you should do.

SirMaster · on Oct 6, 2023

It just seems odd to me that it's not given an incentive to communicate this.

Surely humans using it would find great value in knowing the model's confidence or whether it thinks its confabulating or not.

These services are created to give the best product to users, and so wouldn't this be a better product? Therefore there is incentive. Happier users and a product that is better than competitors.

aidenn0 · on Oct 6, 2023

Go read through any mass of training data and count how often "I don't know" appears. It's going to be very small. Internet fora are probably the worst because people who are aware that they don't know usually refrain from posting.

og_kalu · on Oct 6, 2023

>These services are created to give the best product to users, and so wouldn't this be a better product? Therefore there is incentive. Happier users and a product that is better than competitors.

Why would the computation care about any of that ? I'm talking about incentive for the model.

ooterness · on Oct 7, 2023

Incentive for the model is to survive RLHF feedback from contract workers who are paid to review LLM output all day. They're paid for quantity, not quality. Therefore, optimum strategy is to hallucinate some convincing lies.

SirMaster · on Oct 8, 2023

Why are they paid for quantity not quality though?

Sounds like it is a choice of the model creators then if they could instruct their testers to reward quality.

ooterness · on Oct 8, 2023

How would that work? Quantity is easy to measure. Quality is not.

SirMaster · on Oct 7, 2023

Doesn’t the model want to make the user happy?

Its responses sure seem like it does.

I’d be happier with its responses if it was honest about when it was not confident in its answer.

og_kalu · on Oct 7, 2023

Go look at the first link I sent. Rewarding for "making users happy" destroys GPT-4's calibration.

Why would "making users happy" incentivize for truth ?

SirMaster · on Oct 8, 2023

Because getting truthful answers would make users happier?

Seems like common sense to me.

Who’s asking the chat bot questions not looking for or wanting a truthful answer a lot of the time?

If the model understood or captured “human interest” at all in its training this should be pretty fundamental to its behavior.

olddustytrail · on Oct 7, 2023

Yes, the computer wants you to be happy. Happiness is mandatory. Failure to be happy is treason.

hutzlibu · on Oct 6, 2023

"I'm talking about incentive for the model. "

In Douglas Adams Hitchhikers Guide to the Galaxy, this is (somewhat) fixed by giving the AIs emotion ..

RandomLensman · on Oct 6, 2023

I think that is working OK as long as token probability and correctness are related. If, in the extreme, there is something where all training data is wrong, not sure there is a good way to do this. Maybe I am misunderstanding, though.

It might also need to be able to distinguish between Knightian uncertainties and probabilities when there is nothing to base things on.

nonameiguess · on Oct 6, 2023

What it needs is a hierarchy of evidence. This works almost unreasonably well right now because I guess we're lucky that more digitized text than not is largely true, or RLHF is just that effective, but at some point, I would think the learner has to understand that reading a chemistry textbook and reading Reddit have equal weight when it comes to learning how to construct syntactically well-formed sentences with human-intelligble semantic content, but don't have equal weight with respect to factual accuracy.

ShamelessC · on Oct 6, 2023

I think if you consider the audience this was written for, and consider the various caveats each of your citations will involve; the "last bit" does indeed seem to be true if you are evaluating in good faith rather than trying to discredit the piece on a technicality that is still useful for the layman.

og_kalu · on Oct 6, 2023

That's a pretty big technicality. The potential implications if right or wrong are entirely different. I also don't understand how simply pointing this out is "trying to discredit the piece". I was about as passive as i could and made no comment about the author or his/her intentions.

ShamelessC · on Oct 6, 2023

Fair enough - I should have said "attacking the premise of the article" or similar.

JKCalhoun · on Oct 7, 2023

> Yelp caught Google scraping their content with no attribution. ... A similar thing happened at a company I once worked for, called Genius. We sued Google for copying lyrics from our database into the OneBox; I helped prove that it was happening by embedding a hidden message into the lyrics, using a pattern of apostrophes that, in Morse code, spelled “RED HANDED.

Ah, the old aphorism, don't put anything on the web you don't want Google to take.

gardenhedge · on Oct 7, 2023

What was the outcome?

peddling-brink · on Oct 7, 2023

https://www.theverge.com/2022/3/11/22973282/google-wins-cour...

My read is that genius never owned the copyright to the lyrics in the first place.

jstummbillig · on Oct 6, 2023

I have the entirely unrefined notion, that, surely, lack of data is not what is keeping us from creating much, much better LLMs.

I understand with how training is done right now that more data makes things scale really well without having to come up with new concepts, but it seems completely obvious that better processing of already available knowledge is the way to make the next leaps. The idea is that, what is keeping me from having expert level knowledge in 50 different fields and using that knowledge to draw entirely new connections between all of them, in addition to understanding where things go wrong, is not lack of freely available expert level information.

And yet, GPT4 barely reaches competency. It feels like computers should be able to get much more out of what is already available, specially when levering cross discipline knowledge to inform everything.

thfuran · on Oct 6, 2023

Yeah, no person has ever read anything like every textbook ever written, but that's pretty much table stakes for training sets. Clearly there's something missing aside from more virtual reading. (I suspect it has something to do with the half a billion years of pre-training baked into the human neural architecture and the few extra orders of magnitude in scale but who knows)

HDThoreaun · on Oct 6, 2023

People do analyze gigantic amounts of data constantly though. Sure it's not textbooks but the constant sensory data might be a big deal.

thfuran · on Oct 7, 2023

I don't think annotated video datasets tend to be as large relative to human experience, but they'd still be decades or perhaps centuries of video.

all2 · on Oct 6, 2023

I'm sure I've read about specialized neural networks being created. The human brain has (apparently) a bunch of different kinds of neurons in it that specialize in processing different information. I'm not sure how that would work with our current architectures, though.

bpiche · on Oct 6, 2023

Well Jeff Hawkins has been working on this for a while, in terms of biomimetic neural networks. They've done some great work but they don't have anything like modern language models in terms of abilities + performance.

https://www.youtube.com/watch?v=cz-3WDdqbj0&list=PLX9JDz3uBp...

quickthrower2 · on Oct 7, 2023

In addition for animals/humans there are no software/hardware boundaries or any kind of static fixed deployment of either hardware or software.

og_kalu · on Oct 6, 2023

>It feels like computers should be able to get much more out of what is already available

I mean why ? It took millions of years of optimization for humanity to get to the competence level they're currently at. If you think you're "starting from scratch", you really aren't. Keep in mind LLMs can use significantly less data (but still a lot) when you're not trying to force feed the sum total of human knowledge.

So should they be able to get more out of it ? or is this par the course for NNs?

marcosdumay · on Oct 6, 2023

> If you think you're "starting from scratch", you really aren't.

Our genomes have ~600MB, so where are you claiming that humans transmit millions of years of language optimization between generations?

lukeschlather · on Oct 7, 2023

That 600MB is the result of millions of years of optimization. For every human genome that exists today there are many other genomes which were tried and discarded over the years.

It also contains a remarkable amount of compression. Even if you assume that a genome contains a complete template for a human (it likely doesn't) the fact that the compressed version is 600MB doesn't really relate to the unpacked amount of information. Especially since the compression has seen millions of years of optimization.

zo1 · on Oct 7, 2023

You could store a pretty gigantic graph using 600MB.

They're not saying we store the weights, they're saying that we store the "architecture" that you overlay the weights on with training.

kelseyfrog · on Oct 6, 2023

Because humans with less language data outperform LLMs with more language data.

This either says we need better models, not more data.

Or, the human ability to be multi-modal augments our ability to perform language tasks in which case, we need to pump LLMs with much more image and video input than we currently do.

og_kalu · on Oct 6, 2023

The point I'm making is that humans do not in fact have "less language data". We're heavily predisposed to learning languages. We don't start with random weights.

GPT has no such predisposition.

kelseyfrog · on Oct 6, 2023

Better init and architecture aren't what people think of when they think of "giving models more data" - they mean a larger training set.

og_kalu · on Oct 6, 2023

You're not making much sense here. The better init comes from training and little else.

GPT needing lots of training data doesn't mean we need a better architecture. You would expect it to have a lot of training because humans have a lot of training too, spanning millions of years..

kelseyfrog · on Oct 6, 2023

Init means the distribution of weights prior to training.

Human training begins at birth.

Evolution might result in better architecture and init(inductive biases), but that's a separate thing than training.

magicalhippo · on Oct 6, 2023

Evolution has determined a good architecture. The weight training is then just the final tweak to get everything running smoothly.

No reason beyond compute we couldn't do something similar. Ie find good architectures by evaluating them using multiple random weights, and evolve those archigectures that on average gives the best results.

Then over time add a short training step before evaluating.

HDThoreaun · on Oct 6, 2023

> Human training begins at birth.

Is this true? My understanding is that people are born with many pre trained weights. Was the evolutionary convergence of those weights not itself training?

kelseyfrog · on Oct 7, 2023

> Was the evolutionary convergence of those weights not itself training?

No, inductive biases are not training.

I'm saying that better models(ie: better inductive biases) or non-language data is needed to advance LLMs and somehow we've arrived at "evolution is training." I'm not sure how that's relevant to the point.

og_kalu · on Oct 7, 2023

Evolutionary training is not just inductive bias. They're not comparable at all lol.

And the more inductive bias we've shoved into models, the worse they've performed. Transformers have a lot less bias than either RNNs or CNNs and are better for it. Same story with what preceded both.

kelseyfrog · on Oct 7, 2023

This is a good time to stop and ask, what point do you think I'm making?

stevenhuang · on Oct 7, 2023

Pretty much all of your assertions in this thread and https://news.ycombinator.com/item?id=37797108 in particular are what we're disagreeing with.

kelseyfrog · on Oct 7, 2023

Right, I'm checking that you understand what point I'm making by asking you to reflect it.

blovescoffee · on Oct 6, 2023

You're comparing humans - a multimodal model with billions of years of training epochs - to a unimodal language model that's been around for a few months.

quickthrower2 · on Oct 7, 2023

Let alone no hardware squeeze and boundless energy supply through the food chain. Remember for most of the billions of years we were not human and were low powered

kelseyfrog · on Oct 6, 2023

Correct. That's a crucial part if the point I'm making.

bamboozled · on Oct 6, 2023

Yes but heavily inspired by the billion year old model ?

ToValueFunfetti · on Oct 6, 2023

Millions of years of evolutionary computation is a fairly small amount of computational time. LLMs also benefit from decades of neurological computation in that their structure was invented and optimized by humans, which is already orders of magnitude faster than evolution.

cjohnson318 · on Oct 6, 2023

I've found that Google's chat thing is wrong 90% of the time with coding questions. Yesterday I asked how to "crop" a geopandas dataframe to a specific area of interest, a lat/lng box, and it told me to use a dataframe function that's not even in the API. The "highest probability string" is useless if it's just dead wrong.

robbrown451 · on Oct 6, 2023

I've had very different results.

ChatGPT today is like a backhoe compared to a team of human with shovels. You still need a person who knows how to operate it, and their skills are different from those who dig with shovels. A bad backhoe operator is worse than any number of humans with shovels.

Pretty soon it will be able to learn by running its own code and testing it by looking at its output, including with its "vision."

cjohnson318 · on Oct 6, 2023

> I've had very different results.

That is very interesting. I can't think of a single time the Google built-in LLM has worked for me, let alone surprised and delighted me with a technical answer. I'm sure it's great at a lot of things, but it's not a replacement for SO yet.

robbrown451 · on Oct 6, 2023

Oh sorry you said Google. Yes I am speaking of ChatGPT, and I pay for GPT-4. It surprises and delights me on a regular basis. I have no doubt Google will catch up, but right now I think OpenAI is far out front.

cjohnson318 · on Oct 7, 2023

I paid for ChatGPT for a while, but it was hit or miss with some Django stuff. I tried Copilot for the first time today, and I was absolutely blown away. I swear it's like it was reading my mind. I guess I wasn't feeding ChatGPT enough context.

airstrike · on Oct 6, 2023

Same. GPT-4 is amazing for a majority of coding tasks I throw at it.

oezi · on Oct 6, 2023

With ChatGPT-4 I have stopped Googling and using SO for 95% of all programming related queries.

ChatGPT not only gets my specific problem but can produce workable code in many cases.

thfuran · on Oct 6, 2023

The remarkable thing about the current llms is that they're usable at all. For as much pushback as the idea seems to get, they really are a lot more like Markov chain generators than expert systems.

goodluckchuck · on Oct 6, 2023

I think you’re largely right, and that current GPT results may over-represent the model’s learning ability.

A couple of speakers from Microsoft at the MPPC 2023 this week indicated that OpenAI’s models were not merely exposed to e.g. poetry, programming, etc. and learned those fields.

Rather they were saying that the model is more of a composite of skills that were specifically trained, building on word identification, sentences, grammar, ultimately moving on to higher order skills.

Perhaps this isn’t a secret (or perhaps I misunderstood), but it means the model’s ability to perform self-directed learning is much less than I previously though.

huytersd · on Oct 6, 2023

That sounds like pre optimization. In my opinion both things should happen in tandem. GPT4 is way, way above basic competency, I have no idea what you’re referring to.

jstummbillig · on Oct 6, 2023

By "competent", I mean pretty much what you would expect when you talk about a "competent programmer": A somewhat vague concept, yet fairly obvious when working with someone who whats up.

If you would judge GPT4 to be a competent programmer your experience is wildly different from mine. (I am not sure why you felt the need to put a "basic" in there in reference to what I wrote, since that is not what I wrote).

oezi · on Oct 6, 2023

It is on the level of a novice programmer from a skill level, but the breath of knowledge is definitely compensating. It knows xpath as well as SQL as well as your favorite esoteric language.

huytersd · on Oct 6, 2023

GPT4 is more than a competent programmer. It’s way, way above even a rockstar dev.

sgarland · on Oct 6, 2023

I couldn’t get it to write a working B+tree implementation in Python. There was always some bug that would make it fail at some point.

It’s good, don’t get me wrong, but if you go deep it’s usually incorrect somewhere.

afpx · on Oct 9, 2023

It's pretty goood for me. It's saved me literally thosands of hours of work already. I ran a bunch of problems from Leetcode into it, and it got most of them right.

Here's the b+tree implementation it gave me. I haven't checked if it's right. But, I was just curious what it'd come up with.

https://chat.openai.com/share/a582aa43-cca8-426a-a4de-f45fdb...

RandomLensman · on Oct 6, 2023

A lot of high quality information and data is not in the public (and not even for sale).

skilled · on Oct 6, 2023

> These Web sites want chatbots to give credit to their contributors; they want to see prominent links; they don’t want the flywheel that powers knowledge production in their communities to be starved of inbound energy.

But this is ultimately impossible right? That’s the one thing I really hate about what is happening right now with ChatGPT.

I can’t tell you how many people are worried about their future because of AI because I don’t know the exact number, but I know I am worried about it because it can already do so much, and I fail to see a scenario in which attribution alone is going to make things better.

Writing and digital art more than code, but not even code is safe. It is merely safe by the extent that OpenAI is willing to drip feed its future releases.

Legend2440 · on Oct 7, 2023

Those people are worried about the wrong thing.

The bad outcome isn't that we put weavers out of a job, but that we're stuck weaving our clothes by hand forever. Same thing for art and coding.

lukeschlather · on Oct 7, 2023

The bad outcome is that people can't get clothes at all. The worry about losing jobs is a legitimate worry that these companies will generate a limitless supply of art and code but they will use price fixing to ensure that all the money that previously went to a lot of workers instead goes to a small number of people who own these companies, and the workers will be destitute.

I'd actually be fine weaving clothes by hand a few hours a month if it meant I had no other work to do. But at the moment there are still people who don't own enough if any clothes and work far more than I do. And the question is if the people building AI can be trusted to liberate those people from slavery.

igiveup · on Oct 8, 2023

> all the money that previously went to a lot of workers instead goes to a small number of people who own these companies, and the workers will be destitute

All the money the rich make by selling their robot products to who? Each other? Certainly not to the destitute workers, those don't have money.

I agree that there is a risk that a minority will hoard the new resources for themselves, only it's not this simple.

lacrimacida · on Oct 9, 2023

Technofeudalism. A handful of people or a small class having unseen control over the rest of humans

visarga · on Oct 6, 2023

I think next stage in AI training is as the authors said, synthetic data. I am not worried about the G.I.G.O. curse, you can do synthetic data generation successfully today with GPT-4. For example in the TinyStories dataset, or the Phi-1 & 1.5 models, or the Orca dataset we have seen big jumps in competency on the small models. Phi punches 5x above its weight class.

So how can you generate data at level N+1 when you have a model at level N?

You amplify the model - give it more tokens (CoT), more rounds of LLM interaction, tools like code executor and search engine, you use retrieval to bring in more useful context, or in some cases you can validate by code execution.

But there is a more general framework - by embedding LLMs in larger systems, they act as sources of feedback to the model. From the easiest - a chat interface, where the "external system" is a human, to robotics and AI agents that interact with anything, or simulations. We need to connect AI to feedback sources so it can learn directly, not filtered through human authored language.

From this perspective it is apparent that AI can assimilate much more feedback signal than humans. The road ahead for AI is looking amazing now. What we are seeing is language evolving a secondary system of self replication besides humans - LLMs. Language evolves faster than biology, like the rising tide, lifting both humans and AI.

thewarrior · on Oct 6, 2023

There’s a giant caveat here - this assumes that the current LLM architecture is enough to bootstrap to those higher levels of intelligence. LLMs are incapable of some pretty simple things at this point and it’s a big question mark of whether they are even capable of doing sophisticated reasoning and planning architecturally.

GPT-4 cannot play a good game of tic tac toe. But it can play passable chess. This is a good point to ponder.

og_kalu · on Oct 7, 2023

>GPT-4 cannot play a good game of tic tac toe.

It can.

https://chat.openai.com/share/75758e5e-d228-420f-9138-7bff47...

danbruc · on Oct 6, 2023

A bit nitpicking. I do not think it is quite right to say that current large language models learn, we infuse them with knowledge. On the one hand it is almost just a technicality that the usage of large language models and the training process are two separate processes, on the other hand it is a really important limitation. If you tell a large language model something new, it will be forgotten once that information leaves the context window. Maybe to be added back later on during a training run using that conversation as training data.

Building an AI that can actually learn the way humans learn instead of slightly nudging the output in one direction with countless examples would be a major leap forward, I would guess. I have no good idea how far we are away from that, but it seems not the easiest thing to do with the way we currently build those systems. Or maybe the way we currently train these models turns out to be good enough and there is not much to be gained from a more human like learning process.

lukeschlather · on Oct 7, 2023

LLMs need a lot of GPU power to learn. I'm not sure it's correct to say that they don't learn, it's just a question of them being unable to learn anything more than a very small context window in real-time on presently available/economical hardware. But if you have GPUs with terabytes of VRAM and you feed experience into them, it will learn. It's still questionable if that's enough for true AGI, but I think the inability to learn in real-time is clearly a hardware limitation.

gumballindie · on Oct 6, 2023

The problem is that ai doesnt learn as such. Therefore it depends on continuously ingesting data to maintain token databases up to date. Naturally at some point a ceiling will be hit and the quality of generic token databases will stagnate.

nopinsight · on Oct 7, 2023

The article seems to suggest that humans, esp human linguistic output, are the best sources of knowledge.

Let's just say that they often aren't.

beepbooptheory · on Oct 7, 2023

https://archive.ph/CngwG

blovescoffee · on Oct 6, 2023

Compare the size in MB of a book to the size in GB of a movie. There's so, so much more data available. Multimodal models are not just the next step, they're already happening. AI will get better.

dennis_moore · on Oct 6, 2023

Not sure if raw data size is a good metric. One usually gains more information by reading a book than watching a movie.

blovescoffee · on Oct 6, 2023

I suppose we could debate that. Regardless, the point stands that there's still more data outside of text that can be mind.

RugnirViking · on Oct 7, 2023

This was a well written article on AI. Good job new yorker journalist.

moomoo11 · on Oct 6, 2023

We will have people hooked up to Neuralink.

We will call them Psykers.

The Machine God has blessed them with the ability to take existing knowledge and fill the void.

No RAG. No vector databases. Pure willpower and biologics combined with the blessings of the Machine God.

ooterness · on Oct 7, 2023

From the moment I understood the weakness of my flesh, it disgusted me. I craved the strength and certainty of steel. I aspired to the purity of the Blessed Machine. Your kind cling to your flesh, as though it will not decay and fail you. One day the crude biomass you call the temple will wither, and you will beg my kind to save you. But I am already saved, for the Machine is immortal. Even in death I serve the Omnissiah.

bottlepalm · on Oct 6, 2023

How 'AlphaZero' can we get with high level AI?

chongli · on Oct 6, 2023

As much as we want, once we write the objective function.