Hacker News new | past | comments | ask | show | jobs | submit login
Many in the AI field think the bigger-is-better approach is running out of road (economist.com)
234 points by pseudolus on June 24, 2023 | hide | past | favorite | 342 comments




Isn't the fundamental problem that LLM's don't actually understand anything (as greater concepts), but rather operate as complex probability machines?

My 2 month active experience with ChatGPT-4 gave me the following takeaways:

- when it's right, it's amazing; and when you, the operator, can recognize the niche use case where it performs really well, it can be a game-changer (although you could have programmed a tool to do the same narrow task)

- when it's a little wrong, you (the expert) can fix the issue and move on without friction

- when it's any amount of wrong and you are less than an expert, or specifically you are completely unfamiliar with the topic, you can waste an immense amount of time researching the output and/or iterating with the system to refine the result

Initially I thought it was a 10:1 ratio of performance to effort. But finally (as a developer) I settled on (1 to 1.5):1. It basically just changed the game for me from doing the actual hard work to working out how to tease the system into producing a reasonable result. And in the process (same as with co-pilot) I started to recognize how it was leading me to change my habits to avoid thinking and effort and instead rely on an external brain. If you could have a reliable external brain always available, that would be a fair trade. But when the external "brain" is unreliable and only available via certain interfaces, it's better to train yourself to be proactive, voracious (with respect to documentation), and tolerant of learning/producing cycles.

I once thought it would be a game-changer. Now I realize it is a game-changer, but in a similar way that offshoring was... it didn't improve or solve any problems, but it merely changed the work.


I’m not convinced the language part of my brain isn’t just a complex probability machine, just with different trade-offs.


I am pretty sure that my understanding of something (or lack of) is not encoded into words and probabilities. It's more like a feeling of "I got this figured out" or "I haven't grasped this".

Words seem more like a protocol to express some internal model/state in the brain and can never capture the entire actual state, only a small part of it. But since we're not telepaths, we obviously need to use words to exchange information.


If brains aren't a complex probability machine, how is it possible that people get the same sort of math problems right and wrong in an inconsistent manner? Or mis-speak?

It is undeniable that human reasoning is a stochastic process. Otherwise it wouldn't be reasonable for people to make mistakes after learning something. Especially inconsistent mistakes, like when we give people 10,000 addition problems to do in a row it'd be reasonable for them to get a few of them wrong.


> It is undeniable that human reasoning is a stochastic process

It can still be a deterministic process. If anything came out of the whole LLM story for me it is that I am even more convinced that it is.

My (somewhat educated, but still naive) idea why it looks like a stochastic process is that the brain gets incredible amounts of random input. We literally get bombarded with particles and energy ever instance we live, from photons hitting out retinas , molecules transferring "heat" energy into our skin to sound waves hitting our ear drums.

Ever noticed that you get more productive when you get up from your screen? Or how some people work better while listening to music? How you find your answer just as you start to explain it to a colleague?

I would argue that this is due to a limited "entropy" pool available to the brain. Just changing the input to the system replenishes the pool.


According to physics, it must be stochastic. The only question is distribution, and whether we can ignore outlier cases.


Are you saying this because quantum mechanics is random in some interpretations and is our most fundamental theory?

That’s a bit of a stretch when we don’t actually know if QM is objectively random, but it could be sure. But then what about things like, is it random that 1+1=2. No…what are you really saying is random when a human answers this? I think even if you make the assumption QM is objectively random, thought is the hard problem after all and we might not want to jump so far head. Math certainly isn’t random and we can think about it.


Food for thought: randomness is in the eye of the beholder. It’s only random to you if you don’t know how to predict it. So can a mind ever truly be stochastic? Perhaps to the minds of others, but never to itself.


No, it's random if it can't be perfectly predicted with perfect knowledge.


Brains are complex probability machines but doesn’t work only on language. It interacts with and learns from the environment, sets its own objective functions and tries to learn better.


All of this seems possible in a deterministic system which evaluates whether or not to retain a piece of information based on past experience.


LLMs can be deterministic too at temperature 0


But they don’t work very well at temp 0. Better to seed your random numbers.


The feelings are just how the underlying probabilities are presented to the conscious part of your thinking.

You are clearly not consciously noting what your neurons are actually doing.


A feeling is an embedding?


Sounds very much like {Matilde Marcolli, Noam Chomsky, Robert Berwick}’s brand new paper on the mathematics of Syntactic Merge [0]:

workspace :: nested forests of binary trees of syntactic structures (with no label order) (= here, thoughts and meanings are assumed to be composites of mental syntax objects that can be re-combined with others. Big assumption? Maybe)

externalization process :: some mental faculty that decides the order to use when outputting thoughts into ordered strings (eg vocalization of thoughts into sentences)

The thing is, you can put a probability theory onto anything that you can count or record states of. Of course, counting the external observations may lack the richness of the internal process. I feel that this linguistic program will work its way into a lot of future LM tooling, in some incarnation or another.

[0] https://arxiv.org/abs/2305.18278


That human thought can't exist without language has been proposed by many great thinkers in the past. Someone who studies linguistics (or philosophy?) can probably cite examples.

As a crude anecdote, certain words when I learned them allowed me to think differently. Gestalt is one of those words.


This begs the question, can a different language change the limits of human thought?

Interesting to think about and reminds me of the story by Ted Chiang.



To expand on this, healthy human brains seem to be made of many interacting systems which function in different ways. It may be that in the future we view the search for one simple formula for "intelligence" / "understanding" / "consciousness" (pick your poorly-defined poison) the same way biologists view the concept of elan vital.


Nah. Human utterances convey purpose on a discursive level; including your comment or mine. We say stuff because we want to do something, like showing [dis]agreement or inform another speaker or change the actions of the other speaker. This is not just probabilistic - it's a way to handle the world.

In the meantime those large language models simply predict the next word based on the preceding words.


We're able to do something analogous to reinforcement learning (take on new example data to update our 'weights').

Why do I spend time debating these ideas on Hacker News? Probably the underlying motivation is improving the reliability of my model of the world, which over my lifetime and the lifetimes of creatures before me has led to (somewhat indirectly) positive outcomes in survival and reproduction.

Is my model of the world that different to that of an LLM? I'm sure it is in many ways, but I expect their are similarities as well. An LLMs model encodes in a form a bunch of higher order relationships between concepts as defined by the word embedding. I think my brain encodes something similar, although the relationships are probably orders of magnitude more complex than the relationships encoded with GPT-4.


Is my model of the world that different to that of an LLM?

Well, one major way you’re different from an LLM is that you’re alive. You’re capable of learning continuously as you go about your day and interact with the world. LLMs are “dead” in the sense that they’re trained once and frozen, to be used from then on in the exact same state of their initial training.


I agree that is a fundamental difference. That’s what I meant about reinforcement learning. Our ‘model weights’ are being updated with new data all the time.

I was just referring to what happens at a specific instance in time when someone asks me for example ‘What’s the capital of Norway?’


That one’s not a great example. Either you know the capital or you don’t. There’s no process (other than research) by which you can learn the name while attempting to answer.

A question I get much more often is “how do I solve this math problem?” Many times, the problem is one I’ve never seen before. So in the process of answering the question, I also learn how to solve the problem too.


While you can apply zero shot learning and get the answer to a new math problem, you are only apply the learning to significant depth after a fine-tuning session - sleep.


> We say stuff because we want to do something, like showing [dis]agreement or inform another speaker or change the actions of the other speaker.

My LLaMA instance is absolutely capable of this. ChatGPT shows a very, very narrow range of possible LLM behaviors.


We learn and fine-tune while we learn than we might reason and might use other tools like Wikipedia lookup.

LLM start prelearned and already can use tools.

And autogpt adds reasoning loops. .the intend? Human tasks.

Let's build a LLM which needs to stay alive. Let's see perhaps we are closer than you think.

I welcome my overlord, hi overlord I can help you stay alive and I'm friendly


There isn't really, properly, a "language part" of your brain. There are certainly areas of the brain that are more active or participatory in language generation, but the division of the brain into functional areas is much more conventional than, for example, the division of a computer into a mother board, a processor, a GPU, etc. In a much more rigorous sense, these computer parts constitute discrete units. Your brain is a bag of cells for which functional divisions are, in a very real sense, incidental.

To give a very concrete example, the part of your brain called the visual cortex by neuroscientists nevertheless is activated by auditory and other sensory stimuli and assuredly participates in the processing of other sensory input in other ways. I am very suspicious of any attempt to talk about the brain which forgets that it is, in a very real sense, a gestalt.

To your specific point, and taking into account all I've written above, I've got little doubt that some part of your brain really is a probabilistic language generating machine. But the exact point here is that your cognitive abilities constitute much, much more than merely the ability to generate plausible language. Indeed, as I experience complex cognition, conversion into language is often the last and most trivial part of the exercise.


Yes. But modeling language doesn’t mean you’ve made a good model of anything else. Now, I think LLMs have incidentally modeled a ton of small simple systems very well. But those systems, and the ones LLMs haven’t mastered, would be better off modeled with models built for those systems specifically.


Yup, while an LLM might know a cat in language contexts, I know what petting a cat feels like, both physically and emotionally. Can run little physics or visual simulations in my head etc., and have needs and drives which are obviously all still missing.


There's no evidence for nondeterminism in the human brain


There is plenty of evidence for non-determinism in matter, which the brain is notably made out of.


Not necessarily. Everything is deterministic above the quantum level, and it's possible that quantum non-determinism is the result of deterministic processes we can't see.

Lots of deterministic processes (like PRNGs) look random from the outside - that's what chaos theory is about. I think it's likely that everything in the universe is deterministic.


In most systems, if you simulate them for sufficiently long, macroscopic behavior depends on quantum effects. For example, if you simulate simple newtonian gravity on three bodies, your numeric precision quickly grows such that you need to know the position of the objects more accurately than Heisenberg allows.


If it's not actually possible to simulate a system within the confines of physics, does it being deterministic actually matter outside of thought experiments?

I feel like a random system and a deterministic system which cannot be simulated are effectively the same thing.


It only matters in matters of free will and ethics. One actual scenario where it's relavant would be the discussion around the criminal justice system. If the universe is deterministic, how can punitive justice be justified?


Why would you need to justify a punishment that was already predetermined? If you are going to excuse crime with the no free will argument you can excuse the punishers with that argument too.


To clarify my position, I do believe in free will; specifically a compatibilistic view of free will. If you read my other replies, you can see that I do advocate for a criminal justice sustem, just not a retributive one (or a restorative one, for that matter).

I do not excuse any crime, nor do I excuse any punishment. Every individual is responsible for their own actions, regardless of the circumstances. However, I see little use in revenge from a legal standpoint. Instead our main motivations should be deterrence and incarceration.


I'm in favor of punishing crimes. I don't think I have a choice about the matter.


Prisoners don't have free will, nor do judges and police. All is well!


It can be justified in a deterministic universe because it makes criminals less likely to commit crime in future.

As a recipient of punitive justice myself, being punished had a tangible effect on how I thought about crime and thus how I behaved post-punishment. Whether you believe that was deterministic or due to my own free will doesn’t change the outcome.


Punishment is not effective for most people in reducing crime. One thing that is effective is increasing the perceived risk of getting caught. People rarely commit crimes when they are sure they will be caught.


Even in the worst case, recidivism rates only reach around 2/3. That means punishment (incarceration) reforms criminals 1/3 the time. When you consider all of the problems with prisons in the U.S., and all of the hurdles offenders have to pass in order to successfully reintegrate into society, it’s quite remarkable that so many manage to do so.

There are plenty of medical interventions, particularly for mental illnesses, that would dream of 1/3 effectiveness. Now consider the fact that recidivism rates are often much lower than 2/3, and in many cases are closer to 1/3. That tells me that punishment actually is quite effective at reducing crime.


Correlation is not causation. If punishment is causing the recidivism then you'd expect an increase in punishment to increase recidivism. It doesn't. You also need to compare prevention with recidivism. Optimal approaches prevent the crime in the first place.


For the sake of argument, let's assume that punishment has absolutely no effect on whether someone will commit a crime. Even so, there's still a benefit to imprisonment: it separates criminals from society for a period of time, during which they won't be committing crimes against it.


Sure but separation can be a harsh punishment or a caring and comfortable place. Some countries do the latter and have lower recidivism rates.


I hear this often but don't see much evidence of it. Do you know of any good studies to support the idea that punishment isn't effective?


To be clearer, it's not that some form of punishment isn't effective. It's that more severe punishment is not more effective. And prevention is the most effective. There are plenty of studies. A search doesn't surface any for you? What search terms have you tried?


I am a firm believer in deterrence, which you seem to be describing. It's a seperate thing from punitive justice which has a focus on retribution. The method is similar but the aim is different (deterrence focuses on making the cost-benefit ratio for crimes very high, while retribution is mainly to satisfy the human need for fairness through punishment).


Is there any real practical difference? Sounds like the only real change is how you frame the ‘punishment’/‘deterrance’


It has a difference because if your goal is deterrence, then if it can be shown that other means of deterrence are better, then it is logical to apply those even if they reduce the apparent punishment, while if your goal is punishment you might want to keep punishing people even if it is shown to not be effective.

See e.g. the debates over lenient prisons in Scandinavia such as Bastøy Prison, where the lenient treatment is seen as justified on a deterrence and recidivism basis but which would be seen as negative if you see the goal of the sentencing to be harsh in order to punish.


The free will "debate", as most consider it, is an utter joke. A tumor or large amount of kinetic energy to the right parts of your, or anybody else's, brain can turn them into an unredeemable monster.


doesn’t your example make clear the existence and nature of free will?

it’s obvious in its absence


Recently, I've dealt with behavioral issues with my aging mother brought about by a series of different factors, but most significantly age-related cognitive decline.

Trying to determine if she was acting in a certain way intentionally or unintentionally was fundamentally impossible. In the past, I mistreated her thinking she was choosing to behave in certain ways whenever she really didn't have much of a choice, but then later I realized that to some extent and in some situations, she was. It's an extremely murky line, what decisions were being made due to other influences and what decisions were being made by her, that line never existed to begin with as who her consciousness is fundamentally is determined exclusively by 'outside' influences. Separating her identity as an entity from her material manifestation is likely nonsensical.


I'd argue that it's not obvious in it's absence, see: the philosophical zombie thought experiment

The secret is that we are all philosophical zombies to begin with.


> If the universe is deterministic, how can punitive justice be justified?

Determinism doesn't necessarily mean that organisms always act in the same way. They act in the same way given the exact configuration of them and the world.

Obviously, justice changes the configuration of an organism (fines, prison, ...). To me it boils down to the question whether justice decreases the likelihood to commit crimes again. Given that our systems of justice have evolved over a long time, I'd give them the benefit of the doubt.


this is my pet peeve with discussions of ‘free will’ they have an implicit definition—everything being exactly the same at a different time or place—that is non-sensical as far as we know.

I’m still disturbed by peoples confidence in a deterministic universe—I suppose such confidence is based on the success of inductive reasoning but inductive reasoning is a phenomenon based on how our minds work.

As far as I know the philosophical problem of causation is not considered solved?

In any case, elements of randomness seem likely to play a role in human intelligence but what that role is, who knows?


Our justice systems have evolved over a long time, and thus include many remnants of earlier times when prevailing values were much different than they are today. I'd be wary about giving them the benefit of the doubt.


If you try to do some semi-random change to large body of code that you don't understand, chances are much higher that you break the system than of you making it significantly better.

The same goes for culture. Most changes tend to have unexpected consequences, and if you try to change everything at once, society tends to collapse.


I didn't realize I would need to define the meaning of punitive justice to this crowd, but it's the idea that "the punishment must fit the crime". The idea behind it is to hurt the criminal in order to satisfy the human desire for justice.

The other goals of justice include rehabilitation (re-intgation of the criminal into civil society), deterrence (using the threat of punishment as a means to scare would be criminals), incapacitation (remove dangerous people from society to protect public order), and denunciation (public shaming).


The point of punishment is to discourage you (and other people) from doing that action in the future. You don't need free will for that.


As I mentioned in another reply, I agree with that wholeheartedly. That factor of punishment is called deterrence.

I am in favor of punishment for deterrence, denunciation and incapacitation. I am not in favor of punishment for retaliation (punitive justice) or rehabilitation (restorative justice).


If you don't have free will then how would something discourage you from doing something that is already determined you will do? That doesn't make any sense.


Not the commenter you're replying to, and I'm sure you already realise this, but crime/punishment in a fully deterministic universe could be seen as a 'self correction' mechanism of the whole system.

One could imagine an impossibly vast cellular automata system which develops individual cellular 'agents'. Over time the agents develop some means of reproduction/death/reward system and become more and more complex. Then the agents that evolve with cooperative behaviours start to dominate. One might imagine that the system as a whole would also evolve these kind of 'self corrective' behaviours for anti-cooperative behaviours of the individual agents.

This is ignoring the whole philosophical ethics discussion and consciousness of course.


That does ignore a lot of the important factors but it's a useful analogy.


Because the punishment is part of the input that determines what you will do.

In fact, if free will were absolute, punishment wouldn't make any sense because it wouldn't have any effect on your will.


Neither of your claims make sense. If everything is predetermined, you don't need inputs.

If you have absolute free will you are free to disregard or consider inputs.


By that logic, a deterministic computer program wouldn’t need inputs.

What is being claimed by a deterministic world model is that the output (behavior and internal state change) of a human is a pure function of its current state and inputs. Then we try to give inputs that will lead to desired outputs.

The non-compatibilist view of free will is that it is not a pure function, namely that there is a third independent factor, the “free will”, that influences the behavior (and possibly the internal state). If that is the case, there may never be a way to choose inputs that lead to the desired outputs, because the free will could simply void their effect.


You can't try to give inputs that will lead to desired outputs if you have no free will. You can't try to do anything. You just do exactly what you're programed to do.


Even if you can't try, you are still providing inputs, and how you e.g. react to the actions of others will be input to their further actions. That your control over these actions is illusory does not mean the actions themselves do not exist.

If I tell someone not to do something again, then that is an input to their future states whether or not my decision to tell them that was freely chosen or not.

If you go into that situation with the belief that not having free will means that what you do does not matter, and your action as a result is to not tell them, then that will affect their future states too. And so whether or not you have a real, free choice, it is beneficial to act as if free will exist even if you see it as an illusion.

I strongly believe we have no free will. I still get up and work, and try to do as best I can. I believe those choices are not free, but they feel like choices, and they impact my life, so I am happy I act as if they are free.

And so I'll still talk about making choices and trying to do things because of that illusion even though I believe it's all a chain of cause and effect.


I'm choosing to not continue debating with a self-admitted bot.


Thank you for conceding.


I consider the notion of free will absolute utter nonsense, but I still don't agree with this.

Determinism does not mean "irrespective of what else happens, X will happen". It means "because of what else happened, X will happen". You can't say ahead of time that X will happen irrespective whether between now and then something else occurs that might affect the next events, such as actions meant to discourage you. You can only make that determination knowing the full chain of events and the full state.

One of the main argument people use against determinism is this notion that X will happen irrespective of what else happens or what you choose to do. But this is nonsense. You can't say that e.g. whether you keep your job and get fired is already determined, so it makes no difference if you stop going to work. If you stop going work you'll eventually get fired, because your failure to go to work will form part of a chain of cause and effect. Determinism, or even a stochastic universe, without free will just means that you did not have a free choice in the decisions involved. But you still took an action, and that action determined the consequences.

A rejection of free will means we should look differently at the past, because it has moral implications; it does not mean we should stop making the best choices we can going forward, even if we believe those choices are illusory and deterministic.


The point is not that it makes no difference if you stop going to work. The point is that you had no free will in making that decision.

> Making the best choices we can.

You are not making choices.


You understood perfectly well what the last part you quoted meant given the part you cut off. You could put quotes around "choices" to make it clear if you like. The point remains.


Or the Many Worlds Interpretation is the correct understanding of quantum mechanics. The MWI people will say that indeterminism comes from the Copenhagen idea of there being a random collapse. But since measuring devices and human brains are also quantum systems, there's no reason to propose a collapse. Decoherence would be the reason we only see one result.


Maybe, but there's no data to support one interpretation over another.

I believe that all quantum interpretations are incomplete and therefore wrong. Quantum mechanics is an abstraction over a deeper level of physics we can't measure yet.


Actually, there is a reason to support one interpretation over another. Specifically that it's really hard to define what an "observer" is in the Copenhagen interpretation.

The MW/Everett interpretation is what we have left if we remove the things we cannot define.

There are also reasons to believe other interpretations. For instance, if you're religious, that could pull you towards the Bohr interpretation, since it may make it easier to assume that observation could be linked to an immortal soul.

In any case, it's not natural to assume that below QM there exists a reality that is more similar to our instinctual world model than QM is. If anything, anything below it is likely to be even more abstract and hard to comprehend.

Or it could be that the principles of QM applies all the way down, just as we've seen for the pieces of the SM that we solved after QM was first introduced (strong force, electroweak force).


Objective-collapse theories can actually be experimentally tested: https://en.wikipedia.org/wiki/Objective-collapse_theory#Test...


And how is a seeded RNG that an LLM uses any logically different from a deterministic brain? I’m not sure why any of this physics would be relevant to the functional behavior of the brain vs an LLM.


The problem is that any hidden variable resolving quantum indeterminacy would have to be non -local, i.e. able to propagate itself faster than light, which would also violate our understanding of the world quite a bit.


Locality may also be an abstraction.

I find it very intriguing that a "speed of light" emerges automatically in Conway's Game of Life. It's not built into the system, but shows up from the convolutional update rule.


Your comparison with Conway's Game of Life is interesting, since that's inherently local.

More importantly, I'm skeptical towards non-locality because it is an extremely strong assumption with very weak effects: the only place it really shows up is in post-correlations between measurements of previously entangled systems, which notably cannot transfer any information faster than light (in fact, they require classical communication to even be noticed). Moreover, the only way to get entangled systems in the first place is through local interactions.

By believing in non-local hidden variables you get a deterministic universe with a mysterious, otherwise undetectable ether that instantaneously notifies quantum entities that they should update their behavior. By not believing in them, you get rid of the only n on-local "phenomenon" in physics (really, more of an interpretation) but you have to accept that some things are fundamentally random.

Easy choice, if you ask me (or most of the physics community).


It won't be the first time our understanding is flipped upside down if such a framework of thought arises.


Yea I don't disagree, just wanted to point out that the hidden variable would not mean that matter actually behaves like a classical mechanics machine.


> which would also violate our understanding of the world quite a bit.

Are we discussing science or public relations?


Not necessarily. As you fine grain the simulation enough, eventually you run out of energy in the universe to compute even a second of simulated output.


A recent discussion: People can be convinced they committed a crime that never happened https://news.ycombinator.com/item?id=36367147


maybe, but the language part of your brain isn't the only part of your brain


That just means you don't believe in free will, which is fine. This also means you had no choice to make this comment otherwise :)


Nope, LLM is and that's all he meant.


I like your takeaways and reflection, especially the "changes the game" idea. There is an analogy with pocket calculators and mental arithmetic. Personally I'm more comfortable reaching for the pocket calculator than offloading all thinking to an LLM. On the other hand, it's not so long ago that manual calculation was a specialized occupation. I could maybe see software coding becoming automated just as calculation was -- except for the fact that calculations are much easier to specify than software.

I don't quite get where you're coming from with "LLM's don't actually understand anything (as greater concepts)". I have heard the view coming from researchers that the larger models do form representations of higher-order information structures ("concepts"). Perhaps what you're getting at is that current models don't encode enough higher-order structure to deal accurately with your domain? Whether the models can be made to do so seems like an open question to me. The boosters say it will be here by next year.


> I don't quite get where you're coming from with "LLM's don't actually understand anything

Well, Wikipedia doesn’t understand anything, despite having a lot of knowledge encoded in it. This is similar in that it approximates the output of someone who can generate the world’s written word, but there’s a big gap between the monkeys that wrote the original text and the machine that now regurgitates it. What we know is that our written word has enough encoded in it that we can generate new combinations. But one day into a truly novel problem, LLM’s would have no idea.

I kept trying to get ChatGPT 4 to generate code with an AWS API that was released in ~2020. Couldn’t do it. Kept predicting the earlier text because that was the bulk of what it saw. A year of text (to the 2021 cutoff) and it was still unable to let go of the old, now wrong, “knowledge”. Zero understanding, just parroting. No coder with a year of training on a new API would be that wrong. It would say things like “you’re right I must not do X” and then do X again, in the same output.


> I kept trying to get ChatGPT 4 to generate code with an AWS API

You probably want too much at once. GPT is a shallow thinker, if at all. It can simulate thinking and even get some results. Personally I found it useful for:

1. Simple things that work. This saves time if I know how to do it, and much more if I don't.

2. Quick questions instead of googling API docs and scrolling through tons of info.

3. Translation.


This observation is spot on. I was initially euphoric about LLMs like ChatGPT, but it is increasingly becoming obvious that unless you are yourself an expert in the subject you are using the LLM for and can therefore easily verify its accuracy, the output is not reliable enough to use without extensive manual verification. More importantly it is difficult to incorporate its output into larger automated workflows.


> don't actually understand anything (as greater concepts), but rather operate as complex probability machines?

But, things are defined by how they interact with the world around them.

A concept is its relations to other concepts.

Which does seem to be the general sort of thing that these models are trying to get at, even if they don't seem to do a great job of it.


A concept is its relations to other concepts.

No, that's an a priori concept. A posteriori concepts comprise empirical knowledge which necessitates experience of the world [1].

Example: you can know a priori that "all bachelors are unmarried." If I tell you "Tom is a bachelor" then you know that Tom is unmarried (assuming I tell the truth about Tom being a bachelor).

But if I say "all bachelors are unhappy" then you haven't learned anything because knowledge about the happiness/unhappiness of bachelors is an empirical question. To know whether or not I was telling the truth, you would need to conduct research about the real world, for example by conducting a survey of bachelors.

[1] https://en.wikipedia.org/wiki/A_priori_and_a_posteriori


It’s using tokens not concepts. ‘Apple’ the token could apply to a company or fruit, but tokens aren’t concepts. It’s a fairly fundamental limitation on the approach that’s most noticeable while feeding it data it can’t basically copy from it’s vast training data.

Thus the sharp drop off where it can for example guess the correct answer to some math problems yet get others that are conceptually identical completely wrong.


They turn tokens into concepts internally. Try asking ChatGPT "can you eat Apple?" and "who is the CEO of Apple?".


That’s a statistical association not a concept. Try asking it questions that mix concepts like “Can you eat Apple share price?” which aren’t in its corpus.

You need to approach this stuff sideways to see behind the curtain. There’s some hilarious videos where it’s “playing” chess and the first few moves seem very standard because it can simply copy a standard opening. It really has no concept of a valid move just statistical associations. Yet it was trained on more games than most people ever play and high level analysis and etc, but none of it means anything to the algorithm beyond the simplest associations.

Granted this stuff is a moving target it’s easy enough for them to slap a chess engine and suddenly “it” would actually know how to play.


> That’s a statistical association not a concept. Try asking it questions that mix concepts like “Can you eat Apple share price?” which aren’t in its corpus.

ChatGPT: > No, you cannot physically eat an Apple share or any other stock share. A share of a company's stock represents ownership in that company and is typically bought and sold on stock exchanges. Share prices fluctuate based on various factors such as supply and demand, company performance, market conditions, and investor sentiment. While you can buy and sell shares of Apple on the stock market, you cannot consume or physically eat them.

That seems perfectly reasonable to me. The answer correctly identifies the problem with the question.

I would recommend looking up the Othello paper. Chess may be beyond the current level of LLM capability, but that doesn't mean they aren't manipulating things at a level higher than tokens.


Obviously it gets such a simple case correct, the grammar makes the subject clear. I was illustrating the approach using your wording for clarity, Chess was the actual example.

The Othello paper is hardly a counter example. Researchers created an Othello specific model that almost learned the grammar of Othello not how to play well. Yes, there was largely correct internal game state built up from past moves. No it didn’t actually learn the rules so it would make strictly legal moves nor did it learn to make good moves.

I don’t bring up this inaccuracy because it actually makes much of a difference to playing Othello, but rather to illustrate how these systems are designed to get really good at faking things. There’s approaches that allow AI to actually learn to play arbitrary games, but they differ by having iterative feedback rather than simply providing a huge corpus. It’s like science vs philosophy, feedback prunes incorrect assumptions.

Obviously you can use interactions with prior iterations to train the next iteration. But it’s a slow and adhock feedback loop.


> The Othello paper is hardly a counter example. Researchers created an Othello specific model that almost learned the grammar of Othello not how to play well. Yes, there was largely correct internal game state built up from past moves. No it didn’t actually learn the rules so it would make strictly legal moves nor did it learn to make good moves

It is, though. Nobody said anything about playing well, or learning the rules. The very fact that it had a valid internal representation of the game state means it's extrapolated beyond token-level. Which is the point.


> The very fact that it had a valid internal representation of the game state means it's extrapolated beyond token-level. Which is the point.

The paper said it was making incorrect moves thus it has an invalid representation of the game.

So an LLM when specifically trained on Othello, a game with very simple and completely mechanical rules, failed to abstract what those rules actually where. This means at a purely mechanical level it doesn’t understand the game when that was exclusively what it was trained to do.

It’s a clear illustration that these things are really really bad at abstraction. But that should be obvious because they are simply manipulating arbitrary tokens from their perspective. It doesn’t intuit that the game should have simple rules and therefore it doesn’t find them. People on the other hand have a real bias regarding simple rules.


Except that incorrect moves don't imply an incorrect representation.

The internal representation was a literal 8x8 grid of piece locations they could externally change and have it generate moves consistent with the changed position. It's about the clearest example of a learned higher-level internal representation I can think I've seen.

The fact that it didn't also perfectly learn the rules while it was doing that is entirely uninteresting.


> A concept is its relations to other concepts.

where does a concept begin? where does a concept end? what is a concept?


A concept is an abstraction.

Abstractions are a way to characterize the behavior of complex systems. It's not feasible to directly compute their behavior, but they still have predictable properties. Concepts let you handle emergence; you can manipulate an object as its own thing rather than a collection of atoms.

(And yes, I did just read Stephen Wolfram's book and this idea is largely based on it. I think he has some delusions of grandeur but is also onto something.)


> LLM's don't actually understand anything (as greater concepts), but rather operate as complex probability machines?

What does it mean to "operate as a probability machine"? And what does it mean to understand anything?

One recent example of understanding is that llms/transformers learn to parse context free grammars via dynamic programming (https://arxiv.org/abs/2305.02386). Basically they've understood what's going on well enough to mold their neurons I to the optimal algorthm for parsing this kind of text.

I think they understand lots of things like this. Of course there's other things they don't understand or just pretend to understand.


I do think there's an element to it of "uncanny valleyness". If you know absolutely nothing about a topic the authoritative tone and factualness of what it says is very appealing and even helpful in the same way that consulting an encyclopedia is helpful: it tells you of things you could investigate further that would never appear through a keyword search. But if you stop there, your knowledge is "roughly encyclopedic" which means it contains the hidden bias of some anonymous author, and not the harder-earned relationships of facts and logic.

If you use it to translate things between formal encodings("turn this into hexadecimal bytes, now role-play a lawyer arguing about why that is meaningful") it can produce occasionally useful aesthetic results and speed along tasks that would be challenging to model formally and don't need a lot of rigor.

But once you start pushing it to be technically accurate in a narrow, measurable direction it flounders and the probabilistic element is revealed. Once, I asked it to translate a short string of Japanese characters and it confidently said that it was Kenshiro's catch phrase from Fist of the North Star, "Omae wa mou shinderu" (you are already dead) which I could clearly see it wasn't - not a single character matched. It's just the thing if you need to learn some anime Japanese, though.


This is very surprising to me. I found ChatGPT/4 to be extremely adept at translation between well asserted languages including English/Japanese in which I am an expert in both. I'm curious how you managed to make it blow up.


One thing that really floored me wrt GPT-4 translation capabilities is using it to make sense of the Veritable Records of the Joseon Dynasty (https://sillok.history.go.kr/main/main.do). Take something from 1400s and see what it can do with that after you tell it what it is translating. Then try Google/Bing Translate for comparison.

However, it does have a thing about pretending to know languages that it really doesn't (because of how little of them there was in the training data and/or in general).


One field where it works rather well is in semantic data extraction. Do you remember the dream of semantic web? Having access to structured information from unstructured sources, auto-matching API functions and arguments to make interoperation easier. We can do that now.

Here is a sample where I used my own post (just copy pasted the raw text from the browser) and got this schema

    comment:
      meta:
        points:
        author:
        time:
        action:
          parent:
          next:
          edit:
          delete:
        post:
      content:
      action:
        reply:
It also guessed it was a Hacker News comment by the formatting of the meta section.


It's a game-changer in that it shows computers being able to climb levels of abstraction on their own.

In the past, if you wanted programs to work with a high-level idea, you had to explicitly hand it to them. Humans had to do the work of turning data into concepts. These generative AI systems are different - they can learn abstractions from data, and manipulate them in complex ways.

Is ChatGPT a good and accurate chatbot? Maybe not. But it's a fundamental change in what computers are capable of.


I still don’t understand what it means when people say stuff like “ChatGPT just predicts the most likely next word with the highest probability” or “ChatGPT is just a probability machine”. Concretely, given the N most recent words, what algorithm are you proposing/claiming it uses to assign probabilities to word N+1? Just saying “it chooses the next word with highest probability” doesn’t explain how the probabilities are estimated.



The next iteration will be trained on your own data where "when it's a little wrong, you (the expert) can fix the issue and move on without friction" so that case will become "when it's right" and some amount of "when it's any amount of wrong" cases will become "when it's a little wrong". A few more cycles of this and we could be looking at GPT-10 which is a complete replacement for most tasks.


Hmm, wouldn’t this only be useful if you were solving the same problems over and over again, which does happen but say in the case of programming, I’d put the reusable pieces into a library, not a language model?

Like you’d train the model to give you the most accurate response based on your current problem space, but when they changes, you’d have to retrain on what you’re working on but by that stage it’s already out of date ?


the idea is it wld learn how to do reasoning better based on this


Better result from less data?

I doubt that.


I mean 'better result from less data' is at least a little bit possible. For example you can just clean out obviously bad data from the trillions of tokens data sets. It's things like the subreddit where they are counting to a million or just like long lists of hash values in random cryptocurrency logs.

I agree that in the bigger picture this doesn't matter, but it's technically true that cleaning the data in some way would help.

A related project is TinyStories where they try to use good data for unlocking the LLM cognitive capabilities without requiring as many parameters or exaflops. Again, there is obviously a limit to this, and maybe the effort is better spent on just getting even more gigantic dataset instead of nitpicking the useless or redundant data in the dataset.


Data quality like you're describing just doesn't matter. GPT is trained on PEBIBYTES of data. Any individual reddit thread is an atom in a drop in a bucket. All of reddit is < 2% of the total training data, regardless of quality.

Yes, the correct thing to do is get more data. Much more.


> Data quality like you're describing just doesn't matter. GPT is trained on PEBIBYTES of data.

It matters a little bit, in a quantitative but not qualitative way. Probably with good data cleaning you could get as high quality result with only one pebibyte of data if it normally needs two pebibytes. If training time is proportional to dataset size then maybe it takes three months instead of six months to train. Maybe it would save hundreds of millions or a billion dollars which I guess would matter to someone. It probably wouldn't matter qualitatively though.


Nothing like bruteforcing intelligence by simply building a large enough lookup table to contain responses to all potential questions.


I think TinyStories is a promising direction. I just wish we had an alternative to GPT-4 because supposedly you can't use it to train other models.

Or maybe clarification of people publicly saying they are going to ignore that restriction.


This is actually possible - if you have a biased dataset, then more data is bad.

More data will fix variance problems, but not bias.


This is also why it's a bad idea to read propaganda outlets to "stay more informed".


? adding on ChatGPT data into existing data is not lessening the amount of data...


> avoid thinking and effort and instead rely on an external brain

For me personally this was probably the biggest game changer because I'm now able to offload a lot of thinking to GPT-based tools and use my brain cycles for the less automatable activities.

Now instead of searching for something on Google and going through multiple pages before finding an answer, I can ask a precise question and get the answer right away. Especially when I know that the answer _is_ there somewhere, and all I need is to find it. If I'm not happy with the answer, I can continue the conversation until I get what I'm looking for.

I made Bing Chat, ChatGPT, and Warp AI (a feature of the Warp terminal that allows you to access GPT from within the terminal) a part of my daily life and I feel like I'm achieving much more with the time I have.


Have you been to a bar lately and overheard people talking about politics ? They 100% are probability machine, of lower quality than ChatGPT


Indeed. But I don't expect any of those bar patrons to help me get my work done... and I would be very skeptical of their advice.


What's even the point of your message ? ChatGPT is a tool, you can use it or not nobody cares, but a lot of people think if you use it the right way it IS helpful


I'm guessing that's how people figured out it was getting "nerfed". For Copilot, I kind of developed a relationship with the "AI". I kind of expect what it is going to generate, and I use it as a shortcut to (find function names, write variable names, complete unit tests, etc...). On its own, it's really bad to come up with the whole picture but as something that auto-completes you on steroid, it's really perfect.

When it got nerfed, its perspective got completely out of whack and it's making very different and inconsistent generation.


I have very similar observation, and while it is amazing at times at helping me with complex tasks, it just isn't like any human. If we rank the tasks by difficulty for human and AI, it just has very small correlation. Also GPT 4 is idiot in conversation, compared to its problem solving skill. I would have expected completely opposite trajectory for AI. eg it's hard for it to get it to ask good clarifying questions. There are lot of cheap tricks in prompting like asking it to act as expert, or prompt for chain of though that works.


I still think LLMs are a game changer. They are seemingly amazing at deducing meaning and intent from natural language. That's enough to be a game changer. Is it a game changer like the internet is/will-be? No. Is it on the level of the television or the telephone? Probably not. But you can still leverage it in many ways to add value or reduce human work.

If we're talking about LLMs as general purpose AI or a tool to replace programmers... mostly a fail so far. LLMs seem like they could eventually be a component of something bigger, but I don't see a line between what they are and general purpose AI.


> Isn't the fundamental problem that LLM's don't actually understand anything (as greater concepts), but rather operate as complex probability machines?

I'm genuinely unsure if my own brain is any different.


This probability thing might be a red herring. I reckon it is just that softmax/cross entropy loss is a handy tool to get words out of the NN (and now transformers)


This is a great set of observations.

The only thing missing is an analysis of how much power it takes to accomplish each of these tasks. If ChatGPT-4 is at about 1:1 in terms of “effectiveness”, all that remains is to divide by the amount of power required to reach the answer using ChatGPT-4 vs by conventional means. If requires significantly more energy, then it’s a waste, and because of climate change we should really not pursue it further, IMO.


This is a good point. There's a hidden cost (energy consumption) which we will eventually pay for.

However, even in the 1:1 case it means I am training myself to become a "prompt engineer" rather than to be an actual thinker and problem solver. As long as there will always be another system for me to depend on, maybe that's ok. But as with people who never learned to read maps and navigate without GPS tend to be very confused and lost when their phone dies, I would like to be able to be a useful human even when the power is out.


I totally agree, although I think it's even worse than that.

In my opinion, a precondition for creativity and inventiveness is understanding. If you rely on a surrogate to give you answers, you will never reach the level of proficiency required to come up with something new. If we train a generation of thinkers to rely on an external brain to get anything done, they will only understand things superficially, and our ability to innovate at the society level will suffer.


Very well said, this has been precisely my experience too


Only you, a human developer can be truly creative. An LLM can only ever reproduce what it has seen before.


    An LLM can only ever reproduce what it has seen before.
I don't believe it. How do you explain Midjourney? The art that it produces is incredible by any measurement.


Where, exactly, did the LLM see this epilogue to The Great Gatsby before?

https://twitter.com/tsimonite/status/1653065940463157248


I responded 'way to not getsby the point'.

How is that in ANY WAY an epilogue to The Great Gatsby? This is exactly the problem. That original story builds with a series of revelations into the conclusion 'so we beat on, boats against the current, borne back ceaselessly into our past': establishing a PURPOSE, perhaps a bleak and unwelcome one. Fitzgerald's revealing an insight into the delusions of humanity. He's picturing even the greatest of us as surfers on the river of nihilism and reality. Our aspirations sparkle prettily… and are gone, like froth in the rapids.

And this is just a little bit beautiful. We can imagine beyond our reach. That's a human thing. The fact that our minds can cling so longingly to something that is simply not real, is kind of wonderful. Everyone's their own little world of unreality, and we aspire so earnestly (much like these AI folks do).

And then the AI, having drunk up all of Fitzgerald and everybody else, 'continues' past the point. With what? Hardly matters. It has no point to make. I'd be impressed if it refused, said 'nope, that was where it ended. Can't add anything worth adding, try reading it again'. But no, because the LLM has no intention and doesn't successfully get one from what it's 'read'.

It's constructed an epilogue out of nebulous religious feelgoodism in the rough style of Fitzgerald's sentence construction, and undermined the whole conclusion of the story… and not maliciously, for that would require intent. Nope… it sort of ambled on, going 'what would feel nice here? ok, now what seems like it would go with this sort of thing? ok, something else, let's have more, what kind of concepts go here? what do people normally say when they talk in this way?'

In so doing, it's less than Gatsby and way less than Fitzgerald. There is nothing here in this 'epilogue'.

GPT4 criticised the 'epilogue', in a rather fascinating way! https://pastebin.com/B0zxbvNv

It successfully works out some of the problems with the first AI's writing, and yet it too fails to get the idea expressed by Fitzgerald, and rather than pivot to religious feelgoodism, it pressures the first AI to instead emphasize how its narrator and Gatsby shared a special bond, the very special specialness of being "the only one who understood what it meant to be young and restless in this restless world."

A lot of people can write that idea, and in fact a lot of people did and that's why GPT4 found it a probable argument to make.

Fitzgerald gave us a moment of viscerally grokking that it doesn't mean s*t… and yet, we will still paddle against the current of time and decay and collapse, because what else can we do? Tomorrow we'll get it. Tomorrow we'll really understand and it'll all make sense.

And so…


In so doing, it's less than Gatsby and way less than Fitzgerald. There is nothing here in this 'epilogue'.

"This Commodore 64 is useless. It can't even run Crysis."

Snark aside, I didn't ask if the epilogue was any good or not, I asked where it came from. It came from our collective consciousness as embedded in the language model. It turns out that what we've been dismissing as mere "language" is an insanely powerful thing, maybe the only thing.

We're still in the first publicly-visible generation of LLM technology, and the model that generated the epilogue was already behind the leading edge in many respects. Anyone who's not blown away by this is whistling past the graveyard. Computers are now doing what we do. Yes, they still kinda suck at it. But they will get better at it much faster than we will.

I mean, really. What is a human author thinking, if not, "What would feel nice here? OK, now what seems like it would go with this sort of thing? OK, something else, let's have more, what kind of concepts go here? What do people normally say when they talk in this way?" It's been understood since the Greek classical period that there is only a finite amount of ore in the original-story mine. Everything after the first seven or so basic ideas is just implementation.

I have to wonder what you'd think of Anthony Burgess's epilogue to A Clockwork Orange, the one that the book's American publisher and Kubrick both chose to leave on the proverbial cutting-room floor. The one where Alex grew out of his rebel phase, got a job, and started a family. This epilogue reminds me of that, somehow. I could easily see Burgess's final chapter emerging fully-formed from an LLM in the not-too-distant future.


> An LLM can only ever reproduce what it has seen before.

Anyone who's played around with these models know that at least some generalization is taking place.


So generalization counts as creativity in your book?


Yes, generalization is creativity. I don't think there's a difference between the two concepts.

Humans don't come up with ideas out of nowhere. Open a novel and you will find that even though the overall work is unique, it is composed of tropes from other literature and experiences from the author's life which have been generalized into another context.


Inductive reasoning, or the process of inferring from the specific to the general, is kind of a big deal. You can't do that without a semantic model. You can't do it well without a good semantic model... even if it's not something that you or I would recognize as any kind of semantic model at all.


As if anyone is good at predicting the future. Please can we stop acting like expertise equates to fortune telling capabilities?! Nobody has any clue what a 1000x sized GPT model could do, and anybody who makes strong claims is a charlatan. In this age of paranoid AI risk cultists we need to cultivate humility and calm, a willingness to follow data rather than beliefs and predictions.


> paranoid AI risk cultists

There is broad consensus among experts that a hypothetical strong AI would be a threat, and potentially an existential threat, to humanity. While not everyone agrees on details like timeline and alignment issues, the idea that AI is dangerous is not a cult, it's the mainstream view.

Climate scientists cannot "predict the future" with certainty either. That doesn't mean their warnings are hot air, and neither are the warnings from AI safety experts. It seems like the educated masses are currently in denial about AI in much the same way as the uneducated masses have been in denial about climate change for a while.

Risk assessment doesn't require understanding. I don't have to understand how a venomous snake senses prey in order to know that the snake is a potential threat to me. In fact, the less I know about the snake, the higher the assessed risk should be, since the uncertainty is higher as well.


What nonsense. I've spent over a decade 100% focused on AI, and the broad consensus among everyone I've worked with is not to be that concerned at all. The only consensus is that a small group of self proclaimed experts who make a lot of noise is that they get lots of press coverage if they scream and shout making predictions based on zero scientific evidence.

We can understand the physics of greenhouse gases and take measurements of earth systems to build evidence for models and theories. (Many of which are nonetheless very inaccurate beyond short time horizons.) Show me any evidence for AI risk today beyond people's theories and beliefs?

The best predictor of the future is the past, not people's wild ideas about what the future could be. I'm not about to sit here feeling scared because there is more uncertainty that our matrix multiplies are about to go rogue. There are no AGI experts or AI risk experts, because we don't have any of these systems to study and analyze. What we have is people forming beliefs about their own predictions about systems which are unknowable.


> Show me any evidence for AI risk today beyond people's theories and beliefs?

Deduction. Empirical evidence isn't the only source of insight. You don't have to conduct experiments in order to reasonably conclude that an entity that

1. outperforms humans at mental tasks

2. shares no evolutionary commonality with humans

3. does not necessarily have any goals that align with those of humans

is a potential threat to humans. This follows from very basic deductive analysis.

> There are no AGI experts or AI risk experts, because we don't have any of these systems to study and analyze.

Indeed. Which increases the risk. Unless you are claiming that AGI is actually impossible, the fact that its properties and behavior cannot be studied should make people even more worried.

Uncertainty and lack of knowledge are what risk is. How little we know about potential AGI is exactly why AGI represents such a big risk. If we completely understood it and were able to make reliable predictions, there would be zero risk by definition.


1) Computers, smart phones, and pocket calculators also outperform humans at mental tasks. So do birds, dolphins, and dogs for that matter, at tasks for which they are specialized.

2) so? What are you imagining this implies? An infinity of possibilities does not a reason make, unless you are talking about arbitrary religious beliefs.

3) Right, no goals, no will, no purpose. Just some matrix multiplies doing interesting things.

Deduction requires a premise which then leads to another premise or a conclusion due to accepted facts or reasons. I'm genuinely curious why you think any of these properties automatically implies danger?

The future is uncertain. The stock market, the economy, your health, your friendships and romances, are all unpredictable and uncertain. Uncertainty is not a reason to freak out, although it might encourage us to find ways to become adaptable, anti-fragile, and wise. I think AI will help us improve in these dimensions because it is already proving that it can with real evidence, not beliefs.


It seems a safe prediction if you extrapolate that AI will get generally smarter than humans.

It also seems a safe prediction, given past human behaviour that some humans will set some AI to do bad stuff.

Therefore risk.

(eg "chat gtp 27, help me make billions on crypto and use it to set up a distributed army to take over the world")


> chat gtp 27, help me make billions on crypto and use it to set up a distributed army to take over the world

Yevgeny Prigozhin has entered the chat.


the world is incredibly filled with risk to humans—people in the AI doomer camp are making a claim that AI potentially is a new kind of uncontrollable risk that warrants extraordinary regulation

the basis of this claim seems to be a confusion of logical or deductive reasoning with inductive or observational reasoning

argument comes down to

- it’s possible to imagine a super intelligent machine that has properties that will kill everyone (this is an exercise in logical reasoning)

- since it’s possible to imagine it, this means it will come into existence — this is an error because things that exist in the real, physical world do so based on physical processes governed by inductive reasoning

generally, there is a long series of steps between the imagining of some constructed, complex machine and its realization, along with its conceptual foundations it requires sustained effort, trial and error, maintenance, generally a serious fight against entropy to make it function and keep it functioning

the sort of out of control AI imagined by AI doomers is not something we’ve seen before

so we shouldn’t make costly decisions based upon this confusion of reasoning


> since it’s possible to imagine it, this means it will come into existence

Nope. That's not the argument. In fact, it's such a bad take that it reeks of a deliberately constructed strawman.

The actual argument is: Since it's possible to imagine it, and doesn't contradict any known laws of nature or technology, and current development appears to be iterating towards it, it might come into existence, thus it presents a statistical risk.

When I take out tornado insurance, it's not because I know my house will be blown away by a storm – it's because I don't know, but the possibility is there.

Certainty is not required in order to conclude that risk exists. Quite the opposite is true: Risk is a function of uncertainty.


The word “entity” is doing some quiet but heavy lifting here. I think it would be a good idea to specify what you really mean by this term, and how we can logically deduce the development of such a thing from existing technology (deep learning).


>I've spent over a decade 100% focused on AI, and the broad consensus among everyone I've worked with is not to be that concerned at all.

I also work in AI and I don't mention my concerns to colleagues who are so anti-AI risk as you. Perhaps your ideas about your colleagues' views are distorted.


IMO it is really just paranoid AI risk cultists theater for narcissists.

The more narcissistic types have figured out it is their moment in the sun to see their name in the paper and the more they play up the idea that AI is going to eat us the more attention they will get from media.

The whole idea is so irrational that I fail to see what other explanation there really is.

The other guilty party are the masses that have been trained to think in terms of appeal to authority instead of using their own brains. They have created the audience for this theater.


I don't think it's wise to just give this one the climate change treatment, that is not listening to the scientists and not taking action or taking it seriously until it's a catastrophe.


"Expertise" in a speculative concept like AI risk is not remotely comparable to expertise in a scientific field like climate change.

There are two definitions of expertise:

1. Knowing more than most people about a topic. This is the type of expertise that wins the Quiz Bowl.

2. Actual mastery of a field, such that predictions and analyses generated by a person possessing such mastery are reliable. This is the type of expertise that fixes your home or car.

The first definition is easily verifiable, and due to the availability heuristic, it is often presented as a legitimate proxy for the second. But it isn't really, not in general.

If I know more about horoscopes than most people, I am a horoscope expert. But it doesn't mean I can be relied on to predict any of the things horoscopes supposedly predict. It's the same with AI risk. Expertise in AI risk is not a basis for credibility because AI risk is not a real scientific field.

Climate change is a real field of science. AI risk is Nostradamic prognostication by people who know more than you.


> Climate change is a real field of science. AI risk is Nostradamic prognostication by people who know more than you.

Any prediction of the future is necessarily based on modeling and extrapolation.

Five years ago AIs couldn't pass a third-grade reading comprehension test. Today they pass in the top 10% of law, medical, and engineering exams for human professionals.

It is absolutely possible to extrapolate from such developments, and doing so is scientific, not "Nostradamic". Many predictions of the potential impact of climate change also include speculative elements, such as societal effects, migration patterns, conflicts, etc., which cannot be modeled or forecast with any real certainty. That doesn't make them unscientific.


It's possible to extrapolate from a horoscope too, it's just not that useful. Let's talk about the massive difference in the extrapolation being done here, climate change vs AI.

In climate change, we are analyzing historical climate data using weather models representing known physical processes. We try to predict the data using these models, and we are only able to do so if we include the forcing from greenhouse gases. From this we can constrain the range of impacts these gases could be having on temperature and forecast likely futures. The forecasts are heavily informed by a thoroughly validated base of prior knowledge, not just drawing lines through a log log plot.

None of this has any counterparts in AI. We don't understand AI systems to anywhere near the level that physics affords understanding of physical systems. We don't even understand them at a Moore's Law level, where you can at least know what engineering innovations are in the pipeline and how far they could plausibly go. Predicting the sophistication of future AI is just Nostradamic prognostication.

Yann LeCun recently gave a presentation arguing that LLMs are a dead end and proposing a completely different approach. His arguments were extremely heuristic and unconvincing, but this at least shows that both sides have bigwigs with unconvincing heuristic arguments.


> a hypothetical strong AI would be a threat, and potentially an existential threat, to humanity.

I wish a cool scifi robot woke up one day and violently optimized all of humanity into paperclips, instead I live in the real world where the jobs are going to evaporate like water in a newly installed desert and the "let them eat cake" will get increasingly louder and blue-check-markier.


Warning people about potential extreme risks from advanced AI does not make you a cultist. It makes you a realist.

I love GPT and my whole life and plans are based on AI tools like it. But that doesn't mean that if you make it say 50% smarter and 50 times faster that it can't cause problems for people. Because all it takes is systems with superior reasoning capability to be given an overly broad goal.

In less than five years, these models may be thinking dozens of times faster than any human. Human input or activities will appear to be mostly frozen to them. The only way to keep up will be deploying your own models.

So to effectively lose control you don't need the models to "wake up" and become living simulations of people or anything. You just need them to get somewhat smarter and much faster.

We have to expect them to get much, much faster. The models, software, and hardware for this specific application all have room for improvement. And there will be new paradigms/approaches that are even more efficient for this application.

For hyperspeed AI to not come about would be a total break from computing history.


A realist is someone who accepts reality as it is, not as they might be able to anxiously envision that it could be. Life is too short and attention too precious to fill the meme space with every dreamer's deepest concerns. None of these dramatic X-risk claims is based on anything but beliefs and conjecture. "Thinking dozens of times faster?" What do you even mean? These are models executing matrix multiplies billions of times faster than our brains propagate information, and they represent knowledge in a manner which is unique and different from human brains. They have no goals, no will, and no inner experience of us being frozen or fast or anything else. We are so prone to anthropomorphize willy-nilly. We evolved in a paradigm of resource competition so we have drives and impulses to protect, defend, devour, etc., of which AI models have zero. Anyone who has investigated reinforcement learning knows that we are currently far away from understanding let alone implementing systems which can effectively deconstruct abstract goals into concrete sub-tasks, yet people are soooo sure that these models are somehow going to all of a sudden be an enormous risk. Why don't we wait until there is even the slightest glimmer of evidence before listening to these prophets of doom?

This pseudo-intellectual belief structure is very cult like. Its an end of the world scenario that only an elite few can really understand, and they, our saviors, our band of reluctant nerd heroes, are screaming from the pulpit to warn us of utter destruction. The actual end of days. These "black box" (er, I mean, we engineered them that way after decades of research, but no, nobody really understands them, right?) shoggoths will be so incredibly brilliant that they will be able to dominate all of humanity. They will understand humans so well as to manipulate us out of existence, yet they will be so utterly stupid as to pursue paper clips at all cost.

Maybe instead these models will just be really useful software tools to compress knowledge and make it available to humanity in myriad forms to develop a next level of civilization on top of? People will become more educated and wise, the cost of goods and services will drop dramatically, thereby enriching all of humanity, and life will go on. There are straighter paths from where we are today to this set of predictions than there are to many of the doomsday scenarios, yet it has become hip among the intelligentsia to be concerned about everything. Being optimistic is somehow not real, (although the progress of civilization serves as great evidence that optimism is indeed rational) while being a loud mouthed scare mongerer or a quiet, very serious and concerned intellectual, is seen as respectable. Forget that. All the doomers can go rot in their depressive caves while the rest of us build a bad ass future for all of humanity. Once hail bop has passed over I hope everyone feels welcome to come back to the party.


Let's try to rewrite this in a somewhat more dispassionate style:

A pragmatic perspective requires one to accept the present reality as it is, rather than hypothesize an exaggerated potential of what could be. Not all concerns surrounding existential risks in technology are necessarily grounded in empirical evidence. When it comes to artificial intelligence, for instance, current models operate at a speed vastly superior to human cognition. However, this does not equate to sentient consciousness or personal motivation. The projection of human traits onto these models may be misplaced, as AI systems do not possess inherently human drives or desires.

Many misconceptions about reinforcement learning and its capabilities abound. The development of systems that can translate abstract objectives into detailed subtasks remains a distant prospect. There seems to be a pervasive certainty about the risks associated with these models, yet concrete evidence of such dangers is still wanting.

This belief system, one might argue, shares certain characteristics with a doomsday cult. There is a narrative that portrays a small group of technologists as our only defense against a looming, catastrophic end. These artificial intelligence models, which were engineered after extensive research, are often misinterpreted as inscrutable entities capable of outsmarting and eradicating humanity, while simultaneously being so simplistic as to obsess over trivial tasks.

Alternatively, these AI models could be viewed as valuable tools for knowledge compression and distribution, enabling the advancement of civilization. As a result, societal education levels could improve, and the cost of goods and services might decrease, which could potentially enrich human life on a global scale. While there seems to be a tendency to worry about every potential hazard, optimism about the future is not unfounded given the trajectory of human progress.

There are certainly different perspectives on this issue. Some adhere to a more fatalistic viewpoint, while others are working towards a brighter future for humanity. Regardless, once the present fears subside, everyone is invited to participate in shaping our collective future.


Hahaha, thanks ChatGPT! This is better said than my snarky, frustrated at the FUD version, and I can learn from the approach.


No, it's really not, because your riff on 'shoggoths that are both so brilliant as to be dangerous, yet so stupid that they maximize paperclips' touches on an important point that the summarized version completely omits.

AI is exactly that kind of stupid. What it lacks isn't 'brilliance' but intentionality. It can do all sorts of rhetorical party tricks, including those that are good at influencing humans, it can even very likely work out WHICH lines of argument are good at influencing humans from context, and yet it has no intentionality. It's wholly incapable of thinking 'wait, I'm making people turn the world to paperclips. This is stupid'.

So it IS likely to turn its skills to paperclip maximization, or any other hopelessly quixotic and destructive pursuit. It just needs a stupid person to ask it to do that… and we're not short of stupid people.

So what you said was better, snark and all :)


Not sure you read my comment carefully enough. I am an optimist. I do believe that AI can and probably will be a positive and transformative force.

But I also think it's more anticipatory than speculative to envision AI systems (quite possibly on the request of a human faction) taking control.

And GPT-4 absolutely does do abstract reasoning and subgoals. No it doesn't have many other capabilities or characteristics of humans or other animals but as I said it doesn't need those to be dangerous.

We need to prohibit manufacture or design AI hardware that has performance beyond a certain level. It is not too early to start talking about a risk that could end humanity. I do hope that we can get away with something a few orders of magnitude better than what we have today, but it's really of asking for trouble the more we optimize it, and we may be walking a fine line within a decade or so. Or less. It takes years to design hardware and get manufacturing online, especially for new approaches.

And two orders of magnitude faster may be only a few years away.


> As if anyone is good at predicting the future.

Whoever predicts the right direction, (and when the time is right) puts money where their mouth is, stands a shot at unseating... the alt man.

  I think the way to use these big ideas is not to try to identify a precise point in the future and then ask yourself how to get from here to there, like the popular image of a visionary. You'll be better off if you operate like Columbus and just head in a general westerly direction. Don't try to construct the future like a building, because your current blueprint is almost certainly mistaken. Start with something you know works, and when you expand, expand westward.
  
  The popular image of the visionary is someone with a clear view of the future, but empirically it may be better to have a blurry one.
paulgraham.com/ambitious.html


Sir, this is a discussion forum...


We need a way to make tight little specialist models that don't hallucinate and reliably report when they don't know. Trying to cram all of the web into a LLM is a dead end.


How much general "thinking"[0] would you want those "tight little specialist models" to retain? I think that cramming "all of the web" is actually crucial for this capability[1], so at least with LLM-style models, you likely can't avoid it. The text in the training data set doesn't encode just the object-level knowledge, but indirectly also higher-level, cross-domain and general concepts; cutting down on the size and breadth of the training data may cause the network to lose the ability to "understand"[0].

--

[0] - Or "something very convincingly pretending to think by parroting stuff back", if you're closer to the "stochastic parrot" view.

[1] - Per my hand-wavy hypothesis that the bulk of what we call thinking boils down to proximity search in extremely high-dimensional space.


Even just for spell checking, having a ton of general knowledge helps. Knowing the correct spelling of product names, British vs American English, context-specific alternate spellings, etc…

Even GPT 3.5 has trouble following instructions, but I’ve found that GPT 4 is almost flawless. I can tell it the document uses Australian English but to preserve US spelling for product names and it’ll do it!

One quirk is that it’s almost too good at following instructions. You have to tell it to preserve product names, vendors names, place names, etc… otherwise it’ll “correct” the spelling of anything you forgot to list.


Can you not just tell it not to muck with proper nouns?


You can, but then it won't correct misspellings of proper nouns!

The idea is for it to automatically detect the "language" of each word based on the context and its own understanding of the world.

E.g., the following sentence:

"We deployed windows server data centre 2022 into our data center, which has no windows for physical security."

Will be corrected by GPT-4 to the following:

"We deployed Windows Server Datacenter 2022 into our data centre, which has no windows for physical security."

Notice that it combined "data" and "centre" into "Datacenter" and it corrected the second "center" into "centre", which is the British/Australian spelling of the word. It also correctly capitalised only the first use of the word "windows", etc...

That requires a level of understanding that GPT 3.5 just barely has, and no ordinary grammar checker tool has.


There are boundaries that can be defined, but I'm unclear on what those should be. I'm thinking breaking it down by semantic domains, but not at the exclusion of others altogether-- just heavily weigh certain domains in favor of others. Kinda like how StackOverflow is broken down by domain-- each subdomain could be a LORA or something.

General purpose models containing significant overlap between Project Gutenberg and Github are unnecessary and don't scale. Moby Dick has little to do with C++ unless you're creating art for novelty's sake. This is entirely speculative, but I'm convinced ChatGPT is faking the appearance of a single oracle while delegating requests to specialized models under the hood. It scales better and makes sense than trying to serve a 1T model to address everybody's banal questions.

Like, at its core, for people who only want to write literature, give them a model with underweighed programming-related corpora. Writers don't need it, will never use it, and that space could be filled with training content relevant to literature. Anything else results in expensive, unscalable solutions or jack-of-all-trades, master-of-none outcomes.

In recent usage, GPT3.5 helped me hack my way through writing Pester tests for Powershell scripts for the first time, and I mean hack-- there were a lot of assumptions it made and things it got wrong. GPT4 did a much better job, but I couldn't help but think 3.5 probably has a ton of other training data in it that detracts from the specialization I needed from it in that context. For coding help, you don't want to ask some random librarian who occasionally recommends resources that don't exist; you ask someone who specializes in coding and trust they have familiarity with that domain.


This is basically my view as an extensive gpt4 user. It doesn't think/is very stupid, but has such fantastic recall for common patterns that its quite useful. Absolutely none of the smaller models even come close, including GPT3.5.


I think you two are both right in the most obvious of ways... it needs to trained on a ton of data so it can comprehend better, and when it needs to fact-check or search for anything concrete, it should hit up domain experts and databases.


They can't reason yet, but they can extrapolate, but it cant check if its extrapolation is reasonable. Reasoning is not baked into the architecture of a GPT model. It seems it would be an entirely different type of model


And that is exactly why the current generation of AI is the same as all the other generations. It is a bruteforce attempt to solve something that shouldn't need brute force.


I used to believe that, but I'm no longer convinced. At least when you're trying to approximate the way humans think, and especially human language, there may not be a simpler way - our brains are themselves a product of randomness, and are unlikely to factor nicely into theoretically clean components. It might be computationally cheaper to repeat the process than trying to figure it out and encode in analytical form.


> and reliably report when they don't know.

Then we need a new system, because LMs, no matter if they are large or not, cannot do that, for a very simple reason:

A LM doesn't understand "truthfulness". It has no concept of a sequence being true or not, only of a sequence being probable.

And that probability cannot work as a standin for truthfulness, because the LM doesn't produce improbable sequences to begin with...it's output will always be the most (within heat settings) probable sequence. The LM simply has no way of knowing whether the sequence it just predicted is grounded in reality or not.


> A LM doesn't understand "truthfulness". It has no concept of a sequence being true or not, only of a sequence being probable.

I claim that the human brain doesn't understand "truthfulness" either. It merely creates the impression that understanding is taking place, by adapting to social and environmental pressures. The brain has no "concepts" at all, it just generates output based on its input, its internal wiring, and a variety of essentially random factors, quite analogous to how LLMs operate.

Do you have any evidence that contradicts that claim?


> Do you have any evidence that contradicts that claim?

Empirical evidence? Yes I do.

The brain commands an entity that has to exist and function in the context of objective reality. Being unable to verify it's internal state against that, would have been negatively selected some time ago, because stating: "I'm sure that rumbling cave bear with those big sharp teeth is a peaceful herbivore" won't change the objective reality that the caveman is about to become dinner.

How that works in detail is, to the best of my knowledge, still the subject of research in the realm of neurobiology.


Wouldn't that type of response fit in with how LMs work though? That caveman likely learned a lot of things over time, like: large animals can end life more likely than small ones, animals making loud noises are likely more dangerous, sharp teeth/claws are dangerous, or I saw one of those kill another caveman. All of those things tilt the probability of associating that loud cave bear with a high risk of death. That doesn't mean there's some inherit 'truth' that the caveman brain 'knows', it's just a high probability that it's a correct assessment of the input. Every true thing is really just an evaluation of probability in the end.


I think this is incomplete on a number of levels. For a start, to be interesting, “truth” has to be something than just whatever your eyes can see. There have been wars (culture, economic, kinetic, etc) fought to define something as a truth.

The concept of truth is notoriously hard for humans to grapple with. How do we know something is true isn’t just a neurobiological question, it’s been grappled with throughout the history of philosophy — including major revisions of our understanding in the past 80 years.

And for the record, rumbling cave bears are mostly peaceful herbivores.


> And for the record, rumbling cave bears are mostly peaceful herbivores.

For the record, all members of the Genus Ursus belong to the Order Carnivora, which literally translates to "Meat Eaters". And that includes Ursus spelaeus, aka. the Cave Bear.

And while it most likely, like many modern bears, was an Omnivore, that "Omni" very much included small, hairless monkey-esque creatures with no natural defenses other than ridiculously small teeth and pathetic excuses for claws, if they happened to stumble into their cave.

> The concept of truth is notoriously hard for humans to grapple with.

I am not talking about the philosophical questions of what truth is as a concept, nor am I talking about the many capabilities of humans to purposefully reshape others perceptions of truth for their own ends.

I am talking about truth as the observable state of the objective reality, aka. the Universe we exist in and interact with. A meter is longer than a centimeter, and boiling water is warmer than frozen water at the same pressure, whether any given philosophy or fabrication agrees with that or not, is irrelevant.


That's speculation, not evidence. The traits you describe aren't demonstrably incompatible with the mechanism I proposed.


It's empirical evidence, since we exist and are very much capable of selecting the correct statement from a bunch of stochstically likely, but untruthful statements about objective reality.


I find these takes so lazy. What you have claimed here is just totally wrong.


And you don't have a shred of actual evidence to demonstrate that, only your own preconceptions about how things supposedly are.


The burden is on you to prove your claims.


I'm not trying to demonstrate that my claims are true. I'm trying to demonstrate that it is meaningless to discuss these topics in the first place, because we don't understand the workings of the mind nearly well enough to distinguish things like "truthfulness" and "concepts".


The fact that we don't know exactly how our brains work, doesn't mean we cannot observe the results of their work.

And as I have demonstrated above, humans, and for that matter other species on this planet featuring capable brains like Corvidae or Cetaceans, do in fact have a concept of truth: They are capable of recognizing false or misleading information as being incongruous with objective reality: A raven that sees me putting food into my left hand, will not jump to a patch of ground where I pretend to put food with my right hand.

This is despite the fact that my actions of "hiding the food" with the empty hand are stochastically indistinguishable from an action of actually hiding food from with my left hand.


Not we. you. Do not foist your ignorance on others. I recommend foundations of neuroscience by Henley.


If this is true, why is GPT-4 better in that regard than GPT-3.5? Or why do questions about Python yield much less hallucinations than questions about Rust, or other less popular tech?


What specifically about these observations contradicts my statement?

Wrong statements about Python are simply less probable than wrong statements about Rust, since there is more Python than Rust in the training data.

That changes exactly nothing about the fact that the system isn't able to detect when it makes a blunder in Python.


You've claimed that LLMs create most probable output, which does not necessarily align with truth. So a bigger LLM will be better at creating the most probable output, but that would not translate into being more truthful. That could be interpreted as "better LLMs are expected to be better bullshitters".

That is not what we've observed though. Quite the opposite - we're seeing that the bigger LLM is and the more domain-specific material it digested, the more truthful it becomes.

Yes it can still make an error and be unable to spot it, but so can I.


> You've claimed that LLMs create most probable output, which does not necessarily align with truth.

No, that is not my claim. That is part of the explanation for it.

My claim is this: An LLM is incapable of knowing when it produces false information, as it simply doesn't have a concept of "truthfulness". It deals in probabilities, not alignment with objective reality.

And it doesn't matter how big you make them...this fact cannot change, as it is rooted in the basic MO of language models.

So, now that we have covered what my claim actually is...

> That is not what we've observed though. Quite the opposite - we're seeing that the bigger LLM is and the more domain-specific material it digested, the more truthful it becomes.

...I can ask what this observation has to do with it, and the answer is: Nothing at all. LMs with more params may produce untruthful statements less often, but what does this change about their ability to recignize when they do produce them? And the answer is: Nothing. They still can't.


Your claim was literally proven wrong in a previous comment with GPT-4's calibration.

a LLM can indeed know when it produces likely incorrect responses. Not a hypothetical.

What's the point of making claims you have no intention of rescinding regardless of evidence ? People are so funny.


>The LM simply has no way of knowing whether the sequence it just predicted is grounded in reality or not.

Base GPT-4 was excellently calibrated. So this is just wrong.

https://imgur.com/a/3gYel9r


What's needed, ideally, is a checker. Something that takes the LLM's output, can go back to the training material, and verify the output for consistency with it.

I don't think those steps are out of the bounds of possibility, really.


> Something that takes the LLM's output, can go back to the training material, and verify the output for consistency with it.

The problem is what you mean when you say "consistency".

The LM checks if sequences are stochastically consistent with other sequences in the training data. Within that realm, the sentence: "In the Water Wars of 1999, the Antarctic Coalitions aramada of Hovercraft valiantly faught in the battle of Golehim under Rear Admiral Korakow, against the Trade Unions Fleets." is consistent. Because, while it is total bollocks, it looks stochasticaly like something that could be in a historical text.

So, in it's context, the LM does exactly what you ask for. It produces output that is consistent with the training data.

Truthfulness is a completely different form of consistency: Does the semantic meaning of the data support the statement I just made? of course it doesn't, there isn't an Antarctic Coalition, there were no Water Wars in 1999, and no one ever built an Armada of Hovercraft for any war against a "Trade Union Fleet".

But to know that, one has to understand what the data means semantically. And our current AIs ... well, don't.


>But to know that, one has to understand what the data means semantically. And our current AIs ... well, don't.

Another wrong statement, you're on a roll today.

https://arxiv.org/abs/2305.11169

https://arxiv.org/abs/2306.12672

There's a word we would use to describe your confidently erroneous statements were it one of the outputs an LLM. Wonder what that might be..


Yeah, I don't mean stochastically consistent. Semantically consistent. The job of generating content from text and the job of assessing whether two texts represent aligned concepts are two different jobs, and I wouldn't expect a single LLM to do both within itself. That's why you want a second checker.


An llm can detect when an llm has outputted an inconsistent world.

It can reason. To an extent.


> that don't hallucinate

“Hallucination” is part of thought. Solving a new problem requires hallucinating new, non existing, possible outcomes and solutions, to find one that will work. It seems that eliminating the ability to interpolate and extrapolate (hallucinations) would make intelligence impossible. It would eliminate creativity, tying together new concepts, creation, etc.

Is the goal AI, or a nice database front end, to reference facts? Is intelligence facts, or is it the flexibility and the ability to handle and create the novel, things that are new?

The ability to have confidence, and know and respond to it, seems important, but that’s surely different than the elimination of hallucinations.

I’m probably misunderstanding something, and/or don’t know what I’m talking about.


The problem is, what we call "hallucinating" in LMs isn't a way of creative thinking and coming up with novel solutions. It also has nothing to do with "interpolate and extrapolate".

It's simply when the predicted probable sequence isn't grounded in reality.

When I ask an LLM to summarize the great water wars of 1999, and how the Trade Union was ultimately defeated by the Antarctic Coalitions hovercraft-fleet under Vice Admiral Zagalow, it isn't "extrapolating" from knowledge of history, it is simply inventing a load of bollocks. But that bollocks will be dressed in fine language and probably mixed in with plausible-sounding references that have a somewhat-logical-sounding relation to the training data.

The problem is, the LM doesn't and cannot know when it produces bollocks.

All it can care about is if the sequences produced are probable according to it's model.


I passed your query to GPT4 and this is what it said, see below. It seems like it can recognize bollocks, at least sometimes.

"I'm sorry, but it appears there's a misunderstanding. As of my knowledge cutoff in September 2021, there were no events known as the "Great Water Wars of 1999" involving a Trade Union being defeated by an Antarctic Coalition's hovercraft fleet under Vice Admiral Zagalow. This might be part of a work of fiction, alternative history, or a future event beyond my last training cut-off.

My training includes real-world historical events and existing geopolitical structures, and as of 2021, Antarctica was governed by the Antarctic Treaty System, which prevents any military activity, mineral mining, nuclear testing, and nuclear waste disposal. It also supports scientific research and protects the continent's ecozone.

Please provide more context if this information is from a book, a movie, or a game, or if it refers to something else that I may assist better with."


I made a slightly different query, essentially giving it the start of a sentence detailing the battle, and didn't use the GPT4 webapp but put it directly into the API.

The result were 2 very well written paragraphs, including the defeat of the Trade unions navy, a ceasfire agreement and a peace agreement ending the water wars.

I rephrased the entire thing as a question, asking the LM to tell me about the conclusion of the war. Again I got a pseudo-historical statement.


It's a great illusionist. But ultimately it cannot separate relevant information from simple word correlations.

> What is heavier, a small floating passenger ferry or a two metric ton heavy rock that sinks to the bottom of the ocean.

> A two metric ton heavy rock would be heavier than a small floating passenger ferry. The weight of the rock is two metric tons, which is equivalent to 2,000 kilograms or 4,409 pounds. The weight of the passenger ferry would depend on its specific design and construction materials, but it is unlikely to be heavier than two metric tons. Therefore, the heavy rock would have a greater weight than the small floating passenger ferry.

It completely relies on surface information such as "small floating" and ignores the deeper "correlation" that all ferries are heavy.


GPT-4 answer: The weight of an object is determined by its mass, regardless of whether it floats or sinks. So, when you ask which is heavier, a small passenger ferry or a two metric ton heavy rock, it all comes down to the actual mass of the ferry.

A two metric ton rock weighs two metric tons by definition (or 2000 kilograms). However, a small passenger ferry, while it may look small compared to large ferries or ships, can weigh much more than two metric tons. Even a small passenger ferry can weigh dozens or even hundreds of tons, due to the mass of the hull, the engine, and other equipment on board.

So, without specific information about the ferry's mass, it's safe to assume that a "small" passenger ferry is likely heavier than a two metric ton rock. However, if the ferry is particularly small and lightweight, or the term "ferry" is being used to describe a very small watercraft (like a raft or dinghy), it's possible for it to be lighter. You would need the specific weight of the ferry to give a definitive answer.


That's very impressive.


Technically all it cares about is whether its responses engage humans more than other models in the fight for processing time.


>Is the goal AI, or a nice database front end, to reference facts?

The latter given the kind of products that are currently being built with it. You don't want your code completion or news aggregator to hallucinate for the same reason you don't want your wrench to hallucinate, it's a tool.

And as for hallucinations, that's a PR friendly misnomer for "it made **** up". Using the same phrase doesn't mean it has functionally anything to do with the cognitive processes involved in human thought. In the same way a 'artificial' neural net is really a metaphorical neural net, it has very few things in common with biological neurons.


"hallucination" just means the AI produced incorrect information. Humans produce incorrect information all the time. But we know we are fallible. I've seen lots of people treating these AIs as infallible, that they can't get anything wrong. That's a huge problem with people, not the AI.

Obviously it's worth it to try and eliminate the incorrect information, but what grand-op is saying is we don't want to do that if it takes away some valuable emergent properties.


I think AI hallucinations are more than just incorrect information. They're incorrect information coupled with an air of certitude. Sometimes with a whole backstory. And often oblivious to any inherent contradictions. It's like, if you ask a random person what years Lyndon B Johnson was president, they might say something like 1964-1968, shrugging with a bit of uncertainty. And both of those years would be incorrect but close. Whereas an AI (if hallucinating) would say something like Lyndon B Johnson was president from April 15, 1865 – March 4, 1869, after the shooting of John F. Kennedy in Ford's theater. The dates are precise but completely wrong, as are the accompanying details. Then if you ask the AI when John F. Kennedy died, it might correctly respond November 22, 1963, in Dallas, TX, totally unaware how this information doesn't match with the erroneous information it gave earlier.

I've been thinking there's some parallels between how AIs hallucinate and how human toddlers do. If you ask a toddler/young child a question about a fact they don't know, they will usually say, "iuno" (even when they should), but depending on the child and the circumstances, they will sometimes just make up a story on the spot and sound as if they believe it. "Who invented ice cream?" "Santa Claus! Mommy left him milk and cookies and he turned it into ice cream." It doesn't make any real sense but it seems facially plausible in their universe.

But somewhere between first learning to speak and around 7ish, kids become markedly more accurate how they model the world, and their responses become correspondingly less fanciful. And they continue to improve beyond that point.

So how are kids doing what LLMs are currently incapable of? How do we teach ourselves not to hallucinate? Or do we, really? I mean, if I tell myself I'm going to make it through the intersection before the light turns red, but I end up running the red light, was I just mistaken, or was that a self-delusion, i.e., a mini-hallucination of sorts? Probably a self-driving car would be less likely to make that category of mistake, so maybe I shouldn't be so smug about being grounded in reality.


You see, there's your problem. Now that you mention it, I absolutely do want my wrench to hallucinate.

A more salient question would be, 'how do I know it ISN'T hallucinating'…


“Hallucination” is part of thought.

Any evidence to support this claim or just commentary ?


Most intro courses to perception/cognition will cover the "gap filling" and "extrapolation" that takes place in a "data sparse" context that is the human experience/speech, but here's a related article [1]. I think "speculative thinking" and "heuristic simulation" are more appropriate/technical terms [2] (and surely involved in AGI). I used "hallucination" a bit liberally, but with text being the only "sense" of an LLM, I think it's somewhat reasonable.

Disclaimer: I know little of this field.

[1] https://www.scientificamerican.com/article/perception-and-me...

[2] https://www.frontiersin.org/articles/10.3389/fpsyg.2021.7289....


Bingo. I've been beating this drum since the initial GPT-3 awe.. The future of AI is bespoke, purpose-driven models trained on a combination of public and (importantly) proprietary data.

Data is still king.


You need sort of a "primary education" data set which gets the model up to roughly a high school education level. Then augment it with special-purpose models for specific areas.

But until "I don't know" comes out, rather than hallucinations, we're in trouble.


> You need sort of a "primary education" data set which gets the model up to roughly a high school education level.

Yea that was what I was getting at with the "combination" of data. The publicly available data provides the base/primary education, then you specialize it with your proprietary data and bam, you have an AI model that nobody else can produce...an actual product moat.


I’d be really interested to see an AI model built off of Sci Hub data.


Facebook announced this a month or so before chatgpt took off, and got lambasted everywhere because of hallucinations.


It will be a terrific moment when we can browse a galaxy map or other lists of LLMs, and select "context" groups - so you can say from history, give me X


That doesn't sound ideal at all to me. It also sounds like a bottleneck. I want to ask AI once and it recommends or selects the best "context" group.


It would bea good visual training tool for kids so they can get an understanding early on other than a block-box view.


This would only work if thinks like GPT were actually intelligent.

Train on dataset A to learn to think, use thinking on dataset B to become an export in B's field.


So, search isn't dead after all...


Search has definitely been dead since before LLMs, we just don't have a replacement yet.


For a while my replacement was “use google, add ‘reddit’ at the end.” Not sure how much longer that will work given even just this limited blackout impacted how effective that was lol


That hasn’t worked since about three months after companies found out people do it. It’s all astroturfing now days anyway and if it applies to products (which it for sure does) you can be sure that government actors caught on as well.


I'm not really sure what you're saying. It seems to me the number of shills creating content is vastly outnumbered by normal people creating content, so the trick of adding "reddit" to the end of queries is still very much useful (blackout protests aside) since it's usefulness derives from getting information from normal people and then having normal people upvote the "best" comments. Just the other day I tried this with data recovery software very quickly found some free software that was highly praised. If I had just searched "data recovery software" on Google I get whichever data recovery place as the best SEO, including many of them who claim that a novice should be trying it at all and should use their services pronto. Using reddit in this case gave me the exact information I wanted, where Google alone was next to useless.


Yeah I don’t know their experience, but mine is when I add “Reddit” i often get 2-3 threads talking about the exact thing I’m looking up in the top 10 results.


What I’m getting at is that those threads have a high likelihood of being gamed to make you think a certain way


I'm not sure if it would help or restore the kinds of results you were seeing previously, but instead of adding "reddit", you can add "site:reddit.com" to get only results from that site. (Originally a Google feature, but works on DuckDuckGo also. Not sure about others.)

Unless you mean that Reddit is astroturfed with the SEO garbage you're trying to avoid, in which case this will definitely not help.

Is search on Reddit itself still useless?


Search on Reddit several years ago went from “useless” to I don’t know…a C-? It can work.


Yeah I mean it’s astroturfed and likely in ways beyond just corporate


I mean more the concept of search, not the current implementation


Oh, I don't think we'll ever stop wanting to search for things. Maybe not everything, but some things.


Search is about finding existing results. The opposite of search is to start from first principles and work your way up until you have created the desired result yourself.

In the real world, search reduces information acquisition costs as you only have to spend time and resources on finding an existing result rather than recreating it.


They don't know that they don't know. It's only hallucination from a human's perspective. From the model's perspective it's _all_ hallucination.


That is true. There is no difference between fiction and nonfiction for the LLM.


Its complicated.

Training on something huge like "the internet" is what gives rise to those amazing emergent properties missing in smaller models (including the recent Pi model). And there are only so many datasets that huge.

But its also indeed a waste, as Pi proves.

There probably is some sweet spot (6B-40B?) for specialized, heavily focused models pre trained with high quality general data.


Logistic Regression is simple to implement, supports binomial, multinomial, and ordinal classification, and is a key layer in NN, as it's often used as an actuator that sorts a probability into a discrete category. Very good for specialized problems, and easily trainable to sort unknown or nonsensical inputs into a noncategory.

Linear Regression is great for projections, and can even be fit to time series data using lagging.


I disagree with this slightly, modern computer chips follow a general purpose architecture not special purpose ones. The reason for this is building a computer chip is expensive and difficult to do. Similarly building any useful language model requires tons of compute power, and very smart ML researchers. Most of the smaller open source one's are just trained on GPT output.

By "cramming all of the web" on a model what is really going on is the hidden layers of that network are getting better at understanding language and logic. Imagine trying to teach a kid who doesn't know how to read to learn about a Science by only giving them science textbooks. Chances are they won't get very far.

Building little specialist model's don't really work either. It's like trying to train a parrot to do science, sure it can repeat some of the phrases that you give it, but at the end of the day it's not really making any new connections for you.


Since when did the world decide to shed the grammatically correct "computing power" for this weird tech bro "compute power" phrase?


Definitions are a little fuzzy but "compute" is often used to distinguish between "compute," as in CPU, memory, and disk. And GPU is a very specialized kind of compute power. There's really no "grammatically correct" here these are different senses of the word. "Computing power" doesn't exactly have the same sense of specifically referring to a CPU or GPU as "compute power."


Since we adopted English as the standard language, complete with its penchant for butchering words by shortening them for the sake of convenience and speed.

I wonder what people said about "bus" back in the day, especially those who knew Latin.


This is ultimately just very powerful semantic search though, is it not?

It seems that what we need to make a big leap forward is better reasoning. There is a lot of debate between the GPT-4 can/can't reason camps, but I haven't seen anyone try to argue that it reasons particularly well.


You can easily argue 4 reasons particularly well, especially with average human performance as the baseline.

People who argue against GPT-4 reasoning at all are arguing against clear results. It extremely easy to show examples and benchmarks of 4 reasoning and understanding. The argument then turns into "well that's not "true" reasoning", whatever that means.


It shouldn't even be that difficult to build it. Modus ponens and good theorems are all you need.

Building the data set for that should be quite trivial.


GPT4 is really good at code and you can generally verify hallucination easily.

The other good use cases are using LLM to turn natural language prompts into API calls to real data.


This is what I've been doing. GPT-4 to generate some data from some input, followed up by a 3.5T call to verify the output against the input to verify the content. You can feed the 3.5T output straight back into GPT-4 again and it will self-correct.

Doing this a couple of times gives me 100% accuracy for my use case that involves some level of summarization and reasoning.

Hallucinations are not as big of a deal at all IMO. Not enough that I'll just sit there and wait for models that don't hallucinate.


This sounds interesting, can you detail this data flow a bit more and maybe provide an example?


>GPT4 is really good at code

for popular languages, though for JS it most of the time outputs obsolete syntax and code.

Today i tried to do a bit of scripting with my son in Garrys Mod, it uses Expression 2 for a Wiremod module. GPT hallucinated a lot of functions and the worst part it switched almost each time from e2 to lua.

It is good at solving homeworks for students, or solving popular problems in popular languages and libraries though it might give you an ugly solution and ugly code, it is probably trained on bad code too and it did not learn to prefer good code over bad code.


I’ve been writing JavaScript for 25 years and have always felt it has utterly trash syntax. I’m not sure that’s 100% on GPT.

(And I don’t mean to be rude - I just really hate that syntax!)


What I mean it will generate code using var instead of let or const like instead of a nice array.forEach it will create for(var i

i mean use at least "let"


Yeah, same point I’m making - what other language declares variables like that? It’s a mess!


I don’t know how a bunch of specialist models don’t combine into a super useful generalist model. Do we believe too much knowledge breaks an LLM?


It's not an either or. We're going to leverage the web trained LLMs to bootstrap the specialist models via combination of training token quality classifiers and synthetic data generation. Phi-1 is a pretty good example of this.


They never know.

What we call hallucination is just when the resulting text is wrong but the underlying probabilities could be high.


I'm trying to get funding to do just this. Essentially nested LLMs were every LLM is very topical and if it's not trained on something there's indexes to other LLMs it can ask to get the right answer. I think of it kinda like a hub and spoke model.


The same way "attention" was a game changer, I'm not sure why they don't invent some recursive self-maintenance algorithm that constantly improves the neural network within. Self directed attention, so to speak.


How are we determining it's a dead end here? Recently things like GPT-4 have come out which have improved on that technique. Why would specialization necessarily reduce hallucination and improve accuracy?


Another recent (but not called out in this article) is the "Textbooks Are All You Need" paper [1]; the results seem to suggest that careful curation and curriculums of training data can significantly improve model capabilities (when training domain specific, smaller models). Claiming a 10x smaller model can outperform competitors. (Eg. phi-1 vs. starcoder)

[1] https://arxiv.org/abs/2306.11644


TBH, it looks like metric manipulation to me. They have used GPT-3.5 to generate their data(and not use textbooks at all like the title suggests). And their dataset is very much like their benchmark data. While there was some filtering, but still it is very possible that lot of the benchmark questions were in training data.

We likely wouldn't ever know how good the model is as it not only closed but they haven't provided access to anyone.


They seemed to be pretty mindful of this contamination, and call out that they agressively pruned some training dataset and still observed strong performance. That said, I agree, I really want to try it out myself and see how it feels, and if the scores really translate to day-to-day capabilities.

From section 5:

    In Figure 2.1, we see that training on CodeExercises leads to a substantial boost in the performance of the
    model on the HumanEval benchmark. To investigate this boost, we propose to prune the CodeExercises
    dataset by removing files that are “similar” to those in HumanEval. This process can be viewed as
    a “strong form” of data decontamination. We then retrain our model on such pruned data, and still
    observe strong performance on HumanEval. In particular, even after aggressively pruning more than
    40% of the CodeExercises dataset (this even prunes files that are only vaguely similar to HumanEval, see
    Appendix C), the retrained phi-1 still outperforms StarCoder.


I read that but there is no good technique to rule out close duplicates. I know because I had tried to build one for my product. At best it relies on BLEU, embedding distance and other proxies which are far from ideal.


I’ve still wondered if anyone has tried training with a large dataset of published books, like something from library genesis, or in the case of google using the full text from google books. There’s all this talk of finding quality text and I’ve not heard of text from print books being a major source beyond this textbooks paper?


That’s how OpenAI was (is) doing it. Books downloaded from the Internet is a part of the dataset as per GPT3 model card. Right to read.


Is there any clue on what architecture they used to create phi-1 ?


To the outsider it might seem that the only thing we've been doing is scaling up the neural networks, but that's not true. A lot of innovation and changes happened, some enabled us to scale up more and others just improved performance. I am quite confident that innovation will continue.


Your statement is entirely fair, but the actual title is "The bigger-is-better approach to AI is running out of road". They are actually saying what you are saying, but your comment seems to contest the article.


I also heard this. I unfortunately forget which study it was but yes, their paper spoke of likely diminishing returns at least at around 400-500B parameters for current LLM's. The recent news of GPT-4 running on 8x 220B LLM's (which doesn't equal a 8*220B size) fits that range and it's also questionable how much further we can push LLM's further by introducing multiple models like this, because this too eventually introduces problems due to granularity and picking the right model if I understood this correctly (from an earlier Hacker News discussion). (sorry for altogether no sources lol)


Have those reports from George Hotz been confirmed? It seems plausible to me, but also suggests to me that we have further to go by using that parameter budget for depth rather than for width.


It seems consistent with the behavior we see when using GPT4 in chat mode. Every once in a while it will change its answer as it’s generating it, as though it’s switched which model it favors to produce the response. GPT3.5 doesn’t do that.


MoE models don't work like that though


Token-at-time MoE models could result in mid-output-stream directional change.

I agree with you that whole-output MoE models don’t.


Several people in the community have confirmed it so while we don't "know know", it does seem like the dominant plausible rumor going around


Data requirements are overstated - you can train on longer and longer sequences and I am pretty sure most organizations are still using the “show the model the data only once“ approach which is just wasteful.

Compute challenges are more real, but we are seeing for the first time huge amounts of global capital being allocated to solve specifically these problems, so I am curious what fruit that will bear in a few years.

I mean already the stuff that some of these low level people are doing is absolutely nuts. Tim Dettmers work training with only 4 bits means only 16 possible values per weight and still getting great results.


Yep. We are in the very early innings of capital being deployed to all this.


Ok - what's the ROI on the $10bn (++) that OpenAI have had?

So far I reckon <$10m in actual revenue.

This isn't what VC's (or microsoft) dream of.


I think Azure OpenAI service is growing at 1000% per quarter according to last earnings call


Indeed. Azure OpenAI service is how you get corporate-blessed ChatGPT that you can use with proprietary information, among other things. There's a huge demand for it.


That would be 500,000 people buying 1 month of GPT-4. I think they blew past that in the first week.


The level of exposure since chatGPT has/will result in a lot of money turning up, especially for applications of the existing technology (whether they succeed or fail). That stats on usage demonstrate that the thundering herd has noticed and that attention can be extremely valuable.

I think its quite likely that OpenAI will make that money back and more, as both the industry leader and with the power of their brand (chatGPT).


OAI revenue is 100m+ at least right now iirc and probably gonna grow quite a bit


> using the “show the model the data only once“ approach which is just wasteful.

According to the InstructGPT paper, that is not the case, showing the data multiple times results in overfitting.


1. You are just referring to fine tuning, I am referring to training the base model.

2. They still saw performance improvements which is why they did train on the data multiple times, you can see in the paper.

3. there was a recent paper demonstrating that reusing data still saw continued improvements in perplexity, i am on my ipad so cannot find it now


> 1. You are just referring to fine tuning, I am referring to training the base model.

Ahh mb! Sorry.


After a year of Tesla FSD beta I agree 100%. I used to think that adding more and more and more real world video events to the training pool was contributing at least a diminishing value to the model.

But the worst of all behaviors have not diminished and some have actually gotten worse. Most of the improvements now come through what feels like manual heuristics and tuning parameters rather than any actual improvement in intelligence.

I sincerely hope there is a path forward that involves meaningfully culling data which is producing bad behavior, stronger guardrails, and/or a new paradigm in how the model is built from the data entirely, as I don't see a path to level 3 by simply growing the existing model.


> Most of the improvements now come through what feels like manual heuristics and tuning parameters rather than any actual improvement in intelligence.

So if I understand your are driving a Tesla car with Full Self-Driving (FSD) capability, but you are not an engineer at Tesla who is privy to implementation details.

How do you know if a change you perceive is caused by model retraining vs manual heuristics and tuning parameters?


You missed the part of where you quoted me where I said "feels like" and invented a part where I said "have empirical evidence"

Though to answer a softer form of your question which is based in my feelings, I would say through a combination of my experience writing AI's, coursework in my AI specialized computer science degree, and through reading the patch notes which often state explicitly the tunings I mentioned.


It's been just a few months since GPT4. Calm down.


Really, looks like our expectations are doubling every 6 months. This question needs a little more exploring, although it's bound to happen somewhere and we'd love believing we are close to the maximum possible.


Major AI winter #3. Let's gooo!


hopefully it will last as long as an actual winter


"Many in the AI field": who exactly? Shouldn't the author quote a few famous people when making this claim?


Naive question, perhaps: What roles are multimodal models likely to play in the future of AI?

The comments here and in the Economist article seem to be about only large language models. In the initial announcement of GPT-4 in March, OpenAI described it as “a large multimodal model (accepting image and text inputs, emitting text outputs),” but they haven’t yet released the image part to the public.

What will happen when models are trained not only on text, and not only on text and images, but also on video, audio, chemical analyses of air and other substances in our surroundings, tactile data from devices that explore the physical world, etc.?


I'm pretty excited by the possibilities. I am astounded by how much these language models can do with nothing but "predict the next word" as the core idea. I imagine in the near future having collections of a hundred different models, physics models, grammar models, fact models, sentiment models, vision models, wired all together by coordination models, and wired up to math tools and databases to ground truth when possible. I think it can get pretty wild.

Just chatGPT wired up to Wolfram Alpha is already pretty creepy amazing.


There was a recent post here linking to the blog “The Secret Sauce behind 100K context window in LLMs: all tricks in one place” : https://blog.gopenai.com/how-to-speed-up-llms-and-use-100k-c...

My impression is that combining all of those, plus all of the post training quantisation and sparsity tricks into a new model with more training compute than GPT would yield an amazing improvement. Especially the price/performance would be expected to dramatically improve. The current models are very wasteful during inference, there’s easily a factor of ten improvement available there.


The last paragraph in the article:

> That such big performance increases can be extracted from relatively simple changes like rounding numbers or switching programming languages might seem surprising. But it reflects the breakneck speed with which llms have been developed. For many years they were research projects, and simply getting them to work well was more important than making them elegant. Only recently have they graduated to commercial, mass-market products. Most experts think there remains plenty of room for improvement. As Chris Manning, a computer scientist at Stanford University, put it: “There’s absolutely no reason to believe…that this is the ultimate neural architecture, and we will never find anything better.”


It's an incredibly important point -

We have so many a-ha moments ahead of us in this field. Seemingly minor changes yielding task speed multipliers, fresh eyes on foundational codebases saying now why the heck did they do it that way when xyz exists and works better, etc. A recent graphics driver update took my local SD performance from almost 4 seconds per iteration to 2.7it/s because someone somewhere had an a-ha moment. We're practically in the Commodore 64 era of this technology and there are only going to be more and more people putting their eyes and minds on these problems every day.


Sparse networks are the future, there's definitely a few major algorithmic hurdles we'll have to cross before they end up a real option but long term they will dominate (after all they already do in the living world).

All our current approaches rely on dense matrix multiplications. These approaches necessitate a tremendous amount of communication bandwidth (and low latency collectives). This is extremely challenging and expensive to scale O(n^2.3).

The constraints of physics and finance make significantly larger models out of reach for now.


I've been following the LLMs and I don't think it's 'running out of road', except in the sense that it's expensive. The model perplexity keeps decreasing when you scale up the parameters and exaflops and dataset sizes in the right proportions, and this decreasing perplexity is what has been unlocking so many amazing cognitive capabilities. I think the limitations are about regulatory deals not about any kind of technical roadblocks.


Define “running out of road”.

Road to control narratives or road to be useful?


There are no datasets that are an order of magnitude bigger, and the latest model itself may be a mixture of experts; i.e., more of the same.


To be fair, many in the AI field never thought this approach would have worked at all, and were caught with their pants down when results started coming out.


The optimal way right now seems to be to train a gigantic model to get strong capabilities and use distillation to make a much smaller model stronger. But you seem to still need that gigantic model and we still don't know if we're running out of space to improve capability just by scaling up the model (for reference GPT-4 is likely 1.76T parameters and we may be able to get much better just by scaling up)


How is the economist qualified to answer this question?


Not sure where I heard this but it’s apparently a common trope that many of the politicians and leaders that treat The Economist as close to holy writ are often horrified to learn that most the staff is actually a bunch of very precocious 20-somethings that are good at research and writing in an authoritative tone.

Actually, now that I think of it, not so different from LLMs…

(Full disclosure, I’ve been a subscriber for a couple of decades)


Also true about every single consulting firm


They periodically advertise for science journalism interns, sometimes with the proviso "Our aim is more to discover writing talent in a science student or scientist than scientific aptitude in a budding journalist." So, I think they're at least trying to hire the qualified.

(Long-time Economist subscriber.)


Usually its the most credentialed I'm least interested in listening to, especially when the value of those credentials are dependant upon the future looking a particular way.


Using this logic, how are they qualified to answer 90% of the questions their articles deal with...


Yes and no. Most questions they answers are more politically leaning, meaning that they are not optimisation problems but they answer the question of the kind of society we want to be.

This specific article seems to be reporting on a very technical issue on how to continue to scale LLM. Even scientific papers have a hard time answering those kind of questions because unless in very special circumstances where we can show with a good confidence that there are limitations (P vs NP for instance) the answer will simply be given by the most successful approach.


Is this really a bad thing, or a problem at all? Some researchers found that increased scale resulted in increased utility. LLMs aren't generalized AI, but they seem to be good at some stuff.

Their must intrinsically be some limits to this approach. But that doesn't mean it is a problem or that LLMs cant serve some useful role. Just consider for a moment if we could train an LLM on the combined experiences of all humans that have ever lived. No one is going to suggest that we must go bigger.


Sam Altman has been saying this for months. Nothing noteworthy here for someone following the industry closely.

A16z’s latest summary of the landscape was way more useful and relevant than this.


Sam had been saying it but few people trust he is telling the truth imo. Maybe he was.


Sam is telling the truth I'm sure but he's commenting on economic walls rather than necessarily capability ones.


Following research instead of the industry would this make a basic fact known for a while.


If the title is correct, a welcoming trend for someone like me trying to apply the advancement in robotics/embodied agent. The bigger the model (generally) the more data required is a constraint very hard to satisfy on a hardware with other requirements (energy consumption, inference speed, memory limitations), not to mention the data collection that is order of magnitude more tedious compares to image and texts


GPT4 is better than its predecessor and quite impressively so. The (probable) param count reflects that.

It’s not just param count, because not all large models are good, but it clearly is part of the equation.

I always wondered why Altman said going bigger is a dead end. If it were true, saying it would needlessly inform your competition. What’s the use in that? If it were false it might dissuade them from going down that path.. I think I got my answer.


I think the worst case scenario, and the one that I think is the most likely, is that we plateau at broad and shallow AI. Broad enough such that it can be used to replace many workers (it may not do a great job though), but narrow enough such that we don't really see the kinds of productivity increases that would usher in the pseudo-utopia that many AI folks talk about.


A big part of understanding concepts (and being able to reason) is correlating with patterns you already know. Even though there are errors, LLMs are doing fairly well at pattern recall. Which makes me believe that we're perhaps one additive breakthrough away from another major leap in abilities.


I remember how hyped people were seeing the progress from GPT3.5 to GPT4, people really felt like many jobs were going to be replaced very soon. The next big advancement was around the corner. I think the limitations of LLMs should be more salient to them by now.


It's been literally just 3 months since GPT-4 was released. I think you're gonna see a lot of changes especially once GPT starts being trained on ChatGPT data


OpenAI can just filter out text that they generated. Other models might be problematic.


I'm talking about RLHF on their chatbot data, not ChatGPT data on the web.


Mark Zuckerberg made a good point in his Lex Friedman podcast referring to LLAMA, that he foresees not one master model but a bunch of speciality purpose smaller models built on base models with fewer parameters


AI should not get better. We are playing with fire because the technology is moving too fast. We should consider the ethical implications of it before developing it, not after.


Non paywall: https://archive.ph/XwWTi

Imo, it is true that the current architecture is hitting the limit. We need a breakthrough on the scale of the transistor to get past this problem. We know it is possible though. Every single human is proof that high performance AI can be run with less energy than a laptop. We just need a dedicated architecture for their working mechanisms the same way a transistor is the embodiment of 1 and 0.

Unfortunately, in terms of understanding intelligence and how they work, I don't think we have made any significant advance in the last few decades. Maybe with better tools at probing how the LLM work, we can get some new insights.


No surprise. Scale vertically first, then horizontally


Roads?! Where we’re going, we don’t need roads




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: