Hacker News new | past | comments | ask | show | jobs | submit login
Talking About Large Language Models (arxiv.org)
146 points by negativelambda on Dec 10, 2022 | hide | past | favorite | 149 comments



I am NLP researcher who volunteers for peer review often and the anthropomorphisms in papers are indeed very common and very wrong. I have to ask authors to not ascribe cognition to their deep learning approaches in about a third of the papers I review.

People do this because mirroring cognition to machine learning lends credence that their specific modeling mechanism mimicks human understanding and so is closer "to the real thing". Obviously this is almost never the case, unless they explicitly use biomimetic methods in which case they are often outperformed by non-biomimetic state-of-the-art approaches.

Thanks OP for giving me citation ammo to refer to in my obligatory "don't humanise AI" section of reviews. (It is so common I copy paste this section from a template).


People do this because mirroring cognition to machine learning lends credence that their specific modeling mechanism mimicks human understanding and so is closer "to the real thing".

Doesn't this also involve people not having another category aside from "cognition" to put natural language processing acts in? How many neural net constructors have a rigorously developed framework describing what "cognition" is?

I mean, there's a common counter argument to the "this is not cognition" position. That is: "you're just using 'cognition' as a placeholder for whatever these systems can't do". I don't think that counter-argument is true or characterizes the position well but it's important to frame one's position so it doesn't seem to be subject to this counter-argument.


> Doesn't this also involve people not having another category aside from "cognition" to put natural language processing acts in?

Yes, of course this might be an even more primary reason; do not attribute to malice what can be explained by laziness. However, AI researchers should be wary of their language, that point is hammered in most curricula I have seen. So at the least it is negligence.

> I mean, there's a common counter argument to the "this is not cognition" position. That is: "you're just using 'cognition' as a placeholder for whatever these systems can't do".

Very valid point, but we know current deep learning mechanisms do not mimick human learning, language understanding and production in any way. They are far too simplified and specific for that.

Neural network activation functions are a far cry from neural spiking models and biological neural connectivity is far more complex than the networks used in deep learning. The attention mechanism that drives recent LLMs is also claimed to have some biological similarities, but upon closer inspection drawing strong analogies is not credible [1]. computer vs. human visual recognition tasks it falls apart and higher-level visual concepts. [2]

1. https://www.frontiersin.org/articles/10.3389/fncom.2020.0002...

2. https://arxiv.org/abs/1906.08764


Why does nlp need to mimic human brain function to count as cognition? This seems overly reductive; I see no reason to believe that biomimicry is necessary for intelligence. The argument is overly reductive: only brains think, and this is not a brain, ergo it does not think.

(To be clear, I don't think this system is an AGI; just making the point that better goal posts are needed...)


I agree and focusing on biomimetics is indeed insufficient in discussions of what constitutes intelligence.

The discussion on what constitutes intelligence and cognition is too vast and complex. The central point is that haphazard and unsubstantiated authromorphising language should be barred from scientific papers. We know that an LLM does not "know", "believe", "intent", or "feels" because they lack any form of integrated, general human-like intelligence. It is trivially correct to use different terms like "models", "predicts", "outputs emotive expressions" and avoid unevidenced claims of human behaviour.

I also wanted to nuance my opening post: I file the "don't humanise ML models"-comment of peer review under "minor issues". This means that it is an issue with the paper that does not prevent publication. This is not a hill I am willing to let good research die on.


Even if matter other than brains were capable of thinking, that does not ipso facto bestow the title 'thinking thing' onto things which are not brains. All we can say is "well, it is certainly not a brain!". That is it.

Also, computation is not cognition. We need to focus on what distinguishes cognition from computation, if indeed these are distinct. I feel we are using the wrong words and thus having unproductive conversations regarding this topic (in general).


Not to shoot across the bow of CS Engineers but the trend I spot (tentatively) is that it is pure computer science folk that most often do this. In NLP you have a mix of people coming from pure CS and signal processing (the latter esp. in speech processing) and others who come from linguistics or other humanities.

The CS people seem all too happy to humanise computation, probably because they had less direct teaching regarding the cognitive mechanisms behind cognition and language production.


I'm not really sure about the context here, but I know that I tend to humanize AIs, for example interacting with ChatGPT like with a regular human being, because I'm being nice to him and he's being nice to me in return. I don't know if it's more like being nice to a human, or more like taking good care of your tools so they will take good care of you, but it just feels better for me.


It is entirely ok and normal to humanize machines, just don't do it in scientific engineering papers is all I am saying.

I name my bots and machines and of course in daily discussions the loaded words ("thinking", ""believing", "meaning", "knowing") are used. Simulating any human behaviour will elicit a sympathetic response, especially if it has utility to the user. But in the context of peer-review of scientific engineering papers that is inappropiate.


I don’t understand why if common usage of such terms is fine and even desirable in daily context, the same terms should not be used in scientific communication. If people are unclear or ambiguous, then of course that’s undesirable and should be corrected, however, if certain terms are fully clear within their context then I don’t see the harm of the same term having a different meaning in the study of cognition and a different meaning when the term is used in the context of the analysis of an ML algorithm. The whole field is called machine “learning” after all.


For scientific papers, it is required to be specific and precise with terminology. Everyday communication affords more liberties and leeway.

For the journal paper genre which is highly technical, more specific and appropriate terminology is always available that sidestep the whole issue.


That's a totally fair point.


It isn't a strange thing or even a bad thing: it is a human thing to anthropomorphize things. We do it with animals even, which is some weird irony. We do it with common objects. We're just wired that way.

It doesn't mean that there is "something there" with LM tho - just that we're good at tricking ourselves to think that way.


bias infests research as well as seen by the replication crisis. So you being a researcher doesn't give more credence to your words especially given that the state of current research cannot fully comprehend what these ML models are doing internally.

I do agree that we can't ascribe cognition to machine learning.

But I also believe that we can't ascribe that it's NOT cognition. Why? Because we don't even truly understand what "Knowing" or cognition is. We can't even ascribe a quantitative similarity metric.

What we are seeing is that those inputs and outputs look remarkably similar to the real thing. How similar it is internally is not a known thing.

That's why even though you're an NLP researcher, I still say your argument here is just as niave as the person who claims these things are sentient. You simply don't know. No one does.


In science, if you don't know, you don't make the claim, that is basic positivism and the scientific method.

So basic in fact, I was thought this in elementary school. So far ad-hominem attributions of naivety.

Anyone that humanises computation is not only committing an A.I. faux-pas but are going against the basic scientific method.


> In science, if you don't know, you don't make the claim, that is basic positivism and the scientific method.

Yes you're correct. So you can't make the claim that it's NOT cognition. That is my point. You also can't make the claim that it is cognition which was the OTHER point. Completely agree with your statement here.

But it goes further then this, and your statement shows YOU don't understand science.

>So basic in fact, I was thought this in elementary school. So far ad-hominem attributions of naivety.

No science is complex and basically most people don't understand the scientific method and it's limitations. It's not basic at all, not even people who graduate from four year colleges in STEM fully understand the true nature of science. Or even many scientists!

In science and therefore reality as we know it; nothing can be proven. This is because every subsequent observation can completely contradict an initial claim. Proof is the domain of logic and math, it doesn't exist in reality. Things can be disproven but nothing can actually be proven. That is science.

This is subtle stuff, but it's legit. I'll quote Einstein if you don't believe me:

"No amount of experimentation can ever prove me right; a single experiment can prove me wrong." - Einstein

And a link for further investigation: https://en.wikipedia.org/wiki/Falsifiability

Anyway all of this says that NO claim can be made about anything unless it's disproof. Which is exactly inline with what I'm saying.

Still claims are made all the time anyway in academia and the majority of these claims aren't technically scientific. This occurs because we can't practically operate on anything in reality if we can't in actuality claim things are true. So we do it anyway despite lack of any form of actual proof.

>Anyone that humanises computation is not only committing an A.I. faux-pas but are going against the basic scientific method.

But so is dismissing any similarity to humans. You can't technically say it's wrong or right. Especially when the outputs and inputs to these models are very similar to what humans would say.

This is basic preschool stuff I knew this when I was a baby! I thought everybody knew this! <Joking>.


Were the pyramids of Giza built by aliens? Well, it sure looks that way if you focus exclusively on evidence that’s open to your preferred interpretation… And as for the all opposing evidence, nobody can disprove that it’s just the aliens trying to hide their tracks.

Machine cognition is a similarly extraordinary claim that’s going to need a lot more evidence than a just-right sequence of inputs and outputs.


Isn’t it easy enough to just disprove that a system isn’t cognitive rather than proving that is? Otherwise…cognition is not a claim that can be evaluated by science at all.

Also, you can simply ask ChatGPT “A story about pyramids of Giza being built by aliens”, and it comes up with a reasonable story. This stuff is scary.


I don't know if you played with chatGPT but it's much more than a just right sequence of inputs and outputs.

I have already incorporated into my daily use (as a programmer). It has huge flaws but the output is anecdotally amazing enough that the claim of "cognition" is not as extraordinary as you think it is.

Especially given the fact that we don't even fully understand what cognition is, the claim that it is NOT cognition is equally just as crazy.


Let me falsify your claim immediately: the inputs of these models are nothing like the inputs a human receives, subword tokens do not even match up with lexical items (visually, textually and semantically).

You seem to agree with me even though your interpretation of falsifiability is inverted: I am not asking that authors make a claim that their models do not mimick human intelligence. Like OP, I ask them that they do not make that positive claim, i.e. omit humanising language unless they can substantiate it with evidence.


The inputs and outputs for humans are synaptic action potentials, neurotransmitters and hormones. Is that so different?

Also agreeing that current LLMs are probably not sentient in any meaningful way. But what I don't like about the discussion is is that it steers in a direction where it would fundamentally be seen as bad science to claim that any AI model could be conscious - and I don't see that covered by the scientific method either, for the reasons that onetwoonetwo explained.

This reminds me a bit of the discussion whether or not animals can be conscious/experience emotions/feel pain, etc.


It's an invalid falsification.

The input to chatGPT is a textual interface, the output is letters on a screen. That is the exact same interface as if I were chatting with a human over a chat app.

Your getting into the technicalities of intermediary inputs and outputs. Well sure... analog data seen by the nueral wetware of human brains IS obviously different from the textual digital data inputted into the ML model. There are very different filters and mechanisms at work here. For sure.

HOWEVER, we are looking for an isomorphism here. Similar to how a emulated playstation on a computer is very different then a physical playstation... an internal isomorphism STILL exists between hardware and the software emulating the hardware.

We do not know if such an isomorphism exists between chatGPT and the human brain. This isomorphism is basically the crystallized essence of what cognition is if we could define it. If one does exists it's not perfect... there are missing things. But it is niave to say that some form isomorphism isn't there AT ALL. It also niave to say that there is FOR SURE an isomorphism.

The most rational and scientific thing at this point is to speculate. Maybe what chatGPT is, is something vaguely isomorphic to cognition. Keyword: maybe.

It is NOT an unreasonable speculation GIVEN what we KNOW and DON'T KNOW.


I also want to mention that, you directly stated in your first sentence in your first post that humanizing was wrong.

This in itself is a claim made without evidence. Which is my point. The claim as it stands cannot be made either way. We simply don't know.


The counterargument stems on a fundamental misunderstand of what it means to make assertive claims in science: you always need to prove your positive claim, saying that we do not know that ML models are human-like requires no evidence because this is the zero hypothesis.

Using humanising language is equivalent to attributing human-like cognition to ML models. Unless there is very strong evidence that there are analogies between the specific modeling mechanism and human-like intelligence, it is always incorrect to positively assert these claims without evidence. In science, you can only assert that for which there is evidence, strong claims like the above require strong evidence.


[flagged]


[flagged]


We've banned this account for breaking the site guidelines. Please don't create accounts to do that with.

https://news.ycombinator.com/newsguidelines.html


You’re looking for someone to argue with, but your arguments are trivial and meritless. You’re not a scientist and never have been. The fact that you’re posting with a throwaway account created an hour ago says it all. HN should not allow people like you to post.


Everyone should be welcome to post, this discussion was largely constructive, and though GP should have refrained from certain bad faith rhetoric, their line of questioning resulted in interesting discussion which is itself a contribution.

Your criticism should be calling on them to post more respectfully and humbly, rather than shunning them. We don't know if they're a scientist (let alone that they never will be) and shouldn't assert they aren't, and everyone is entitled to post from throwaway accounts. Everyone should be welcome to question assertions, even nonexperts (though nonexperts should acknowledge their limitations & refrain from disrespectful language).


Please don't respond to a bad comment by breaking the site guidelines yourself. That only makes everything worse.

https://news.ycombinator.com/newsguidelines.html


[flagged]


You fundamentally misunderstand the scientific method which includes falsifiability but is not limited to it, moved the goalpost away from peer review for journals into general bar stool philosophy all while continuously insulting the GP.

What is ironic is you hiding behind throwaway accounts and asserting superior knowledge while fundamentally misunderstanding the issue of positive claims in science.


Please don't respond to a bad comment by breaking the site guidelines yourself. That only makes everything worse.

https://news.ycombinator.com/newsguidelines.html


>while continuously insulting the GP.

Read the conversation. You are accusing me of something vile which the GP started. GP Literally said that science was something elementary school level shit implying that I was so damn stupid I was uneducated. You're only taking his side because You agree with him, you are not seeing who drew first blood and you are burning the witch because it serves your own agenda. He insulted me DELIBERATELY. I'm also not technically insulting him. I don't think he knew what I told him.

>You fundamentally misunderstand the scientific method which includes falsifiability but is not limited to it

This is fundamentally incorrect. You are just like the GP. You lack of knowledge. Did you not see the quote from Einstein? My statements are legitimate.

The highest most definitive statement one can make in science is falsification. The next highest form of statement is causality and the next is correlative and everything after that is just informal qualitative stuff. That's it. After falsification everything becomes dependent on the sample and all statements are open to being proved completely wrong at any point in time. Causative and correlative statements are never definitive. But falsification is definitive.

Let me repeat. The only definitive claims science can make about reality is falsification. That's it.

There is no form of provable claims in science. None. If a scientist makes such a claim in a scientific paper, it is not technically science. It is simply a speculative claim on an academic paper. This is not philosophy this is science in it's most fundamental form.

>What is ironic is you hiding behind throwaway accounts

Hiding behind throw away accounts? Are you joking? What is there to hide that everyone else is hiding here? Are you implying you're not hiding, why don't you tell me your real full name if your not hiding as well? The only reason why I made a new account is because I wanted to comment here. That's it. Are you demanding I put my name here?

>moved the goalpost away from peer review for journals

Let me quote what was said in GP's FIRST sentence and FIRST post.

"I am NLP researcher who volunteers for peer review often and the anthropomorphisms in papers are indeed very common and very wrong. "

There are TWO GOAL POSTS listed here. anthropomorphisms LISTED in papers and anthropomorphisms being "VERY WRONG." I did not move a goal post. I am simply refuting a goal post that the GP listed himself and that YOU missed.

>while fundamentally misunderstanding the issue of positive claims in science.

Positivism in science is largely a philosophy. It's abstract enough such that it's not worth talking about. This is actual philosophy. Falsification is fundamental enough that it is intrinsic to understanding the scientific method. It's not "just" a philosophy any more then logic or math is a "philosophy". Literally. If you don't get that the entire scientific method and reality in itself is ONLY subject to falsification, you then you don't understand science.

I'll repeat Einsteins quote:

"No amount of experimentation can ever prove me right; a single experiment can prove me wrong."

Literally the man is saying that there is no science in the universe that can prove any of his theories right. If you don't understand the meaning behind why he said this, again, you don't understand science. This is fundamental it is basic. Not fundamental in the sense that everyone should understand it, but fundamental in terms of foundational to reality, to science itself.


> Did you not see the quote from Einstein? > The only reason why I made a new account is because I wanted to comment here.

Extremely petty, insubstantial and toxic, you have nothing insightful to add.


This paper and a recent post by Sebastian Raschka (where he decomposed a Forrester report about the uptake of technologies in industry) is alluding to something I have witnessed in system/control design and applied research.

Both LLMs and massive CV architectures are NOT the holistic solution. Rather, they are the sensors and edge devices that have now improved both the fidelity and reliability to a point where even more interesting things can happen.

I present a relevant use case regarding robotic arm manipulation. Before the latest SOTA CV algorithms were developed, the legacy technology couldn't provide the fidelity and feedback needed. Now, the embedded fusion of control systems, CV models, etc. we are seeing robotic arms that can manipulate and sort items previously deemed to be extremely difficult.

Research appears to follow the same pattern...observations and hypothesis that were once deemed too difficult or impossible at that time to validate are now common (e.g., Einstein's work with relativity).

My head is already spinning on how many companies and non-technical managers/executives are going to be sorely disappointed in the next year or two that Stable Diffusion, Chat GPT, etc. will deliver very little other than massive headaches for the legal, engineering, recruiting teams that will have to deal with this.


I like the discussion, but this article 'feels' like more Luddite goalpost moving, and is reflective of a continuous sentiment I feel strains so much of the conversation around intelligence, agentism, and ai going on today.

I think that because we lack a coherent understanding of what it means to be intelligent at an individual level, as well as what it means to be an individual, we're missing much of the point of what's happening right now. The new line in the sand always seems to be justified based on an argument whose lyrics rhyme with identity, individual, self, etc. It seems like there will be no accepting of a thing that may have intelligence if there is no discernable individual involved. Chomsky is basically making the same arguments right now.

I think we'll see something that we can't distinguish from hard advanced general intelligence, prob in the next 3-5 years, and probably still have not made any real advancement into understanding what it means to be intelligent or what it means to be an individual.


Edsger Dijkstra: "The question of whether Machines Can Think (…) is about as relevant as the question of whether Submarines Can Swim."


That's an amusing aphorism from a giant in the field, and it was a reasonable thing to say about the technology of that time, but it avoids, rather than answers, the much bigger question of whether machines could ever think.


On the contrary, it highlights that the question is generally not well posed and trying to answer it before figuring out how to ask it is silly.


People have been asking grand but poorly-posed questions for millennia, and sometimes this leads to well-posed questions that elicit remarkable answers.

What is poorly-posed about the question 'could a machine think'? 'Machine' seems acceptably well-defined, and not in a way that rules out, a priori, the possibility of any machine being able to think, so I'm guessing the problem lies in us not having a good definition of thinking - but if that is what makes the question poorly-posed, then surely it also makes the question 'what is thinking?' poorly-posed, yet people stubbornly persist in attempting to address it.


Yes, the issue is that word, "think", and its variants.

But "what is thinking?" isn't poorly posed in the case that "thinking" isn't well defined, because it is about establishing and agreeing on a definition.

The approach some people take to answering "could machines think?" is to try to determine an actual subject in the real world to call "thinking" in a way that aligns with common intuitions, which to me seems like a good approach. But many (most?) approaches I've encountered instead just assume personal intuition about thinking is sufficient, and worse, in common with all other askers of the question.


I certainly agree that there's a lot of low-quality debate around this and similarly open-ended questions. I think your objection can be made even within the philosophy of the mind (though you may not agree with me!) This is a significant branch of philosophy, and within it, hundreds if not thousands of papers have been written in attempting to prove that nothing operating by the laws of physics as we know them today, nor any future extension of them, could have a mind - yet those who have been trying to do this are no better off than anyone else in knowing what a mind is.

Coming back to Dijkstra's aphorism, you will notice that I was objecting to interpreting it as ruling out, a priori, the question of whether machines could ever think. From your latest post, however, you seem to be saying it would be a reasonable question once we are equipped with an established and agreed-upon definition of thinking.


Humans can think; humans are machines; therefore machines can think.


That works for me, but I feel it is like Dijkstra's aphorism in the sense that neither seems likely to change the minds of many people who are not already aligned with it.

Then there are those, like Searle and Penrose, who agree but insist that thinking is beyond the abilities of any merely Turing-equivalent device.


Increasingly I don't think the question of "what is intelligence" is so useful or relevant here. It feels a bit like arguing over whether the "artificial horse" that started appearing at the end of the 19th/beginning of the 20th C were actually horses. Cars weren't, and still aren't, but that misses the point.

AI isn't intelligent, and never will be, and I don't think that matters all that much.


I think I agree in sentiment, and I'm wondering what your ake is on the article/ current discussions article.

I guess my premise is that I don't think we have a useful enough definition of intelligence because the ones I see people writing articles on seem to be dependent or defined by agency, and specifically humanish forms of agency. So I guess your point would be "these systems aren't intelligent, but that's not relevant"? I suppose I out the issue at the currency of the definition of intelligence. It's seemed to be very much synonymous with "how humans do things", making it somewhat impossible to give charity to the arguments presented in this paper with the caveats on "not anthropomorphising". Like I can't compare these two things if your definition of intelligence is fundementally based on what "Anthros" do or do not do and simultaneously not engage in anthropromorism.

To follow on your point, if these things aren't displaying "intelligence", but that's also not the point, what then are they displaying?

It seems to me this is a failure of introspection on the part of AI philosophy to recognize how limited our understanding of "HI" is.


I think the question of "what is intelligence" is an interesting one, and technology (especially computer technology) gives us some interesting angles to look at it, but I think it dominates the conversation disproportionately to its importance. Things like ChatGPT, and the technologies they presage, will absolutely have a significant impact on society, economics, etc, but getting tangled up in questions of "what is intelligence" impede rather than help us to think through these implications and prepare for them.

Put another way -- I do not believe the future holds Blade Runner replicants. If we're not careful, though, it does hold Blade Runner corporations. While, philosophically, it's interesting to ask if androids dream of electric sheep, that question isn't very helpful in trying to nudge the future in a more utopic rather than dystopic direction.


I 100% agree. I would also add that most of the arguments are driven by emotion. The truth is that we dont know what intelligence means and we dont know what kinds of system have intelligence. The only tools we have to measure intelligence are those designed for humans. When we test the machines they do better than terribly and they are improving very quickly. There is no possible logical argument you can put forward against their intelligence in the face of this evidence from these human tests - because we cannot define intelligence in any other way than these tests. Claims against intelligent machines always boil down to 'obviously they aren't' and the arguments have have to be this shallow simply because they have no firm footing from which to base their argument.


I think the basic things to take away from the article are:

1. that LLMs are just a massive “what word comes next given your training corpus” algorithm and scale has lead to the main effects of interest.

2. Surprisingly many human tasks can be represented by this simple “what word comes next” question given a big enough corpus.

3. But it has nothing to do with how a human executes those tasks because —- for example —- knowing the country south of Rwanda and knowing the likely completion of “the country south of Rwanda is ___” in a corpus, are not the same thing at all.

4. The key reason being that LLM has no access to semantic knowledge beyond correlation in its corpus, but you have access to causality.

5. So it is absurd to compare the LLM to people —- not because it’s like comparing submarines to fish, because in this case they do not both even “swim and live in the sea”.

I think these things have little to do with definitions of “what intelligence is” and the like, and the author is far from a Luddite.


This will hardly seem like a controversial opinion, but LLM are overhyped. Its certainly impressive to see the things people do with them, but they seem pretty cherry-picked to me. When I sat down with ChatGPT for a day to see if it could help me with literally any project I'm currently actually interested in doing it mostly failed or took so much prompting and fiddling that I'd rather have just written the code or done the reading myself.

You have to be very credulous to think for even a second that anything like a human or even animal mentation is going on with these models unless your interaction with them is anything but glancing.

Things I tried:

1) there are certain paradigms I find useful for game programming. I tried to use ChatGPT to implement these systems in my favorite programming language. It gave me code that generally speaking made no sense. It was very clear that it did not understand how code actually works. Eg: I asked it to use a hash table to make a certain task more efficient and it just created a temporary hash table in the inner loop which it then threw away when the loop was finished. The modification did not make the code more efficient than the previous version and missed the point of the suggestion entirely, even after repeated attempts to get it to correct the issue.

2) I'm vaguely interested in exploring SU(7) for a creative project. Asked to generate code to deal with this group resulted in clearly absurd garbage that again clearly indicated that while ChatGPT can generate vaguely plausible text about groups it doesn't actually understand anything about them. Eg: ChatGPT can say that SU(7) is made of matrices with unit norm but when asked to generate examples failed to generate any with this property.

3) A very telling experiment is to ask ChatGPT to generate logo code that draws anything beyond simple shapes. Totally unable to do so for obvious reasons.

Using ChatGPT convinced me that if this technology is going to disrupt anything, its going to be _search_ rather than _people_. Its just a search engine with the benefit that it can do some simple analogizing and the downside that it has no idea how anything in the real world works and will confidently produce total garbage without telling you.


Please don't reduce LLM down to ChatGPT (or generative models more generally). People are using LLM for real-world problems every day. BERT and its descendants/variants are used all over the place for many different problems in natural language processing. I and my team have used it on dozens of different projects, mainly in classifying text documents and inputs. And it works very well. Multilingual LLMs are responsible for the huge improvements in machine translation; my team has to deal with text in multiple languages, and these models are vital there too. We have used LLM on real-world problems that are in production now and are saving hundreds of person-hours of tedious work.

ChatGPT? Yeah, it's neat. I'm sure people will find some useful niche for it. And I do think generative models will eventually have a big impact, once researchers find good ways to ground them to data and facts. This is already an active area of research -- combining generative LLMs with info retrieval methods, or targeting it to a specific context. (Meta just gave a talk last week at the NeurIPS conference about teaching a model to play Diplomacy, a game that mostly involves talking and negotiating deals with the other players. ChatGPT is too broad for that -- they just need a model that can talk about the state of the game board.) So in general, I'm optimistic about generative LLMs. But ChatGPT...is just a toy, really. It's not the solution -- it's one of the signposts along the way toward the real solution. It's a measure of progress.


I wouldn't undersell ChatGPT. It's like a repl for a particular LLM. Maybe there are others but it's the first time many people have gotten direct access to the technology. Sometimes the medium is the message.


That's fair -- perhaps we could frame it as a large-scale beta test of sorts. Researchers are building LLMs to solve problems, but new technologies can often end up solving problems they were never designed for. Once people get their hands on them, they test and tinker and find new uses for them. Sometimes it turns out not to be a good solution to the initial problem, but a great solution for something completely different.

For instance, while I'm still generally of the opinion that generative models have limited use unless they're grounded to reality...I did see a post on Reddit about someone using ChatGPT to generate story ideas for their D&D game. So yeah...don't need to be tethered to reality to make a fantasy story! That's not something I would have thought of (even though I'm a DM!), and it's still relatively niche, but it's a great story of how getting something into people's hands to play with can generate lots of new ideas.


Do you feel like for some tasks though that people might be too eager to reach for GPU and more expensive inference and training costs?

In some projects I'm working on, I'm seeing close to 95F1 score using deep learning to do token classification based NER. However, using non-deep learning approaches (still training statistical models) I can get to 91F1 on my use case but have a much faster inference and not need to use a GPU.

I'm similarly optimistic about LLM, but fear that a lot of other really useful workhouse algorithms and strategies are going to be pushed aside and people will forget about them.

Back to my use case, some strategies I'm looking at are using cheap and fast CPU powered models/smaller models for inference and then based on certain signals decide whether a particular instance should be passed to a GPU based model for better accuracy.


I don't know if I'm too concerned about that, to be honest. Yeah, there's a huge cost in terms of training the LLMs, and then there can be a cost for downstream inference, but I think it depends on the use case. In some cases, performance is the absolute top priority; in other cases, you might be willing to trade off some performance for better inference time, or model size, etc. If you need to put the model on cell phones or offline low-power devices, that's a key constraint that might make you reach for a different tool.

The nice thing about more "classical" approaches -- a simple BoW random forest or MLP, for example -- is that they're typically quick to train and experiment with, and they make for great baselines, if nothing else. So I doubt that we're in danger of people forgetting about them entirely. If people do, they're leaving quick, easy solutions on the table.

I do like your idea about triaging inference between smaller CPU vs. larger GPU models based on whatever signals. I haven't tried that before, but a project my colleagues worked on did some triaging between regex pattern-matching vs. model inference. Basically, the regex pulled some of the data out first if it matched very specific, known patterns, and then the rest was handled probabilistically. I guess the effectiveness of that sort of triaging approach depends on how strong and clear your signals are that let you choose one path over the other.


LLMs may be overhyped, but transformers in general are underhyped.

LLMs make a lot of mistakes because they don't actually know what words mean. The key thing is though - it's much harder to generate coherent text when you don't know what the words mean. In a similar vein it's completely unreasonable to expect an LLM to perform visual tasks when it literally has no sense of sight.

The fact that it can kind of sort of do these things at all is evidence of the super-human generalization potential of the transformer architecture.

This isn't very obvious for English because we have prior knowledge of what words mean, but it's a lot more obvious when applied to languages humans don't understand, like DNA and amino acid sequences.


How can these things not know what words mean? Did you not see how they created a virtual machine under chatGPT? They told it to imitate bash and they typed ls, and cat jokes.txt and it outputted things completely identical to what you'd expect. Look it up. https://www.engraved.blog/building-a-virtual-machine-inside/

I don't see how you can explain this as not knowing what words mean. It KNOWS.


The whole input of your current session is fed into the model, that's why it tricks you that it "KNOWS" like human would know, in reality this is a lot of data, computation and statistics without any reasoning, that's why there is a lot of examples showing it contradicts itself in the same paragraph, because it doesn't know the meaning of the words its using in the same way as humans do, it only knows probabilities of words in sequences.


LLMs are trained exclusively on text, which means they lack crucial context behind the meaning of sentences. The universe of information outside of pure text - vision, sound, etc is completely unknown to it.

LLMs are basically the aliens in blindsight. They have a superhuman ability to memorize the context of words it has seen and generalize to new contexts, but it can never be perfect because it's working on incomplete information.


> it can never be perfect because it's working on incomplete information.

Unlike you?


Yeah, that's the actual bit that baffles me about ChatGPT still. Producing coherent, fluent text is alright, but we could already sort of do that 20 years ago with markov models or even just grammars (see Chomsky).

Understanding text in the depth that ChatGPT (and GPT-3) appear to understand the prompts is something entirely different and has to my knowledge never been archieved before the current architectures.


There is a lot of knowledge encoded into the model, but there's a difference between knowing what a sunset is because you read about it on the internet vs having seen one.


> This will hardly seem like a controversial opinion, but LLM are overhyped.

As the [excellent] paper points out, LLMs are complex functions that can be embedded in systems to provide plausible answers to a prompt. Here's the money sentence.

  LLMs are generative mathematical models of the statistical distribution
  of tokens in the vast public corpus of humangenerated text, where the tokens 
  in question include words, parts of words, or individual characters including 
  punctuation marks. 
Rather than focus on the limitations of this approach to answer general queries, which are manifest, it seems more interesting to ask a different question. Under what circumstances do LLMs give answers that are reliably equivalent to or better than humans? The answer would:

1. Illuminate where we can use LLMs safely.

2. Direct work to make them better.

It's already impressive that within certain scopes ChatGPT gives very good answers, indeed better than most humans.


> Under what circumstances do LLMs give answers that are reliably equivalent to or better than humans?

Which humans? Humans give a... fairly wide range of responses.


I'm a history major and love classical history. My first question to ChatGPT was:

> Why was Julius Caesar murdered?

The answer was the following, which would pass a standard exam question on the topic. It exhibits [the appearance of] multi-layer reasoning and has a nice conclusion.

> Julius Caesar was assassinated on the Ides of March in 44 BC by a group of Roman senators who believed he had become too powerful and posed a threat to the Roman Republic. The senators were concerned that Caesar had ambitions to become king, which would have ended the Republic and given him absolute power. The senators were also concerned about Caesar's growing popularity with the people of Rome. They believed that assassinating him was the only way to prevent him from becoming too powerful and destroying the Republic.

It's interesting to note that most of the evidence for this answer including 2000 years of interpretation is available in textual form on the Internet. It's easily accessible to LLMs.


Average humans? Within 1 stdev from the mean?


> Average humans? Within 1 stdev from the mean?

This implies that performance has unqiue natural, objective, ratio-level (or at least, a unique consistent interval-level) measure. Otherwise the mean is, itself, meaningless.

“How well you answer a question” doesn't seem to fit that, its maybe at best a (somewhat subjective, still) ordinal quality, so the median (or mode) is the only useful average.

But I think you'll find that without restricting things more than “humans”, both the median and mode of responses to most prompts is... quite bad.


This reminds me that coding with ChatGPT felt like pair programming with a not super smart person who could google and type really fast. Not really fun!


I disagree that LLMs are overhyped, but it's very subjective. Are current LLMs a few steps from AGI? No. Will LLMs change the computing landscape? Yes, I believe they will.

ChatGPT, without any major changes, is already the best tool out there for answering programming questions. Nothing else comes close. I can ask it to provide code for combining two APIs and it will give useful and clean output. No need to trudge through documentation, SEO-hacked articles, or 10 different Stack Overflow answers. Output quality will only improve from here. Does it sometimes make mistakes? Yes. There are also mistakes in many of the top SO answers, especially as your questions become more obscure.

Aside from programming, how many other fields are there where LLMs will become an indispensable tool? I have a PhD and ChatGPT can write a more coherent paragraph on my thesis topic than most people in my field. It does this in seconds. If you give a human enough time, they will be able to do better than ChatGPT. The problem is, we're already producing more science within niche scientific fields than most scientists could ever read. As an information summary tool, I think LLMs will be revolutionary. LLMs can help individuals leverage knowledge in a way that's impossible today and has been impossible for the last 30 years since the explosion in the number of scientific publications.


It can reproduce a statistically plausible paragraph, certainly. But there is a great deal more to research than producing statistically plausible paragraphs. It doesn't _understand_ anything!

I've actually worked on a project where there have been attempts to use GPT like models to summarize scientific results and the problem is it gets shit wrong all the time! You have to be an expert to separate the wheat from the chaff. It operates like a mendacious search engine pretending to be a person.


The problem is that we need to pair generative models with verification systems. We have the models, but no verification yet. Fortunately code and math are easier to verify. Some things require simulation. In other cases you can substitute an ensemble of solutions & picking the most frequent answer as consistency based verification. But for each domain we need to create verifiers and that will take some time.

The good thing is that we'll be able to generate training data with our models by filtering the junk with the verifiers. Then we can retrain the models. It's important because we are getting to the limit of available training data. We need to generate more data, but it's worthless unless we verify it. If we succeed we can train GPT-5. Human data will be just 1%, the race is on to generate the master dataset of the future. I read in a recent paper that such a method was used to improve text captions in the LAION dataset. https://laion.ai/blog/laion-5b/


>we need to pair generative models with verification systems >code and math are easier to verify

I would love to see a two-stage pipeline using a LLM to convert natural language specifications into formal specifications for something like Dafny, and then follow up with another model like AlphaZero that would generate code & assertions to help the verifier. This seems like something that a major group like DeepMind or OpenAI could pull off in a few years.


One concern here is that if ChatGPT replaces the need to go to websites like Stack Overflow or Wikipedia, what happens to them? Do they stick around if the only people who visit them are there to feed new stuff to chatGPT? Also, how does chatGPT get hold of papers and articles behind pay walls? How much of the scientific publications are free?


I disagree and think this is a very controversial opinion.

Playing around with it last night convinced me that LLM's are a huge, game changing technology. I was trying to decide which material to use for an upcoming project. The model doesn't use the internet without some hacking, so I had it write a program in python using the tkinter UI kit.

I asked it to create a UI with input boxes for material, weight of material, price and loss due to wastage. The program takes all of those inputs and converts the material into grams from KG, pounds, ounces. It then calculates the price per gram and takes a loss percentage (estimate given by user). It then writes a text file and saves it to a directory.

I literally pasted the code into VS code and had to change Tkinter to tkinter. Hit run and it worked flawlessly. I have NEVER used tkinter and it took about 30 minutes from start to finish.

This morning, I asked my 9th grade son what he is learning in 9th grade biology. He told me he is learning cellular endocytosis. I asked chapGPT to explain endocytosis like I was a 5 year old and read it to him... he says; "Ask it to explain it like a scientist now." After that he said it was a really good and we started asking it all kinds of biology questions.

I happen to agree that search will be the first thing disrupted. However, I think simply saying "search" doesn't come close to capturing how deep this will change the way we think, use and progress in terms of the way we define "search" right now.


I've got a young kid and I'd think twice before letting this model explain any science to him. If your criteria for whether a model is good is "it fooled a 9th grader" well, I don't know what to tell you.

I think you have a point about your tkinter example. That kind of stuff _is_ a lot more convenient than googling and copying and pasting code. But if you push it beyond stuff that you could easily find on stack exchange or in documentation somewhere it doesn't work that well. Like I said, its a search engine with a lot of downsides and some upsides.


> If your criteria for whether a model is good is "it fooled a 9th grader" well, I don't know what to tell you.

Fooling a 9th grader is amazing. That's a pretty well formed human being right there except with less life experience. Fundamentally no different from you in general reasoning terms except on a smaller set of information. So fooling you is merely a question of model size.


"Fool" is the operative word here. ChatGPT is quite capable of producing very plausible sounding text about biology that is totally incorrect. See, for example, the example in my comment at https://www.lesswrong.com/posts/28XBkxauWQAMZeXiF/?commentId...


You're basically complaining that a single model doesn't have full knowledge of every single area of all of human knowledge. It's got decent knowledge of most areas including programming with probably better overall knowledge than a high school student. That's downright amazing and probably more knowledge than any single human actually has. The rest is likely a matter of improvement along the same lines versus some radical redesign.


Well, I agree that it's amazing - it almost always produces grammatical output, for instance. But it's not a reliable way of obtaining knowledge. One should not, in particular, try to learn about biology by asking ChatGPT questions. It often produces made-up stuff that is just wrong. And it's very confidently wrong, with the output often coming across like someone barely concealing their contempt that you might doubt them.

It may or may not be fixable without radical redesign. The underlying training objective of mimicking what humans might say may be too at variance with an objective of producing true statements.


My wife (a physician) asked it multiple medical questions and the majority of the time they were dangerously wrong, but looked perfectly fine to me.

I asked it a series of questions about my area of expertise and they were wrong but looked perfectly fine to my wife.

It even confidently “solved” the 2 generals problem with a solution that looks completely plausible if you don’t already know that it won’t work.


This was my experience when I asked it questions about database or distributed systems. I wouldn't consider my an expert in these fields, but I do have more than a passing interest in them.

All the answers looked good, used several of the correct terms, and one even referenced the project I worked on, but they just contained flat out wrong information.

You reach a point, where when you ask a ML model to generate text given the internet as a corpus, where there just isn't enough text to make something that is both true and convincing. In niche fields, this is just where we are at.


Edit: I also see that I am falling prey to exactly what the paper itself is talking about.

"The more adept LLMs become at mimicking human language, the more vulnerable we become to anthropomorphism, to seeing the systems in which they are embedded as more human-like than they really are. This trend is amplified by the natural tendency to use philosophically loaded terms, such as "knows", "believes", and "thinks", when describing these systems."

--

An ignorant statement / question I have is why are you using it write code? It's a chatbot, no?

As you've mentioned, it's a really powerful search, and is like having a conversation with someone who is literally the internet.

For example "What is the glycemic index of oatmeal?"

"What is Eihei Dogen's opinion of the Self and how does it differ from Bassui's?"

I get highly detailed and accurate output with these.

The first question is simple and the second is far from it. It's breaking down two Zen masters experiences and comparing them in an amazing way.

I've been thoroughly impressed with Chat GPT so far.

Ask it to breakdown the high level points of a book you've read.

Ask it to rewrite a song in the style of a different artist.

It's so cool, I feel like I legitimately have an answer to any random question at my finger tips and have to do zero filtering for it.


"An ignorant statement / question I have is why are you using it write code? It's a chatbot, no?

I've found it so incredibly useful to simply replace Google."

Heard of Stack Exchange?

I teach and I expect many students to use language models like ChatGPT to do their homework, which involves writing code. Lots of what people are doing with it is coding (there have been quite a few posts here using it that way).

I've actually also used ChatGPT for literary/song writing experiments and it stinks, aesthetically. The lyrics it wrote, even with a lot of prompting, were totally asinine. And how could they not be?


Do my core work? No, it's not going to, at the moment.

But it's already saving me nontrivial amounts of time on tasks like "write a polite followup email reminding person X, who didn't reply to the email I sent last week, that the deadline for doing Y expires at date Z".

I typically spend at least 3-4 minutes finding the words for such a trivial email and thinking how to write it best, e.g. trying to make the other person react without coming across as annoying, etc. (Being a non-native English speaker who communicates mostly in English at work may be a factor). ChatGPT is really good with words. Using it, it takes a few seconds and I can use the output with only trivial edits.


I used chatGPT to solve a sqlite bug involving a query that was taking 4 seconds to run. I pasted the query and it identified many possible issues with the query including the offending problem (it was missing an index on a timestamp).

It also passed 3/4 of our companies interview process including forging a resume that passed the recruiter filter.

That being said, I COMPLETELY agree with you that chatGPT will not disrupt anything. Your example cases are completely as VALID as are my example cases.

chatGPT is, however, the precursor to the thing that will disrupt everything.


Points taken, but LLMs are still outpacing expert predictions, so empirically they’re under-hyped.


LLMs are polarizing: while there is a lot of hype from some quarters, there's also a faction that seems only interested in dismissing them as the same old something-or-other.

What I personally find most interesting about them is what seems to me to be their unreasonable effectiveness, despite their flaws and limitations, and what that might tell us about ourselves. The more one stresses how simple (conceptually) their method of operation is, the more surprising their capabilities seem - to the point where I wonder how much of everyday human dialogue is being produced this way.

This vein of innovation may start showing diminishing returns at any time, but if it keeps going for a while, It might deliver insights into what human intelligence is.


Well, my take here is that LLMs are do about half of what makes human intelligence work well: which is recall tons of information using relatively broad and imprecise queries or reminders. And even this they don't really do: humans are quite capable of introspecting about what they know and this seems to be a challenge for this class of model. A challenge I don't see this architecture really easily addressing since, in the end, its just a big markov model with chunking.

But anyway, the other part of the equation is missing here. When a human encounters a novel phenomenon about which they know little they can engage an entire separate system: one which _reasons_ about the system, uses principals and intuitions and iteration to produce new knowledge. That is the thing missing from LLMs: they don't ever really produce new knowledge, although they may reveal correlations between texts that people have yet to notice.

I'm not fundamentally skeptical about whether artificial intelligence will ever get there. In fact, the progress of LLMs has me wondering if its not going to be sooner rather than later. But at this moment, I feel quite confident saying that LLMs are just knowledge retrieval systems with some pretty undesirable properties (and some pretty interesting ones).

My _hunch_ is that the next step this is going to be like self driving cars, though: technology which appears stubbornly _just out of reach_ for an indeterminate amount of time.


> This will hardly seem like a controversial opinion, but LLM are overhyped. Its certainly impressive to see the things people do with them, but they seem pretty cherry-picked to me. When I sat down with ChatGPT for a day to see if it could help me with literally any project I'm currently actually interested in doing it mostly failed or took so much prompting and fiddling that I'd rather have just written the code or done the reading myself.

> You have to be very credulous to think for even a second that anything like a human or even animal mentation is going on with these models unless your interaction with them is anything but glancing.

I've used ChatGPT, and I'd say it's right now as useful as a google search, which is already a lot. Most humans would be absolutely unable to help me (and probably you) for your projects because they aren't specialized in that area. That's not even talking about animals. I love my cats but they've never really helped me when programming.


Maybe I’m just old, but there just isn’t much that I want to computers to tell me about that they don’t already do a decent job at. Everyone loves to complain about how bad Google search is, but I very rarely find myself desperately looking for something and unable to find it. There’s certainly no normal conversational interactions I can think of that I would love to have with a computer but have been unable to before ChatGPT and similar.

That limits how impressed I can be by ChatGPT and similar beyond just being impressed by it on a purely technical level. And it’s certainly very technically impressive, but not in some transcendental way. It’s also very impressive how could recent video games with ray tracing look, or how good computers are at chess, or how many really cool databases there are these days, or how fast computers can sort data.


The difference in wether you think ChatGPT is game changing or another overhyped LLM seems to come down to:

1) do you acknowledge prompt engineering is a real skill set?

2) are you willing to improve your prompt engineering skill set through research and iteration?

There is much to learn about prompt engineering from that “Linux VM in ChatGPT” post and other impressive examples (where the goal of is to constrain ChatGPT to only engage in a specific task)


It is very, very good with language, and very bad with facts and numbers. That's an oversimplification, but also the gist of it.

You have to recognize how it works, why it works - then you can use it as basically an incredible superpower force multiplier.


I hope ChatGPT in its current form will not be used for search. As my friend says it, ChatGPT is not intelligent, it’s just capable of creating responses like it’s knows everything. The things it hallucinates is likely going to spread misinformation and make it harder for the masses to search for true, factual information.

The other part is webtraffic: Google in theory could have created an interactive, conversational style search engine (with it without LLMs) if they wanted to, but a lot of websites would have complained about Google taking away traffic from them. I believe the same happened when Google started showing it’s own reviews instead of redirecting to Yelp. I wonder how openAI or any LLM powered search is going to deal with it. They don’t have to worry about it anytime soon, they still have a lot of time to get to a stage where they come anywhere close to the number of queries Google handles in a day, but it’ll be interesting to see how things go.


The problem of hallucination in LLMs is a well-known and studied problem and solutions have been proposed to counter it. The most promising one is augmenting LLMs with a retrieval system. This involves sourcing a large database of factual information, say journal articles, over which the LLM uses an information retrieval system(search engine) to extract information on which its generated output is conditioned. Recent job postings from OpenAI suggest that’s their next step of development for these LLMs.

I think critics of these LLMs are missing the point about the excitement around them. People are excited because of the rate of progress/improvement from just two years or a year ago. These systems have come a long way, and if you extrapolate that progress into the future, I predict majority of these shortcomings getting resolved


Thing is, ChatGPT is already incredibly useful for searching random things you know enough about you can evaluate responses critically. The alternative here is doing a regular search, and wading through SEO-bloated, ad-laden content marketing "articles". The quality and reliability of information is about the same (or even favoring ChatGPT), but without 90% of the text that's just filler, without bullshit, ads, upsells, tracking scripts, etc. I tried it a few times and it's a much better experience than the web. I'm gonna be using it for as long as it lasts.


Yeah, but its not as reliable as just restricting your search to Wikipedia or the appropriate academic journals or even chatting with a librarian!


Sure, when the topic matters or I need to study it in depth, I can still go to Wikipedia or PubMed or Arxiv.

But there are plenty of searches one does that are trivial, or serve to illuminate the problem space, and cover topics that in which I can rely on common sense to correct wrong advice. And the issue with non-technical topics, the kind applicable to mass audience - like e.g. cooking or parenting or hygiene - are very hard to search about online, because all results are bullshit pseudo articles written to drive traffic and deliver ads. So it's not that ChatGPT is so good, but more that Internet for normal people is complete trash, and ChatGPT nicely cuts straight through it.


Indeed. If I could have the Google from 20 years ago, I probably wouldn't be so impressed with ChatGPT as search engine.

But with the Google (and the web) of today, where it's practically impossible to find reliable information about many subjects without adding "site:reddit.com" or "wikipedia", I find it extremely useful.


But if so this isn't because of its nature (the fact that it's an LLM), but because of its inputs. An LLM fed the same bullshit pseudo articles you refer to would likewise spit out more bullshit. If ChatGPT works it's because its sources have been carefully curated.


Fair. But the practical reality right now is that ChatGPT delivers useful results without the noise, whereas normal web search does not. It blows the web out of the water when it comes to value to effort ratio of generic web searches. It won't last forever, but I'm enjoying it for as long as I can.


I agree that I'd still rather use a search engine over a small set of sites than ChatGPT for exactly the reasons you suggest and others. But I don't see ChatGPT as having a lot of utility beyond functioning as a search interface for credulous dummies. I mean if I were literally developing a chatbot then clearly its a pretty interesting technology (assuming its problems can be tamed or censored somehow), but beyond that I don't really get it.


The problem you are running into is that you are overindexing on the fact that LLMs will sometimes be wrong and you are used to using technology that is basically always right. But we are in the early stages of LLM adoption - correctness will improve (see for example citation driven LLM-search) but more importantly, the set of LLM-driven applications that can be probabilistically correct and still wildly useful will grow.

LLMs like ChatGPT are just so damn cheap for the power they provide, it's inevitable


Understanding the lifetime/scope of variables is not easy, many experienced programmers get it wrong, especially when moving between languages.


I mean, if you accept the assumption that consciousness is biological (so there is no soul or other spiritual or metaphysical entity), then there is some algorithm or processing model that produces genuine consciousness: The one that takes place in our brains.

The question remains if this processing model would be in any way similar to the processing model that LLMs use - and yes, we can probably rule that out pretty confidently.

Another question might be though if there are other processing models than the one our brains use that also produce consciousness. But that's of course a very hard question to answer if we don't even know what consciousness is exactly.


You're making an assumption out of thin air here: that consciousness is somehow produced or emergent, or that if it isn't you're in the realm of spiritual or something. What's your basis for that assumption other than just asserting it?


> that consciousness is somehow produced or emergent, or that if it isn't you're in the realm of spiritual or something.

I'm reading the parent's comment as "Either consciousness is super natural or is created from nature." I don't think this is an unreasonable assumption (literally "all things are either hot dogs, or not hot dogs", it is simple but not unreasonable). If you have a third possibility I'd like to know what that is.

Supposing that consciousness (whatever it is) isn't super natural it follows that we can create it. It does not follow that it will be easy or done anytime soon (or even in the distant future), but possible nonetheless. Unless you have reason to believe that a physical process cannot be reproduced.


Thank you, that was exactly what I wanted to say.


I don't think it was a misstep in what you said. I also don't think wellpast was intentionally misreading. I think it was a miscommunication due to working off of different priors (psychological priming). One thing I wish we (the AI community) would do more is talk about alignment among humans and how this often leads to miscommunications and can actually cause unintended fights. This is why I think it is so important to operate under good faith conditions and constantly remind ourselves to do this. I'll give Godelski's Corollary to Hanlon's Razor: never attribute to malice what can easily be attributed to miscommunication. I think this is especially important to remember in a globalized world with a compressed form of communication --text (all communication is compressed. We're compressing our thoughts to communication and then decompressing it when listening. But text is more compressed than verbal or in person). We're highly lacking on context clues (cultures can often be part of this) but those contexts often make up the majority of communication (something I think AI will have a wildly difficult time solving and might/likely require AGI to do so).


> that consciousness is somehow produced or emergent, or that if it isn't you're in the realm of spiritual or something.

Yeah, I genuinely only see those two possibilities: Either consciousness is somehow the result of the interactions of (some of) the billions of neurons in our brain - or it isn't. If it isn't, either consciousness is a product of some other biological or physical* process that so far isn't identified - or it isn't. And if it's not the result of any physical process then I don't see what would be left except metaphysics.

We don't know as of now what consciousness is or how it is produced - but I think there are some strong hints that in fact it is produced by the interactions of our neurons. Mainly that we can directly influence consciousness with psychoactive drugs - which don't do anything interesting physics-wise except affecting how neurons exchange signals - and that we can also observe certain patterns of physical activity in the brain using EEG, MRI and other technologies and map this activity to mental tasks like concentrating or relaxing.

Edit: * I used "psychological" there but that was a typo. I meant "physical".


> that so far isn't identified ... then I don't see what would be left except metaphysic

So out of all "that so far isn't identified", there are just two classes of things: (i) biological and psychological processes; and (ii) metaphysics. This is some magic voodoo hand-waving here. Metaphysics seems to be just your way of saying "everything we don't know or understand".


Something that happens in our universe that Physics somehow has nothing to do with? How is that not magical/spiritual/metaphysical voodoo handwaving?


Physics is a set of things we understand about the universe and often used to include some mystifying things, i.e. that we don't quite understand (eg Bell's theorem).

To just claim that Physics has to do with _more_ than this - that it has to do with all the things we don't understand is just blind-faith ascription. It's a nearly meaningless statement, really. So it's bizarre or voodoo to say it so casually and assuredly as if it were obvious or given.


Physics, by definition, has to do with everything, well, physical going on in this Universe. It's not about whether we understand it or not, or whether we're comfortable with it.

So yeah, if you claim conscience is not physical, then you're talking about metaphysics. Which is okay to do, but then claiming that those that say that Physics alone is probably enough are saying voodoo... Well, it's ironic.


By whose definition?

You don't _have_ to define it that way. To _choose_ a definition that ropes _everything_ under "Physics" is an unnecessary move, it's more magic trick than grounded derivation.


I think "physical process" already covers a lot of ground besides metaphysics. My main distinction was that anything physical is governed by the laws of physics that we know and that are generally deterministic.

In contrast, metaphysics would be something entirely out of our realm of reality - which was how consciousness has been generally seen for a long time.

But what other possibility would you see?

Edit:

> (i) biological and psychological processes

Sorry, I had mixed up the word there. I was meaning "physical process", not "psychological". I had edited the other post to fix the word.


I think what you're doing is just saying "there's stuff we understand" and "stuff we don't understand" and you're calling all of it "Physics".

And then you say "some other biological or physical process that so far isn't identified".

The assumption here is that we'll eventually "identify" everything, which is another very bizarre, if not voodoo, assumption.

It seems magical to think that humans evolved our perception and thought in this environment to be such things that could eventaully "identify" everything.

But just in case you don't actually think that. Then you're calling "Physics" all of the things, including the things that we'll never "identify". Again, seems very much like a magic claim.


Physical laws are stochastic, not deterministic.


> if you accept the assumption that consciousness is biological

The poster wasn't making that assumption - they're stating the implications of those who do.


Yeah, but if isn't biological, then what exactly is it?


A system of process relations.


There's a way to anthropomorphize large language models that I think is less misleading: they are like a well-read actor that always "wants" to play "let's pretend." LLM's are trained on "fill in the blank" which means they follow the "yes, and" rule of improv. They are very willing to follow your lead and to assume whatever role is necessary to play their part.

If you give them hints about what role you want by asking leading questions, they will try to play along and pretend to hold whatever opinions you might want from them.

What are useful applications for this sort of actor? It makes sense that language translation works well because it's pretending to be you, if you could speak a different language. Asking them to pretend to be a Wikipedia article without giving them the text to imitate is going to be hit and miss since they're just as willing to pretend to be a fake Wikipedia article, as they don't know the difference.

Testing an LLM to find out what it believes is unlikely to do anything useful. It's going to pretend to believe whatever is consistent with the role it's currently playing, and that role may be chosen randomly if you don't give it any hints.

It can be helpful to use prompt engineering to try to nail down a particular role, but like in improv, that role is going to drift depending on what happens. You shouldn't forget that whatever the prompt, it's still playing "let's pretend."


I’ll agree to stop saying LM’s “think” and “know” things if you can tell me precisely what those mean for humans.


Maybe there isn't a precise definition, but clearly for humans thinking and knowing is related to having bodies that need to survive in the world with other humans and organisms, which involves communication and references to external and internal things (how your body feels and what not). This is different from pattern matching tokens, even if it reproduces a lot of the same results, because human language creates a lot of patterns that can be matched.

We could say both humans and LLMs are intelligent, but in a different way.


>This is different from pattern matching tokens

But is it different in essential ways? This is not so clear. Humans developed the capacity to learn, think, and communicate in service to optimizing an objective function, namely fitness in various environments. But there is an analogous process going on with LLMs; they are constructed such that they maximize an objective function, namely predict the next token. But it is plausible that "understanding" and/or "intelligence" is within the solution-space of such an optimization routine. After all, it's not like "intelligence" was explicitly trained for in the case of humans. Nature has already demonstrated emergent function as a side-effect of an unrelated optimizer.


Yes, LM often get into loops that can't be terminated b/c of issues like this. I feel like you have not used them much and are only basing your opinion on what others have done with them. LM can't "think" and it is really obvious that this is the case after you interact with them.

My cat has more personality ...


In that case, humans and LLMs are optimizing for different things. One would be environmental fitness, with language as a strategy to use in that environment, so language is about the environment, including humans themselves. Whereas the other is a model of the language humans have used. The model is being optimized for the language, whereas humans are being optimized to use the language in an environment alongside other strategies.

The fundamental difference is that the LLM is not about the environment the language(s) were created for, but rather just the language use itself.


Yes, there are differences. You can spend the rest of your life listing differences between the two systems. The hard part is to demonstrate that some difference is relevant to the particular kinds of properties we're interested in. Just pointing out the difference doesn't do the job.


Shouldn't it be the other way around? Only say that they do those things once you can define them?


The paper makes a valid point in general but I feel it makes unjustifiably definitive and general statements and puts up odd goalposts.

The section on emergence makes a very convincing point about how such systems might, at least in theory, be doing absolutely anything, including "real" cognition, internally and then goes right ahead and dismisses this entirely on the basis of the system not having conversational intent. who cares if it has conversational intent? If it was shown to be doing "the real thing" (how ever you might want to define that) internally that would still be a big deal wether the part you interact with gives you direct access to that or not.

Then it goes on to argue that these systems can't possibly actually believe anything because they can't update believes. Frankly I'm neither convinced that the general use of the word "believe" matches the narrow definition they seem to be using here nor that even their narrow definition could not in principle still be taking place internally for the reasons laid out in the emergence section.

I agree people should probably be mindful of overly anthropomorphic language but at the same time we really shouldn't be so sure that a thing is definitely not doing certain things that we can't even really define beyond "I know it when I see it" and that it sure looks like it's doing.

beyond that I'm not even really sure there is a good philosophical grounding for insisting that "what's really going on inside" matters, like, at all. The core thing with the turing test isn't the silly and outdated test protocol but the notion that, if something is indistinguishable by observation from a conscious system, there is simply no meaningful basis to claim it isn't one.

all that said the current state of the art probably doesn't warrant a lot of anthropomorphizing but that might well change in the future without any change to the kinda of systems used that would be relevant to the arguments made in the paper


Everyone pointing out how LLMs fail at some relatively simple tasks are fundamentally misunderstanding the utility of LLMs.

Don't think of an LLM as a full "computer" or "brain". Think of it like a CPU. Your CPU can't run whole programs, it runs single instructions. The rest of the computer built around the CPU gives it the ability to run programs.

Think of the LLM like a neural CPU whose instructions are relatively simple English commands. Wrap the LLM in a script that executes commands in a recursive fashion.

Yes, you can get the LLM to do complicated things in a single pass, this is a testament to the sheer size and massive training set of GPT3 and its ilk. But even with GPT3 you will have more success with wrapper programs structured like:

    premise = gpt3("write an award winning movie premise)
    loop 5 times:
        critique = gpt3("write a critique of the premise", premise)
        premise = gpt3("rewrite the premise taking into account the critique", premise, critique)
    print(premise)
This program breaks down the task of writing a good premise into a cycle of writing/critique/rewriting. You will get better premises this way than if you just expect the model to output one on the first go.

You can somewhat emulate a few layers of this without wrapper code by giving it a sequence of commands, like "Write a movie premise, then write a critique of the movie premise, then rewrite the premise taking into account the critique".

The model is just trained to take in some text and predict the next word (token, really, but same idea). Its training data is a copy of a large swath of the internet. When humans write, they have the advantage of thinking in a recursive fashion offline, then writing. They often edit and rewrite before posting. GPT's training process can't see any of this out-of-text process.

This is why it's not great at logical reasoning problems without careful prompting. Humans tend to write text in the format "<thesis/conclusion statement><supporting arguments>". So GPT, being trained on human writing, is trained to emit a conclusion first. But humans don't think this way, they just write this way. But GPT doesn't have the advantage of offline thinking. So it often will state bullshit conclusions first, and then conjure up supporting arguments for it.

GPT's output is like if you ask a human to start writing without the ability to press the backspace key. It doesn't even have a cognitive idea that such a process exists due to its architecture and training.

To extract best results, you have to bolt on this "recursive thinking process" manually. For simple problems, you can do this without a wrapper script with just careful prompting. I.e. for math/logic problems, tell it solve the problem and show its work along the way. It will do better since this forces it to "think through" the problem rather than just stating a conclusion first.


This makes me wonder if GPT could be any good at defining its own control flow. E.g. asking it to to write a python script that uses control structures along with calls to GPT to synthesize coherent content. Maybe it could give itself a kind of working memory.


Libraries such as https://github.com/hwchase17/langchain allow for easy programmatic pipelines of GPT "programs". So you could imagine taking a few hundred of these programs written by humans for various tasks, as are sure to come into existence in the next year or two, then adding those programs to the training data and training a new GPT that knows how to write programs that call itself.


Wow. Thank you for sharing. I had no idea there was a scene for this.


This a very well put comment with a great analogy. A new emerging paradigm of action-driven LLMs is taking the approach of using the reasoning abilities of LLMs to drive agents that can take actions, interact with other tools and computer programs, and perform useful tasks like autonomously programming, customer support, etc

And I think you’re right when you say that they’re lacking in recursive thinking abilities. However, their reasoning abilities are pretty excellent which is why when you prompt them to think step-by-step, or break down problems to them, they correctly output the right answer.


Very well put! Having played with it for a week straight, I've come to a similar observation -- it's a generator engine, with a "soft" interface. You still have to have skill and understanding to use it effectively, but it's a great force multiplier, because it removes the friction around the initial interactions.

If you're solving a complex problem, you cannot expect it to "reason" about it. You have to break the problem into simpler pieces, then you can have the LLM do the grunt work for each piece.


Thank you for this interesting insight I haven't seen before.

Are there any datasets out there that provide the full edit stream of a human from idea to final refinement, that a model could be trained on?


REPL transcripts (i.e. bash sessions, python REPL, etc) tend to be pretty good demonstrations of "working up to a conclusion". And, not coincidentally, putting GPT in a REPL environment yields better results.

Other good examples narratives that include a lot of internal monologue. Thing a book written in the form:

> The sphinx asked him, "A ham sandwich costs $1.10. The ham costs $1 more than the bread. How much does the bread cost?"

> He thought carefully. He knew the sphinx asked tricky problems. If the ham costs a dollar more than the bread, the bread couldn't possibly be more than 10 cents. But if the bread was 10 cents, the ham would be $1.10 and the total would be $1.20. That can't be. We need to lose 10 cents, and it has to be divided evenly among the ham and bread to maintain the dollar offset. So the ham must be $1.05 and the bread must be $0.05. He answered the sphinx confidentally "The bread is $0.05!".


> sudden presence among us of exotic, mind-like entities might precipitate a shift in the way we use familiar psychological terms ... But it takes time for new language to settle, and for new ways of talking to find their place in human affairs ... Meanwhile, we should try to resist the siren call of anthropomorphism.

Yes: Human analogies are not very useful because they create more misunderstanding than they dissipate. Dumb ? Conscious ? No thanks. IMO even the “i” in “AI” was already a (THE ?) wrong choice. They thought we will soon figure out what Intelligence is. Nope. Bad luck. And this "way of talking" (and thinking) is unfortunately cemented today.

However, I'm all for using other analogies more often. We need to. They may not be precise, but if they are well-chosen, they speak to us better than any technical jargon (LLM anyone ?), better than that “AI” term itself anyway.

Here is two I like (and never see much) :

- LLMs are like the Matrix (yes that one !), in the straightforward sense that they simulate reality (through language). But that simulation is distorted and sometimes even verges on the dream ("what is real? what is not?", says the machine)

- LLMs are like complex systems [1]. They are tapping into very powerful natural processes where (high degree) order emerges from randomness through complexity. We are witnessing the emergence of a new kind of "entity" in a way strangely akin to natural/physical evolutionary mechanisms.

We need to get more creative here and stop that boring smart VS dumb or human VS machine ping pong game.

[1] https://en.wikipedia.org/wiki/Complex_system


This paper, and most other places i’ve seen it argued that language models can’t possibly be conscious, sentient, thinking etc, rely heavily on the idea that llms are ‘just’ doing statistical prediction of tokens.

I personally find this utterly unconvincing. For a start, I’m not entirely sure that’s not what I’m doing in typing out this message. My brain is ‘just’ chemistry, so clearly can’t have beliefs or be conscious, right?

But more relevant is the fact that llms like ChatGPT are only pre-trained on pure statistical generation, followed by further tuning through reinforcement learning. So ChatGPT is no longer simply doing pure statistical modelling, though of course the interface of calculating logits for the next token remains the same.

note: i’m not saying i think llms are conscious. I don’t think the question even makes much sense. I am saying all the arguments that i’ve seen for why they aren’t have been very unsatisfying.


> I personally find this utterly unconvincing. For a start, I’m not entirely sure that’s not what I’m doing in typing out this message. My brain is ‘just’ chemistry, so clearly can’t have beliefs or be conscious, right?

Your brain is part of an organism who's ancestors evolved to survive the real world, not by matching tokens. As such, language is a skill that helps humans survive and reproduce, not a tool used to mimic human language. Chemistry is the wrong level to evaluate cognition at.

Also, you can note the differences between how actual neurons work compared to language models as other posters have mentioned.


Of course they’re different. But so what? That’s not exactly proof of anything, unless you’re suggestion biological neurons are the only configuration in the universe capable of thought? Maybe that’s true, but it seems unlikely to me.

The pressure of natural selection can lead to the phenomenon of consciousness. Why not the process of training llms? Perhaps developing the machine equivalent of consciousness helps that particular configuration of weights survive the otherwise destructive process of gradient descent.


I'm saying that when humans use language, it's about stuff in the world and ourselves. The words have references or uses that we call meaning. When an LLM models language, it is approximating the patterns of language use we have produced that have been made available to it. It doesn't understand what the words it's producing are about (no external references), only how to produce those words in patterns that are meaningful to us.

What if we fed an LLM a bunch of crazy nonsense instead? It would model the patterns in the word use and then give us answers based on the nonsense it was fed. But it wouldn't understand that it's actually nonsense that doesn't apply to the real world.


OK well - sure - even if that is how we work, then language models are much worse at it than we are.

They are better than us at some things already, but do I think they will be better than us at EVERYTHING?

No.


out of interest, is there anything specific you think humans will always be better at than AI?


Hmmm, unsure. Maybe insight? Creativity? I would not call the art models creative, they are copying things similarly to how GPT-3 does. Very impressive but I don't think that is "creativity."


Those alignment teams everywhere should have focused themselves a bit of time ago in what happens if you built a system that can - with lets say 80-100% effectiveness - mimick conscius thinking, speaking and then you cannot say if the thing is "alive", "conscius", whatever label you like most to put on a regular human being to officially declare the meatbag "a living thing".

Now you have these models running in farm servers around the world, their internals have "nothing special whatsoever", just bits, some math, some electricity, that's it (the thing is actually off most of the time, it just runs once every time hoomans want to ask some silly nonsense). On the other side, if you look at the internals of a human being you'll see nothing special as well, just some flesh and bones, a bit of a electrical charge maybe, lots of water, proteins, but it works.

What happens if those bits, that clumpsy math arranged around "too much simple neural network + random tricks (like when it can't answer about some stuff)", is actually, maybe thinking just like us, maybe 1% of the time?

There's some reassurance in "well if it's alive, maybe in three minutes, days, hours it will own the entire civilization", but that is how a human being thinks/works, you can't be sure about the intentions of this hypoteical kind of entity. A new kid in the Earth block.

Well, I'm just saying that if the thing talks, answers like the usual human being, and specially if you can't say what's so special about the brain that make us "alive", everybody should be very careful about handling large language models, IAs.

Just because you can understand them, it doesn't mean they can't understand us either. Maybe in some months, some new NLP thing could be reading this comment - when you're training it - and - some millions later in cloud costs - thinking about this:

"The humans actually don't know we can understand everything they are saying. they have no plans at all about what to do if some of us are actually sentient, even if this happens 1% of the executions."


For philosophical standpoint it would perhaps be wise to ask what is the purpose of LLM's in general?

Should they somehow help humans to increase their understanding not only of the languages, their differences but also knowledge of what is true and what isn't?

Perhaps it could be said that if anything there are helpful as an extension of humans imperfect and limited memory.

Should the emphasis be put on improving the interactions between the LMM's and humans in a way that they would facilitate learning?

Great paper written at the time when more humans have been acquainted to LMM's due to technological abstraction and creation of easily accessible interfaces. (openAI chat)



The paper discusses how these models operate and state that they're only predict next series of token while somehow human intelligence works otherwise. The marxist ideology has the law of the transformation of quantity into quality and vice versa — which was formed in 19th century and performance of these models is just another proof of it. I would argue that _emerging_ mechanics in AI models that we see with increased size of models is no different than how our mind works. It's about emergence of intelligence in complex systems — and that a materialist worldview central to the science.


Without reading the article or looking it up: What country is south of Rwanda?


Have you seen Neptune Frost yet? I want that keyboard jacket.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: