Hacker News new | past | comments | ask | show | jobs | submit login
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking (arxiv.org)
280 points by hackerlight 10 months ago | hide | past | favorite | 264 comments



So it seems 'obvious' to me that a network about 50 layers deep (for example) can only reason about symbolic questions for 50 'steps' (in quotes because it's not a step as we think about it). It only seems there's more complexity because it's 50 steps in one or more learned subspaces that the model has been trained in (which might mean the model can accomplish more than one 'human step' in its 'step'). Humans (well intelligent humans at least) seem able to obviously reason beyond those steps, but we all know it requires real thinking and deliberation and perhaps a notepad to be able to do that.

It's quite something to, for example, expect ChatGPT to be able to correctly do 4 digit multiplications without any thought or recourse to 'paper' when very few human beings can do that.


This is true but you have to also consider the autoregressive component. In your example, it's 50 steps per iteration of the model, where the model is executed once for each token in the output.

So practically speaking it's a bit more complicated to calculate how much the model can "think". Of course once a token is output it is committed to that (in the most basic scenario), but that doesn't mean it is not still "thinking" as it produces subsequent tokens.

> perhaps a notepad

Exactly, the context and previously output tokens can be considered such a notepad since they are input for the next steps of the model.


So part of my general issue with this kind of thinking is that, if we take this as the main means of creating complexity, then shorter prompts are worse for reasoning than longer ones, because longer ones automatically give the model more 'space', to think. Now, I realize that the research community knows this, but I like papers like this that explicitly seek ways to enable the model to 'breathe' a bit,.


This doesn't make sense. The responses can be long.


Agreed - also prompt engineering encourages LLM's to do this too (i.e. asking the LLM to explain the steps it will take to solve an answer, prior to answering - e.g. Zero-Shot CoT 'Let's think step by step')


Teachers asking students to show their work are human prompt engineers.


This paper does indeed follow your intuition to investigate the limits of transformers on compositional tasks (i.e., those that require multi-step reasoning, including your multiplication example): https://arxiv.org/abs/2305.18654

> Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily developing systematic problem-solving skills. To round off our empirical study, we provide theoretical arguments on abstract multi-step reasoning problems that highlight how autoregressive generations' performance can rapidly decay with increased task complexity.


Maybe the Skill Mix paper is relevant here. They define a list of 100 skills, and then randomly sample tuples of n skills (usually less than 6) and generate a test example using those skills. Apparently only GPT-4 (at the time of the paper) was able to compose 5 skills, the other models just 3 or 2. Beyond 5 skills even GPT-4 was doing much worse.

The interesting finding of the paper is that GPT-4 couldn't have seen all the (topic, skill-tuple) combinations in the training set. If you have 10,000 examples on a topic, and use 5 out of 100 skills, you would need 100^5 training examples to cover all combinations. In conclusion GPT-4 generalizes to new skill combinations, thus it is not a stochastic parrot.

https://arxiv.org/abs/2310.17567


Ah good... This is definitely a research path I've been looking into. Great to see someone else has already gone there!


You are missing an important detail here - number of tokens - yes, you have 50 "steps" in network depth, but you could have extra tokens. Assuming you don't run out of tape, there is no reason for LLMs to be limited to simple operations.


This doesn't make a lot of sense when you consider how backprop works. Layers aren't limited to working independently.

This also doesn't make a lot of sense when you consider models are autoregressive.


Edsger Dijkstra had a precise english style; even though his mother tongue was Dutch, I find he made better use of English than many native speakers.

In one of the EWD's, he reminisced that, as children, they were taught to never begin to speak a sentence unless they already knew how they were going to finish it.

I'd bet these two observations have a causal connection.


When I was a young man I was taking a language course while I was temporarily living in a foreign country. There was an older man in the course (not elderly, more like mid-fifties) who was very bad at the new language we were both learning. Yet I noticed he had, what seemed to me, a magic power: he could always make people laugh. He would often whisper something to one of our classmates and they would always get a giant smile on their face or even laugh out loud.

I was intensely curious and I spent some time wondering how he did it. One day, out of the blue, he invited me out to lunch after class. We just chatted for most of the lunch, exchanging backgrounds and stories. Then his face took on a serious expression and he slowly and carefully began to explain something to me as if he was passing on some wisdom.

He said that he never spoke a single sentence without fully saying the sentence in his mind. He said he would often think of the words several times in his mind, revising the phrase until he was happy. He would imagine saying the words to the person in front of him and he would imagine their reaction. And he would continue to revise until he felt confident the person who heard the words he would say would react in the way he wanted them to react. If he could not imagine the person reacting how he wanted them to react, he would not say anything at all.

It was clear to me that he was passing along this advice but also that he was calling me out a bit. He was letting me know that I spoke without thinking. I say what pops into my head. It was like he read my mind honestly, he knew exactly what I was curious about and he answered the question I had for him that I never asked.

I wish I could say that I learned the lesson. When I have tried the technique it has rewarded the effort. But I haven't formed it into a habit and I still tend to let my mouth race ahead of my mind.


That actually sounds like hell to me, a complete absence of spontaneity and being in the moment.

I used to obsessively try to figure out what to say before I said it. I am socially awkward, and it did not help at all. I love writing because it is asynchronous and I can figure things out precisely and edit my thoughts.

But in social situations it is a complete hindrance.


I've observed two things. One, writing is different to speaking, because it's async, you can think before you write, you can edit, etc.

But second, speaking in a non-native language makes you think harder about what you're about to say. Less colloquialisms, more focus on making sure your meaning is understood, more sensitivity in case you might offend someone, perhaps?

It's not new either; a lot of science and whatnot has been done in people's not-native language, like French, German, Latin, etc. Another factor there is the lingo of the field; I can't simply say "Kubernetes is een open-bron houder orkestratiesysteem voor het automatiseren van de inzet, schalen, en het beheer van zachte waren" without confusing half my native speaking audience.


I love reading his EWDs, I had a professor who worked with him who mentioned he made his students work use pens while taking his tests. To make it less likely for the students to make mistakes??


> he made his students work use pens while taking his tests

This is very common in the Netherlands, I think that's why it was a rule of his.

In general, the Dutch education system seems to be against pencils (at least this was the case until recent; I'm Dutch and mid 20s). You're tought to write using a fountain pen, not a pencil. In high school, you're allowed to switch to ball point but absolutely not to pencil. In university, write with pretty much anything you want, but... not with a pencil. If you do take your test with a pencil, there's genuinely a chance your teacher will give you a 0, although most of the time they'll probably be forgiving.

I majored in CS in the Netherlands and every test was done with good old pen and paper. Students still make mistakes all the time, which is why everyone uses a scrap sheet.


Same for me, growing up in the middle east. We used fountain pens for everything. And using pens/pencils wasn’t allowed for tests/submissions etc..


Perhaps to make it easier determine how to correct instruction.

- "Guidelines for keeping a laboratory notebook" (2019) https://news.ycombinator.com/item?id=19123430#19126809


I also learned English from textbooks, and one of the strangest things I encountered that native speakers routinely confuse "their, there, they're" which I never thought was a mistake I could make. It would be like confusing 'wet' and 'vet'. So there's definitely a difference between native and non-native speakers use the language.


The people who confuse that mostly have not done very much reading. Audibly, those words are identical.


Even crazier:

“Could of”.

Like “You could of said so”.


Is that even possible, or just hyperbole? I'd bet the latter. I wouldn't be surprised if some people are able to fully unravel entire paragraphs of conversation in their head in a couple of seconds, but that's not something you could teach to children in general.


I don't think it is feasible, at least for conversation, but as an aspirational goal for children, along the lines of "put your toys away when you've finished playing with them", it is not a bad one.

It's not unusual for me to think I know how I am going to end a sentence, but then find that I can't get there.


in Dutch (and German) the verb often goes at the end of a sentence, so the advice is rather practical.


Dat week ik heel goed :(


*weet, thanks autocarrot


German children would with you disagree.


I also wonder if it has anything to do with the process of learning a new language in general. I've thought more thoroughly about how English works since I've been learning French (not that I'm very eloquent in either)


Unfortunately from experience that just gives enough of a delay that you get talked over in a group setting and never get a chance to speak anyway.


I had this thought the other day that the whole chain of thought reasoning pattern contributing to improved performance in LLM-based systems seems to sit parallel to Kahneman's two-system model of the mind that he covers in 'Thinking, Fast and Slow'.

Haven't read it in a few years, but I recall the book suggests that we use one 'System 1' in our brains primarily for low-effort, low computation thinking - like 1+1=? or "the sky is ____".

It then suggests that we use a 'System 2' for deliberate, conscious, high-cognitive tasks. Dense multiplication, reasoning problems, working with tools - generally just decision-making. Anything that requires focus or brain power. Our brain escalates tasks from S1 to S2 if they feel complex or dangerous.

Maybe I'm being too cute, but it feels like critique that "LLMs aren't intelligent because they are stochastic parrots" is an observation that they are only equipped to use their 'System 1'.

When we prompt an LLM to think step-by-step, we allow it a workspace to write down it's thoughts which it can then consider in it's next token prediction, a rudimentary System 2, like a deliberation sandbox.

We do a similar thing when we engage our System 2 - we hold a diorama of the world in the front of our mind, where we simulate what the environment will do if we proceed with a given action - what our friend might respond to what we say, how the sheet steel might bend to a force, how the code might break, how the tyres might grip. And we use that simulation to explore a tree of possibilities and decide an action that rewards us the most.

I'm no expert, but this paper seems to recognise a similar framework to the above. Perhaps a recurrent deliberation/simulation mechanism will make it's way into models in the future, especially the action models we are seeing in robotics.


I'll preface this by saying I know this may sound entirely made up, unscientific, anecdotal, naive, or adolescent even, but luckily nobody has to believe me...

A few weeks back I was in that limbo state where you're neither fully awake nor fully asleep and for some reason I got into a cycle where I could notice my fast-thinking brain spitting out words/concepts in what felt like the speed of light before my slow-thinking brain would take those and turn them into actual sentences

It was like I was seeing my chain of thought as a list of ideas that was filled impossibly fast before it got summarized into a proper "thought" as a carefully selected list of words

I have since believed, as others have suggested in much more cogent arguments before me, that what we perceive as our thoughts are, indeed, a curated output of the brainstormy process that immediately precedes it


Well, this sound weird to me in the sense that I don't feel that I think in _words_. I only convert my thoughts into words when i need to speak or write them down; So when I need to communicate them to others, when I need to remember them for later, or when I am stuck and I need to clear things up.

I was actually convinced it was the same for most people, and that for this reason "Rubber duck debugging"[1] is a thing.

1) https://en.wikipedia.org/wiki/Rubber_duck_debugging


Am I the only one visualizing some of my most creative thoughts in a mental palace that is formed by many distinct (euclidian) spaces, whose axis connect to each other through a graph ? Closest thing that can describe this I found are simplicial sets:

picture: https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRx5Xam...

It seems it's used by cognitive models, although I'm not formally trained enough to tell exactly how:

https://arxiv.org/pdf/1703.08314.pdf


I wish I had something like this in my head to tie things in together. Right now I feel like my understanding of things is so disorganised and "lucky" in a sense. I feel lucky that I have grasp of anything.


Wow, well expressed. That's exactly hoe i feel. Not momentarily, but with everything. Though i am actually not intelligent, i just have good intuition and luck to grasp some of what i need to "unddrstand".


Reminds me of the saying about a poet vs mathematician, the first gives different names to the same thing and the latter the same name to different things. Maybe that's why I can't stand highly descriptive prose (aka describing the water while I'm drowning over here).

Now what if you're a poetic mathematician (or mathematical poet), what's that mind map look like?


Well... what about that palace of mind thing, and the ability to rewind into almost all older memories at will, and on demand being able to look up things from there, like reading, without having it memorized at all? Also full stream of consciousness, like smells, tastes, light wind on your skin, 'silken air' at just the right temperature and humidity.

All of that arranged in something like 'eigengrau', represented by glitterlike points connected by graphs, mostly in 'phospene' colors, but not exclusively so.

Sometimes very non-euclidean, moving/warping.

KNOWING what's behind every glitter point, like small cinema, large home theatre, from several points of view at the same time.

No words involved. Just visuals.

Thinking, like juggling/weighing blobs, like that glowing stuff which moves slowly up and down in a lava-lamp.

Somehow 'knowing' what each blob, its size/form/viscosity/weight/speed/color/brightness/'feel'/smell represents.

Slowly emerging new 'visuals' from this. Which are then translated into 'language', if ever.


>phosphene color

Not sure whether you talk about the uranium yellow/green color, or the brief hallucination of a light spot (happened to me just a few minutes ago, hadn't had one in a long time).

I don't have such an hyperbolic mental palace, and this doesn't really give me the ability to establish a global map but I relate a lot to what you wrote. Sometimes as I reach the climax of a long deep thought, I'm thinking via vision exclusively to the extent I don't even pay attention to what my outer eye sees and I stumble upon some insight that is sometimes almost impossible to convey in language, not because it lies beyond, but because the intrusion of language causes the idea to collapse: words points to dangling shapes that mean barely anything because the rest of the painting has gone away.

To those that have read this far and can't relate to this way of thinking, this isn't a superpower, those are rather rare experiences of altered states.

Talking about this is a kind of taboo and may cause some smiles, and indeed if there is a deeper truth to these experiences about the computational or geometric nature of the mind, maybe in the same way synaesthesia mirrors spectrograms, it won't help people working in machine learning a lot (even though some like Lecun seem to use their own visual introspective abilities as a source of inspiration).

However they may prove to be crucial in conceiving what kind of use brain chips should be put too. For now it seems we're walking through a thick fog in that direction with envisioned application being confined to interfacing to external computers or increasing cognitive abilities quantitatively, such as perfect memory and so on. If I could sustain such experiences durably, with a high level of control and enhanced geometric/mathematical understanding, I believe this would be akin to a superpower, yes.


Like (parts of) this sort of thing maybe?

https://youtu.be/BLmAV6O_ea0?si=OdPbwBXs6mOR5Xj2


No. That's too dense and organic. Mine are rather abstract, much padding, empty eigengrau between 'loci', and more 'geometric'?

edit: I knew about mandelbulbs before. My inner mindscapes are not like that.


>Now what if you're a poetic mathematician (or mathematical poet), what's that mind map look like?

Well look at the drawings I posted below: mathematical notions mixed with ad-hoc diagrammatic distinctive elements such as colors and marks. With maybe a theorem that posits that every mixed representation like theses matches a colorless, unannotated, rigorous mathematical object ?

In fact I come from a structural linguistics background, and when I pictured how one could extrude a semiotic square into another one, I felt like I understood the vague intuition behind homotopy type theory: the metaphor goes like this – the extrusion volume must be water tight for the squares to make sense.

Suppose you read Dostoyevsky's short story "Another Man's Wife and a Husband Under the Bed." In that case, you might notice that the protagonist's vertical position, as he eavesdrops on what he believes to be his wife through the wall of another man's apartment while standing alone in a corridor, mirrors the horizontal position he later assumes when hiding under the bed of his wife's presumed lover. This physical positioning reflects his moral descent, particularly as he is not alone this time. Beneath the bed with him is another man, clandestinely involved with yet another man's wife. This leads to help us picture that our protagonist is just as disconnected from his wife as the man lying next to him under the bed or the husband unknowingly sleeping above them—if not more so.

Granted I don't have the detailed vision of this semiotic diagram, but coming up with the skeletal structure is exactly what the job of a semiotician consists in (which I'm not). What matters is that all these equivalence classes the writer lays down, just like in mathematics, allows meaning to flow. His vertical loneliness must match his horizontal promiscuity for the story to operate this crescendo. Clog theses connections, and the inner structure of the object they tie together disappear too. Digging into Saussure and Voeivodsy one can realize they shared a common obsession about identity, for it is precisely when physical objects become indistinguishable that they can be referred to with the same terms and that conceptuality arises (Aerts, 2010s and onward).

"Different names to the same thing" and the "same name to different things": the two directions on the homotopical ladder.

Note: I'm 100% in postmodern mode here, this goes way above my head of course.


I don't know what a simpilician set is and wikipedia didn't really helped me. However I could roughly describe my "mind" as many mental maps where concepts are laid out and connected in different ways. Learning means putting new things on these maps a thinking is navigating through them.


This is just a deleuzian metaphor for the weird kind of space I perceive certain abstract thoughts with.

>many distinct (euclidian) spaces, whose axis connect to each other through a graph

Imagine having pictures hanged on the walls of your mental palace that act as portals to others rooms and corridors within that palace, and that must exist parallelly to each other, in different "universes" otherwise their volumes would intersect. The kind of geometry the Antichamber video game features.

Or picture this: a representation that relies on its axis to convey meaning, for instance the political compass meme. Walk along an axis long enough and it will connect orthogonally to another axis, for instance, authoritarianism may connect to anger from the emotional compass.

Simplexes: a generalization of triangles to n dimensions. A 2-axis representation (the political compass for example) could connect to spaces with 3 axis (the ascended political compass: https://external-preview.redd.it/UQgZCVQ4OLg_Hz16FGdu9-qxfq9...).

To represent this you could connect one tip of a segment (a 1-simplex) to the tip of a triangle (a 2-simplex), each vertex in these figures representing an axis. This is where my deleuzian metaphore collapses because I'm conflating the notion of axis with the notion of the "left" and "right" part of an axis. And I'd also be tempted to consider that planes should be allowed to connect to axis (to support that portal through a painting I mentioned above).

So this is just a sketchy thought, but this seems legitimate as it's not something I conceptualize but something I perceive (sometimes). But I think there may be something interesting behind these perceptions because it seems they deal with separate concerns through some kind of orthogonal geometry that is structured: putting a concept in a dimension orthogonal to another concept doesn't lead that dimension to be orthogonal to all other dimensions/concepts in your mental palace, as that would be the case if it took the shape of a n-dimensional space. And because the orthogonality is structured, it allows to deal with more than 3 concepts spatially at the same time and embed them within something your eye can picture in 2D or 3D, using diagrammatic annotations (colors, marks, etc). Finally it allows to put a concept C in several orthogonal relationships to distinct concepts, for instance A and B, and to keep these different instantiations of concept C orthogonal to each other.

This is what my mind pictured as I was explaining this ; colors and graduation marks/boxes faithfully representing what I just perceived: https://pasteboard.co/kMecyenyZdzg.png

Note that the two colors, the green of the axis and of red of the sticks could be thought as two individual concepts of their own, orthogonal to each other.

https://pasteboard.co/3VYEyepnVouQ.png

If a mathematician is reading this, please accept my deepest apologies. Here's another paper that seems thematically related to this: https://ieeexplore.ieee.org/abstract/document/10008602



Really interesting. I could guess that people that "think in words" are more likely to share their thoughts on social media, since they don't need to translate them into text/speech like people that "think in concepts"


I guess from the results of this thread a larger percentage of HN has this condition, but my understanding from reddit threads is that it is quite abnormal. I also lack an internal narrative, and I was quite shocked to find out that most people literally have a voice that they 'hear' internally.


I'll paste my reply to another comment on this thread:

> I could guess that people that "think in words" are more likely to share their thoughts on social media, since they don't need to translate them into text/speech like people that "think in concepts"

So, maybe word-thinker are just over represented in "mainstream" social networks, and concept-thinker are over represented in engineering circles?


Same. If I try to visualize my thoughts it’s like a cloud that coalesces into various forms, to show different scenarios. It definitely isn’t word-based until I decide to actually translate it into that mode.


Interesting. I think all of my thoughts are this record I'm listening to as if it's an audiobook almost. Sometimes, it's like multiple parallel streams of different thoughts at different strengths that I can observe, like a thought line that is going on, on a more subconscious level, and it's something that if I notice, I might want to pay attention to.

Like multiple LLMs are generating tokens in my head in parallel, but like in my field of view, some I can only listen/see barely because I'm not focusing on them.


There is a technique for achieving this state of consciousness, it’s called noting

This is an awareness that advanced meditators seek, practice and develop to perceive “reality as it is”

If you are curious, you might find related discussions, and a great welcoming community at r/streamentry on Reddit

Also the book Mastering the Core Teachings of the Buddha talks about it quite a bit, including instructions on how to do it


Noting is very useful as long as you remember not to do it all the time.


If you don't remember then what? Stack overflow? Heap overflow?


Is this different from Dzoghchen buddhism?


Noting is just a meditation technique

You might also call it an exercise for insight practice

There are multiple traditions that use noting or similar techniques for insight practice (maybe with different names)

Can’t vouch for this thread, as I just found it, but here’s a related discussion (Dzogchen vs Vipassana) https://www.reddit.com/r/Buddhism/comments/9t3095/dzogchen_v...


This is fascinating. I had another experience that I think sheds light on some of this. One day I was in my office and the lights were off. I turned around and looked at the dark shape on top of my coworkers desk. For a few seconds I stared blankly and then suddenly I had a thought: PC, it's his PC. Then I started to think about that period of time just before I realized what I was looking at... The only word I can describe what it felt like is: unconscious. Is it possible that consciousness is just a stream of recognition?


I think it's likely that consciousness is what you call it until you understand how it works.


I have this too. My cognitive processes are not related to my thinking brain, which I define as the part of my mental process which produces the sounds of words in my mind. Instead, I've observed that first, my subconscious processes concepts at a much more fine grained level, much like the latent space of a machine learning model. Only substantially after, let's say 10ms after, do thoughts arise, which are just pointers to the already processed subconscious process. A very rough analogy would be the inference of an LLM in words, vs all the processing of embeddings that happens internally.


I forget the name but I remember reading about this as a recognized process in neurology. We usually only hear the thought that wins, but there are many generated simultaneously, and there is a selection process.

Possibly related, I had a similar experience last night, where my mind simulated a fully realistic conversation between two people, with audio and video, except that the sentences made no sense. I thought that was interesting. My explanation was "the language part of your brain is too tired cause you've been using it all day."


Hm, interesting... i struggle with people understanding what i mean with having too many thoughts in parallel. I thought that's what adhd is, but turns iut, it's not. But i don't have a winning thought. I have to fight many of them & "pick" the winner if you will. People always take it as a figure of speech, but i honestly struggle with it. It's not rare that i can just sit quietly and after a few hours i am exhausted when finally having finished thinking.

If you remember the official name, please let me know. I'd love to look into it more.


> I got into a cycle where I could notice my fast-thinking brain spitting out words/concepts in what felt like the speed of light before my slow-thinking brain would take those and turn them into actual sentences

The way I’ve seen this described by psychologists is that System 1 is driving the car while System 2 panicks in the back seat screaming out explanations for every action and shouting directions to the driver so it can feel in control. The driver may listen to those directions, but there’s no direct link between System 2 in the backseat and System 1 holding the wheel.

Various experiments have shown that in many situations our actions come first and our conscious understanding/explanation of those actions comes second. Easiest observed in people with split brain operations. The wordy brain always thinks it’s in control even when we know for a fact it couldn’t possibly have been because the link has been surgically severed.

Being super tired, on the edge of sleep, or on drugs can disrupt these links enough to let you observe this directly. It’s pretty wild when it happens.

Another easy way, for me, is to get up on stage and give a talk. Your mouth runs away presenting things and you’re in the back of your head going “Oh shit no that’s going in the wrong direction and won’t make the right point, adjust course!”


Sometimes when I am in a Teams call, I observe myself talking. I know for myself that I can get carried away whilst talking and that time passes faster then. My conscious self sometimes needs to interrupt my talky self with a 'nough explained signal, or even with a 'nough joking signal.

I read several studies that show that brains don't have a central point of command, so our true self can not exist (as one single origin). We are the sum of all our consciousnesses, similar to how a car is the sum of its parts.


Oh, yes, that's what I do! I act first, and then consider the action.


It’s hard (impossible?) to know if we’re talking about the same thing or not, but I experience something like this all the time, without being on the edge of sleep. We might both be wrong, but it’s relatable!


This seems like it might upend Descartes' "cogito, ergo sum" ("I think therefore I am") in that the process for forming thoughts in a language is not indicative that we exist, rather it merely indicates that we have evolved a brain that can produce and interpret language.

Seems like we're dismantling a lot of what Descartes came up with these days.


For that I came up (or got inspired from somewhere) with this: I'm aware therefore I exist. Pure awareness, devoid of all objects (thoughts/visualization) is me.


From positive perspective,it is surely that our thinking/mind is not just language and always faster than sentence formation.


I had a similar experience when I was put under during surgery a few years ago. Later I learned that they used ketamine in their concoction.


I occasionally reach a similar state near sleep where I will be half-dreaming that I'm reading from a page of a book where the words materialize/"come into focus" right before my eyes into what is usually vaguely grammatically correct nonsense.


> curated output of the brainstormy process that immediately precedes it

Daniel Dennett gives a nice albeit more detailed version of your idea in his book Consciousness Explained, could be worth a read


Mandelthought psyt.


> it feels like critique that "LLMs aren't intelligent because they are stochastic parrots" is an observation that they are only equipped to use their 'System 1'.

I wouldn't say LLMs aren't intelligent (at all) since they are based on prediction which I believe is the ability that we recognize as intelligence. Prediction is what our cortex has evolved to do.

Still, intelligence isn't an all or nothing ability - it exists on a spectrum (and not just an IQ score spectrum). My definition of intelligence is "degree of ability to correctly predict future outcomes based on past experience", so it depends on the mechanisms the system (biological or artificial) has available to recognize and predict patterns.

Intelligence also depends on experience, minimally to the extent that you can't recognize (and hence predict) what you don't have experience with, although our vocabulary for talking about this might be better if we distinguished predictive ability from experience rather than bundling them together as "intelligence".

If we compare the predictive machinery of LLMs vs our brain, there is obviously quite a lot missing. Certainly "thinking before speaking" (vs LLM fixed # steps) is part of that, and this Q* approach and tree-of-thoughts will help towards that. Maybe some other missing pieces such as thalamo-cortical loop (iteration) can be retrofitted to LLM/transformer approach too, but I think the critical piece missing for human-level capability is online learning - the ability to act then see the results of your action and learn from that.

We can build a "book smart" AGI (you can't learn what you haven't been exposed to, so maybe unfair to withhold the label "AGI" just because of that) based on current approach, but the only way to learn a skill is by practicing it and experimenting. You can't learn to be a developer, or anything else, just by reading a book or analyzing what other people have produced - you need to understand the real world results of your own predictions/actions, and learn from that.


Defining intelligence as prediction leaves out a lot of other things that humans would see as intelligence in other humans (e.g., creating a novel), also quite simple organisms make predictions (e.g., a predator jumping at prey makes a prediction about positions).


>Defining intelligence as prediction leaves out a lot of other things that humans would see as intelligence in other humans (e.g., creating a novel)

Would it?

Why would "creating a novel" by a human not itself be text generation based on prediction on what are the next good choices (of themes, words, etc) based on a training data set of lived experience stream and reading other literature?


What is the human predicting there? Why would it need to be a prediction task at all? How about a dada-ist poem? Made-up words and syntax? If it is prediction but the criterion for "what is a good next choice" can totally be made up on the fly - what does the word "prediction" even mean?


>What is the human predicting there?

Their next action - word put on page, and so on.

>Why would it need to be a prediction task at all?

What else would it be?

Note that prediction in LLM terminology doesn't mean "what is going to happen in the future" like Nostradamus. It means "what is a good next word given the input I was given and the words I've answered so far".

>How about a dada-ist poem? Made-up words and syntax?

How about it? People have their training (sensory input, stuff they're read, school, discussions) and sit to predict (come up with, based on what they know) a made-up word and then another.


That is a meaningless definition of prediction if "what is a good next word" has an ever changing definition in humans (as everything would fulfill that definition).


That's the very definition of production in an LLM.

What does "has an ever changing definition" mean?

And why "everything would fulfill that definition"?

At any time whats the "good next word" is based on the state created by our inputs thus far (including chemical/physiological state, like decaying memories, and so on). And not only not "everything fullfil it", but it can be only a single specific word.

(Same as if we include the random seed among an LLM output: we get the same results given the same training and same prompt).


"it can be only a single specific word" - that is incorrect as a human can change the process to generate the next word, up to and including, using a random process to create or select the next word (i.e., any word would be fine).

You could say the process chosen is somehow predetermined (even if the choices then are all made by using randomness), but then really the word "prediction" has very little meaning as the criteria to what is a "good next word" have a nearly unlimited and ever changing range as the generating process changes.


>"it can be only a single specific word" - that is incorrect as a human can change the process to generate the next word, up to and including, using a random process to create or select the next word (i.e., any word would be fine).

That's also exactly what an LLM does.

It's still only a single specific word if (as I wrote above) you take the seed into account too (i.e use the same input, including same random seed value).

If you mean to answer "yes, but LLMs use a random number generator, whereas humans can actually pick a word at random" I'd answer that this is highly contested. Where would the source for such randomness be in the universe (exept if you beg the question, and attribute it to an "soul" that is outside the universe)?


Claiming that the universe has no randomness is a very strong claim and moves beyond our ("standard") understanding of quantum mechanics. For example, a human could be using radioactive decay to sample randomness (and such devices are available).

An LLM is bound to what an LLM can, while humans can construct and use tools to go beyond what humans can do. Being a universal function approximator does not give access to all processes in the natural world.


> Why would "creating a novel" by a human not itself be text generation based on prediction on what are the next good choices (of themes, words, etc) based on a training data set of lived experience stream and reading other literature?

Unless you're Stephen King on a cocaine bender, you don't typically write a novel in a single pass from start to finish. Most authors plan things out, at least to some degree, and go back to edit and rewrite parts of their work before calling it finished.


That can be expressed as text prediction. You output version 1 then output editing instructions or rewritten versions until you're done.

The real issue is running out of the input window.


> The real issue is running out of the input window.

isn't this what abstractions are for? you summarise the key concepts into a new input window?


Sure, but if we're talking about editing an entire book eventually the fine details do matter. That, and presumably human authors' abstraction/memories of their books are stored in some more compact form than language tokens. Though we can't be sure about that.


Maybe a better way to say it rather than "intelligence is prediction" is that prediction is what supports the behaviors we see as intelligent. For example, prediction is the basis of what-if planning (multi-step prediction), prediction (as LLMs have proved) is the basis of leaning and using language, prediction is the basis of modelling other people and their actions, etc. So, ultimately the ability to write a novel, is a result of prediction.

Yes, an insect (a praying mantis, perhaps) catching another is exhibiting some degree of prediction, and per my definition I'd say is exhibiting some (smallish) degree of intelligence in doing so, regardless of this presumably being a hard-coded behavior. Prediction becomes more and more useful the better you are at it, from avoiding predators, to predicting where the food is, etc, so this would appear to be the selection pressure that has evolved our cortex to be a very powerful prediction machine.


I think you're confusing prediction with ratiocination.

I'm sure you've deducted hypothesis' based solely on the assertion that "contradiction and being are incompatible". Note, there wasn't prediction involved on that process.

I consider prediction as a subset of reason, but not the contrary. Therefore, I beg to differ on the whole assumption that "intelligence is prediction". It's more than that, prediction is but a subset of that.

This is perhaps the biggest reason for the high computational costs of LLM's, because they aren't taking the shortcuts necessary to achieve true intelligence, whatever that is.


> I think you're confusing prediction with ratiocination.

No, exactly not! Prediction is probabalistic and liable to be wrong, with those probabilities needing updating/refining.

Note that I'm primarily talking about prediction as the brain does it - not about LLMs, although LLMs have proved the power of prediction as a (the?) learning mechanism for language. Note though that the words predicted by LLMs are also just probabilities. These probabilities are sampled from (per a selected sampling "temperature" - degree of randomness) to pick which word to actually output.

The way the brain learns, from a starting point of knowing nothing, is to observe and predict that the same will happen next time, which it often will, once you've learnt what observations are appropriate to include or exclude from that prediction. This is all highly probabalistic, which is appropriate given that the thing being predicted (what'll happen if I throw a rock at that tiger?) is often semi-random in nature.

We can better rephrase "intelligence is ability to predict well", as "intelligence derives from ability to predict well". It does of course also depend on experience.

One reason why LLMs are so expensive to train is because they learn in an extremely brute force fashion from the highly redundant and repetitive output of others. Humans don't do that - if we're trying to learn something, or curious about it, we'll do focused experiments such as "Let's see what happens if I do this, since I don't already know", or "If I'm understanding this right, then if I do X then Y should happen".


The ability to write a novel is different from actually writing a novel. If prediction forms the basis of (at least some forms of) intelligence, intelligence itself is more than prediction.


That's why I say our vocabulary for talking about these things leaves something to be desired - the way we use the word "intelligence" combines both raw/potential ability to do something (prediction), and the experience we have that allows that ability to be utilized. The only way you are going to learn to actually write a novel is by a lot of reading and writing and learning how to write something that provides the experience you hope it to have.


Kind of agree. I think, though, trying to shoe-horn intelligence into some evolutionary concepts is tricky because it is easy stack hypotheses there.


>The ability to write a novel is different from actually writing a novel

In what way, except as in begging the question?


Which LLM will on its own go and write a novel? Also, even for humans, just because you technically know how to write a novel, you might fail at it.


>Which LLM will on its own go and write a novel?

Which human will?

We get prompts all the time, it's called sensory input.

Instead of "write a noval" it's more like information about literature, life experience, that partner who broke our heart and triggered our writing this personal novel, and so on.


Some people write novels, some don't. Why some people do so we sometimes know, sometimes we don't (maybe they flipped a coin to decide). Some start to write but fail to finish.

You have to believe that humans have no free will in a certain way to have them be like an LLM, i.e, every action is externally driven and determined.


>You have to believe that humans have no free will in a certain way to have them be like an LLM, i.e, every action is externally driven and determined.

Free will doesn't have much meaning. If I dont base my action at time t, on their development based on inputs on times before t, what would I base it on?

It would be random?

Or would there be a small thinking presense inside me that gets information about my current situation and decides "impartially", able to decide in whatever direction, because it wasn't itself entirely determined by my experiences thus far?


Randomness is certainly an option. Ignoring information is an option.


>Randomness is certainly an option

Were would that randomness come from? Which would be the source of that in the universe, for it to occur in the mind?

If you mean pseudo-randomness, sure, LLMs employ that too.

>Ignoring information is an option.

Randomly ignoring information? If so, see above. If you mean intended informed ignoring of information, that's still determined on all the previous inputs.


Quantum mechanics provide for randomness (e.g., radioactive decay) - why wouldn't microscopic randomness occur in the brain or be used by a human in a machine?

A universal function approximator isn't enough to access all of nature.


LLMs have shown that writing a novel can be accomplished as an application of prediction, at least to a certain level of quality.


I have yet to see an LLM write a novel on its volition.


> online learning - the ability to act then see the results of your action and learn from that.

I don't think that should be necessary, if you are talking about weight updates. Offline batch mode Q-learning achieves the same thing.

By online learning, did you mean working memory? I'd agree with that. Whether it's RAG, ultra-long-context, and LSTM-like approach, or something else, is TBD.


By online learning I mean incremental real-time learning (as opposed to pre-training), such that you can predict something (e.g. what some external entity is going to do next, or the results of some action you are about to take), then receive the sensory feedback of what actually happened, and use that feedback to improve your predictions for next time.

I don't think there is any substitute for a predict-act-learn loop here - you don't want to predict what someone else has done (which is essentially what LLMs learn from a training set), you want to learn how your OWN predictions are wrong, and how to update them.


> By online learning I mean incremental real-time learning, such that you can predict something (e.g. what some external entity is going to do next, or the results of some action you are about to take),

I used to believe this, but the recent era of LLMs has changed my mind. It's clear that the two things are not related: you don't need to update weights in real-time if you can hold context another way (attention) while predicting the next token.

The fact that we appear to remember things with one-shot, online training might be an illusion. It appears that we don't immediately update the weights (long term memory), but we store memories in short term memory first (e.g. https://www.scientificamerican.com/article/experts-short-ter...).


The fundamental difference is that humans do learn, permanently (eventually at least), from prediction feedback, however this works. I'm not convinced that STM is necessarily involved in this particular learning process (maybe just for episodic memories?), but it makes no difference - we do learn from the feedback.

An LLM can perform one-shot in-context learning, which in conversational mode will include (up to context limit) feedback from it's actions (output), but this is never learned permanently.

The problem with LLMs not permanently learning from the feedback to their own actions is that it means they will never learn new skills - they are doomed to only learn what they were pre-trained with, which isn't going to include the skills of any specific job unless that specific on-the-job experience of when to do something, or avoid doing it, were made a part of it. The training data for this does not exist - it's not the millions of lines of code on GitHub or the bug fixes/solutions suggested on Stack Overflow - what would be needed would be the inner thoughts (predictions) of developers as they tackled a variety of tasks and were presented with various outcomes (feedback) continuously throughout the software development cycle (or equivalent for any other job/skill one might want them to acquire).

It's hard to see how OpenAI or anyone else could provide this on-the-job training to an LLM even if they let it loose in a programming playground where it could generate the training dataset. How fast would the context fill with compiler/link errors, debugger output, program output etc ... once context was full you'd have to pre-train on that (very slow - months, expensive) before it could build on that experience. Days of human experience would take years to acquire. Maybe they could train it to write crud apps or some other low-hanging fruit, but it's hard to see this ever becoming the general purpose "AI programmer" some people think is around the corner. The programming challenges of any specialized domain or task would require training for that domain - it just doesn't scale. You really need each individual deployed instance of an LLM/AI to be able to learn itself - continuously and incrementally - to get the on-the-job training for any given use.


> but this is never learned permanently.

Are you sure? I think "Open"AI uses the chat transcripts to help the next training run?

> they are doomed to only learn what they were pre-trained with

Fine-tuning.

> The training data for this does not exist

What does "this" refer to? Have you read the Voyager paper? (https://arxiv.org/abs/2305.16291) Any lesson learnt in the library could be used for fine-tuning or the next training run for a base model.

> what would be needed would be the inner thoughts (predictions) of developers as they tackled a variety of tasks and were presented with various outcomes (feedback) continuously throughout the software development cycle

Co-pilot gets to watch people figure stuff out - there's no reason that couldn't be used for the next version. Not only does it not need to read minds, but people go out of their way to write comments or chat messages to tell it what they think is going on and how to improve its code.

> Days of human experience would take years to acquire

And once learnt, that skill will never age, never get bored, never take annual leave, never go to the kids' football games, never die. It can be replicated as many millions of time as necessary.

> they could train it to write crud apps

To be fair, a lot of computer code is crud apps. But instead of learning it in one language, now it can do it in every language that existed on stackoverflow the day before its training run.


> Are you sure? I think "Open"AI uses the chat transcripts to help the next training run?

> Fine-tuning.

The learning that occurs through SGD is proven to be less flexible and generalizing than what happens via context. This is due to the restricted way information flows through transformers and which is further worsened in autoregressive GPTs vs models with bidirectional encoders.

On top of that, SGD already requires a great many examples per concept and, the impact of any single example rapidly diminishes as learning rate tampers down as training ends. Finetuning a fully trained model is far less efficient, more crippled when compared to learning from context for introducing new knowledge. It's believed that instruction tuning helps reduce uncertainty in token selection more than it introduces new knowledge.

> Co-pilot gets to watch people figure stuff out

We don't actually know if that's true. It depends on how many intermediate steps Microsoft records as training data. If enough intermediate steps lead to bad results and needed backtracking, but that erasure is not captured, it will significantly harm model quality. It is not nearly as easy to do well as you make it seem.

All in all, getting online learning into models has proven very challenging. While some "infinite" context alternatives to self-attention are promising for LTM, it'd remain true that the majority of computational power and knowledge resides in the fixed FF weights. If context and weights conflict this can cause degradation during inference. You might have encountered this yourself with GPT4 worsening with search. Lots of research is required to match human learning flexibility and efficiency.


> If enough intermediate steps lead to bad results and needed backtracking, but that erasure is not captured

That is a fascinating insight to me. I'm so used to the emacs undo record that I forget that others are not as lucky. I just take for granted that the entire undo history would be available.


> Co-pilot gets to watch people figure stuff out

There's a reason most jobs require hands-on experience, and can't be learnt just by reading a book about how to do it, or watching someone else work, or looking at something that someone else created.

It's one thing to have a bag full of tools, but another to know how to skillfully apply them, and when to apply them, etc, etc.

You may read a book (or as an LLM ingest a ton of training data) and think you understand it, or the lessons it teaches, but it's not until the rubber hits the road and you try to do it yourself, and it doesn't go to plan, that you realize there are all sorts of missing detail and ambiguity, and all the fine advice in that programming book or stack overflow discussion doesn't quite apply to your situation, or maybe it appears to apply but for subtle reasons really doesn't.

Maybe if developers were forced to talk about every decision they were making all day every day throughout all sorts of diverse projects, from requirements gathering and design though coding and debugging, and an AI had access to transcriptions of these streams of thought, then this would be enough for them to generalize the thought processes enough to apply them to a novel situation, but even then, in this best case hypothetical scenario, I doubt it'd be enough. Certainly just watching a developer's interactions with an IDE isn't going to come remotely close to an LLM understanding of how to do the job of a developer, let alone to the level of detail that could hypothetically let it learn the job without ever having to try it itself.

I also think that many jobs, including developer and FSD, require AGI to backstop the job specific skills, else what do you do when you discover yourself in a situation that wasn't in the book you trained on? So, it's not just a matter of how do you acquire the skills to do a specific job (which I claim requires practice), but what will it take for AI architectures to progress beyond LLMs and achieve the AGI that is also necessary.


> You may read a book (or as an LLM ingest a ton of training data) and think you understand it, or the lessons it teaches, but it's not until the rubber hits the road and you try to do it yourself, and it doesn't go to plan, that you realize there are all sorts of missing detail and ambiguity, and all the fine advice in that programming book or stack overflow discussion doesn't quite apply to your situation, or maybe it appears to apply but for subtle reasons really doesn't.

Pre-training is comparable to reading the book. RLHF, and storing all the lifetime prompts and outputs would be comparable to "learning on the job". There are also hacks like the Voyager minecraft paper.


> storing all the lifetime prompts and outputs would be comparable to "learning on the job"

I'm not sure.

I guess we're talking about letting the LLM loose in a programming playground where it can be given requirements, design and write programs, test and debug them, with all inputs and outputs recorded for later off-line pre-training/fine-tuning. For this to be usable as training data, I guess it would have to be serialized text - basically all LLM interactions with tools (incl. editor) and program done via the console (line editor, not screen editor!).

One major question is how would the LLM actually use this to good effect? Training data is normally used to "predict next word", with the idea being that copying the most statistically common pattern is a good thing. A lot of the interactions between a fledgling programmer and his/her notes and tools are going to be BAD ideas that are later corrected and learnt from... not actions that really want to be copied. Perhaps this could be combined with some sort of tree-of-thoughts approach to avoid taking actions leading to bad outcomes, although that seems a lot easier said than done (e.g. how does one determine/evaluate a bad outcome without looking WAY ahead).


Id say intelligence is a measure of how well you can make use of what you have. An intelligent person can take some pretty basic principles a really long way, for example. Similarly, they can take a basic comprehension of a system and build on it rapidly to get predictions for that system that defy the level of experience they have. Anyone can gather experience, but not everyone can push that experience's capacity to predict beyond what it should enable.


To me, it is one of those things like defining what 'art' is, as in creating a model in our heads around a concept. We take our definitions and then use those to construct models like AI that simulate our model well enough.

In other words, I personally do not believe any system we develop will be truly 'intelligent', since intelligence is a concept we created to help explain ourselves. We can't even truly define it, but yet we try to test technologies we develop to see if they possess it. It is a bit non sensical to me.


Sure, we created the word intelligence to help describe ourselves, and our differing levels of ability, as well as applying it to animals such as apes or dogs that we see seem to possess some similar abilities.

However, if we want to understand where this rather nebulous ability/quality of "intelligence" comes from, the obvious place to look is our cortex, which it turns out actually has rather simple architecture! If uncrumpled our cortex would be a thin sheet about the size of a tea towel, and consists of six layers of neurons of different types, with a specific pattern of connectivity, and including massive amounts of feedback. We can understand this architecture to be a prediction machine, which makes sense from an evolutionary point of view. Prediction is what lets you act according to what will happen in the future as opposed to being stuck in the present reacting to what is happening right now.

Now, if we analyze what capabilities arise from an ability to predict, such as multi-step what-if planning (multi-step prediction), ability to learn and use language (as proven by LLMs - a predict-next-word architecture), etc, etc, it does appear (to me at least!) that this predictive function of the cortex is behind all the abilities that we consider as "intelligence".

For sure there is very little agreement on a definition of intelligence, but I have offered here a very concrete definition "degree of ability to predict future outcomes based on past experience" that I think gets to the core of it.

Part of the problem people have in agreeing on a definition of intelligence is that this word arose from self-observation as you suggest, and is more a matter of "i know it when i see it" rather than having any better defined meaning. For technical discussion of AI/AGI and brain architecture we really need a rigorously defined vocabulary, and might be better off avoiding such a poorly defined concept in the first place, but it seems we are stuck with it since the word is so entrenched and people increasingly want to compare machines to ourselves and judge whether they too have this quality.

Of course we can test for intelligence, in ourselves as well as machines, by using things like IQ tests to see the degree to which we/they can do the things we regard as intelligent (we'd really need a much deeper set of tests than a standard IQ test to do a good job of assessing this), but the utility of understanding what is actually behind intelligence (prediction!) is that this allows us to purposefully design machines that have this property, and to increasing degrees of capability (via more powerful predictive architectures).


I think that is my overall point though - we created a system (AI) based on how we see one aspect of a particular organ or system (brain, cortex, etc.), and, in this case, labeled intelligence as 'predictive behavior', and so develop systems after that model. But for starters, only mammals and a few other life branches have cortexes, and cortexes weren't always around.

Evolutionary theory isn't hinged on prediction in itself, it's just one possible aspect of it. But, organisms that rely on prediction or primarily see themselves as predictive machines will state the opposite, because we cannot do anything else but model off what we think we know.

It is also further diluted in the sense that we are always limited in what we can model because of the digital nature of our medium as it attempts to model analog systems. It is like saying that the words that I am typing right now are just like having a real human conversation. No, not really. It is a diluted form of conversation that focuses on a specific, bare part of the communicative process.


I don't think people are, yet, deliberately creating predictive machines because they see that as the path to intelligence. Things like ChatCPT are LLMs, born out of that (language model) line of research, where the goal has been to learn the rules of language. The fact that a language model, when made large enough, appears somewhat intelligent was an unexpected surprise.

Different species have evolved to have different capabilities. Humans have evolved to be generalists, able to survive in a huge variety of environments, which requires a high degree of adaptability. The key to adaptability is prediction - the ability to very rapidly (in space of minutes/hours/days - not evolutionary timescales) learn how things work in a new environment or in new conditions.

Not all animals need this degree of adaptability, since they have been able to survive and thrive in long-lasting stable environments. Examples might be crocodiles or sharks - very low intelligence, but great at what they do. Evolution is not generally about prediction or intelligence - it's about optimizing each species for their own environment(s).

We already know how to build machines that are more like crocodiles - great at doing one thing over and over, but now we have the capability and desire to also build machines that are generalists like ourselves, and that requires us to figure out a way how to implement intelligence. Given how hard a problem this has been (and continues to be) to solve, it makes sense to look at our brains for inspiration - where does our own intelligence come from, and it's highly notable that the part of our brain that most differentiates humans from other animals - our large neo-cortex - appears to be a prediction machine ... In studying humans no-one is saying that other animals are the same - it's just that humans are the animal who's capabilities we are trying to reproduce.

As I said, LLMs being intelligent was an accidental discovery - they were expected just to be language models, but it's certainly notable that the only thing they are trained to do is predict next word. They only do one thing, predict, and they exhibit unexpected intelligence, hmmm ...

At this point people are NOT yet all saying "prediction is the key to intelligence, so let's build predictive machines and assume they will be intelligent", but when you look at our cortex and look at LLMs, that does appear to be the obvious direction.


In this case I would say AI is the crocodile, the same as all life is. It's specializing (or becoming specialized) in something, which is prediction, in the same way a human (or any life that shows the same definitions of intelligence as us, like a crow solving a puzzle) can show success in a new or novel situation. But life does not need this definition of intelligence to survive, which leads to the basis of evolutionary theory. The trait of adaptability/prediction/intelligence is not always useful given a niche and can get weeded out, which is why most life does not need it, yet they are still around. In organisms that do possess it, it can be a detriment as well given specific situations (over analyzing, stuck in anxiety, excessive risks to adapt, etc.).

In other words, when we say an LLM is becoming intelligent, it's not that it is in the general sense. It's that we recognize the traits within it because the traits make sense to us and mimic what we define ourselves in terms of specializing, because quite obviously, we made it and provide its data input. But, the key difference is that AI has none of the original impetus or evolutionary pressures that led to our own ability to generalize/specialize. This is because its output is derived from human input, which is fed through it through digitized means, which means there is always some kind of 'loss' since it is a specialized aspect of us.

It is why I made the reference to typing. We are communicating right now, but at the same time, it is a specialized form of it. It is not the full original human experience of talking to one another, but does not have to be in this case, because it works well enough and has some advantages given the niche. If we were using Facetime, it would be much closer, but still not quite the same as being in the same room face-to-face.

In my opinion, we are not so much prediction machines, but rather mimickers who can also create mimics of themselves via what we can make. You do not need to be able to predict that well if you can just mindlessly copy something that succeeded somehow.


Andrej Karpathy makes this same point, using the same book reference, in his "[1hr Talk] Intro to Large Language Models" video from Nov. 2023.

Here is a link to the relevant part of his presentation: https://youtu.be/zjkBMFhNj_g?t=2120


Wasn't most of the claims in that book refuted, some even by the author. I really enjoyed it and found some great insights only to be later told by a friend in that sphere that the book was not correct and even the author had "retracted" some of the assertions.


It might still be a useful concept in developing LLMs.


He won a Nobel prize for his works so not sure how much of it would be refuted


One quick google search and you can find multiple links for that, including some that were posted here. wasn't proven to be false but that the evidence used was not much of evidence either.

here the first one in my results:

https://retractionwatch.com/2017/02/20/placed-much-faith-und...


Cunningham's Law states "the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer."

https://meta.wikimedia.org/wiki/Cunningham%27s_Law


As luck would have it, a System 1 vs System 2 scenario falls into our laps.


People often say that LLMs aren't really thinking because they are just producing a stream of words (tokens really) reflexively based on some windows of previous text either read or from its own response. That is true.

But I have the experience when talking of not knowing what I'm going to say until I hear what I've said. Sometimes I do have deliberative thought and planning, trialing phrases in my head before uttering them, but apparently I'm mostly an LLM that is just generating a stream of tokens.


This is something that is easily observable by anyone at virtually any moment, yet at the same time is something that escapes 99% of the population.

When you are talking to someone in normal conversation, you are both taking in the words you are saying at the same time.


I'm currently reading it for the first time, completely coincidentally/not for this reason, and on a few occasions I've thought 'Gosh that's just like' or 'analogous to' or 'brilliant description of that problem' for LLMs/generative AI or some aspect of it. I wish I could recall some examples.


I think of COT as a memory scratchpad. It gives the LLM some limited write-only working memory that it can use for simple computations (or associations, in its case). Now suppose an LLM had re-writeable memory... I think every prompt-hack, of which COT is one example, is an opportunity for an architecture improvement.


I think of COT more as a type of planning or thinking before you speak. If you just open your mouth and start talking, which is what a plain LLM does, then you may talk yourself into a corner with no good way to get out of it, or find yourself saying something that really makes no sense. COT effectively allows the LLM to see the potential continuations of what it is considering saying, and pick one that makes sense!

I think lack of COT or any ability to plan ahead is part of why LLMs are prone to hallucinate - if you've already run your mouth and said "the capital of australia is", then it's a bit late to realize you don't know what it is. The plain LLM solution is to do what they always do and predict next word using whatever it had in the training set, such as names of some australian cities and maybe a notion that a capital should be a large important city. IOW it'll hallucinate/bullshit a continuation word such as "Melbourne". With COT it would potentially have the ability to realize that "the capital of australia is" is not a good way to start a sentence when you don't know the answer, and instead say "i don't know". Of course the other cause of hallucinations is that the LLM might not even know what it doesn't know, so might think that "Melbourne" is a great answer.


Feel like this is better represented as the default mode network: https://en.m.wikipedia.org/wiki/Default_mode_network

There are questions we know the answers to and we just reflexively spit them out, but then there are questions that are new to us and we have to figure them out separately.

Recent research has shown that new memories are recorded in the brain differently depending on how unique the memory is: https://www.quantamagazine.org/the-usefulness-of-a-memory-gu...


I have a similar view to you and not much to add to your comment, other than to reference a couple books that you might like if you enjoyed 'Thinking, Fast and Slow'.

'The Righteous Mind' by Jonathan Haidt. Here, Haidt describes a very similar 2-system model he describes as the Elephant-rider model.

'A Thousand Brains: A New Theory of Intelligence' by Jeff Hawkins. Here Jeff describes his Thousand Brains theory, which has commonality with the 2-system model described by Kahneman.

I think these theories of intelligence help pave the way for future improvements on LLMs for sure, so just want to share.


How does evolutionary instinct factor into the system model? Flight or fight responses, reflexes, etc. 'Thinking' does have consequences in terms of evolutionary survival in some circumstances, as in spending too much time deliberating\simulating.


This is a common comparison in the LLM world. I actually think it is closer to the Left/Right Brain differences described in Master and His Emissary, but that’s for a blog post later.


This sounds similar to the A Brain/B Brain concept that was described by, I believe, Marvin Minsky. I don't know how this might be related to Kahneman's work.


I had the same thought from Thinking, Fast and Slow.

Another variation of this seems to be the “thought loop” that agents such as Devin and AutoGPT use.



It’s a bit over my head for now but seems like GFlowNets are tackling this problem a bit.


interesting, hadn't come across these. Will be doing some more reading up on them.


that is the approach also taken in this paper for building LLM agents with metacognition: https://replicantlife.com/


thinking step-by-step requires 100% accuracy in each step. If you are 95% accurate in each step, after the 10th step, the accuracy of the reasoning chain drops to 59%. this is the fundamental problem with llm for reasoning.

reasoning requires deterministic symbolic manipulation for accuracy. only then it can be composed into long chains.


You’ve never made a mistake in your reasoning?

Tongue in cheek but this has been considered and has resulted in experiments like tree of thought and various check your work and testing approaches. Thinking step by step is really just another way of saying make a plan or use an algorithm and when humans do either they need to periodically re-evaluate what they’ve done so far and ensure it’s correct.

The trick is training the model to do this as a matter of course and to learn which tool to apply at the right time which is what the paper is about wrt interspersed thoughts.


>reasoning requires deterministic symbolic manipulation for accuracy

No, that is automation. Automated reasoning is a thing, indeed. And I can kind of see a world where there is a system which uses LLM for creative thinking, augmented with automated reasoning systems (think datalog, egg, SMT-solver, probabilistic model checking etc).


I dream of a world where the majority of humans could come close to 59% after attempting a ten step logical process.


wut

the average theorem in euclids' elements (written 2000 years back) would have a reasoning chain of at least 10 steps.

all of the mathematical machinery humans build need 100% accuracy in each step


all human knowledge is created by a small number of people. most of us just regurgitate and use it.

think euclid, galileo, newton, maxwell, etc...

and all human knowledge is mathematical in nature (galileo said this).

what is meant here is that, facts and events in the world we perceive can be compressed into small models which are mathematical in nature and allow a deductive method.

human genius comprises of coming up with these models. This process is described by Peirce (and Kant before him) ie, inventing concepts and relations between them to comprise models of the world we live in.

imagine compressing all observed motion into a few equations of physics. or compress all electromagnetic phenomena into a few equations. and then use this machinery to make things happen.

imagine if we feed a lot of perceived motion data into a giant black-box (which could be a neural net) - and out comes a small model of that data comprising newton's equations (and similarly maxwellian equations).

But, this giant knowledge edifice is built on solid foundations of mathematical reasoning (newton said this).

human genius is to invent a mathematical language to describe imaginary worlds precisely, and then a scientific method to apply that language to model the real world.


Another RL paper with terrible baseline. They used 0 shot non instruction tuned Mistral for GSM8k which has very specific way of output. They got 11% accuracy after improving it, while few shot prompting achieves 37%[1]. GPT 4 could get ~97% with prompting.

[1]: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...


Fwiw if they're serious scientists, taking a known method and baseline and improving it is good science. Extensions to get state of the art are probably possible, but their goal is to measure just the impact of their change in a simple setting. Let the engineers do the munged system combinations and get SoTA.


I am not talking about SoTA. I am talking about deliberate poor baseline. GSM8k consists of two things: solving the problem and getting the output format correct. Getting the output format corrects gives 30% accuracy for the same model where they got 11%. SoTA is 97%.


Any relation to OpenAI's rumored Q* (i.e. q-star) model? Authors of this paper don't seem affiliated.

Just a name coincidence?


I think it's just a play on the same hyped up term.


I was thinking the same. The STaR paper this is an extension of came out in 2022, so at least possible this is what q-star is based on too, but maybe with Q standing for something else.


This is the missing piece to train AI which has the ability to reason. There are so many tasks whose answers are known but reason steps are missing. With this method, we can use less annotated data the reach the ability.

The interesting part(I imagine): the generated thought could be hard for human to understand while it is still way more helpful to get the correct answer! If that happens, we have created something more intelligent than ourselves.


This is basically what I tried this morning at the prompt level (awful results), but the sketchy idea I had in mind went further by introducing control-flow "meta-tokens" to help the LLM renavigate its context. In this perspective the context would be rethought as a self-editing structured mind-map, with the linear aspect of the context at a time T standing for the execution trace of the exploration of this mind-map so far. Some of those meta-tokens would be able to have side effects on the context, to highlight, give structure, summarize, forget and so on, some of its parts. This could allow for native structured output without using a syntactic format such as json, programmatic constructs in the style of LMQL, implementing memory, etc. The goal: not just to give logical/reasoning abilities to a LLM, but to give it the means to come up with its own cognitive architecture. Implementing structured output (using a <label name="stuff">...</label> token) to also implement memory/scratchpads, would also bring inspectability of those cognitive structures for free. Of course I have no idea how to implement this (I'm a ML tourist).


They do not cite [1], a paper on (learned) variable computation in RNN, applied to language modeling, that predates their work by almost 8 years.

[1] https://openreview.net/pdf?id=S1LVSrcge

Microsoft also had something alike at that time, but for image recognition: a CNN at input and then varable computation at classification.


Base Mistral 7B is hardly suitable for the evaluations, even one team at Intel tried to pull a fast one with NeuralChat in the exact same way https://huggingface.co/Intel/neural-chat-7b-v3#quantitative-...


> Much of the meaning of text is hidden between the lines: without understanding why statements appear in a document, a reader has only a shallow understanding.

This seems not true when I and most people I know read things. I would argue that we almost always have a world model and know some reasons why these statements are appearing in a book. If I was reading Fluid Dynamics textbook, I may not understand the math, but I know why those statements appear; They are mathematical statements to help you learn the theory or whatever and they follow a pattern to teach you important concepts. Like for instance concepts will build upon older ones Bernoulli's equation is there because law of conservation of energy was there before it, so its there because it assumes I understand the latter..


Observation: "expertise" (hence "reflex") is the learning of the nonlinear solution space that can be inferred from initial conditions.

Conjecture: models which engage in self-training on the solutions they derive will get to something that looks a bit like bootstrapping when you squint.

Lemma: there's a nice opportunity for cloud-hosted model SaaS to offer discounts for actionable feedback on the quality of their output, so as to drive this retraining.

Idle comment: I'd use the language of REM sleep and the idea of "memory consolidation" for this.

Most of the premises of model training can be extended to the level of reasoned solutions, rather than tokens.


This looks really interesting; any possibility the researchers will release some code soon ?


If it is doing this , is it still a language model? or also a thought model?


Here we go!! I've been waiting years for them to try this. Let's see how it does when scaled up to GPT-3/4 level.

This might be the missing piece to AGI.


The missing piece is unknowable


We'll likely reconstruct what the missing piece was in hindsight, but it's very probable there's no one missing piece. Just like human evolution.


I am not convinced there even is a missing piece. I mean, LLMs are being used very differently compared to how traditional AI programs were written. Combining both worlds might be all that is needed.

I would not be surprised that when we have general artificial intelligences, we will see, that advancing LLMs wasn't necessary.


Until it's been found, you mean?


Maybe even then too!


This is purely anecdotal, and I try to keep it to myself but its very difficult when at least half of the HN homepage is AI related: LLMs like ChatGPT do so utterly terribly at any non-trivial job I throw at it that I seriously consider people who use it daily to either be straight up incompetent, or maybe their domain is so trivial that the LLM actually does well.

From asking LLMs to solve a highly difficult async C++ parallelism problem, to german language specifics, it just fucks up at a fundamental level. I understand that LLMs cannot solve these issues and why, but then I do not understand the heavy focus on AI by so many tech people.

Is day to day programming job so trivial that LLMs do a good job, while at the same time being too difficult for you to do it yourself? I really, really want to know exactly what the use case is.

Do people just throw simple problems at it to validate their own preconceived notion of how cool and useful LLMs are? Whats the deal?


I had a similar take until about a week ago. A friend showed me his workflow with Copilot and whatever Jetbrains AI assistant is.

Use it as a tool: what if instead of opening up a new tab, searching for the API docs for the library you're trying to find a function in, find the function, re-read the parameter arguments for the 400th time, and then use it, you could just highlight a snippet and say "Paginate the results from S3 using boto3" and the code would just populate?

You have to have the clarity of thought to know what you're doing, but the time it takes to write every line for basic stuff you've done 1000x before can be greatly compressed if it's inlined with your IDE.

I think this is the move for most LLM tools: integrate it with existing tooling. An LLM for Excel for corporate bookkeepers, CPAs, etc will be great. A Word/PDF summarizer that's tuned for attorneys will also be fantastic. Highlight a paragraph, ask for relevant case law, etc.

I thought ~2 years ago the results were... not great. Now I'm pretty happy with it.

SecureFrame (helps with compliance regimes like SOC2) recently added the ability to generate Terraform templates to automatically generate infrastructure that will fix specific platform risks for AWS, Azure, GCP, etc.

It definitely needs someone at the helm since it does hallucinate, but I have found it to cut down my time on mundane tasks or otherwise niche/annoying problems. When was the last time you visited 4+ StackOverflow posts to find your answer? Copilot, so far, has always hit a pretty close answer very quickly.


I also had to build intuition for when it will be appropriate versus not. It's hard to describe but one very positive signal is certainly "will any hallucination be caught in <30s"? Even in ChatGPT Plus you can have it write its own unit tests and run them in the original prompt (even in the profile's Custom Instructions so you don't have to type it all the time).

So a mistake was using it for something where runtime performance on dozens of quirky data files was critical; that nearly set my CPU on fire. But str>str data cleanup, chain of simple API calls, or some a one-off data visualization? chef kiss


> to write every line for basic stuff you've done 1000x before

There are ways to avoid writing basic stuff you've done 1000x before that are better than LLMs though...

Put it in a well-thought-out function or package or other form of shared/reusable code. You can validate it, spend the time to make sure it covers your edge cases, optimize it, test it, etc. so that when you go to reuse it you can have confidence it will reliably do what you need it to do. LLM-generated code doesn't have that.

(When you think about how LLMs are trained and work, you realize they are actually just another form of code reuse, but one where there are various transformations to the original code that may or may not be correct.)

Where LLMs shine for coding is in code-completion. You get the LLM output in little chunks that you can immediately review correctly and completely, in the moment: "yeah that's what I want" or "no, that's no good" or "ok, I can work with that". Not surprising, since predicting completion is what LLMs actually do.


I don't know exactly how you use it, but this isn't my experience at all. If you ask a LLM anything too specific, that isn't obvious and a common issue/discussion ( something that I almost never need to do), it just makes up nonsense to fill the space.

Equally, if you ask it general questions it misses information and is almost always incomplete, leaving out slightly more obscure elements. Again, I need comprehensive answers, I can come up with incomplete ones myself.

What's really obvious to me when I use it is that it's a LLM trained on pre-existing text, that really comes through in the character of its answers and its errors.

I've very glad others find them useful and productive, but for me they're disappointing given how I want to use them.


That's fair, it might not be for you. In 'old school ML', for a binary classifier, there's the concept of Precision (% of Predicted Positive that's ACTUALLY Positive) and Recall (% of ACTUALLY Positive that's Predicted to be Positive).

It sounds like you want perfect Precision (no errors on specific Qs) and perfect Recall (comprehensive on general Qs). You're right that no model of any type has ever achieved that on any large real-world data, so if that's truly the threshold for useful in your use cases, they won't make sense.


I just want something useful. I'm not talking perfection, I'm talking about answers which are not fit for purpose. 80% of the time the answers are just not useful.

How are you supposed to use LLMs if the answers they give are not salvageable with less work than answering the question yourself using search?

Again, for some people it might be fine, for technical work, LLMs don't seem to cut it.


Sorry if this is sophmoric, but when you said "you have to have clarity of thought" - what jumped to mind was the phrase "you have to speak to the code"... I thought it encapsulated your clarity of thought quite saliently for me.


You must be one with the code. You must be the code.


Stop using it for things that are in you area of expertise but are too difficult for you. Use if for things where you think "this is probably easy but I have no idea how to do it". For example, I needed to do some pretty trivial task in powershell but I have never used it so I got chatGPT to do it for me and it worked first time. Obviously I checked the commands looked plausible before I ran them, but it still probably took 2 mins to do something that would have otherwise taken 30.


I want to second this:

> Use if for things where you think "this is probably easy but I have no idea how to do it"

I had exactly the same reaction as OP (LLM's suck what's with the all the hype). These people are using it differently. For me it's often something like, asking it to put together a specific sequence of matrix transformations in ThreeJS or some other library.

This is not a difficult task but it's often one I waste a lot of time getting right. It's sort of about finding the right level of abstraction you need to ask it.


And how often will those "plausible looking commands" create obvious or subtle problems that cost far more than 30 minutes?


Probably about as often as if I cobbled something together from random blog posts except faster.

It's not like the script is running a nuclear power station.


That just means you are ignorant of how wrong it guides you. You need to first build trust before taking it new places. You do that with topics and concepts you are familiar with.


This has always been true of anything anyone has ever googled or looked up on stackoverflow

I copy paste code from stackoverflow all the time. I used to agonize over making sure I fully understand every line it's copying. Now I have the discretion of making that decision: sometimes it does really matter, sometimes all you need to know is that it produces the right result for your limited use & test case of it. (it's no different than relying on a 3rd party library in that way)

I think we need to apply the same discretion to LLM output. The answer "it depends". Sometimes using its output blindly leads to disaster. Sometimes using it without fully understanding all the details is a great way to make progress.


This is no different from my coworker who regularly copy/pastes from stackoverflow to do things he doesn't have any idea how to do himself, and just as awful, unproductive, and problem inducing.


This is an observation I've seen a lot around here. Underneath it is the assumption that "if I can't figure out how to get meaningful use out of a tool, the tool must be useless".

OpenAI didn't sign up 100M users without somebody somewhere finding it to be useful. Like any other tool, it's utility is limited mostly by the person wielding it.


The tools seem useful, but I'm not sure they are. too often they will confidently make up an answer that is wrong. When I use them they do great on trivial problems but can't help on hard ones.


Reframe your thinking. You’re approaching it like other computer systems, where a given input yields a determined output. Instead, treat it like a junior dev whom you can unload an unlimited amount of work to, but the result still requires review.

We’re all used to working this way in human systems, people that sound confident might also be wrong, and you learn where you might trust them more or less as you work with them over time. Until you are confident that they are always “right” in a given problem domain, you need to apply some level of review.

Finally, keep in mind that there are "smarter" and "dumber" LLMs. If you didn't pay for what you were doing, you were talking to a "dumber" model. The quality does go up if you have $20 in your pocket.


The junior engineers I know tend to ask questions not be confidently wrong. That isn't to say they are always right but they make a very different class of errors.


Again, this is a tool you can use. You can complain that it doesn't work in the way you expect, or you can learn how it operates and how best to use it. If you can't figure out how to apply it to your work, that's fine, but loads of other people are doing exactly that with or without you.


> When I use them they do great on trivial problems but can't help on hard ones.

That sounds super useful! The tools free you up from wasting time on trivial problems so you have more time to focus on the hard ones. What's not to love?


I try to work on complex problems. Sometimes they hide something easy


Does your job involve solving complex, challenging problems all the time?

I am a CS professor, I don't think most people would class that as a trivial job, but I find myself needing to do plenty of trivial tasks every day: mixed bureaucracy (periodic reports, grant requests, various evaluations, etc.), trivial programming (a Seaborn chart to show some Excel results), text polishing (need to cut a text to 500 words without altering meaning), writing student assignments, writing emails in (non-Native) English for sensitive requests with the right tone, etc... all of those are things I have found LLMs to do fairly well and save me a lot of time.

I wouldn't use them to do the core job of designing novel algorithms, doing experiments, writing the bulk of a paper or teaching students. But most of my working hours are not really that "core" stuff. And I would assume it's the same for most professionals.

If you have an environment where you are constantly challenged by difficult tasks... wow. I don't know if I should envy you (because I love difficult problems and hate mindless chores) or it would be too stressful.

PS: I don't think "being too difficult for you to do it yourself" is the right litmus test for LLM usefulness. I can draw charts with Seaborn, of course. But the LLM does it much faster, and I don't think doing it myself would make me grow, hone useful skills or anything. I'd rather devote my time to something else. So (in my view) it's clearly better to have the LLM do it.


They're good autocomplete, they can help search for solutions sometimes better than Google (SEO spam), you can use it as a rubber duck, and you can make it auto fill trivial stuff that would take you a few minutes to write out manually, like test scaffolding. I would never use it to actually complete a non-trivial task and I always confirm it's answers. And yeah, sometimes it sucks - it's a tool with a learning curve about knowing it's limitations.

The reason there's so much money and time is that even semi-competant AI is relatively new and the methods are still extreme crude, and yet it's this advanced. This seems like the path to an AGI, and if someone were to even approach that point, it would radically change the world forever and could lead to either really good things or really bad things.

Now, GPT-4 isn't considered the best at specialized tasks. It's a master of many, but there are much smaller models that can do things like incredibly complex symbolic/geometric math proofs, write code, perform translations, etc better. A lot of ideas are on making expert systems using many of those specialists combined with a generalist, like the segmentation of a brain.

Anyway:

> I seriously consider people who use it daily to either be straight up incompetent, or maybe their domain is so trivial that the LLM actually does well.

These kinds of radical lines of thinking about a significant proportion of enthused professionals (in any industry) who aren't showing the same experience as you, is a red flag for introspection. It's so easy to fall into the "enlightened me" trap.

I appreciate you asking for more information!


There are plenty of jobs where people have to complete various tasks that are outside of their domain or otherwise tedious on a daily basis. For example, plenty of devs have to set up or change the configuration of remote hosts. Some LLMs are pretty good at generating devops scripts to speed up this work.


Exactly. Example: maybe 1% of the code I generate is bash. I used to try to memorize patterns, but of the top 20 I'd use each less than once per year. Now, instead of that 1% taking 5% of my time, it takes 2%. It's all "simple stuff", and I can verify it instantly.

I have ~10 similar use cases. So it hasn't revolutionized my life, but it's been well worth $20/mo ChatGPT Plus and $3/mo API calls.


For me it’s more like brainstorming.

Even if half of it is garbage it’s a net win. At least in domains where I can distinguish the two.

There are also cases where the cost of failure is very low. Eg I could spend half an hour reading an api spec or I could make an AI give me a curl command and test it out in 30 seconds. If it works great if not oh well time to read spec


I signed up for Open.AI’s monthly subscription. Its performance on non-trivial tasks is abysmal. It’s a regurgitation machine. One might mischievously argue the average tech worker isn’t much better than an LLM, thus the interest? On a related note, we are deluged daily with firms offering AI services. I see a bubble.


You should treat LLMs the same way you treat any other smart entity, human or otherwise: realize that they can be both immensely useful and fundamentally wrong at the same time. Intelligence is not equivalent to correctness.


Why do you presume that people commonly use it for non-trivial things? It excels at trivial things. That's what most people use it for, probably. Like google search. Is there something that leads you to think otherwise?


Perhaps the incessant talk of GPT-x being AGI, whatever that means.


I don't use it constantly but regularly.

LLMs english skills are much better than mine.

And when i do a little bit of go coding once a week (i'm a java developer by trade), i don't have the time to learn go well enough to just type stuff down without looking things up. Instead of googling, i tell it "I need a struct with the following attributes..." and it doesn't just ell me how i do structs in go, it also creates them for me.

Also: There are a TON of issues were i would write a short script to do something (formatting text into a table, searching for specific lines etc.) were a normal person doesn't even have those tools at hand.

For companies overall: Its not just what an LLM can do, LLM can do things for you but its also a very very good interface to your application. The demos i saw in my company are really good and totally make sense and do reduce the entry barrier for people.

I know a friend whos job is to create reports with sql. She doesn't do anything else just reports across the whole datawarehouse. Why? Because every normal non dev person can't just write SQL or automate things.

The gap between tech people and management is huge.


Not everything in tech is difficult

I find LLMs great for creating SQL queries and regexes


Technology is complex and hard to make sense of. That is why most non-experts have a strong wish for a kind of mythical technology, which you can just pour onto your problem and it magically knows what you wanted (and which things you did not want).

For a certain class of problems LLMs achieved new, never before seen, almost magical results. Now imagine you were someone who hates dealing with the constant complexity of solving problems with technology and something comes along that seems to carry promise of lifting that off your shoulders. Then you know why people react like they do. Recall the block-chain-craze? There were people who declared that this somehow magically solved any IT-security problem there ever was – instead of seeing it as a good solution for a very specific set of circumstances, nearly nobody faced in practise.

In reality of course also LLMs have limitiations, e.g. above mentioned ambiguity that is inherent to any magical technology: To be true magic the technology would have to be able to read the thoughts of those who apply it and somehow infer from that the true thing they want or need. Now LLMs are in the end still just very good guesses based on statistical data, that means the guess could just be what you want, but it lacks an actual understanding of what it is doing.

Those applying the technology for things it is actually good at (e.g. classification problems etc) will put it to good use, but there will be a lot who will apply it and have things fall apart Canada Airlines style.


Your view on LLM usage is too narrow. Yes, they are pretty shit for me too in solving coding problems, but they are still useful for bespoke information extraction, classification and creative applications. The interest is justified, we're just having a hard time understanding the limitations.


I train my LLM to barf up my domain specific boilerplate code. I don't ask it to solve business problems.


Three examples:

1. having ChatGPT generate boilerplate, because I’m lazy;

2. having ChatGPT attempt something I don’t know as a starting point, eg JavaScript; or,

3. having ChatGPT give a reference rather than Google myself, eg of a config option.

ChatGPT makes 1 less tedious, 3 less a game of “what magic phrase finds the right SO post?”, and means I do 2 at all, eg trying out JS features on my blog.

I think it does alright at composition if you break down the task sufficiently, but it struggles with higher order structure — particularly if you’re using multiple responses.

That said, I suspect we need a theory shift to get AI to comprehend higher order structure in composition.


It's pretty amazing at generating rust structs from yaml examples, and also at writing generic versions of rust functions.

Neither of those tasks are especially _difficult_, but they are _annoying_.


Some questions we've thrown at GPT-4 recently (real use cases):

> how does torchmetrics IOU work? Does it match gt with detection boxes? or does it do pairwise IOU and average?

> What predictions has Ray Kurzweil made that he got correct and incorrect? Please produce a table

> can you give me a stack implementation with min function in O(1) time

> (A question about how we should solve a UX problem specific to our app)

> What is the best way to return raw image data via a REST endpoint?

> How is Return on Capital Employed (ROCE) calculated?

> Following the email exchange below, write a cross intro email to introduce (X person) to (Y person)

> How do I run this code on TPU in Collab?


Did it correctly answer all of these?


RE: Ray Kurzweil

Did you see him on JRE last week:

https://www.youtube.com/watch?v=w4vrOUau2iY

(or was that why you asked)


When you say ChatGPT, are you referring to GPT4? I find a huge and avoidable miscommunication happens when two people both think they are using “ChatGPT” but talking about two different models which vary in size by a factor of 10.

Assuming you are talking about GPT4, for the sake of argument, the answer is speed. Of course I can write a small parser script that deals with some data I received from a client. It will take me an hour and be a tedious task far distant from my actual expertise. An LLM can do it in 45 seconds, including the time it took me to describe the task.


I daily drive KDB/Q. This is readily extendable for example in C, which was my previous daily, and Python which I use sporadically.

I don’t use LLMs for C or KDB, I do use them for Python.

ChatGPT is good in Python. I guess as Python programmers rely on stack exchange so there is lots to learn from, and Python anyway is largely an exercise in finding the correct library.

If the only thing ChatGPT did was listen to my problem and suggest which imports to use/manuals to read, that would be good enough to use regularly. If I wasn’t after a library/pre existing code I wouldn’t be using Python!


I've definitely noticed ChatGPT generally writes better Python than it writes Scala, presumably for the same reason of there being a fair bit more Python code in the wild.


The actual reason probably has to do with the fact that LLM developers and academics are more familiar with Python than other programming languages, and therefore have policed it's correctness better.


Boilerplate, test code, and general tedium. Most software just needs to handle IO.

The next time you want to use SQL to compute a rolling sum try asking ChatGPT 4 instead of searching through documentation or search engine results for windowing functions.

Competency at programming along with very good technical communication skills (with a touch of learning how to not hold the tool backwards) and you should find the appeal.


yes. just used Cody to get me on the right path with an obscure postgresql JSON query, it easily saved me an hour of fiddling around.


Profit. The question at hand is whether LLMs can produce profit, which is an extremely different question than the questions you're asking.


I’m preparing. Learning how to work with an AI is the only way to stay competitive. The AIs will become smarter much faster than I will.


Are you using the free one? If you are, GPT4 is completely different. Claude is the best free chatbot.


Yes, the free one. I find it almost a fallacy to say "oh, yeah it sucks, but you should try the expensive one! Its proper good."

Oh, your BMW keeps having issues? You should have bought the one thats 2x the price, that one is perfect!

Of course its better, but both are being sold as a car, or, in this case, an LLM. Theyre both LLMs, theyre both by the leading AI company, if one sucks such supreme ass, why would the other one be amazing? ;)


We're not in control here and we didn't build them, they just happened. Every LLM is completely different and some of them are bad. You can't generalize from one of them at all.


Every other query I've given to ChatGPT came up with an utterly wrong answer. Followup always yielded "sorry, I made an obvious mistake, here's another wrong answer". Confident and stupid is a very bad combination.


First sentence is 100% my sentiment, cheers !


I think its safe to remind oneself that this thing is literally a zygot. So patience, and in ~5 years, it will be a different story.

@xanderlewis

Doesnt that mean it simply is now consuming the internet in real-time?


Why? It’s already eaten all of the publicly available data on the web.


We're done for!


Neural networks do not think


You're not giving information anybody on this forum doesn't already know.

Obviously they don't "speak" either. Both "think" and "speak" are used as shorthands here for what the language models actually do.


What are you upset with me for? The authors are using the misleading language, not me. Take it up with them.


Could you give a definition of "think" that NNs fail to live up to?


Abstracting immaterial concepts from physical reality and deliberately using them in analytical or deductive processes to discover truths.


Might be relevant: https://www.nature.com/articles/s41586-023-06924-6 Mathematical discoveries from program search with large language models


So basically finding ways to compress your observational history?


No, it's not "basically" that at all.


That's pretty much what it is, as you stated it. Finding abstractions that let you encode your observational history more efficiently than you previously could, or "discovering truths", if you want to be all mystical about it.


That's not what it is, and it's not what I stated.


Then could you go into more detail? Because what I just described was the progress of scientific theory.

If an AI can do that, it's not going to matter whether or not it meets your arcane definition of "thinking".


Your definition of thinking is designed to fit AI. You set the bar low, and then get giddy when it jumps over. "Progress of scientific theory" is just a meaningless phrase that makes your claim sound authoritative when it isn't.


I'm still not hearing your definition of thinking, but given how hallowed you seem to find it, it must be truly brilliant.

Progress of scientific theory is plain to see. At each step, e.g. Kepler equations -> Newtonian mechanics -> General relativity, we encode our observations of the physical world ever more efficiently.


I gave it above, you're just too dense to even begin to try to understand what I mean. If you have a specific question I'm more than happy to try and answer it.


>I gave it above

Yeah, something about "abstraction" and some hand-wavy magic, which computers are already doing.

Can you state, specifically, what part of thinking you presume computers can't do?


They cannot handle immaterial concepts such as goodness, truth, justice, etc. because such concepts are not reducible to material components. They cannot abstract at all, because to abstract something is to consider it in a universal way, leaving the material parts behind. Computers are fundamentally material, and so cannot handle any kind of immaterial concepts.


Do neurons think? Do a bunch of neurons?

Is this semantics?


basically this: https://en.wikipedia.org/wiki/Sorites_paradox

One neuron doesn't think. Three neurons don't think. Billions of neurons think. Somewhere between one neuron and billions of neurons, thinking starts happening. Probably also true for neural networks. The main problem is that people throw around terms like: "Thought", "Intelligence", "Will", "Reasoning", "Knowledge", "Consciousness", etc like they are very well defined and well understood terms and they very much are not.


My point precisely. Those are all vague terms. Saying that "neural nerworks do not think" is as meaningless as any equivalent (or opposite) statement on any other system including any number of neurons, a whole brain or a person.

It's all semantics.


Your claim is that saying "people think" is meaningless? Maybe you don't think (seems to be the case), but I certainly do.


It is meaningless to me because the term is imprecise. Is it precise when I do something bone headed and say "sorry I wasn't thinking"? Do cats think? Do worms?

Depends on what you mean by thinking in that particular case.

In my opinion what LLMs do is close enough to thinking for some definition of the word.

And, by the way, I think there's no need to be unpleasant to state your point.


There's a difference between a term being imprecise and being equivocal. Yeah, we use the term "think" in different ways. But it primarily refers to the operations unique to the human intellect with which we're all familiar.. deduction, analysis, synthesis, judgment, deliberation, abstraction, etc.

The thing that's special about these operations is that they deal with immaterial concepts. What's right? What's good? What's true? And so on. Computers have never achieved this and they never will, because they are only material arrangements, and so can only handle things on the material level. The human mind is clearly not reducible to that, because it can handle purely immaterial concepts.


I see where we differ in opinion. I do believe that the human mind is just a very particular material arrangement. Therefore while different in scale to an LLM and likely also differing in structure, I do think both systems are fundamentally in the same category.

Given that, we could never agree. But that's fine.

And by the way, I actually hope you're right.


Yes that is the key contention. I don't think that our minds are material, because if they were then we couldn't handle any immaterial concepts. Although obviously our brains are involved in the operation of the mind, they cannot be an exhaustive explanation of what the mind is, given that he mind is perfectly comfortable to leave the material world behind.


Billions of neurons don't think, people do.


...with what?


With their minds


There are no real neurons in a neurons in a neural network.


I don't understand the downvotes - you are correct.


It's fair to talk about thinking in a handwavey "you know what I mean" way. This is not a philosophy paper. It's a fine point if that's what you want to discuss, but doesn't change anything about the issue at hand and is needlessly pedantic. It's the "what you're referring to is actually GNU/Linux" of discussions about the tech side of AI.


It pretends to be a philosophy paper. If they wanted to talk about computation, they would use terms that communicate that clearly. But they're using words that confuse the two fields. I didn't do that, the author did.


I think people just get mad when they're reminded of this obvious fact. They want computers to prove that our minds are an illusion, the product of a "meat computer".


Read some Daniel Dennett!


Are you serious?


You're very grumpy I think you need some food and a nap :-)


I think you need religion


Next Language models teaching themselves to think, then kill humans, based on crawled russian website with secret AI instructions.


Although this is obviously satirical hyperbole dataset poisoning is real and will be underappreciated until the 1st catastrophic example of it happening occurs.


Many years ago now I wrote my kids a very simple chatbot to play with. You'd type in a phrase. It would tokenize it, adding start and stop tokens, then update it's token transition probabilities, using the two preceding tokens to pick the next one. It would then generate a response from these probabilities.

The data poisoning began immediately. Because "poop" was such a funny word, they quickly taught it that the most probable token after any bigram was "poop".

No humans were killed, but two small kids were amused for an hour or so.


My condolences for your models' poisoning. It sounds like a real crappy way to go :?


It isn't really all that real, of course untrusted text contains things that aren't true. So don't trust it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: