Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

anon291 · 2024-03-15T11:17:09 1710501429

So it seems 'obvious' to me that a network about 50 layers deep (for example) can only reason about symbolic questions for 50 'steps' (in quotes because it's not a step as we think about it). It only seems there's more complexity because it's 50 steps in one or more learned subspaces that the model has been trained in (which might mean the model can accomplish more than one 'human step' in its 'step'). Humans (well intelligent humans at least) seem able to obviously reason beyond those steps, but we all know it requires real thinking and deliberation and perhaps a notepad to be able to do that.

It's quite something to, for example, expect ChatGPT to be able to correctly do 4 digit multiplications without any thought or recourse to 'paper' when very few human beings can do that.

radarsat1 · 2024-03-15T12:27:46 1710505666

This is true but you have to also consider the autoregressive component. In your example, it's 50 steps per iteration of the model, where the model is executed once for each token in the output.

So practically speaking it's a bit more complicated to calculate how much the model can "think". Of course once a token is output it is committed to that (in the most basic scenario), but that doesn't mean it is not still "thinking" as it produces subsequent tokens.

> perhaps a notepad

Exactly, the context and previously output tokens can be considered such a notepad since they are input for the next steps of the model.

anon291 · 2024-03-15T13:26:40 1710509200

So part of my general issue with this kind of thinking is that, if we take this as the main means of creating complexity, then shorter prompts are worse for reasoning than longer ones, because longer ones automatically give the model more 'space', to think. Now, I realize that the research community knows this, but I like papers like this that explicitly seek ways to enable the model to 'breathe' a bit,.

danielmarkbruce · 2024-03-16T00:20:52 1710548452

This doesn't make sense. The responses can be long.

Closi · 2024-03-15T13:18:34 1710508714

Agreed - also prompt engineering encourages LLM's to do this too (i.e. asking the LLM to explain the steps it will take to solve an answer, prior to answering - e.g. Zero-Shot CoT 'Let's think step by step')

carlmr · 2024-03-16T11:21:48 1710588108

Teachers asking students to show their work are human prompt engineers.

blackbear_ · 2024-03-15T11:54:51 1710503691

This paper does indeed follow your intuition to investigate the limits of transformers on compositional tasks (i.e., those that require multi-step reasoning, including your multiplication example): https://arxiv.org/abs/2305.18654

> Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily developing systematic problem-solving skills. To round off our empirical study, we provide theoretical arguments on abstract multi-step reasoning problems that highlight how autoregressive generations' performance can rapidly decay with increased task complexity.

visarga · 2024-03-15T14:51:50 1710514310

Maybe the Skill Mix paper is relevant here. They define a list of 100 skills, and then randomly sample tuples of n skills (usually less than 6) and generate a test example using those skills. Apparently only GPT-4 (at the time of the paper) was able to compose 5 skills, the other models just 3 or 2. Beyond 5 skills even GPT-4 was doing much worse.

The interesting finding of the paper is that GPT-4 couldn't have seen all the (topic, skill-tuple) combinations in the training set. If you have 10,000 examples on a topic, and use 5 out of 100 skills, you would need 100^5 training examples to cover all combinations. In conclusion GPT-4 generalizes to new skill combinations, thus it is not a stochastic parrot.

https://arxiv.org/abs/2310.17567

anon291 · 2024-03-15T12:16:08 1710504968

Ah good... This is definitely a research path I've been looking into. Great to see someone else has already gone there!

visarga · 2024-03-15T14:47:18 1710514038

You are missing an important detail here - number of tokens - yes, you have 50 "steps" in network depth, but you could have extra tokens. Assuming you don't run out of tape, there is no reason for LLMs to be limited to simple operations.

danielmarkbruce · 2024-03-16T00:17:56 1710548276

This doesn't make a lot of sense when you consider how backprop works. Layers aren't limited to working independently.

This also doesn't make a lot of sense when you consider models are autoregressive.

082349872349872 · 2024-03-15T11:49:44 1710503384

Edsger Dijkstra had a precise english style; even though his mother tongue was Dutch, I find he made better use of English than many native speakers.

In one of the EWD's, he reminisced that, as children, they were taught to never begin to speak a sentence unless they already knew how they were going to finish it.

I'd bet these two observations have a causal connection.

zoogeny · 2024-03-15T15:54:45 1710518085

When I was a young man I was taking a language course while I was temporarily living in a foreign country. There was an older man in the course (not elderly, more like mid-fifties) who was very bad at the new language we were both learning. Yet I noticed he had, what seemed to me, a magic power: he could always make people laugh. He would often whisper something to one of our classmates and they would always get a giant smile on their face or even laugh out loud.

I was intensely curious and I spent some time wondering how he did it. One day, out of the blue, he invited me out to lunch after class. We just chatted for most of the lunch, exchanging backgrounds and stories. Then his face took on a serious expression and he slowly and carefully began to explain something to me as if he was passing on some wisdom.

He said that he never spoke a single sentence without fully saying the sentence in his mind. He said he would often think of the words several times in his mind, revising the phrase until he was happy. He would imagine saying the words to the person in front of him and he would imagine their reaction. And he would continue to revise until he felt confident the person who heard the words he would say would react in the way he wanted them to react. If he could not imagine the person reacting how he wanted them to react, he would not say anything at all.

It was clear to me that he was passing along this advice but also that he was calling me out a bit. He was letting me know that I spoke without thinking. I say what pops into my head. It was like he read my mind honestly, he knew exactly what I was curious about and he answered the question I had for him that I never asked.

I wish I could say that I learned the lesson. When I have tried the technique it has rewarded the effort. But I haven't formed it into a habit and I still tend to let my mouth race ahead of my mind.

MattPalmer1086 · 2024-03-27T19:42:30 1711568550

That actually sounds like hell to me, a complete absence of spontaneity and being in the moment.

I used to obsessively try to figure out what to say before I said it. I am socially awkward, and it did not help at all. I love writing because it is asynchronous and I can figure things out precisely and edit my thoughts.

But in social situations it is a complete hindrance.

Cthulhu_ · 2024-03-15T15:24:03 1710516243

I've observed two things. One, writing is different to speaking, because it's async, you can think before you write, you can edit, etc.

But second, speaking in a non-native language makes you think harder about what you're about to say. Less colloquialisms, more focus on making sure your meaning is understood, more sensitivity in case you might offend someone, perhaps?

It's not new either; a lot of science and whatnot has been done in people's not-native language, like French, German, Latin, etc. Another factor there is the lingo of the field; I can't simply say "Kubernetes is een open-bron houder orkestratiesysteem voor het automatiseren van de inzet, schalen, en het beheer van zachte waren" without confusing half my native speaking audience.

wara23arish · 2024-03-15T16:17:08 1710519428

I love reading his EWDs, I had a professor who worked with him who mentioned he made his students work use pens while taking his tests. To make it less likely for the students to make mistakes??

float4 · 2024-03-15T19:19:16 1710530356

> he made his students work use pens while taking his tests

This is very common in the Netherlands, I think that's why it was a rule of his.

In general, the Dutch education system seems to be against pencils (at least this was the case until recent; I'm Dutch and mid 20s). You're tought to write using a fountain pen, not a pencil. In high school, you're allowed to switch to ball point but absolutely not to pencil. In university, write with pretty much anything you want, but... not with a pencil. If you do take your test with a pencil, there's genuinely a chance your teacher will give you a 0, although most of the time they'll probably be forgiving.

I majored in CS in the Netherlands and every test was done with good old pen and paper. Students still make mistakes all the time, which is why everyone uses a scrap sheet.

wara23arish · 2024-03-16T04:00:49 1710561649

Same for me, growing up in the middle east. We used fountain pens for everything. And using pens/pencils wasn’t allowed for tests/submissions etc..

westurner · 2024-03-15T16:22:37 1710519757

Perhaps to make it easier determine how to correct instruction.

- "Guidelines for keeping a laboratory notebook" (2019) https://news.ycombinator.com/item?id=19123430#19126809

torginus · 2024-03-15T15:46:25 1710517585

I also learned English from textbooks, and one of the strangest things I encountered that native speakers routinely confuse "their, there, they're" which I never thought was a mistake I could make. It would be like confusing 'wet' and 'vet'. So there's definitely a difference between native and non-native speakers use the language.

qup · 2024-03-15T17:07:52 1710522472

The people who confuse that mostly have not done very much reading. Audibly, those words are identical.

leobg · 2024-03-15T20:43:18 1710535398

Even crazier:

“Could of”.

Like “You could of said so”.

ricardobeat · 2024-03-15T11:51:49 1710503509

Is that even possible, or just hyperbole? I'd bet the latter. I wouldn't be surprised if some people are able to fully unravel entire paragraphs of conversation in their head in a couple of seconds, but that's not something you could teach to children in general.

mannykannot · 2024-03-15T13:21:01 1710508861

I don't think it is feasible, at least for conversation, but as an aspirational goal for children, along the lines of "put your toys away when you've finished playing with them", it is not a bad one.

It's not unusual for me to think I know how I am going to end a sentence, but then find that I can't get there.

h34t · 2024-03-15T13:59:01 1710511141

in Dutch (and German) the verb often goes at the end of a sentence, so the advice is rather practical.

ricardobeat · 2024-03-15T18:11:24 1710526284

Dat week ik heel goed :(

ricardobeat · 2024-03-15T22:45:37 1710542737

*weet, thanks autocarrot

ted_bunny · 2024-03-15T18:27:28 1710527248

German children would with you disagree.

caddy · 2024-03-15T13:30:57 1710509457

I also wonder if it has anything to do with the process of learning a new language in general. I've thought more thoroughly about how English works since I've been learning French (not that I'm very eloquent in either)

fennecbutt · 2024-03-18T02:12:11 1710727931

Unfortunately from experience that just gives enough of a delay that you get talked over in a group setting and never get a chance to speak anyway.

dcrimp · 2024-03-15T10:46:46 1710499606

I had this thought the other day that the whole chain of thought reasoning pattern contributing to improved performance in LLM-based systems seems to sit parallel to Kahneman's two-system model of the mind that he covers in 'Thinking, Fast and Slow'.

Haven't read it in a few years, but I recall the book suggests that we use one 'System 1' in our brains primarily for low-effort, low computation thinking - like 1+1=? or "the sky is ____".

It then suggests that we use a 'System 2' for deliberate, conscious, high-cognitive tasks. Dense multiplication, reasoning problems, working with tools - generally just decision-making. Anything that requires focus or brain power. Our brain escalates tasks from S1 to S2 if they feel complex or dangerous.

Maybe I'm being too cute, but it feels like critique that "LLMs aren't intelligent because they are stochastic parrots" is an observation that they are only equipped to use their 'System 1'.

When we prompt an LLM to think step-by-step, we allow it a workspace to write down it's thoughts which it can then consider in it's next token prediction, a rudimentary System 2, like a deliberation sandbox.

We do a similar thing when we engage our System 2 - we hold a diorama of the world in the front of our mind, where we simulate what the environment will do if we proceed with a given action - what our friend might respond to what we say, how the sheet steel might bend to a force, how the code might break, how the tyres might grip. And we use that simulation to explore a tree of possibilities and decide an action that rewards us the most.

I'm no expert, but this paper seems to recognise a similar framework to the above. Perhaps a recurrent deliberation/simulation mechanism will make it's way into models in the future, especially the action models we are seeing in robotics.

airstrike · 2024-03-15T13:10:37 1710508237

I'll preface this by saying I know this may sound entirely made up, unscientific, anecdotal, naive, or adolescent even, but luckily nobody has to believe me...

A few weeks back I was in that limbo state where you're neither fully awake nor fully asleep and for some reason I got into a cycle where I could notice my fast-thinking brain spitting out words/concepts in what felt like the speed of light before my slow-thinking brain would take those and turn them into actual sentences

It was like I was seeing my chain of thought as a list of ideas that was filled impossibly fast before it got summarized into a proper "thought" as a carefully selected list of words

I have since believed, as others have suggested in much more cogent arguments before me, that what we perceive as our thoughts are, indeed, a curated output of the brainstormy process that immediately precedes it

giva · 2024-03-15T14:07:14 1710511634

Well, this sound weird to me in the sense that I don't feel that I think in _words_. I only convert my thoughts into words when i need to speak or write them down; So when I need to communicate them to others, when I need to remember them for later, or when I am stuck and I need to clear things up.

I was actually convinced it was the same for most people, and that for this reason "Rubber duck debugging"[1] is a thing.

1) https://en.wikipedia.org/wiki/Rubber_duck_debugging

kjqgqkejbfefn · 2024-03-15T15:55:41 1710518141

Am I the only one visualizing some of my most creative thoughts in a mental palace that is formed by many distinct (euclidian) spaces, whose axis connect to each other through a graph ? Closest thing that can describe this I found are simplicial sets:

picture: https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRx5Xam...

It seems it's used by cognitive models, although I'm not formally trained enough to tell exactly how:

https://arxiv.org/pdf/1703.08314.pdf

mewpmewp2 · 2024-03-15T16:34:15 1710520455

I wish I had something like this in my head to tie things in together. Right now I feel like my understanding of things is so disorganised and "lucky" in a sense. I feel lucky that I have grasp of anything.

endofreach · 2024-03-16T00:03:36 1710547416

Wow, well expressed. That's exactly hoe i feel. Not momentarily, but with everything. Though i am actually not intelligent, i just have good intuition and luck to grasp some of what i need to "unddrstand".

karmakaze · 2024-03-15T19:55:47 1710532547

Reminds me of the saying about a poet vs mathematician, the first gives different names to the same thing and the latter the same name to different things. Maybe that's why I can't stand highly descriptive prose (aka describing the water while I'm drowning over here).

Now what if you're a poetic mathematician (or mathematical poet), what's that mind map look like?

LargoLasskhyfv · 2024-03-15T22:26:46 1710541606

Well... what about that palace of mind thing, and the ability to rewind into almost all older memories at will, and on demand being able to look up things from there, like reading, without having it memorized at all? Also full stream of consciousness, like smells, tastes, light wind on your skin, 'silken air' at just the right temperature and humidity.

All of that arranged in something like 'eigengrau', represented by glitterlike points connected by graphs, mostly in 'phospene' colors, but not exclusively so.

Sometimes very non-euclidean, moving/warping.

KNOWING what's behind every glitter point, like small cinema, large home theatre, from several points of view at the same time.

No words involved. Just visuals.

Thinking, like juggling/weighing blobs, like that glowing stuff which moves slowly up and down in a lava-lamp.

Somehow 'knowing' what each blob, its size/form/viscosity/weight/speed/color/brightness/'feel'/smell represents.

Slowly emerging new 'visuals' from this. Which are then translated into 'language', if ever.

kjqgqkejbfefn · 2024-03-16T03:17:00 1710559020

>phosphene color

Not sure whether you talk about the uranium yellow/green color, or the brief hallucination of a light spot (happened to me just a few minutes ago, hadn't had one in a long time).

I don't have such an hyperbolic mental palace, and this doesn't really give me the ability to establish a global map but I relate a lot to what you wrote. Sometimes as I reach the climax of a long deep thought, I'm thinking via vision exclusively to the extent I don't even pay attention to what my outer eye sees and I stumble upon some insight that is sometimes almost impossible to convey in language, not because it lies beyond, but because the intrusion of language causes the idea to collapse: words points to dangling shapes that mean barely anything because the rest of the painting has gone away.

To those that have read this far and can't relate to this way of thinking, this isn't a superpower, those are rather rare experiences of altered states.

Talking about this is a kind of taboo and may cause some smiles, and indeed if there is a deeper truth to these experiences about the computational or geometric nature of the mind, maybe in the same way synaesthesia mirrors spectrograms, it won't help people working in machine learning a lot (even though some like Lecun seem to use their own visual introspective abilities as a source of inspiration).

However they may prove to be crucial in conceiving what kind of use brain chips should be put too. For now it seems we're walking through a thick fog in that direction with envisioned application being confined to interfacing to external computers or increasing cognitive abilities quantitatively, such as perfect memory and so on. If I could sustain such experiences durably, with a high level of control and enhanced geometric/mathematical understanding, I believe this would be akin to a superpower, yes.

mistermann · 2024-03-16T00:32:17 1710549137

Like (parts of) this sort of thing maybe?

https://youtu.be/BLmAV6O_ea0?si=OdPbwBXs6mOR5Xj2

LargoLasskhyfv · 2024-03-16T01:01:20 1710550880

No. That's too dense and organic. Mine are rather abstract, much padding, empty eigengrau between 'loci', and more 'geometric'?

edit: I knew about mandelbulbs before. My inner mindscapes are not like that.

kjqgqkejbfefn · 2024-03-16T02:39:07 1710556747

>Now what if you're a poetic mathematician (or mathematical poet), what's that mind map look like?

Well look at the drawings I posted below: mathematical notions mixed with ad-hoc diagrammatic distinctive elements such as colors and marks. With maybe a theorem that posits that every mixed representation like theses matches a colorless, unannotated, rigorous mathematical object ?

In fact I come from a structural linguistics background, and when I pictured how one could extrude a semiotic square into another one, I felt like I understood the vague intuition behind homotopy type theory: the metaphor goes like this – the extrusion volume must be water tight for the squares to make sense.

Suppose you read Dostoyevsky's short story "Another Man's Wife and a Husband Under the Bed." In that case, you might notice that the protagonist's vertical position, as he eavesdrops on what he believes to be his wife through the wall of another man's apartment while standing alone in a corridor, mirrors the horizontal position he later assumes when hiding under the bed of his wife's presumed lover. This physical positioning reflects his moral descent, particularly as he is not alone this time. Beneath the bed with him is another man, clandestinely involved with yet another man's wife. This leads to help us picture that our protagonist is just as disconnected from his wife as the man lying next to him under the bed or the husband unknowingly sleeping above them—if not more so.

Granted I don't have the detailed vision of this semiotic diagram, but coming up with the skeletal structure is exactly what the job of a semiotician consists in (which I'm not). What matters is that all these equivalence classes the writer lays down, just like in mathematics, allows meaning to flow. His vertical loneliness must match his horizontal promiscuity for the story to operate this crescendo. Clog theses connections, and the inner structure of the object they tie together disappear too. Digging into Saussure and Voeivodsy one can realize they shared a common obsession about identity, for it is precisely when physical objects become indistinguishable that they can be referred to with the same terms and that conceptuality arises (Aerts, 2010s and onward).

"Different names to the same thing" and the "same name to different things": the two directions on the homotopical ladder.

Note: I'm 100% in postmodern mode here, this goes way above my head of course.

giva · 2024-03-15T18:32:43 1710527563

I don't know what a simpilician set is and wikipedia didn't really helped me. However I could roughly describe my "mind" as many mental maps where concepts are laid out and connected in different ways. Learning means putting new things on these maps a thinking is navigating through them.

kjqgqkejbfefn · 2024-03-16T01:39:09 1710553149

This is just a deleuzian metaphor for the weird kind of space I perceive certain abstract thoughts with.

>many distinct (euclidian) spaces, whose axis connect to each other through a graph

Imagine having pictures hanged on the walls of your mental palace that act as portals to others rooms and corridors within that palace, and that must exist parallelly to each other, in different "universes" otherwise their volumes would intersect. The kind of geometry the Antichamber video game features.

Or picture this: a representation that relies on its axis to convey meaning, for instance the political compass meme. Walk along an axis long enough and it will connect orthogonally to another axis, for instance, authoritarianism may connect to anger from the emotional compass.

Simplexes: a generalization of triangles to n dimensions. A 2-axis representation (the political compass for example) could connect to spaces with 3 axis (the ascended political compass: https://external-preview.redd.it/UQgZCVQ4OLg_Hz16FGdu9-qxfq9...).

To represent this you could connect one tip of a segment (a 1-simplex) to the tip of a triangle (a 2-simplex), each vertex in these figures representing an axis. This is where my deleuzian metaphore collapses because I'm conflating the notion of axis with the notion of the "left" and "right" part of an axis. And I'd also be tempted to consider that planes should be allowed to connect to axis (to support that portal through a painting I mentioned above).

So this is just a sketchy thought, but this seems legitimate as it's not something I conceptualize but something I perceive (sometimes). But I think there may be something interesting behind these perceptions because it seems they deal with separate concerns through some kind of orthogonal geometry that is structured: putting a concept in a dimension orthogonal to another concept doesn't lead that dimension to be orthogonal to all other dimensions/concepts in your mental palace, as that would be the case if it took the shape of a n-dimensional space. And because the orthogonality is structured, it allows to deal with more than 3 concepts spatially at the same time and embed them within something your eye can picture in 2D or 3D, using diagrammatic annotations (colors, marks, etc). Finally it allows to put a concept C in several orthogonal relationships to distinct concepts, for instance A and B, and to keep these different instantiations of concept C orthogonal to each other.

This is what my mind pictured as I was explaining this ; colors and graduation marks/boxes faithfully representing what I just perceived: https://pasteboard.co/kMecyenyZdzg.png

Note that the two colors, the green of the axis and of red of the sticks could be thought as two individual concepts of their own, orthogonal to each other.

https://pasteboard.co/3VYEyepnVouQ.png

If a mathematician is reading this, please accept my deepest apologies. Here's another paper that seems thematically related to this: https://ieeexplore.ieee.org/abstract/document/10008602

jiggawatts · 2024-03-15T20:52:07 1710535927

https://mymodernmet.com/inner-monologue/

giva · 2024-03-16T00:17:10 1710548230

Really interesting. I could guess that people that "think in words" are more likely to share their thoughts on social media, since they don't need to translate them into text/speech like people that "think in concepts"

idiotsecant · 2024-03-16T00:11:50 1710547910

I guess from the results of this thread a larger percentage of HN has this condition, but my understanding from reddit threads is that it is quite abnormal. I also lack an internal narrative, and I was quite shocked to find out that most people literally have a voice that they 'hear' internally.

giva · 2024-03-16T00:25:34 1710548734

I'll paste my reply to another comment on this thread:

> I could guess that people that "think in words" are more likely to share their thoughts on social media, since they don't need to translate them into text/speech like people that "think in concepts"

So, maybe word-thinker are just over represented in "mainstream" social networks, and concept-thinker are over represented in engineering circles?

JoBrad · 2024-03-15T15:38:31 1710517111

Same. If I try to visualize my thoughts it’s like a cloud that coalesces into various forms, to show different scenarios. It definitely isn’t word-based until I decide to actually translate it into that mode.

mewpmewp2 · 2024-03-15T15:48:40 1710517720

Interesting. I think all of my thoughts are this record I'm listening to as if it's an audiobook almost. Sometimes, it's like multiple parallel streams of different thoughts at different strengths that I can observe, like a thought line that is going on, on a more subconscious level, and it's something that if I notice, I might want to pay attention to.

Like multiple LLMs are generating tokens in my head in parallel, but like in my field of view, some I can only listen/see barely because I'm not focusing on them.

nico · 2024-03-15T14:00:30 1710511230

There is a technique for achieving this state of consciousness, it’s called noting

This is an awareness that advanced meditators seek, practice and develop to perceive “reality as it is”

If you are curious, you might find related discussions, and a great welcoming community at r/streamentry on Reddit

Also the book Mastering the Core Teachings of the Buddha talks about it quite a bit, including instructions on how to do it

jprete · 2024-03-15T15:52:05 1710517925

Noting is very useful as long as you remember not to do it all the time.

0xdeadbeefbabe · 2024-03-15T16:43:37 1710521017

If you don't remember then what? Stack overflow? Heap overflow?

jondwillis · 2024-03-15T15:16:34 1710515794

Is this different from Dzoghchen buddhism?

nico · 2024-03-15T16:09:03 1710518943

Noting is just a meditation technique

You might also call it an exercise for insight practice

There are multiple traditions that use noting or similar techniques for insight practice (maybe with different names)

Can’t vouch for this thread, as I just found it, but here’s a related discussion (Dzogchen vs Vipassana) https://www.reddit.com/r/Buddhism/comments/9t3095/dzogchen_v...

dicroce · 2024-03-15T13:50:21 1710510621

This is fascinating. I had another experience that I think sheds light on some of this. One day I was in my office and the lights were off. I turned around and looked at the dark shape on top of my coworkers desk. For a few seconds I stared blankly and then suddenly I had a thought: PC, it's his PC. Then I started to think about that period of time just before I realized what I was looking at... The only word I can describe what it felt like is: unconscious. Is it possible that consciousness is just a stream of recognition?

idiotsecant · 2024-03-16T00:09:40 1710547780

I think it's likely that consciousness is what you call it until you understand how it works.

theaussiestew · 2024-03-15T21:29:46 1710538186

I have this too. My cognitive processes are not related to my thinking brain, which I define as the part of my mental process which produces the sounds of words in my mind. Instead, I've observed that first, my subconscious processes concepts at a much more fine grained level, much like the latent space of a machine learning model. Only substantially after, let's say 10ms after, do thoughts arise, which are just pointers to the already processed subconscious process. A very rough analogy would be the inference of an LLM in words, vs all the processing of embeddings that happens internally.

andai · 2024-03-15T23:45:52 1710546352

I forget the name but I remember reading about this as a recognized process in neurology. We usually only hear the thought that wins, but there are many generated simultaneously, and there is a selection process.

Possibly related, I had a similar experience last night, where my mind simulated a fully realistic conversation between two people, with audio and video, except that the sentences made no sense. I thought that was interesting. My explanation was "the language part of your brain is too tired cause you've been using it all day."

endofreach · 2024-03-15T23:58:41 1710547121

Hm, interesting... i struggle with people understanding what i mean with having too many thoughts in parallel. I thought that's what adhd is, but turns iut, it's not. But i don't have a winning thought. I have to fight many of them & "pick" the winner if you will. People always take it as a figure of speech, but i honestly struggle with it. It's not rare that i can just sit quietly and after a few hours i am exhausted when finally having finished thinking.

If you remember the official name, please let me know. I'd love to look into it more.

Swizec · 2024-03-15T13:56:33 1710510993

> I got into a cycle where I could notice my fast-thinking brain spitting out words/concepts in what felt like the speed of light before my slow-thinking brain would take those and turn them into actual sentences

The way I’ve seen this described by psychologists is that System 1 is driving the car while System 2 panicks in the back seat screaming out explanations for every action and shouting directions to the driver so it can feel in control. The driver may listen to those directions, but there’s no direct link between System 2 in the backseat and System 1 holding the wheel.

Various experiments have shown that in many situations our actions come first and our conscious understanding/explanation of those actions comes second. Easiest observed in people with split brain operations. The wordy brain always thinks it’s in control even when we know for a fact it couldn’t possibly have been because the link has been surgically severed.

Being super tired, on the edge of sleep, or on drugs can disrupt these links enough to let you observe this directly. It’s pretty wild when it happens.

Another easy way, for me, is to get up on stage and give a talk. Your mouth runs away presenting things and you’re in the back of your head going “Oh shit no that’s going in the wrong direction and won’t make the right point, adjust course!”

nuancebydefault · 2024-03-15T18:08:48 1710526128

Sometimes when I am in a Teams call, I observe myself talking. I know for myself that I can get carried away whilst talking and that time passes faster then. My conscious self sometimes needs to interrupt my talky self with a 'nough explained signal, or even with a 'nough joking signal.

I read several studies that show that brains don't have a central point of command, so our true self can not exist (as one single origin). We are the sum of all our consciousnesses, similar to how a car is the sum of its parts.

devinprater · 2024-03-15T15:04:07 1710515047

Oh, yes, that's what I do! I act first, and then consider the action.

mirror_neuron · 2024-03-15T13:50:21 1710510621

It’s hard (impossible?) to know if we’re talking about the same thing or not, but I experience something like this all the time, without being on the edge of sleep. We might both be wrong, but it’s relatable!

pictureofabear · 2024-03-15T18:28:45 1710527325

This seems like it might upend Descartes' "cogito, ergo sum" ("I think therefore I am") in that the process for forming thoughts in a language is not indicative that we exist, rather it merely indicates that we have evolved a brain that can produce and interpret language.

Seems like we're dismantling a lot of what Descartes came up with these days.

TriNetra · 2024-03-15T18:40:42 1710528042

For that I came up (or got inspired from somewhere) with this: I'm aware therefore I exist. Pure awareness, devoid of all objects (thoughts/visualization) is me.

melagonster · 2024-03-15T15:27:53 1710516473

From positive perspective,it is surely that our thinking/mind is not just language and always faster than sentence formation.

JoBrad · 2024-03-15T15:42:03 1710517323

I had a similar experience when I was put under during surgery a few years ago. Later I learned that they used ketamine in their concoction.

allemagne · 2024-03-15T16:30:19 1710520219

I occasionally reach a similar state near sleep where I will be half-dreaming that I'm reading from a page of a book where the words materialize/"come into focus" right before my eyes into what is usually vaguely grammatically correct nonsense.

marmaduke · 2024-03-15T14:09:24 1710511764

> curated output of the brainstormy process that immediately precedes it

Daniel Dennett gives a nice albeit more detailed version of your idea in his book Consciousness Explained, could be worth a read

samstave · 2024-03-15T14:44:19 1710513859

Mandelthought psyt.

HarHarVeryFunny · 2024-03-15T13:02:49 1710507769

> it feels like critique that "LLMs aren't intelligent because they are stochastic parrots" is an observation that they are only equipped to use their 'System 1'.

I wouldn't say LLMs aren't intelligent (at all) since they are based on prediction which I believe is the ability that we recognize as intelligence. Prediction is what our cortex has evolved to do.

Still, intelligence isn't an all or nothing ability - it exists on a spectrum (and not just an IQ score spectrum). My definition of intelligence is "degree of ability to correctly predict future outcomes based on past experience", so it depends on the mechanisms the system (biological or artificial) has available to recognize and predict patterns.

Intelligence also depends on experience, minimally to the extent that you can't recognize (and hence predict) what you don't have experience with, although our vocabulary for talking about this might be better if we distinguished predictive ability from experience rather than bundling them together as "intelligence".

If we compare the predictive machinery of LLMs vs our brain, there is obviously quite a lot missing. Certainly "thinking before speaking" (vs LLM fixed # steps) is part of that, and this Q* approach and tree-of-thoughts will help towards that. Maybe some other missing pieces such as thalamo-cortical loop (iteration) can be retrofitted to LLM/transformer approach too, but I think the critical piece missing for human-level capability is online learning - the ability to act then see the results of your action and learn from that.

We can build a "book smart" AGI (you can't learn what you haven't been exposed to, so maybe unfair to withhold the label "AGI" just because of that) based on current approach, but the only way to learn a skill is by practicing it and experimenting. You can't learn to be a developer, or anything else, just by reading a book or analyzing what other people have produced - you need to understand the real world results of your own predictions/actions, and learn from that.

RandomLensman · 2024-03-15T13:14:02 1710508442

Defining intelligence as prediction leaves out a lot of other things that humans would see as intelligence in other humans (e.g., creating a novel), also quite simple organisms make predictions (e.g., a predator jumping at prey makes a prediction about positions).

coldtea · 2024-03-15T15:07:07 1710515227

>Defining intelligence as prediction leaves out a lot of other things that humans would see as intelligence in other humans (e.g., creating a novel)

Would it?

Why would "creating a novel" by a human not itself be text generation based on prediction on what are the next good choices (of themes, words, etc) based on a training data set of lived experience stream and reading other literature?

RandomLensman · 2024-03-15T15:30:55 1710516655

What is the human predicting there? Why would it need to be a prediction task at all? How about a dada-ist poem? Made-up words and syntax? If it is prediction but the criterion for "what is a good next choice" can totally be made up on the fly - what does the word "prediction" even mean?

coldtea · 2024-03-15T16:00:49 1710518449

>What is the human predicting there?

Their next action - word put on page, and so on.

>Why would it need to be a prediction task at all?

What else would it be?

Note that prediction in LLM terminology doesn't mean "what is going to happen in the future" like Nostradamus. It means "what is a good next word given the input I was given and the words I've answered so far".

>How about a dada-ist poem? Made-up words and syntax?

How about it? People have their training (sensory input, stuff they're read, school, discussions) and sit to predict (come up with, based on what they know) a made-up word and then another.

RandomLensman · 2024-03-15T16:04:20 1710518660

That is a meaningless definition of prediction if "what is a good next word" has an ever changing definition in humans (as everything would fulfill that definition).

coldtea · 2024-03-15T16:22:20 1710519740

That's the very definition of production in an LLM.

What does "has an ever changing definition" mean?

And why "everything would fulfill that definition"?

At any time whats the "good next word" is based on the state created by our inputs thus far (including chemical/physiological state, like decaying memories, and so on). And not only not "everything fullfil it", but it can be only a single specific word.

(Same as if we include the random seed among an LLM output: we get the same results given the same training and same prompt).

RandomLensman · 2024-03-15T16:31:53 1710520313

"it can be only a single specific word" - that is incorrect as a human can change the process to generate the next word, up to and including, using a random process to create or select the next word (i.e., any word would be fine).

You could say the process chosen is somehow predetermined (even if the choices then are all made by using randomness), but then really the word "prediction" has very little meaning as the criteria to what is a "good next word" have a nearly unlimited and ever changing range as the generating process changes.

coldtea · 2024-03-16T01:19:29 1710551969

>"it can be only a single specific word" - that is incorrect as a human can change the process to generate the next word, up to and including, using a random process to create or select the next word (i.e., any word would be fine).

That's also exactly what an LLM does.

It's still only a single specific word if (as I wrote above) you take the seed into account too (i.e use the same input, including same random seed value).

If you mean to answer "yes, but LLMs use a random number generator, whereas humans can actually pick a word at random" I'd answer that this is highly contested. Where would the source for such randomness be in the universe (exept if you beg the question, and attribute it to an "soul" that is outside the universe)?

RandomLensman · 2024-03-16T06:55:07 1710572107

Claiming that the universe has no randomness is a very strong claim and moves beyond our ("standard") understanding of quantum mechanics. For example, a human could be using radioactive decay to sample randomness (and such devices are available).

An LLM is bound to what an LLM can, while humans can construct and use tools to go beyond what humans can do. Being a universal function approximator does not give access to all processes in the natural world.

duskwuff · 2024-03-15T20:50:07 1710535807

> Why would "creating a novel" by a human not itself be text generation based on prediction on what are the next good choices (of themes, words, etc) based on a training data set of lived experience stream and reading other literature?

Unless you're Stephen King on a cocaine bender, you don't typically write a novel in a single pass from start to finish. Most authors plan things out, at least to some degree, and go back to edit and rewrite parts of their work before calling it finished.

astrange · 2024-03-15T23:25:27 1710545127

That can be expressed as text prediction. You output version 1 then output editing instructions or rewritten versions until you're done.

The real issue is running out of the input window.

anko · 2024-03-16T01:05:29 1710551129

> The real issue is running out of the input window.

isn't this what abstractions are for? you summarise the key concepts into a new input window?

astrange · 2024-03-16T07:40:21 1710574821

Sure, but if we're talking about editing an entire book eventually the fine details do matter. That, and presumably human authors' abstraction/memories of their books are stored in some more compact form than language tokens. Though we can't be sure about that.

HarHarVeryFunny · 2024-03-15T13:32:02 1710509522

Maybe a better way to say it rather than "intelligence is prediction" is that prediction is what supports the behaviors we see as intelligent. For example, prediction is the basis of what-if planning (multi-step prediction), prediction (as LLMs have proved) is the basis of leaning and using language, prediction is the basis of modelling other people and their actions, etc. So, ultimately the ability to write a novel, is a result of prediction.

Yes, an insect (a praying mantis, perhaps) catching another is exhibiting some degree of prediction, and per my definition I'd say is exhibiting some (smallish) degree of intelligence in doing so, regardless of this presumably being a hard-coded behavior. Prediction becomes more and more useful the better you are at it, from avoiding predators, to predicting where the food is, etc, so this would appear to be the selection pressure that has evolved our cortex to be a very powerful prediction machine.

spookie · 2024-03-15T16:12:22 1710519142

I think you're confusing prediction with ratiocination.

I'm sure you've deducted hypothesis' based solely on the assertion that "contradiction and being are incompatible". Note, there wasn't prediction involved on that process.

I consider prediction as a subset of reason, but not the contrary. Therefore, I beg to differ on the whole assumption that "intelligence is prediction". It's more than that, prediction is but a subset of that.

This is perhaps the biggest reason for the high computational costs of LLM's, because they aren't taking the shortcuts necessary to achieve true intelligence, whatever that is.

HarHarVeryFunny · 2024-03-15T17:32:01 1710523921

> I think you're confusing prediction with ratiocination.

No, exactly not! Prediction is probabalistic and liable to be wrong, with those probabilities needing updating/refining.

Note that I'm primarily talking about prediction as the brain does it - not about LLMs, although LLMs have proved the power of prediction as a (the?) learning mechanism for language. Note though that the words predicted by LLMs are also just probabilities. These probabilities are sampled from (per a selected sampling "temperature" - degree of randomness) to pick which word to actually output.

The way the brain learns, from a starting point of knowing nothing, is to observe and predict that the same will happen next time, which it often will, once you've learnt what observations are appropriate to include or exclude from that prediction. This is all highly probabalistic, which is appropriate given that the thing being predicted (what'll happen if I throw a rock at that tiger?) is often semi-random in nature.

We can better rephrase "intelligence is ability to predict well", as "intelligence derives from ability to predict well". It does of course also depend on experience.

One reason why LLMs are so expensive to train is because they learn in an extremely brute force fashion from the highly redundant and repetitive output of others. Humans don't do that - if we're trying to learn something, or curious about it, we'll do focused experiments such as "Let's see what happens if I do this, since I don't already know", or "If I'm understanding this right, then if I do X then Y should happen".

RandomLensman · 2024-03-15T13:40:50 1710510050

The ability to write a novel is different from actually writing a novel. If prediction forms the basis of (at least some forms of) intelligence, intelligence itself is more than prediction.

HarHarVeryFunny · 2024-03-15T13:49:32 1710510572

That's why I say our vocabulary for talking about these things leaves something to be desired - the way we use the word "intelligence" combines both raw/potential ability to do something (prediction), and the experience we have that allows that ability to be utilized. The only way you are going to learn to actually write a novel is by a lot of reading and writing and learning how to write something that provides the experience you hope it to have.

RandomLensman · 2024-03-15T14:29:45 1710512985

Kind of agree. I think, though, trying to shoe-horn intelligence into some evolutionary concepts is tricky because it is easy stack hypotheses there.

coldtea · 2024-03-15T15:07:50 1710515270

>The ability to write a novel is different from actually writing a novel

In what way, except as in begging the question?

RandomLensman · 2024-03-15T15:17:13 1710515833

Which LLM will on its own go and write a novel? Also, even for humans, just because you technically know how to write a novel, you might fail at it.

coldtea · 2024-03-15T15:56:55 1710518215

>Which LLM will on its own go and write a novel?

Which human will?

We get prompts all the time, it's called sensory input.

Instead of "write a noval" it's more like information about literature, life experience, that partner who broke our heart and triggered our writing this personal novel, and so on.

RandomLensman · 2024-03-15T16:08:59 1710518939

Some people write novels, some don't. Why some people do so we sometimes know, sometimes we don't (maybe they flipped a coin to decide). Some start to write but fail to finish.

You have to believe that humans have no free will in a certain way to have them be like an LLM, i.e, every action is externally driven and determined.

coldtea · 2024-03-15T16:15:00 1710519300

>You have to believe that humans have no free will in a certain way to have them be like an LLM, i.e, every action is externally driven and determined.

Free will doesn't have much meaning. If I dont base my action at time t, on their development based on inputs on times before t, what would I base it on?

It would be random?

Or would there be a small thinking presense inside me that gets information about my current situation and decides "impartially", able to decide in whatever direction, because it wasn't itself entirely determined by my experiences thus far?

RandomLensman · 2024-03-15T16:33:59 1710520439

Randomness is certainly an option. Ignoring information is an option.

coldtea · 2024-03-16T23:29:06 1710631746

>Randomness is certainly an option

Were would that randomness come from? Which would be the source of that in the universe, for it to occur in the mind?

If you mean pseudo-randomness, sure, LLMs employ that too.

>Ignoring information is an option.

Randomly ignoring information? If so, see above. If you mean intended informed ignoring of information, that's still determined on all the previous inputs.

RandomLensman · 2024-03-17T08:05:24 1710662724

Quantum mechanics provide for randomness (e.g., radioactive decay) - why wouldn't microscopic randomness occur in the brain or be used by a human in a machine?

A universal function approximator isn't enough to access all of nature.

jimbokun · 2024-03-15T14:14:16 1710512056

LLMs have shown that writing a novel can be accomplished as an application of prediction, at least to a certain level of quality.

RandomLensman · 2024-03-15T14:26:45 1710512805

I have yet to see an LLM write a novel on its volition.

hackerlight · 2024-03-15T13:14:50 1710508490

> online learning - the ability to act then see the results of your action and learn from that.

I don't think that should be necessary, if you are talking about weight updates. Offline batch mode Q-learning achieves the same thing.

By online learning, did you mean working memory? I'd agree with that. Whether it's RAG, ultra-long-context, and LSTM-like approach, or something else, is TBD.

HarHarVeryFunny · 2024-03-15T13:38:15 1710509895

By online learning I mean incremental real-time learning (as opposed to pre-training), such that you can predict something (e.g. what some external entity is going to do next, or the results of some action you are about to take), then receive the sensory feedback of what actually happened, and use that feedback to improve your predictions for next time.

I don't think there is any substitute for a predict-act-learn loop here - you don't want to predict what someone else has done (which is essentially what LLMs learn from a training set), you want to learn how your OWN predictions are wrong, and how to update them.

exe34 · 2024-03-15T19:06:08 1710529568

> By online learning I mean incremental real-time learning, such that you can predict something (e.g. what some external entity is going to do next, or the results of some action you are about to take),

I used to believe this, but the recent era of LLMs has changed my mind. It's clear that the two things are not related: you don't need to update weights in real-time if you can hold context another way (attention) while predicting the next token.

The fact that we appear to remember things with one-shot, online training might be an illusion. It appears that we don't immediately update the weights (long term memory), but we store memories in short term memory first (e.g. https://www.scientificamerican.com/article/experts-short-ter...).

HarHarVeryFunny · 2024-03-15T20:53:16 1710535996

The fundamental difference is that humans do learn, permanently (eventually at least), from prediction feedback, however this works. I'm not convinced that STM is necessarily involved in this particular learning process (maybe just for episodic memories?), but it makes no difference - we do learn from the feedback.

An LLM can perform one-shot in-context learning, which in conversational mode will include (up to context limit) feedback from it's actions (output), but this is never learned permanently.

The problem with LLMs not permanently learning from the feedback to their own actions is that it means they will never learn new skills - they are doomed to only learn what they were pre-trained with, which isn't going to include the skills of any specific job unless that specific on-the-job experience of when to do something, or avoid doing it, were made a part of it. The training data for this does not exist - it's not the millions of lines of code on GitHub or the bug fixes/solutions suggested on Stack Overflow - what would be needed would be the inner thoughts (predictions) of developers as they tackled a variety of tasks and were presented with various outcomes (feedback) continuously throughout the software development cycle (or equivalent for any other job/skill one might want them to acquire).

It's hard to see how OpenAI or anyone else could provide this on-the-job training to an LLM even if they let it loose in a programming playground where it could generate the training dataset. How fast would the context fill with compiler/link errors, debugger output, program output etc ... once context was full you'd have to pre-train on that (very slow - months, expensive) before it could build on that experience. Days of human experience would take years to acquire. Maybe they could train it to write crud apps or some other low-hanging fruit, but it's hard to see this ever becoming the general purpose "AI programmer" some people think is around the corner. The programming challenges of any specialized domain or task would require training for that domain - it just doesn't scale. You really need each individual deployed instance of an LLM/AI to be able to learn itself - continuously and incrementally - to get the on-the-job training for any given use.

exe34 · 2024-03-15T21:34:55 1710538495

> but this is never learned permanently.

Are you sure? I think "Open"AI uses the chat transcripts to help the next training run?

> they are doomed to only learn what they were pre-trained with

Fine-tuning.

> The training data for this does not exist

What does "this" refer to? Have you read the Voyager paper? (https://arxiv.org/abs/2305.16291) Any lesson learnt in the library could be used for fine-tuning or the next training run for a base model.

> what would be needed would be the inner thoughts (predictions) of developers as they tackled a variety of tasks and were presented with various outcomes (feedback) continuously throughout the software development cycle

Co-pilot gets to watch people figure stuff out - there's no reason that couldn't be used for the next version. Not only does it not need to read minds, but people go out of their way to write comments or chat messages to tell it what they think is going on and how to improve its code.

> Days of human experience would take years to acquire

And once learnt, that skill will never age, never get bored, never take annual leave, never go to the kids' football games, never die. It can be replicated as many millions of time as necessary.

> they could train it to write crud apps

To be fair, a lot of computer code is crud apps. But instead of learning it in one language, now it can do it in every language that existed on stackoverflow the day before its training run.

Vetch · 2024-03-15T23:35:30 1710545730

> Are you sure? I think "Open"AI uses the chat transcripts to help the next training run?

> Fine-tuning.

The learning that occurs through SGD is proven to be less flexible and generalizing than what happens via context. This is due to the restricted way information flows through transformers and which is further worsened in autoregressive GPTs vs models with bidirectional encoders.

On top of that, SGD already requires a great many examples per concept and, the impact of any single example rapidly diminishes as learning rate tampers down as training ends. Finetuning a fully trained model is far less efficient, more crippled when compared to learning from context for introducing new knowledge. It's believed that instruction tuning helps reduce uncertainty in token selection more than it introduces new knowledge.

> Co-pilot gets to watch people figure stuff out

We don't actually know if that's true. It depends on how many intermediate steps Microsoft records as training data. If enough intermediate steps lead to bad results and needed backtracking, but that erasure is not captured, it will significantly harm model quality. It is not nearly as easy to do well as you make it seem.

All in all, getting online learning into models has proven very challenging. While some "infinite" context alternatives to self-attention are promising for LTM, it'd remain true that the majority of computational power and knowledge resides in the fixed FF weights. If context and weights conflict this can cause degradation during inference. You might have encountered this yourself with GPT4 worsening with search. Lots of research is required to match human learning flexibility and efficiency.

exe34 · 2024-03-16T13:01:09 1710594069

> If enough intermediate steps lead to bad results and needed backtracking, but that erasure is not captured

That is a fascinating insight to me. I'm so used to the emacs undo record that I forget that others are not as lucky. I just take for granted that the entire undo history would be available.

HarHarVeryFunny · 2024-03-16T01:54:51 1710554091

> Co-pilot gets to watch people figure stuff out

There's a reason most jobs require hands-on experience, and can't be learnt just by reading a book about how to do it, or watching someone else work, or looking at something that someone else created.

It's one thing to have a bag full of tools, but another to know how to skillfully apply them, and when to apply them, etc, etc.

You may read a book (or as an LLM ingest a ton of training data) and think you understand it, or the lessons it teaches, but it's not until the rubber hits the road and you try to do it yourself, and it doesn't go to plan, that you realize there are all sorts of missing detail and ambiguity, and all the fine advice in that programming book or stack overflow discussion doesn't quite apply to your situation, or maybe it appears to apply but for subtle reasons really doesn't.

Maybe if developers were forced to talk about every decision they were making all day every day throughout all sorts of diverse projects, from requirements gathering and design though coding and debugging, and an AI had access to transcriptions of these streams of thought, then this would be enough for them to generalize the thought processes enough to apply them to a novel situation, but even then, in this best case hypothetical scenario, I doubt it'd be enough. Certainly just watching a developer's interactions with an IDE isn't going to come remotely close to an LLM understanding of how to do the job of a developer, let alone to the level of detail that could hypothetically let it learn the job without ever having to try it itself.

I also think that many jobs, including developer and FSD, require AGI to backstop the job specific skills, else what do you do when you discover yourself in a situation that wasn't in the book you trained on? So, it's not just a matter of how do you acquire the skills to do a specific job (which I claim requires practice), but what will it take for AI architectures to progress beyond LLMs and achieve the AGI that is also necessary.

exe34 · 2024-03-16T13:03:40 1710594220

> You may read a book (or as an LLM ingest a ton of training data) and think you understand it, or the lessons it teaches, but it's not until the rubber hits the road and you try to do it yourself, and it doesn't go to plan, that you realize there are all sorts of missing detail and ambiguity, and all the fine advice in that programming book or stack overflow discussion doesn't quite apply to your situation, or maybe it appears to apply but for subtle reasons really doesn't.

Pre-training is comparable to reading the book. RLHF, and storing all the lifetime prompts and outputs would be comparable to "learning on the job". There are also hacks like the Voyager minecraft paper.

HarHarVeryFunny · 2024-03-16T21:43:40 1710625420

> storing all the lifetime prompts and outputs would be comparable to "learning on the job"

I'm not sure.

I guess we're talking about letting the LLM loose in a programming playground where it can be given requirements, design and write programs, test and debug them, with all inputs and outputs recorded for later off-line pre-training/fine-tuning. For this to be usable as training data, I guess it would have to be serialized text - basically all LLM interactions with tools (incl. editor) and program done via the console (line editor, not screen editor!).

One major question is how would the LLM actually use this to good effect? Training data is normally used to "predict next word", with the idea being that copying the most statistically common pattern is a good thing. A lot of the interactions between a fledgling programmer and his/her notes and tools are going to be BAD ideas that are later corrected and learnt from... not actions that really want to be copied. Perhaps this could be combined with some sort of tree-of-thoughts approach to avoid taking actions leading to bad outcomes, although that seems a lot easier said than done (e.g. how does one determine/evaluate a bad outcome without looking WAY ahead).

Grimblewald · 2024-03-18T11:09:21 1710760161

Id say intelligence is a measure of how well you can make use of what you have. An intelligent person can take some pretty basic principles a really long way, for example. Similarly, they can take a basic comprehension of a system and build on it rapidly to get predictions for that system that defy the level of experience they have. Anyone can gather experience, but not everyone can push that experience's capacity to predict beyond what it should enable.

iteygib · 2024-03-15T13:23:57 1710509037

To me, it is one of those things like defining what 'art' is, as in creating a model in our heads around a concept. We take our definitions and then use those to construct models like AI that simulate our model well enough.

In other words, I personally do not believe any system we develop will be truly 'intelligent', since intelligence is a concept we created to help explain ourselves. We can't even truly define it, but yet we try to test technologies we develop to see if they possess it. It is a bit non sensical to me.

HarHarVeryFunny · 2024-03-15T14:45:09 1710513909

Sure, we created the word intelligence to help describe ourselves, and our differing levels of ability, as well as applying it to animals such as apes or dogs that we see seem to possess some similar abilities.

However, if we want to understand where this rather nebulous ability/quality of "intelligence" comes from, the obvious place to look is our cortex, which it turns out actually has rather simple architecture! If uncrumpled our cortex would be a thin sheet about the size of a tea towel, and consists of six layers of neurons of different types, with a specific pattern of connectivity, and including massive amounts of feedback. We can understand this architecture to be a prediction machine, which makes sense from an evolutionary point of view. Prediction is what lets you act according to what will happen in the future as opposed to being stuck in the present reacting to what is happening right now.

Now, if we analyze what capabilities arise from an ability to predict, such as multi-step what-if planning (multi-step prediction), ability to learn and use language (as proven by LLMs - a predict-next-word architecture), etc, etc, it does appear (to me at least!) that this predictive function of the cortex is behind all the abilities that we consider as "intelligence".

For sure there is very little agreement on a definition of intelligence, but I have offered here a very concrete definition "degree of ability to predict future outcomes based on past experience" that I think gets to the core of it.

Part of the problem people have in agreeing on a definition of intelligence is that this word arose from self-observation as you suggest, and is more a matter of "i know it when i see it" rather than having any better defined meaning. For technical discussion of AI/AGI and brain architecture we really need a rigorously defined vocabulary, and might be better off avoiding such a poorly defined concept in the first place, but it seems we are stuck with it since the word is so entrenched and people increasingly want to compare machines to ourselves and judge whether they too have this quality.

Of course we can test for intelligence, in ourselves as well as machines, by using things like IQ tests to see the degree to which we/they can do the things we regard as intelligent (we'd really need a much deeper set of tests than a standard IQ test to do a good job of assessing this), but the utility of understanding what is actually behind intelligence (prediction!) is that this allows us to purposefully design machines that have this property, and to increasing degrees of capability (via more powerful predictive architectures).

iteygib · 2024-03-16T05:20:00 1710566400

I think that is my overall point though - we created a system (AI) based on how we see one aspect of a particular organ or system (brain, cortex, etc.), and, in this case, labeled intelligence as 'predictive behavior', and so develop systems after that model. But for starters, only mammals and a few other life branches have cortexes, and cortexes weren't always around.

Evolutionary theory isn't hinged on prediction in itself, it's just one possible aspect of it. But, organisms that rely on prediction or primarily see themselves as predictive machines will state the opposite, because we cannot do anything else but model off what we think we know.

It is also further diluted in the sense that we are always limited in what we can model because of the digital nature of our medium as it attempts to model analog systems. It is like saying that the words that I am typing right now are just like having a real human conversation. No, not really. It is a diluted form of conversation that focuses on a specific, bare part of the communicative process.

HarHarVeryFunny · 2024-03-16T12:35:42 1710592542

I don't think people are, yet, deliberately creating predictive machines because they see that as the path to intelligence. Things like ChatCPT are LLMs, born out of that (language model) line of research, where the goal has been to learn the rules of language. The fact that a language model, when made large enough, appears somewhat intelligent was an unexpected surprise.

Different species have evolved to have different capabilities. Humans have evolved to be generalists, able to survive in a huge variety of environments, which requires a high degree of adaptability. The key to adaptability is prediction - the ability to very rapidly (in space of minutes/hours/days - not evolutionary timescales) learn how things work in a new environment or in new conditions.

Not all animals need this degree of adaptability, since they have been able to survive and thrive in long-lasting stable environments. Examples might be crocodiles or sharks - very low intelligence, but great at what they do. Evolution is not generally about prediction or intelligence - it's about optimizing each species for their own environment(s).

We already know how to build machines that are more like crocodiles - great at doing one thing over and over, but now we have the capability and desire to also build machines that are generalists like ourselves, and that requires us to figure out a way how to implement intelligence. Given how hard a problem this has been (and continues to be) to solve, it makes sense to look at our brains for inspiration - where does our own intelligence come from, and it's highly notable that the part of our brain that most differentiates humans from other animals - our large neo-cortex - appears to be a prediction machine ... In studying humans no-one is saying that other animals are the same - it's just that humans are the animal who's capabilities we are trying to reproduce.

As I said, LLMs being intelligent was an accidental discovery - they were expected just to be language models, but it's certainly notable that the only thing they are trained to do is predict next word. They only do one thing, predict, and they exhibit unexpected intelligence, hmmm ...

At this point people are NOT yet all saying "prediction is the key to intelligence, so let's build predictive machines and assume they will be intelligent", but when you look at our cortex and look at LLMs, that does appear to be the obvious direction.

iteygib · 2024-03-16T15:29:56 1710602996

In this case I would say AI is the crocodile, the same as all life is. It's specializing (or becoming specialized) in something, which is prediction, in the same way a human (or any life that shows the same definitions of intelligence as us, like a crow solving a puzzle) can show success in a new or novel situation. But life does not need this definition of intelligence to survive, which leads to the basis of evolutionary theory. The trait of adaptability/prediction/intelligence is not always useful given a niche and can get weeded out, which is why most life does not need it, yet they are still around. In organisms that do possess it, it can be a detriment as well given specific situations (over analyzing, stuck in anxiety, excessive risks to adapt, etc.).

In other words, when we say an LLM is becoming intelligent, it's not that it is in the general sense. It's that we recognize the traits within it because the traits make sense to us and mimic what we define ourselves in terms of specializing, because quite obviously, we made it and provide its data input. But, the key difference is that AI has none of the original impetus or evolutionary pressures that led to our own ability to generalize/specialize. This is because its output is derived from human input, which is fed through it through digitized means, which means there is always some kind of 'loss' since it is a specialized aspect of us.

It is why I made the reference to typing. We are communicating right now, but at the same time, it is a specialized form of it. It is not the full original human experience of talking to one another, but does not have to be in this case, because it works well enough and has some advantages given the niche. If we were using Facetime, it would be much closer, but still not quite the same as being in the same room face-to-face.

In my opinion, we are not so much prediction machines, but rather mimickers who can also create mimics of themselves via what we can make. You do not need to be able to predict that well if you can just mindlessly copy something that succeeded somehow.

kderbe · 2024-03-15T13:59:21 1710511161

Andrej Karpathy makes this same point, using the same book reference, in his "[1hr Talk] Intro to Large Language Models" video from Nov. 2023.

Here is a link to the relevant part of his presentation: https://youtu.be/zjkBMFhNj_g?t=2120

biosed · 2024-03-15T11:46:14 1710503174

Wasn't most of the claims in that book refuted, some even by the author. I really enjoyed it and found some great insights only to be later told by a friend in that sphere that the book was not correct and even the author had "retracted" some of the assertions.

mannykannot · 2024-03-15T13:23:36 1710509016

It might still be a useful concept in developing LLMs.

jerpint · 2024-03-15T12:04:29 1710504269

He won a Nobel prize for his works so not sure how much of it would be refuted

gryn · 2024-03-15T12:26:42 1710505602

One quick google search and you can find multiple links for that, including some that were posted here. wasn't proven to be false but that the evidence used was not much of evidence either.

here the first one in my results:

https://retractionwatch.com/2017/02/20/placed-much-faith-und...

jerpint · 2024-03-16T04:18:19 1710562699

Cunningham's Law states "the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer."

https://meta.wikimedia.org/wiki/Cunningham%27s_Law

mistermann · 2024-03-15T12:42:04 1710506524

As luck would have it, a System 1 vs System 2 scenario falls into our laps.

tasty_freeze · 2024-03-15T13:17:57 1710508677

People often say that LLMs aren't really thinking because they are just producing a stream of words (tokens really) reflexively based on some windows of previous text either read or from its own response. That is true.

But I have the experience when talking of not knowing what I'm going to say until I hear what I've said. Sometimes I do have deliberative thought and planning, trialing phrases in my head before uttering them, but apparently I'm mostly an LLM that is just generating a stream of tokens.

Workaccount2 · 2024-03-15T15:39:56 1710517196

This is something that is easily observable by anyone at virtually any moment, yet at the same time is something that escapes 99% of the population.

When you are talking to someone in normal conversation, you are both taking in the words you are saying at the same time.

OJFord · 2024-03-15T10:55:11 1710500111

I'm currently reading it for the first time, completely coincidentally/not for this reason, and on a few occasions I've thought 'Gosh that's just like' or 'analogous to' or 'brilliant description of that problem' for LLMs/generative AI or some aspect of it. I wish I could recall some examples.

glial · 2024-03-15T15:11:52 1710515512

I think of COT as a memory scratchpad. It gives the LLM some limited write-only working memory that it can use for simple computations (or associations, in its case). Now suppose an LLM had re-writeable memory... I think every prompt-hack, of which COT is one example, is an opportunity for an architecture improvement.

HarHarVeryFunny · 2024-03-15T15:49:55 1710517795

I think of COT more as a type of planning or thinking before you speak. If you just open your mouth and start talking, which is what a plain LLM does, then you may talk yourself into a corner with no good way to get out of it, or find yourself saying something that really makes no sense. COT effectively allows the LLM to see the potential continuations of what it is considering saying, and pick one that makes sense!

I think lack of COT or any ability to plan ahead is part of why LLMs are prone to hallucinate - if you've already run your mouth and said "the capital of australia is", then it's a bit late to realize you don't know what it is. The plain LLM solution is to do what they always do and predict next word using whatever it had in the training set, such as names of some australian cities and maybe a notion that a capital should be a large important city. IOW it'll hallucinate/bullshit a continuation word such as "Melbourne". With COT it would potentially have the ability to realize that "the capital of australia is" is not a good way to start a sentence when you don't know the answer, and instead say "i don't know". Of course the other cause of hallucinations is that the LLM might not even know what it doesn't know, so might think that "Melbourne" is a great answer.

kouru225 · 2024-03-15T21:56:24 1710539784

Feel like this is better represented as the default mode network: https://en.m.wikipedia.org/wiki/Default_mode_network

There are questions we know the answers to and we just reflexively spit them out, but then there are questions that are new to us and we have to figure them out separately.

Recent research has shown that new memories are recorded in the brain differently depending on how unique the memory is: https://www.quantamagazine.org/the-usefulness-of-a-memory-gu...

bun_at_work · 2024-03-15T16:20:28 1710519628

I have a similar view to you and not much to add to your comment, other than to reference a couple books that you might like if you enjoyed 'Thinking, Fast and Slow'.

'The Righteous Mind' by Jonathan Haidt. Here, Haidt describes a very similar 2-system model he describes as the Elephant-rider model.

'A Thousand Brains: A New Theory of Intelligence' by Jeff Hawkins. Here Jeff describes his Thousand Brains theory, which has commonality with the 2-system model described by Kahneman.

I think these theories of intelligence help pave the way for future improvements on LLMs for sure, so just want to share.

iteygib · 2024-03-15T13:18:42 1710508722

How does evolutionary instinct factor into the system model? Flight or fight responses, reflexes, etc. 'Thinking' does have consequences in terms of evolutionary survival in some circumstances, as in spending too much time deliberating\simulating.

eightysixfour · 2024-03-15T15:44:10 1710517450

This is a common comparison in the LLM world. I actually think it is closer to the Left/Right Brain differences described in Master and His Emissary, but that’s for a blog post later.

thwarted · 2024-03-15T16:45:55 1710521155

This sounds similar to the A Brain/B Brain concept that was described by, I believe, Marvin Minsky. I don't know how this might be related to Kahneman's work.

futureshock · 2024-03-15T11:09:41 1710500981

I had the same thought from Thinking, Fast and Slow.

Another variation of this seems to be the “thought loop” that agents such as Devin and AutoGPT use.

mistermann · 2024-03-15T14:00:43 1710511243

https://en.m.wikipedia.org/wiki/OODA_loop

machiaweliczny · 2024-03-15T10:59:50 1710500390

It’s a bit over my head for now but seems like GFlowNets are tackling this problem a bit.

dcrimp · 2024-03-15T11:22:02 1710501722

interesting, hadn't come across these. Will be doing some more reading up on them.

toisanji · 2024-03-15T12:07:40 1710504460

that is the approach also taken in this paper for building LLM agents with metacognition: https://replicantlife.com/

emmender2 · 2024-03-15T14:59:45 1710514785

thinking step-by-step requires 100% accuracy in each step. If you are 95% accurate in each step, after the 10th step, the accuracy of the reasoning chain drops to 59%. this is the fundamental problem with llm for reasoning.

reasoning requires deterministic symbolic manipulation for accuracy. only then it can be composed into long chains.

throwuwu · 2024-03-15T15:08:28 1710515308

You’ve never made a mistake in your reasoning?

Tongue in cheek but this has been considered and has resulted in experiments like tree of thought and various check your work and testing approaches. Thinking step by step is really just another way of saying make a plan or use an algorithm and when humans do either they need to periodically re-evaluate what they’ve done so far and ensure it’s correct.

The trick is training the model to do this as a matter of course and to learn which tool to apply at the right time which is what the paper is about wrt interspersed thoughts.

trenchgun · 2024-03-15T16:12:19 1710519139

>reasoning requires deterministic symbolic manipulation for accuracy

No, that is automation. Automated reasoning is a thing, indeed. And I can kind of see a world where there is a system which uses LLM for creative thinking, augmented with automated reasoning systems (think datalog, egg, SMT-solver, probabilistic model checking etc).

hesdeadjim · 2024-03-15T15:04:30 1710515070

I dream of a world where the majority of humans could come close to 59% after attempting a ten step logical process.

emmender2 · 2024-03-16T16:51:39 1710607899

wut

the average theorem in euclids' elements (written 2000 years back) would have a reasoning chain of at least 10 steps.

all of the mathematical machinery humans build need 100% accuracy in each step

emmender2 · 2024-03-17T15:24:10 1710689050

all human knowledge is created by a small number of people. most of us just regurgitate and use it.

think euclid, galileo, newton, maxwell, etc...

and all human knowledge is mathematical in nature (galileo said this).

what is meant here is that, facts and events in the world we perceive can be compressed into small models which are mathematical in nature and allow a deductive method.

human genius comprises of coming up with these models. This process is described by Peirce (and Kant before him) ie, inventing concepts and relations between them to comprise models of the world we live in.

imagine compressing all observed motion into a few equations of physics. or compress all electromagnetic phenomena into a few equations. and then use this machinery to make things happen.

imagine if we feed a lot of perceived motion data into a giant black-box (which could be a neural net) - and out comes a small model of that data comprising newton's equations (and similarly maxwellian equations).

But, this giant knowledge edifice is built on solid foundations of mathematical reasoning (newton said this).

human genius is to invent a mathematical language to describe imaginary worlds precisely, and then a scientific method to apply that language to model the real world.

YetAnotherNick · 2024-03-15T13:40:11 1710510011

Another RL paper with terrible baseline. They used 0 shot non instruction tuned Mistral for GSM8k which has very specific way of output. They got 11% accuracy after improving it, while few shot prompting achieves 37%[1]. GPT 4 could get ~97% with prompting.

[1]: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

hiddencost · 2024-03-15T15:21:47 1710516107

Fwiw if they're serious scientists, taking a known method and baseline and improving it is good science. Extensions to get state of the art are probably possible, but their goal is to measure just the impact of their change in a simple setting. Let the engineers do the munged system combinations and get SoTA.

YetAnotherNick · 2024-03-15T20:53:14 1710535994

I am not talking about SoTA. I am talking about deliberate poor baseline. GSM8k consists of two things: solving the problem and getting the output format correct. Getting the output format corrects gives 30% accuracy for the same model where they got 11%. SoTA is 97%.

adlpz · 2024-03-15T10:55:19 1710500119

Any relation to OpenAI's rumored Q* (i.e. q-star) model? Authors of this paper don't seem affiliated.

Just a name coincidence?

smusamashah · 2024-03-15T11:43:55 1710503035

I think it's just a play on the same hyped up term.

HarHarVeryFunny · 2024-03-15T11:39:10 1710502750

I was thinking the same. The STaR paper this is an extension of came out in 2022, so at least possible this is what q-star is based on too, but maybe with Q standing for something else.

pawnty · 2024-03-15T12:41:24 1710506484

This is the missing piece to train AI which has the ability to reason. There are so many tasks whose answers are known but reason steps are missing. With this method, we can use less annotated data the reach the ability.

The interesting part(I imagine): the generated thought could be hard for human to understand while it is still way more helpful to get the correct answer! If that happens, we have created something more intelligent than ourselves.

kjqgqkejbfefn · 2024-03-15T15:40:31 1710517231

This is basically what I tried this morning at the prompt level (awful results), but the sketchy idea I had in mind went further by introducing control-flow "meta-tokens" to help the LLM renavigate its context. In this perspective the context would be rethought as a self-editing structured mind-map, with the linear aspect of the context at a time T standing for the execution trace of the exploration of this mind-map so far. Some of those meta-tokens would be able to have side effects on the context, to highlight, give structure, summarize, forget and so on, some of its parts. This could allow for native structured output without using a syntactic format such as json, programmatic constructs in the style of LMQL, implementing memory, etc. The goal: not just to give logical/reasoning abilities to a LLM, but to give it the means to come up with its own cognitive architecture. Implementing structured output (using a <label name="stuff">...</label> token) to also implement memory/scratchpads, would also bring inspectability of those cognitive structures for free. Of course I have no idea how to implement this (I'm a ML tourist).

thesz · 2024-03-15T21:13:03 1710537183

They do not cite [1], a paper on (learned) variable computation in RNN, applied to language modeling, that predates their work by almost 8 years.

[1] https://openreview.net/pdf?id=S1LVSrcge

Microsoft also had something alike at that time, but for image recognition: a CNN at input and then varable computation at classification.

iAkashPaul · 2024-03-15T14:09:16 1710511756

Base Mistral 7B is hardly suitable for the evaluations, even one team at Intel tried to pull a fast one with NeuralChat in the exact same way https://huggingface.co/Intel/neural-chat-7b-v3#quantitative-...

itissid · 2024-03-16T07:01:27 1710572487

> Much of the meaning of text is hidden between the lines: without understanding why statements appear in a document, a reader has only a shallow understanding.

This seems not true when I and most people I know read things. I would argue that we almost always have a world model and know some reasons why these statements are appearing in a book. If I was reading Fluid Dynamics textbook, I may not understand the math, but I know why those statements appear; They are mathematical statements to help you learn the theory or whatever and they follow a pattern to teach you important concepts. Like for instance concepts will build upon older ones Bernoulli's equation is there because law of conservation of energy was there before it, so its there because it assumes I understand the latter..

aaroninsf · 2024-03-15T20:33:23 1710534803

Observation: "expertise" (hence "reflex") is the learning of the nonlinear solution space that can be inferred from initial conditions.

Conjecture: models which engage in self-training on the solutions they derive will get to something that looks a bit like bootstrapping when you squint.

Lemma: there's a nice opportunity for cloud-hosted model SaaS to offer discounts for actionable feedback on the quality of their output, so as to drive this retraining.

Idle comment: I'd use the language of REM sleep and the idea of "memory consolidation" for this.

Most of the premises of model training can be extended to the level of reasoned solutions, rather than tokens.

archibaldJ · 2024-03-15T13:56:25 1710510985

This looks really interesting; any possibility the researchers will release some code soon ?

lawlessone · 2024-03-15T15:46:50 1710517610

If it is doing this , is it still a language model? or also a thought model?

FeepingCreature · 2024-03-15T11:44:34 1710503074

Here we go!! I've been waiting years for them to try this. Let's see how it does when scaled up to GPT-3/4 level.

This might be the missing piece to AGI.

parthianshotgun · 2024-03-15T12:17:51 1710505071

The missing piece is unknowable

Cthulhu_ · 2024-03-15T15:24:48 1710516288

We'll likely reconstruct what the missing piece was in hindsight, but it's very probable there's no one missing piece. Just like human evolution.

arendtio · 2024-03-15T19:34:18 1710531258

I am not convinced there even is a missing piece. I mean, LLMs are being used very differently compared to how traditional AI programs were written. Combining both worlds might be all that is needed.

I would not be surprised that when we have general artificial intelligences, we will see, that advancing LLMs wasn't necessary.

digging · 2024-03-15T17:11:50 1710522710

Until it's been found, you mean?

sroussey · 2024-03-15T20:03:47 1710533027

Maybe even then too!

lionkor · 2024-03-15T13:56:02 1710510962

This is purely anecdotal, and I try to keep it to myself but its very difficult when at least half of the HN homepage is AI related: LLMs like ChatGPT do so utterly terribly at any non-trivial job I throw at it that I seriously consider people who use it daily to either be straight up incompetent, or maybe their domain is so trivial that the LLM actually does well.

From asking LLMs to solve a highly difficult async C++ parallelism problem, to german language specifics, it just fucks up at a fundamental level. I understand that LLMs cannot solve these issues and why, but then I do not understand the heavy focus on AI by so many tech people.

Is day to day programming job so trivial that LLMs do a good job, while at the same time being too difficult for you to do it yourself? I really, really want to know exactly what the use case is.

Do people just throw simple problems at it to validate their own preconceived notion of how cool and useful LLMs are? Whats the deal?

nathas · 2024-03-15T14:09:01 1710511741

I had a similar take until about a week ago. A friend showed me his workflow with Copilot and whatever Jetbrains AI assistant is.

Use it as a tool: what if instead of opening up a new tab, searching for the API docs for the library you're trying to find a function in, find the function, re-read the parameter arguments for the 400th time, and then use it, you could just highlight a snippet and say "Paginate the results from S3 using boto3" and the code would just populate?

You have to have the clarity of thought to know what you're doing, but the time it takes to write every line for basic stuff you've done 1000x before can be greatly compressed if it's inlined with your IDE.

I think this is the move for most LLM tools: integrate it with existing tooling. An LLM for Excel for corporate bookkeepers, CPAs, etc will be great. A Word/PDF summarizer that's tuned for attorneys will also be fantastic. Highlight a paragraph, ask for relevant case law, etc.

I thought ~2 years ago the results were... not great. Now I'm pretty happy with it.

SecureFrame (helps with compliance regimes like SOC2) recently added the ability to generate Terraform templates to automatically generate infrastructure that will fix specific platform risks for AWS, Azure, GCP, etc.

It definitely needs someone at the helm since it does hallucinate, but I have found it to cut down my time on mundane tasks or otherwise niche/annoying problems. When was the last time you visited 4+ StackOverflow posts to find your answer? Copilot, so far, has always hit a pretty close answer very quickly.

orzig · 2024-03-15T14:48:46 1710514126

I also had to build intuition for when it will be appropriate versus not. It's hard to describe but one very positive signal is certainly "will any hallucination be caught in <30s"? Even in ChatGPT Plus you can have it write its own unit tests and run them in the original prompt (even in the profile's Custom Instructions so you don't have to type it all the time).

So a mistake was using it for something where runtime performance on dozens of quirky data files was critical; that nearly set my CPU on fire. But str>str data cleanup, chain of simple API calls, or some a one-off data visualization? chef kiss

jmull · 2024-03-15T15:12:07 1710515527

> to write every line for basic stuff you've done 1000x before

There are ways to avoid writing basic stuff you've done 1000x before that are better than LLMs though...

Put it in a well-thought-out function or package or other form of shared/reusable code. You can validate it, spend the time to make sure it covers your edge cases, optimize it, test it, etc. so that when you go to reuse it you can have confidence it will reliably do what you need it to do. LLM-generated code doesn't have that.

(When you think about how LLMs are trained and work, you realize they are actually just another form of code reuse, but one where there are various transformations to the original code that may or may not be correct.)

Where LLMs shine for coding is in code-completion. You get the LLM output in little chunks that you can immediately review correctly and completely, in the moment: "yeah that's what I want" or "no, that's no good" or "ok, I can work with that". Not surprising, since predicting completion is what LLMs actually do.

dkjaudyeqooe · 2024-03-15T14:47:08 1710514028

I don't know exactly how you use it, but this isn't my experience at all. If you ask a LLM anything too specific, that isn't obvious and a common issue/discussion ( something that I almost never need to do), it just makes up nonsense to fill the space.

Equally, if you ask it general questions it misses information and is almost always incomplete, leaving out slightly more obscure elements. Again, I need comprehensive answers, I can come up with incomplete ones myself.

What's really obvious to me when I use it is that it's a LLM trained on pre-existing text, that really comes through in the character of its answers and its errors.

I've very glad others find them useful and productive, but for me they're disappointing given how I want to use them.

orzig · 2024-03-15T15:00:13 1710514813

That's fair, it might not be for you. In 'old school ML', for a binary classifier, there's the concept of Precision (% of Predicted Positive that's ACTUALLY Positive) and Recall (% of ACTUALLY Positive that's Predicted to be Positive).

It sounds like you want perfect Precision (no errors on specific Qs) and perfect Recall (comprehensive on general Qs). You're right that no model of any type has ever achieved that on any large real-world data, so if that's truly the threshold for useful in your use cases, they won't make sense.

dkjaudyeqooe · 2024-03-15T16:53:24 1710521604

I just want something useful. I'm not talking perfection, I'm talking about answers which are not fit for purpose. 80% of the time the answers are just not useful.

How are you supposed to use LLMs if the answers they give are not salvageable with less work than answering the question yourself using search?

Again, for some people it might be fine, for technical work, LLMs don't seem to cut it.

samstave · 2024-03-15T14:34:45 1710513285

Sorry if this is sophmoric, but when you said "you have to have clarity of thought" - what jumped to mind was the phrase "you have to speak to the code"... I thought it encapsulated your clarity of thought quite saliently for me.

throwup238 · 2024-03-15T14:52:44 1710514364

You must be one with the code. You must be the code.

sebzim4500 · 2024-03-15T14:45:08 1710513908

Stop using it for things that are in you area of expertise but are too difficult for you. Use if for things where you think "this is probably easy but I have no idea how to do it". For example, I needed to do some pretty trivial task in powershell but I have never used it so I got chatGPT to do it for me and it worked first time. Obviously I checked the commands looked plausible before I ran them, but it still probably took 2 mins to do something that would have otherwise taken 30.

OmarShehata · 2024-03-15T14:56:20 1710514580

I want to second this:

> Use if for things where you think "this is probably easy but I have no idea how to do it"

I had exactly the same reaction as OP (LLM's suck what's with the all the hype). These people are using it differently. For me it's often something like, asking it to put together a specific sequence of matrix transformations in ThreeJS or some other library.

This is not a difficult task but it's often one I waste a lot of time getting right. It's sort of about finding the right level of abstraction you need to ask it.

runeofdoom · 2024-03-15T15:00:10 1710514810

And how often will those "plausible looking commands" create obvious or subtle problems that cost far more than 30 minutes?

sebzim4500 · 2024-03-15T17:47:47 1710524867

Probably about as often as if I cobbled something together from random blog posts except faster.

It's not like the script is running a nuclear power station.

porkbeer · 2024-03-15T14:55:13 1710514513

That just means you are ignorant of how wrong it guides you. You need to first build trust before taking it new places. You do that with topics and concepts you are familiar with.

OmarShehata · 2024-03-15T15:00:12 1710514812

This has always been true of anything anyone has ever googled or looked up on stackoverflow

I copy paste code from stackoverflow all the time. I used to agonize over making sure I fully understand every line it's copying. Now I have the discretion of making that decision: sometimes it does really matter, sometimes all you need to know is that it produces the right result for your limited use & test case of it. (it's no different than relying on a 3rd party library in that way)

I think we need to apply the same discretion to LLM output. The answer "it depends". Sometimes using its output blindly leads to disaster. Sometimes using it without fully understanding all the details is a great way to make progress.

mrguyorama · 2024-03-15T15:15:53 1710515753

This is no different from my coworker who regularly copy/pastes from stackoverflow to do things he doesn't have any idea how to do himself, and just as awful, unproductive, and problem inducing.

luma · 2024-03-15T14:07:38 1710511658

This is an observation I've seen a lot around here. Underneath it is the assumption that "if I can't figure out how to get meaningful use out of a tool, the tool must be useless".

OpenAI didn't sign up 100M users without somebody somewhere finding it to be useful. Like any other tool, it's utility is limited mostly by the person wielding it.

bluGill · 2024-03-15T14:53:28 1710514408

The tools seem useful, but I'm not sure they are. too often they will confidently make up an answer that is wrong. When I use them they do great on trivial problems but can't help on hard ones.

luma · 2024-03-15T15:22:14 1710516134

Reframe your thinking. You’re approaching it like other computer systems, where a given input yields a determined output. Instead, treat it like a junior dev whom you can unload an unlimited amount of work to, but the result still requires review.

We’re all used to working this way in human systems, people that sound confident might also be wrong, and you learn where you might trust them more or less as you work with them over time. Until you are confident that they are always “right” in a given problem domain, you need to apply some level of review.

Finally, keep in mind that there are "smarter" and "dumber" LLMs. If you didn't pay for what you were doing, you were talking to a "dumber" model. The quality does go up if you have $20 in your pocket.

bluGill · 2024-03-15T17:00:35 1710522035

The junior engineers I know tend to ask questions not be confidently wrong. That isn't to say they are always right but they make a very different class of errors.

luma · 2024-03-15T17:57:08 1710525428

Again, this is a tool you can use. You can complain that it doesn't work in the way you expect, or you can learn how it operates and how best to use it. If you can't figure out how to apply it to your work, that's fine, but loads of other people are doing exactly that with or without you.

EForEndeavour · 2024-03-15T15:25:04 1710516304

> When I use them they do great on trivial problems but can't help on hard ones.

That sounds super useful! The tools free you up from wasting time on trivial problems so you have more time to focus on the hard ones. What's not to love?

bluGill · 2024-03-15T15:53:01 1710517981

I try to work on complex problems. Sometimes they hide something easy

Al-Khwarizmi · 2024-03-15T14:47:34 1710514054

Does your job involve solving complex, challenging problems all the time?

I am a CS professor, I don't think most people would class that as a trivial job, but I find myself needing to do plenty of trivial tasks every day: mixed bureaucracy (periodic reports, grant requests, various evaluations, etc.), trivial programming (a Seaborn chart to show some Excel results), text polishing (need to cut a text to 500 words without altering meaning), writing student assignments, writing emails in (non-Native) English for sensitive requests with the right tone, etc... all of those are things I have found LLMs to do fairly well and save me a lot of time.

I wouldn't use them to do the core job of designing novel algorithms, doing experiments, writing the bulk of a paper or teaching students. But most of my working hours are not really that "core" stuff. And I would assume it's the same for most professionals.

If you have an environment where you are constantly challenged by difficult tasks... wow. I don't know if I should envy you (because I love difficult problems and hate mindless chores) or it would be too stressful.

PS: I don't think "being too difficult for you to do it yourself" is the right litmus test for LLM usefulness. I can draw charts with Seaborn, of course. But the LLM does it much faster, and I don't think doing it myself would make me grow, hone useful skills or anything. I'd rather devote my time to something else. So (in my view) it's clearly better to have the LLM do it.

Tadpole9181 · 2024-03-15T14:07:51 1710511671

They're good autocomplete, they can help search for solutions sometimes better than Google (SEO spam), you can use it as a rubber duck, and you can make it auto fill trivial stuff that would take you a few minutes to write out manually, like test scaffolding. I would never use it to actually complete a non-trivial task and I always confirm it's answers. And yeah, sometimes it sucks - it's a tool with a learning curve about knowing it's limitations.

The reason there's so much money and time is that even semi-competant AI is relatively new and the methods are still extreme crude, and yet it's this advanced. This seems like the path to an AGI, and if someone were to even approach that point, it would radically change the world forever and could lead to either really good things or really bad things.

Now, GPT-4 isn't considered the best at specialized tasks. It's a master of many, but there are much smaller models that can do things like incredibly complex symbolic/geometric math proofs, write code, perform translations, etc better. A lot of ideas are on making expert systems using many of those specialists combined with a generalist, like the segmentation of a brain.

Anyway:

> I seriously consider people who use it daily to either be straight up incompetent, or maybe their domain is so trivial that the LLM actually does well.

These kinds of radical lines of thinking about a significant proportion of enthused professionals (in any industry) who aren't showing the same experience as you, is a red flag for introspection. It's so easy to fall into the "enlightened me" trap.

I appreciate you asking for more information!

jollyllama · 2024-03-15T14:03:39 1710511419

There are plenty of jobs where people have to complete various tasks that are outside of their domain or otherwise tedious on a daily basis. For example, plenty of devs have to set up or change the configuration of remote hosts. Some LLMs are pretty good at generating devops scripts to speed up this work.

orzig · 2024-03-15T14:41:00 1710513660

Exactly. Example: maybe 1% of the code I generate is bash. I used to try to memorize patterns, but of the top 20 I'd use each less than once per year. Now, instead of that 1% taking 5% of my time, it takes 2%. It's all "simple stuff", and I can verify it instantly.

I have ~10 similar use cases. So it hasn't revolutionized my life, but it's been well worth $20/mo ChatGPT Plus and $3/mo API calls.

Havoc · 2024-03-15T14:24:42 1710512682

For me it’s more like brainstorming.

Even if half of it is garbage it’s a net win. At least in domains where I can distinguish the two.

There are also cases where the cost of failure is very low. Eg I could spend half an hour reading an api spec or I could make an AI give me a curl command and test it out in 30 seconds. If it works great if not oh well time to read spec

BenFranklin100 · 2024-03-15T14:24:35 1710512675

I signed up for Open.AI’s monthly subscription. Its performance on non-trivial tasks is abysmal. It’s a regurgitation machine. One might mischievously argue the average tech worker isn’t much better than an LLM, thus the interest? On a related note, we are deluged daily with firms offering AI services. I see a bubble.

keiferski · 2024-03-15T14:10:03 1710511803

You should treat LLMs the same way you treat any other smart entity, human or otherwise: realize that they can be both immensely useful and fundamentally wrong at the same time. Intelligence is not equivalent to correctness.

dmos62 · 2024-03-15T14:33:33 1710513213

Why do you presume that people commonly use it for non-trivial things? It excels at trivial things. That's what most people use it for, probably. Like google search. Is there something that leads you to think otherwise?

xanderlewis · 2024-03-15T15:05:04 1710515104

Perhaps the incessant talk of GPT-x being AGI, whatever that means.

KLejmooo · 2024-03-15T14:54:56 1710514496

I don't use it constantly but regularly.

LLMs english skills are much better than mine.

And when i do a little bit of go coding once a week (i'm a java developer by trade), i don't have the time to learn go well enough to just type stuff down without looking things up. Instead of googling, i tell it "I need a struct with the following attributes..." and it doesn't just ell me how i do structs in go, it also creates them for me.

Also: There are a TON of issues were i would write a short script to do something (formatting text into a table, searching for specific lines etc.) were a normal person doesn't even have those tools at hand.

For companies overall: Its not just what an LLM can do, LLM can do things for you but its also a very very good interface to your application. The demos i saw in my company are really good and totally make sense and do reduce the entry barrier for people.

I know a friend whos job is to create reports with sql. She doesn't do anything else just reports across the whole datawarehouse. Why? Because every normal non dev person can't just write SQL or automate things.

The gap between tech people and management is huge.

slifin · 2024-03-15T14:16:33 1710512193

Not everything in tech is difficult

I find LLMs great for creating SQL queries and regexes

atoav · 2024-03-15T14:44:29 1710513869

Technology is complex and hard to make sense of. That is why most non-experts have a strong wish for a kind of mythical technology, which you can just pour onto your problem and it magically knows what you wanted (and which things you did not want).

For a certain class of problems LLMs achieved new, never before seen, almost magical results. Now imagine you were someone who hates dealing with the constant complexity of solving problems with technology and something comes along that seems to carry promise of lifting that off your shoulders. Then you know why people react like they do. Recall the block-chain-craze? There were people who declared that this somehow magically solved any IT-security problem there ever was – instead of seeing it as a good solution for a very specific set of circumstances, nearly nobody faced in practise.

In reality of course also LLMs have limitiations, e.g. above mentioned ambiguity that is inherent to any magical technology: To be true magic the technology would have to be able to read the thoughts of those who apply it and somehow infer from that the true thing they want or need. Now LLMs are in the end still just very good guesses based on statistical data, that means the guess could just be what you want, but it lacks an actual understanding of what it is doing.

Those applying the technology for things it is actually good at (e.g. classification problems etc) will put it to good use, but there will be a lot who will apply it and have things fall apart Canada Airlines style.