Hacker News new | past | comments | ask | show | jobs | submit login

> But what about 32k contexts, or beyond? At some point, as token size increases, the ability of a human to give a highly precise and detailed answer decreases

War and Peace is over 580,000 words long. Chapter one is ~2,020 words which encoded for GPT3 is ~2,956 tokens (lots of longer, older words and proper nouns eg. "scarlet-liveried footman" is six tokens), so we might expect the entire book to be ~750,000 tokens long.

Many people could reason about the book in its entirety. They would not have an encyclopedic recall of random dates and side characters or be able to quote any passage, but they could perform deep analysis on it.




Similarly, consider a series like A Song of Ice and Fire. A human reader is still consciously aware of (and waiting for) the answers to questions raised in the very first book. This is millions of tokens ago, and that's if our brains turn off when not reading the books.

I think this highlights a hurdle on the path to more human-like AGI. We keep track of so much stuff for very long periods of time, albeit perhaps with some loss of fidelity.

My guess is that there will need to be an advancement or two before we can can get an AI to read all of the ASIOF books so far and ask "What really happened at the Tower of Joy?"


> Similarly, consider a series like A Song of Ice and Fire. A human reader is still consciously aware of (and waiting for) the answers to questions raised in the very first book.

Some of them, some of the time. This is best comparable with ChatGPT having those books in its training dataset.

The context window is more like short-term memory. GPT-4 can fit[0] ~1.5 chapters of Game of Thrones; GPT-4-32k almost six. Making space for prompt, questions and replies, say one chapter for GPT-4, and five chapters for GPT-4-32k.

Can you imagine having a whole chapter in your working memory at once? Being simultaneously aware of every word, every space, every comma, every turn of phrase, every character and every plot line mentioned in it - and then being able to take it all into account when answering questions? Humans can do it for a paragraph, a stanza, maybe half a page. Not a whole chapter in a novel. Definitely not five. Not simultaneously at every level.

I feel in this sense, LLMs already surpassed our low-level capacity - though the comparison is a bit flawed, since our short-term memory also keeps tracks of sights, sounds, smells, time, etc. and emotions. My point here isn't really to compare who has more space for short-term recall - it's to point out that answering questions about immediately read text is another narrow, focused task which machines can now do better than us.

----

[0] - 298000 words in the book (via [1]), over 72 chapters (via [2]), gives us 4139 words per chapter. Multiplying by 4/3, we get 5519 tokens per chapter. GPT-4-8k can fit 1.45x that; GPT-4-32k can fit 5.8x that.

[1] - https://blog.fostergrant.co.uk/2017/08/03/word-counts-popula...

[2] - https://awoiaf.westeros.org/index.php/Chapters_Table_of_cont...


Just thinking about this, I realized that as a musician I do it all the time. I can recall lyrics, chords, instrumental parts and phrasing to hundreds if not thousands of pieces of music and "play them back" in my head. Unlike a training set, though, I can usually do that after listening to a piece only a few times, and also recall what I thought of each part of each piece, and how I preferred to treat each note or phrase each time I played it, which gives me more of a catalog of possible phrasings the next time I perform it. This is much easier for me than remembering exact words I've read in prose. I suspect the relationships between all those different dimensions is what makes the memory more durable. I must also be creating intermediary dimensions and vectors to do that processing, because one side effect of it is that I associate colors with pitches.


If we are trying to at least match human level then all we have to do is summarize and store information for retrieval in the context window. Emphasis on summarize.

We take out key points explicitly so it's not summarized, and for the rest (less important parts) we summarize it and save it.

That would very likely fit and it would probably yield equal to or better recall and understanding than humans.


Some loss to fidelity? Our memories are hugely lossy, reconstructed at recall based on a bunch of concepts. It's great, but it's also very lossy.


You got me.


ASIOF spoiler below!

> At the Tower of Joy, Ned Stark defeated three members of the Kingsguard and discovered his dying sister, Lyanna, who made him promise to protect her son, Jon Snow, whose true parentage remained a closely guarded secret.

Seems like ChatGPT-3 already knows, unless there's a deeper secret that I'm not deep enough into ASIOF fandom to know.


But this is becaude ASIOF was in the training dataset. Chatgpt wouldn't be able to say anything about this book if it wasn't in his dataset, and you wouldn't be able to have enough tokens to present the whole book to chatgpt.


Thinking of it as "the training dataset" vs "the context window" is the wrong way of looking at it.

There's a bunch of prior art for adaption techniques for getting new data into a trained model (fine tuning, RLHF etc). There's no real reason to think there won't be more techniques that turn what think of now as the context window into something that alters the weights in the model and is serialized back to disk.


It's a reasonable way to look at it given that's how pretty much all 'deployed' versions of LLM's work?


Exactly.

But also, not just ASIOF is in the training set, but presumably lots of discussion about it and all the interesting events in the book.


"The Magical Number Seven, Plus or Minus Two" [1] applies to many things. In this case, a book, could reasonably be reasoned about almost no matter the length as you argue, given the scope of a book is usually limited to a few topics (the French invasion of Russia). Similarly three to seven chapters could be retold, but not every 361 chapters. A "580,000" word long book about 58,000 things would be unfeasible for a human, but probably feasible for an LLM with a 580k context.

That in essence, I believe, is the difference. An LLM (while still predicting the next word, given it's context), seem to care less about the number of subjects in a given context, than humans do.

[1]: https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus...


>Many people could reason about the book in its entirety

That’s analogous to having War and Peace in the training set. When actually reading War and Peace, nobody‘s working memory includes a precise recall of everything they’ve read thus far, which is more analogous to an LLM’s context size.


Realistically, you could not quote a passage from the chapter directly, and you don't really need to anyway.

Summarizing chapters with the very same LLM and including that as context may very well get you far.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: