Hacker News new | past | comments | ask | show | jobs | submit login

So you’re saying the solution is to prefix each training batch with a description of a sensory experience (You read the following in a paris cafe in 1997. While you read, you have an excellent baguette and some boiled eggs, and over-roasted coffee. The woman one table over is wearing a beautiful blue hat) and then post-train the final model into recalling the setting where it read any piece of text, or failing to recall any experience when presented with text it didn’t read?

(If someone tries this and it works, I’m quitting my phd and going back to camp counseling)




I don't think that's what they're saying at all. They're talking not about qualia in the human sense, but specifically about "the qualia of their own training". That is, the corpus that LLMs "learn" from and the "experiences" of those texts that are generalized during the training process. Both the raw data and the memory of "learning" is discarded.

So if one were to improve an LLM along those lines, I believe it would be something like: 1) LLM is asked a question. 2) LLM comes up with an initial response. 3) LLM retrieves the related "learning" history behind that answer and related portions of the corpus. 4) LLM compares the initial answer with the richer set of information, looking for conflicts between the initial answer and the broader set, or "learning" choices that may be false. 6) LLM generates a better answer and gives it. 7) LLM incorporates this new "learning".

And that strikes me as a pretty reasonable long-term approach, if not one that fits within the constraints of the current gold rush.


So...reinforcement learning?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: