I wonder about this, too. OpenAI's biggest 'moat' is that their model takes so much resources to train, not that their algorithms are particularly secret.
One idea I had was to not use one single model to learn all steps of the task, but to break it up. The human brain has dedicated grammar processing parts. It is unclear whether something like a universal grammar exists, but we have at least an innate sense for rhythm. Applied to NLP, you could heavily preprocess the input. Tokenize it, annotate parts of speech. Maybe add pronunciation, so the model doesn't have to think about weird english spelling rules, and so you can deal with audio more easily later. So I would build all these little expert-knowledge black boxes and offer them as input to my network.
But there is also some inherent resource cost in large language models. If you want to store and process the knowledge of the world, it is going to be expensive no matter what. Maybe we could split the problem into two parts: Understanding language, and world knowledge (with some messy middle ground). I believe you could replace the world knowledge with a huge graph database or triple store. Not just subject-verb-object, but with attribution and certainty numbers for every fact. The idea would be to query the database at inference time. I don't know how to use this in conjunction with a transformer network like GPT-3, so you'd likely need a very different architecture.
The big benefit of this would be that it is feasible to train the language part without the world knowledge part with much less resources. But you have other benefits, too. ChatGPT is trained to "win the language game". But as they say, winning the argument does not make you right. If you have a clean fact database, you can have it weigh statements from trustworthy sources higher. You then basically have a nice natural language frontend to a logical reasoning system that can respond with facts (or better: conclusions).
GPT and human brain ( at least the language / speech part ) have nothing in common. We, as humans, do not use language in a generative way, is derived from a higher or very low level of abstraction ( intentions, emotions, etc ) and is explictly use for communicating something. Even this text is based on previous knowledge, saved in an abstract way, and while writing this I must follow the synthax of the language or writing the right order otherwise, you , the person who reads this, will not understand what I mean. While GPT can generate the same text, it does not have a motivation and has no need to communicate ( while I just wanted to feel good by bringing some contribution on HN ).
> and while writing this I must follow the synthax of the language or writing the right order otherwise
A good example that is not, word randomised order and kombination with Mrs Spelling and fonetic spel-ing prevent ye knot that which I wrote you to komprehend.
(My apologies to non-native speakers of English; if someone did that to me in German I'd have no clue what was meant).
A better point is that GPT-3's training set is more tokens than the number of times an average human synapse fires in a lifetime, squeezed into a network with about 3 orders of magnitude fewer parameters than the human brain has synapses.
It's wrong to model AI as anything like natural intelligence, but if someone insists, my go-to comparison (with an equivalent for image generators) is this: "Imagine someone made a rat immortal, then made it browse the web for 50,000 years. It's still a rat, despite being very well-trained."
> (My apologies to non-native speakers of English; if someone did that to me in German I'd have no clue what was meant).
At least for me it's perfectly understandable (except the "Mrs" part). This reminds of those "did you know you can flip characters randomly and our brain can still understand the text" copypastas that can be found everywhere. I think it's probably quite similar for word order: As long as your sentence structure is not extremely complicated, you can probably get away with changing it any way you like. Just like nobody has issues understanding Yoda in Star Wars.
Although I think there are some limits to changing word order - I can imagine complicated legal documents might get impossible to decipher if you start randomizing word order.
These are conceptual "differences" that don't actually explain the mechanics of what's going on. For all you know "motivation", "intentions", etc. are also just GPT-like subsystems, in which case the underlying mechanics are not as different as you imply.
That's the hardware it runs on, not the software architecture of GPT. I could equally say that transistors are faster than synapses by the same ratio that marathon runners are faster than continental drift.
It seems to me that a lot of everyday communication is rather statistical in nature. We don’t necessarily think deeply about each word choice but instead fall back on well worn patterns and habits. We can be more deliberate about how we compose our sentences but most situations don’t call for it. It makes me wonder if we don’t all have a generative language model embedded in our brains that serves up the most likely next set of words based on our current internal state.
Here we go again. They must have something in common, because for about 90% of the tasks the language model agrees with humans, even on novel tasks.
> We, as humans, do not use language in a generative way
Oh, do you want to say we are only doing classification from a short list of classes and don't generate open ended language? Weird, I speak novel word combinations all the time.
No, what is meant is that the next word I speak/write after a current word are not based on a statistical model, but on a world model which includes a language structure based on a defined syntax and cultural variaty. I actually mean what I say while the ChatGPT just parrots around weights and produces an output based purely on statistics. There is zere modeling which translates into real world ( what normally we call "understanding" and "experience" ).
Oh, I see. Then I agree with you, an isolated model can't do any world modelling on its own. No matter how large it is, the real world is more complex.
It might be connected to the world, of course. And it might even use toys such as simulators, code execution, math verification and fact checking to further ground itself. I was thinking about the second scenario.
The more experience I get, the more I wonder if this is really the case for us. We certainly have some kind of abstract model in our heads when thinking deeply about a problem. But in many settings - in a work meeting, or socially with friends - I think it is a much more automatic process. The satisfaction you get when saying the right thing, the dread when you say something stupid: It is just like playing a game. Maybe the old philosophical concept of society as merely "language games" is correct after all. A bit silly but I find the thought makes annoying meetings a bit more bearable.
But you are of course right with GPT, it has no inner life and only parrots. It completely lacks something like an inner state, an existence outside of the brief moment it is invoked, or anything like reflection. Reminds me of the novel "Blindsight" (which I actually haven't read yet, but heard good things about!) where there are beings that are intelligent, but not conscious.
This biggest most is high-quality data. Both their proprietary datasets (WebText, WebText2 etc), but also now their human-annotated data. Another secondary moat is their expertise with training models using PPO (their RL method), they can get results that are quite better than other labs. I say this moat is secondary because it's possible that you can get similar results with other RL algorithms (e.g. DeepMind using MPO) and because maybe you don't really need RL from Human Feedback, and just fine-tuning on instructions is enough
I find OpenAI having exclusive access to that kind of high-quality data more concerning than them having access to their current amount of compute and currently trained model. A couple of million dollars worth of compute is in the realm of any medium sized research university, larger company or any country worth of mention. And seeing as Moore's law still applies to GPU, the cost will only fall.
However high-quality data is scarce. I would be willing to fund a proper effort to create high-quality data.
It's not just about compute; if that were the case, then models like BLOOM and OPT, which also have 175 billion parameters, would have the same performance for real-world use cases as GPT-3, but they don't. Datasets are also very important.
One idea I had was to not use one single model to learn all steps of the task, but to break it up. The human brain has dedicated grammar processing parts. It is unclear whether something like a universal grammar exists, but we have at least an innate sense for rhythm. Applied to NLP, you could heavily preprocess the input. Tokenize it, annotate parts of speech. Maybe add pronunciation, so the model doesn't have to think about weird english spelling rules, and so you can deal with audio more easily later. So I would build all these little expert-knowledge black boxes and offer them as input to my network.
But there is also some inherent resource cost in large language models. If you want to store and process the knowledge of the world, it is going to be expensive no matter what. Maybe we could split the problem into two parts: Understanding language, and world knowledge (with some messy middle ground). I believe you could replace the world knowledge with a huge graph database or triple store. Not just subject-verb-object, but with attribution and certainty numbers for every fact. The idea would be to query the database at inference time. I don't know how to use this in conjunction with a transformer network like GPT-3, so you'd likely need a very different architecture.
The big benefit of this would be that it is feasible to train the language part without the world knowledge part with much less resources. But you have other benefits, too. ChatGPT is trained to "win the language game". But as they say, winning the argument does not make you right. If you have a clean fact database, you can have it weigh statements from trustworthy sources higher. You then basically have a nice natural language frontend to a logical reasoning system that can respond with facts (or better: conclusions).