I have the entirely unrefined notion, that, surely, lack of data is not what is keeping us from creating much, much better LLMs.
I understand with how training is done right now that more data makes things scale really well without having to come up with new concepts, but it seems completely obvious that better processing of already available knowledge is the way to make the next leaps. The idea is that, what is keeping me from having expert level knowledge in 50 different fields and using that knowledge to draw entirely new connections between all of them, in addition to understanding where things go wrong, is not lack of freely available expert level information.
And yet, GPT4 barely reaches competency. It feels like computers should be able to get much more out of what is already available, specially when levering cross discipline knowledge to inform everything.
Yeah, no person has ever read anything like every textbook ever written, but that's pretty much table stakes for training sets. Clearly there's something missing aside from more virtual reading. (I suspect it has something to do with the half a billion years of pre-training baked into the human neural architecture and the few extra orders of magnitude in scale but who knows)
I'm sure I've read about specialized neural networks being created. The human brain has (apparently) a bunch of different kinds of neurons in it that specialize in processing different information. I'm not sure how that would work with our current architectures, though.
Well Jeff Hawkins has been working on this for a while, in terms of biomimetic neural networks. They've done some great work but they don't have anything like modern language models in terms of abilities + performance.
>It feels like computers should be able to get much more out of what is already available
I mean why ? It took millions of years of optimization for humanity to get to the competence level they're currently at. If you think you're "starting from scratch", you really aren't. Keep in mind LLMs can use significantly less data (but still a lot) when you're not trying to force feed the sum total of human knowledge.
So should they be able to get more out of it ? or is this par the course for NNs?
That 600MB is the result of millions of years of optimization. For every human genome that exists today there are many other genomes which were tried and discarded over the years.
It also contains a remarkable amount of compression. Even if you assume that a genome contains a complete template for a human (it likely doesn't) the fact that the compressed version is 600MB doesn't really relate to the unpacked amount of information. Especially since the compression has seen millions of years of optimization.
Because humans with less language data outperform LLMs with more language data.
This either says we need better models, not more data.
Or, the human ability to be multi-modal augments our ability to perform language tasks in which case, we need to pump LLMs with much more image and video input than we currently do.
The point I'm making is that humans do not in fact have "less language data". We're heavily predisposed to learning languages. We don't start with random weights.
You're not making much sense here. The better init comes from training and little else.
GPT needing lots of training data doesn't mean we need a better architecture. You would expect it to have a lot of training because humans have a lot of training too, spanning millions of years..
Evolution has determined a good architecture. The weight training is then just the final tweak to get everything running smoothly.
No reason beyond compute we couldn't do something similar. Ie find good architectures by evaluating them using multiple random weights, and evolve those archigectures that on average gives the best results.
Then over time add a short training step before evaluating.
Is this true? My understanding is that people are born with many pre trained weights. Was the evolutionary convergence of those weights not itself training?
> Was the evolutionary convergence of those weights not itself training?
No, inductive biases are not training.
I'm saying that better models(ie: better inductive biases) or non-language data is needed to advance LLMs and somehow we've arrived at "evolution is training." I'm not sure how that's relevant to the point.
Evolutionary training is not just inductive bias. They're not comparable at all lol.
And the more inductive bias we've shoved into models, the worse they've performed. Transformers have a lot less bias than either RNNs or CNNs and are better for it. Same story with what preceded both.
You're comparing humans - a multimodal model with billions of years of training epochs - to a unimodal language model that's been around for a few months.
Let alone no hardware squeeze and boundless energy supply through the food chain. Remember for most of the billions of years we were not human and were low powered
Millions of years of evolutionary computation is a fairly small amount of computational time. LLMs also benefit from decades of neurological computation in that their structure was invented and optimized by humans, which is already orders of magnitude faster than evolution.
I've found that Google's chat thing is wrong 90% of the time with coding questions. Yesterday I asked how to "crop" a geopandas dataframe to a specific area of interest, a lat/lng box, and it told me to use a dataframe function that's not even in the API. The "highest probability string" is useless if it's just dead wrong.
ChatGPT today is like a backhoe compared to a team of human with shovels. You still need a person who knows how to operate it, and their skills are different from those who dig with shovels. A bad backhoe operator is worse than any number of humans with shovels.
Pretty soon it will be able to learn by running its own code and testing it by looking at its output, including with its "vision."
That is very interesting. I can't think of a single time the Google built-in LLM has worked for me, let alone surprised and delighted me with a technical answer. I'm sure it's great at a lot of things, but it's not a replacement for SO yet.
Oh sorry you said Google. Yes I am speaking of ChatGPT, and I pay for GPT-4. It surprises and delights me on a regular basis. I have no doubt Google will catch up, but right now I think OpenAI is far out front.
I paid for ChatGPT for a while, but it was hit or miss with some Django stuff. I tried Copilot for the first time today, and I was absolutely blown away. I swear it's like it was reading my mind. I guess I wasn't feeding ChatGPT enough context.
The remarkable thing about the current llms is that they're usable at all. For as much pushback as the idea seems to get, they really are a lot more like Markov chain generators than expert systems.
I think you’re largely right, and that current GPT results may over-represent the model’s learning ability.
A couple of speakers from Microsoft at the MPPC 2023 this week indicated that OpenAI’s models were not merely exposed to e.g. poetry, programming, etc. and learned those fields.
Rather they were saying that the model is more of a composite of skills that were specifically trained, building on word identification, sentences, grammar, ultimately moving on to higher order skills.
Perhaps this isn’t a secret (or perhaps I misunderstood), but it means the model’s ability to perform self-directed learning is much less than I previously though.
That sounds like pre optimization. In my opinion both things should happen in tandem. GPT4 is way, way above basic competency, I have no idea what you’re referring to.
By "competent", I mean pretty much what you would expect when you talk about a "competent programmer": A somewhat vague concept, yet fairly obvious when working with someone who whats up.
If you would judge GPT4 to be a competent programmer your experience is wildly different from mine. (I am not sure why you felt the need to put a "basic" in there in reference to what I wrote, since that is not what I wrote).
It is on the level of a novice programmer from a skill level, but the breath of knowledge is definitely compensating. It knows xpath as well as SQL as well as your favorite esoteric language.
It's pretty goood for me. It's saved me literally thosands of hours of work already. I ran a bunch of problems from Leetcode into it, and it got most of them right.
Here's the b+tree implementation it gave me. I haven't checked if it's right. But, I was just curious what it'd come up with.
I understand with how training is done right now that more data makes things scale really well without having to come up with new concepts, but it seems completely obvious that better processing of already available knowledge is the way to make the next leaps. The idea is that, what is keeping me from having expert level knowledge in 50 different fields and using that knowledge to draw entirely new connections between all of them, in addition to understanding where things go wrong, is not lack of freely available expert level information.
And yet, GPT4 barely reaches competency. It feels like computers should be able to get much more out of what is already available, specially when levering cross discipline knowledge to inform everything.