Hacker News new | past | comments | ask | show | jobs | submit login

I have the entirely unrefined notion, that, surely, lack of data is not what is keeping us from creating much, much better LLMs.

I understand with how training is done right now that more data makes things scale really well without having to come up with new concepts, but it seems completely obvious that better processing of already available knowledge is the way to make the next leaps. The idea is that, what is keeping me from having expert level knowledge in 50 different fields and using that knowledge to draw entirely new connections between all of them, in addition to understanding where things go wrong, is not lack of freely available expert level information.

And yet, GPT4 barely reaches competency. It feels like computers should be able to get much more out of what is already available, specially when levering cross discipline knowledge to inform everything.




Yeah, no person has ever read anything like every textbook ever written, but that's pretty much table stakes for training sets. Clearly there's something missing aside from more virtual reading. (I suspect it has something to do with the half a billion years of pre-training baked into the human neural architecture and the few extra orders of magnitude in scale but who knows)


People do analyze gigantic amounts of data constantly though. Sure it's not textbooks but the constant sensory data might be a big deal.


I don't think annotated video datasets tend to be as large relative to human experience, but they'd still be decades or perhaps centuries of video.


I'm sure I've read about specialized neural networks being created. The human brain has (apparently) a bunch of different kinds of neurons in it that specialize in processing different information. I'm not sure how that would work with our current architectures, though.


Well Jeff Hawkins has been working on this for a while, in terms of biomimetic neural networks. They've done some great work but they don't have anything like modern language models in terms of abilities + performance.

https://www.youtube.com/watch?v=cz-3WDdqbj0&list=PLX9JDz3uBp...


In addition for animals/humans there are no software/hardware boundaries or any kind of static fixed deployment of either hardware or software.


>It feels like computers should be able to get much more out of what is already available

I mean why ? It took millions of years of optimization for humanity to get to the competence level they're currently at. If you think you're "starting from scratch", you really aren't. Keep in mind LLMs can use significantly less data (but still a lot) when you're not trying to force feed the sum total of human knowledge.

So should they be able to get more out of it ? or is this par the course for NNs?


> If you think you're "starting from scratch", you really aren't.

Our genomes have ~600MB, so where are you claiming that humans transmit millions of years of language optimization between generations?


That 600MB is the result of millions of years of optimization. For every human genome that exists today there are many other genomes which were tried and discarded over the years.

It also contains a remarkable amount of compression. Even if you assume that a genome contains a complete template for a human (it likely doesn't) the fact that the compressed version is 600MB doesn't really relate to the unpacked amount of information. Especially since the compression has seen millions of years of optimization.


You could store a pretty gigantic graph using 600MB.

They're not saying we store the weights, they're saying that we store the "architecture" that you overlay the weights on with training.


Because humans with less language data outperform LLMs with more language data.

This either says we need better models, not more data.

Or, the human ability to be multi-modal augments our ability to perform language tasks in which case, we need to pump LLMs with much more image and video input than we currently do.


The point I'm making is that humans do not in fact have "less language data". We're heavily predisposed to learning languages. We don't start with random weights.

GPT has no such predisposition.


Better init and architecture aren't what people think of when they think of "giving models more data" - they mean a larger training set.


You're not making much sense here. The better init comes from training and little else.

GPT needing lots of training data doesn't mean we need a better architecture. You would expect it to have a lot of training because humans have a lot of training too, spanning millions of years..


Init means the distribution of weights prior to training.

Human training begins at birth.

Evolution might result in better architecture and init(inductive biases), but that's a separate thing than training.


Evolution has determined a good architecture. The weight training is then just the final tweak to get everything running smoothly.

No reason beyond compute we couldn't do something similar. Ie find good architectures by evaluating them using multiple random weights, and evolve those archigectures that on average gives the best results.

Then over time add a short training step before evaluating.


> Human training begins at birth.

Is this true? My understanding is that people are born with many pre trained weights. Was the evolutionary convergence of those weights not itself training?


> Was the evolutionary convergence of those weights not itself training?

No, inductive biases are not training.

I'm saying that better models(ie: better inductive biases) or non-language data is needed to advance LLMs and somehow we've arrived at "evolution is training." I'm not sure how that's relevant to the point.


Evolutionary training is not just inductive bias. They're not comparable at all lol.

And the more inductive bias we've shoved into models, the worse they've performed. Transformers have a lot less bias than either RNNs or CNNs and are better for it. Same story with what preceded both.


This is a good time to stop and ask, what point do you think I'm making?


Pretty much all of your assertions in this thread and https://news.ycombinator.com/item?id=37797108 in particular are what we're disagreeing with.


Right, I'm checking that you understand what point I'm making by asking you to reflect it.


You're comparing humans - a multimodal model with billions of years of training epochs - to a unimodal language model that's been around for a few months.


Let alone no hardware squeeze and boundless energy supply through the food chain. Remember for most of the billions of years we were not human and were low powered


Correct. That's a crucial part if the point I'm making.


Yes but heavily inspired by the billion year old model ?


Millions of years of evolutionary computation is a fairly small amount of computational time. LLMs also benefit from decades of neurological computation in that their structure was invented and optimized by humans, which is already orders of magnitude faster than evolution.


I've found that Google's chat thing is wrong 90% of the time with coding questions. Yesterday I asked how to "crop" a geopandas dataframe to a specific area of interest, a lat/lng box, and it told me to use a dataframe function that's not even in the API. The "highest probability string" is useless if it's just dead wrong.


I've had very different results.

ChatGPT today is like a backhoe compared to a team of human with shovels. You still need a person who knows how to operate it, and their skills are different from those who dig with shovels. A bad backhoe operator is worse than any number of humans with shovels.

Pretty soon it will be able to learn by running its own code and testing it by looking at its output, including with its "vision."


> I've had very different results.

That is very interesting. I can't think of a single time the Google built-in LLM has worked for me, let alone surprised and delighted me with a technical answer. I'm sure it's great at a lot of things, but it's not a replacement for SO yet.


Oh sorry you said Google. Yes I am speaking of ChatGPT, and I pay for GPT-4. It surprises and delights me on a regular basis. I have no doubt Google will catch up, but right now I think OpenAI is far out front.


I paid for ChatGPT for a while, but it was hit or miss with some Django stuff. I tried Copilot for the first time today, and I was absolutely blown away. I swear it's like it was reading my mind. I guess I wasn't feeding ChatGPT enough context.


Same. GPT-4 is amazing for a majority of coding tasks I throw at it.


With ChatGPT-4 I have stopped Googling and using SO for 95% of all programming related queries.

ChatGPT not only gets my specific problem but can produce workable code in many cases.


The remarkable thing about the current llms is that they're usable at all. For as much pushback as the idea seems to get, they really are a lot more like Markov chain generators than expert systems.


I think you’re largely right, and that current GPT results may over-represent the model’s learning ability.

A couple of speakers from Microsoft at the MPPC 2023 this week indicated that OpenAI’s models were not merely exposed to e.g. poetry, programming, etc. and learned those fields.

Rather they were saying that the model is more of a composite of skills that were specifically trained, building on word identification, sentences, grammar, ultimately moving on to higher order skills.

Perhaps this isn’t a secret (or perhaps I misunderstood), but it means the model’s ability to perform self-directed learning is much less than I previously though.


That sounds like pre optimization. In my opinion both things should happen in tandem. GPT4 is way, way above basic competency, I have no idea what you’re referring to.


By "competent", I mean pretty much what you would expect when you talk about a "competent programmer": A somewhat vague concept, yet fairly obvious when working with someone who whats up.

If you would judge GPT4 to be a competent programmer your experience is wildly different from mine. (I am not sure why you felt the need to put a "basic" in there in reference to what I wrote, since that is not what I wrote).


It is on the level of a novice programmer from a skill level, but the breath of knowledge is definitely compensating. It knows xpath as well as SQL as well as your favorite esoteric language.


GPT4 is more than a competent programmer. It’s way, way above even a rockstar dev.


I couldn’t get it to write a working B+tree implementation in Python. There was always some bug that would make it fail at some point.

It’s good, don’t get me wrong, but if you go deep it’s usually incorrect somewhere.


It's pretty goood for me. It's saved me literally thosands of hours of work already. I ran a bunch of problems from Leetcode into it, and it got most of them right.

Here's the b+tree implementation it gave me. I haven't checked if it's right. But, I was just curious what it'd come up with.

https://chat.openai.com/share/a582aa43-cca8-426a-a4de-f45fdb...


A lot of high quality information and data is not in the public (and not even for sale).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: