Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Anyone working on something better than LLMs?
42 points by xucian 5 months ago | hide | past | favorite | 30 comments
if you think about it, next-token prediction is just stupid. it's so resource intensive

yet, it's mimicking emergent thought quite beautifully. it's shockingly unintuitive how a simple process scaled enormously can lead to this much practical intelligence (practical in the sense that's useful, but it's not the way we think). I'm aware there are multiple layers, filters, processes etc., I'm just talking about the foundation, which is next-token prediction.

when I first heard that it's not predicting words, but parts of words, I immediately saw a red flag. yes, there are compounded words like strawberry (straw + berry) and you can capture meaning at a higher-resolution, but most words are not compounded, and just in general we're trying to simulate meaning instead of 'understanding' it. 'understanding' simply means knowing a man is to a woman what a king is to a queen, but without the need to learn about words and letters (that should be just an interface).

I feel we're yet to discover the "machine code" for ASI. it's like we have no compiler, but we directly interpret code. imagine the speed-ups if we could just spare the processor from understanding our stupid, inefficient language.

I'd really like to see a completely new approach working in the Meaning Space, which transcends the imperfect Language Space. This will require lots of data pre-processing, but it's a fun journey -- basically a parser human-machine and machine-human. I'm sure I'm not the first one thinking about it

so what we've got so far?




As others have noted, Yann LeCun is looking beyond autoregressive (next-token-prediction) models. Here’s one of his slide decks that raises some interesting new concepts:

https://drive.google.com/file/d/1BU5bV3X5w65DwSMapKcsr0ZvrMR...


The problem with his approach is again, the architecture is set in stone. If we keep on trying these intermediate architectures, we are never going to get to the real AI.

There are 2 areas that need focus. First is hardware accessible for everyone to experiment on. This is being worked on (Tenstorrent, Tinycorp).

The second area needs to be research into how the compute graphs organize themselves. We shouldn't be hardcoding things like temporary memory or recurrence into the models. We should be starting with some baseline model and letting it self discover constructs like memory, recurrence, e.t.c. This is the only way to get true AI.

Human species got its level of intelligence through genetic algorithm. It may turn out that this is the only way to really get a generic AI architecture. However, genetic algorithms in nature only operate on the performance scalar (i.e survival/ability to reproduce), whereas simulating these models has an additional layer of historical data (in nature, this would look like being able pick gene groups for your kid that you know resulted in better performance in a certain area). So we may be able to arrive at this faster, but this requires a shitload of memory and availability of fast compute for everyone, which is the point 1.


I like this different angle, goes to show how many unknown unknowns there are

I think most of us would agree that maximizing an architecture's flexibility, including self-improvement, will always be preferred

quite an exciting perspective I almost ignored. of course we want self-improving AI. Imagine preparing all of the hardware, computing power, and everything, and just planting that initial seed, basically telling it "here you are, we've perfected the seed of infinite ASI, now go and grow baby, this is all yours now". man, this is so exciting... wish I had no need for money so I can focus on just this for my entire life starting right now


thanks, I went through it and I like what I see. I'd have to iterate a few more times over it to fully grasp it. it's missing some analogies and examples, so plebs like me could understand it in a single iteration


> it's shockingly unintuitive how a simple process scaled enormously can lead to this much practical intelligence

A biological neuron doesn't do much. On its own, a simple process. Yet when you put a 100 billion of them together in the right 1000-connected configuration you get a human brain.


good point, but replace neuron with perceptron and human brain with neural net

here's the catch: neurons work with over 100 types of neurotransmitters [1], they're quite complex compared to a perceptron

which brings me on a tangent: I feel there's a breakthrough just around the corner, at the intersection of the 'dumb' on/off perceptron and a biologically-accurate simulated neuron. hardware will catch up too

[1] https://en.wikipedia.org/wiki/Neurotransmitter


biological neurons are only found in humans? what a waste.


> in general we're trying to simulate meaning instead of 'understanding' it. 'understanding' simply means knowing a man is to a woman what a king is to a queen, but without the need to learn about words and letters (that should be just an interface).

I have no idea what I'm talking about, but what you describe is exactly what LLM's do.

Words are tokens that represent concepts. We've found a way to express the relationships between many tokens in a giant web. The tokens are defined by their relationships to each other. Changing the tokens we use probably won't make much more difference than changing the language the LLM is built from.

We could improve the method we use to store and process those relationships, but it will still be fundamentally the same idea: Large webs of inter-related tokens representing concepts.


it's just that tokens are things like "tele" "vis" "ion" etc., represented by some numbers/vectors

but instead of that, we can have those vectors/numbers represent the concepts themselves. for example, the concept of "television" would be represented by a single vector, and guess what, you then also get translation capabilities by design, since 99% of the concepts are language-agnostic; and you (probably) also get smaller models, since they don't need to be "remember" all the translations for every word. imagine the same parts of the NN are working regardless of whether you talk to it in English or Japanese (after some interface LLM converts them to this "Meaning Space" I'm talking about, which isn't a trivial task, but it's a translation task more than a comprehension task). how cool is that?

I know LLMs are deceiving because they know a king is to a queen what a man is to a woman, but, put it another way, they're just "saying what we want to hear", they're not "explaining us the concept". I am really happy with where we've got, we're already extracting enormous value from them. I'm just excited about this completely new approach in arriving at the same destination, which, for me, it seems to consume less resources while coming closer to what we define as 'understanding'

we need it for web4 (brainnet) anyway


I think you might find Lex Fridmans interview with Yann LeCun interesting[1]. It discusses exactly this, how LLMs just mimmick intelligent behaviour but have no understanding of the world at all. It also discusses other approaches we should look at instead of current LLMs.

[1] https://youtu.be/5t1vTLU7s40



thanks, have postponed it for too long already. the time has come


> understanding' simply means knowing a man is to a woman what a king is to a queen,

Turns out this is beautifully represented by embeddings alone!


yes, I agree it's beautiful how basic vector addition/subtraction navigates you through some kind of meaning space, playing with relationships between things and such

actually, this is the same way in which the model in the Meaning space would operate too. so the process is similar, it's not like we're reinventing the whole wheel, but just that instead of referring to the words "king" and "queen" we'd refer to the actual concept of a king and queen, in a language-agnostic way, i.e. we wouldn't have to also embed their French translations "roi" and "reine" -- it'll be built-in, because a queen is a queen in most languages

I acknowledge there are differences even in fundamental concepts between countries/cultures, but it's not the norm. for example, in a language "to want" could just mean something passive, while in some language "to want" might also imply taking some action towards it. I'm aware about these, but I think they're just infrequent nuances (that have to be dealt with later)


Meaning Space transcending the imperfect Language Space? Yes, there's been some recent thinking in this direction, e.g. Zhuangzi, "Words exist because of meaning. Once you've gotten the meaning, you can forget the words. Where can I find a man who has forgotten words so I can talk with him?"


recent as in 3rd century. :D

I'm sure I'm not the only one, but this is how I've felt my whole life. people trapping themselves in this sea of words, so many arguments could be solved if we could just connect our brains and see the concepts instead of hearing the words

words are at best shortcuts -- and I say 'at best' because they still don't fix the issue of 'for me love means X, but for you love means Y'

this is another reason I'd advocate for innovating towards this Meaning Space, it's also an easier segue to web4 (brainnet)


> understanding' simply means knowing a man is to a woman what a king is to a queen, but without the need to learn about words and letters (that should be just an interface).

Citation needed


agree, it's more of an "according to me" statement

I'm sure I can find citations for both cases: meaning causing language and language causing meaning, so it'd only artificially support my reasoning

but I strongly feel thinking about it through first principles points more towards meaning causing language. an example is being able to imagine things, connections, processes: our ancestors wanted to convey whatever was in their brain to other members of the tribe (how big the lion that just ripped off their hand was, how fresh the berries were and where are they, how complex did the sky become after they ate those weird mushrooms, what happens if he uses a sharp object instead of a rock and why that's safer etc.). language evolved as a need to convey what your brain is 'seeing'

but yes, language definitely influences our perception about concepts, and it even predisposes us to certain beliefs, it's like a cage we're trapped our whole lives in if we're not fortunate enough to come across the right books/articles that set us free

in English, you say "Brevity is the soul of wit", in Romanian you say "Lots of talking -- man's poverty" (rough translation). the first is more positive than the second, but they're basically pointing to the same idea


That stuck out to me too. There is a (not universally accepted) idea that language itself determines our capacity for understanding, called the Sapir-Whorf hypothesis. While it traditionally relates to humans, I've been casually working to apply it to LLMs as well.


see my reply above


>Yes, there are compounded words like strawberry (straw + berry) and you can capture meaning at a higher-resolution, but most words are not compounded

What's really cool about tokenization is that it breaks down words based on how often parts of the word are used. This helps a lot with understanding different forms of words, like when you add "-ing" to a verb, make words plural, or change tenses. It's like seeing language as a bunch of building blocks.


agree, I think this is the best argument for autoregressive models, and why they are so 'unreasonably' good


Does anyone have a guess what angle John Carmack is working on with Keen Technologies? https://dallasinnovates.com/john-carmacks-keen-technologies-...


  "The important thing is about how an AI can digest its experience into its view of the world and how it predicts things going forward and how it needs to have motivations both internally and externally imposed."

  Those issues, he said, are not addressed by large language models, which don't address how our brains work. Yet every lab in the world, he said, is currently "throwing resources" at such models. [1]
interesting, thanks for sharing. this might be the guy. also, "Founder Id Software"? [2] ok, I'm hunting him down

[1] https://www.theregister.com/2023/09/26/john_carmack_agi/

[2] https://twitter.com/ID_AA_Carmack


Look no further than here for decoding more than one following token:

https://hao-ai-lab.github.io/blogs/cllm/


now I remember seeing this somewhere, perhaps in another thread?

it's nice to see progress. from what I understand, this is already actionable (https://github.com/hao-ai-lab/Consistency_LLM) and perhaps giga-models like gpt4 (gpt5?) are already being upgraded to use this as I'm writing

thought this is still about token prediction. I actually foresee many more incremental improvements like this, we're in the "polishing" phase of LLMs, I don't think there'll soon be a ChatGPT breakthrough moment in the ChatGPT itself, just these incremental improvs + the expected gain from any energy efficiency improv + any improv from just adding more tflops

xnx mentioned John Carmack which seems to work on what I'm looking for, I'm really curious what this guy does


I don't think you quite understand how tokenization works. Try typing "strawberry" in here:

https://platform.openai.com/tokenizer

Tokens aren't just individual parts of compound words, they're sliced up in a way that's convenient statistically. The tokenizer has each individual character as a token, so it could be purely character-based if desired, it's just easier to compute when some common sequences like "berry" are represented by a single token. Try typing "strawberry" into the tokenizer and see it tokenized as "str", "aw", and "berry".

Also, next token prediction is not stupid. A "sufficiently advanced" next token predictor must be at least as intelligent as a human, if it could predict any humans' next token in any scenario. Obviously, we're not there yet, but there's no reason to think right now that next token prediction will face any sort of limitation. Especially with new models coming out that are seeing better perfomance purely from training them much longer on the same datasets.


correct, I worked with the tokenizer quite a lot, and I should've clearly made the distinction between semantic splitting and just this statistically-convenient splitting. I got carried out by the greater picture. thanks for pointing this, I would've updated the post if I could

I didn't say we should drop LLMs, your point about them still getting better and better is one reason why. they're very useful and we'll only ditch them after something better comes out


its not stupid.


it's not stupid because it can't do smart & useful things, but because it works on fractions(1) of locale-dependent(1) symbols for meaning

(1) instead, it should do whole symbols

(2) instead, it should be language-agnostic, i.e. it shouldn't need to remember both 'king' and its French translation 'roi', just a vector that represents the concept




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: