I have been trying to help folks understand what the underlying mechanisms of these generative LLM's are so it's not such a surprise when we get wrong answers from them by putting together some youtube videos on the topic.
The consensus seems to be building now on HackerNews that there is a huge over-hype. Hopefully these two videos help see some of the nuance behind why it's an over-hype.
That being said, being that language generation is probabilistic, a given language model which is transformer based can either be trained or fine-tuned to have fewer errors in a particular domain - so this is all far from settled.
Long-term, I think we're going to see something closer to human intelligence from CNN's and other forms of neural networks than from transformers, which are really a poor man's NN. As hardware advances and NN's inevitably become cheaper to run, we will continue to see scarier and scarier A.I. -- I'm talking over a 10-20 year timeframe.
HN was always going to be overly pessimistic with regards to this stuff, so this was utterly predictable.
I work in this field & it almost pains me to see it come into the mainstream and see all of the terrible takes that pundits can contort this into, ie. LLM as a "lossy jpeg of the internet" (bad, but honestly one of the better ones).
The idea of a lossy JPEG/Parrot helps capture the idea that there are dangers and opportunities in LLM's. You can have fake or doctored images spread, you can have a Parrot swear at someone and cause un-needed conflict - but they can also be great tools and/or cute and helpful companions, as long as we understand their limitations.
The issue is that it doesn't just recreate things it was trained on, it generates novel content. There is no reason that novel pathways of "thought" (or whatever makes one comfortable) aren't emergent in a model under optimization & regularization.
This is what the "lossy compression" and "stochastic parrot" layperson models do not capture. Nonetheless, people will lap them up. They want a more comfortable understanding that lets them avoid having to question their pseudo-belief in souls and the duality of mind and body. Few in the public seem to want to confront the idea of the mind as an emergent phenomenon from interactions of neurons in the brain.
It is not simply regurgitating training data like everyone seems to want it to.
I think its unfair and asinine to caricature sceptics as ignorant people in denial, holding on to some outdated idea of a soul. That's the sort of argument someone makes when they're so entrenched in their own views they see nothing but their own biases.
Ask people to describe how they think the mind functions and you will very often get something very akin to soul-like belief. Many, many people are not very comfortable with the mind as emergent phenomenon. A straight majority of people in the US (and likely globally) believe in souls when polled, you are the one imputing the words of "ignorant people in denial" onto my statement of why people find views to the contrary uncomfortable.
I understand that HN is a civil community. I don't think it is crossing the line to characterize people I disagree with as wrong and also theorize on why they might hold those wrong beliefs. Indeed, you are doing the same thing with my comment - speculating on why I might hold views that are 'asinine' because I see 'nothing but [my] own biases.'
I'm not saying it's not true of most people in the world, but that doesn't make it a constructive argument. And you didn't use the words ignorant and denial, but they're reasonable synonyms to what you did say.
When I do the "same thing" I'm really saying that when you represent yourself as from the field, you might want to cultivate a more nuanced view of the people outside the field, if you want to be taken seriously.
Instead, given the view you presented, I'm forced to give your views the same credence I give a physicist who says their model of quantum gravity is definitely the correct one. I.e: "sure, you'd say that, wouldn't you"
I am providing a reason why "the public" might be uncomfortable around these ideas. You accuse me of misrepresenting the public's beliefs as ignorant and outdated when really the public has a nuanced view on this subject. I am merely taking the majority of people at their word when they are polled on the subject.
Most people believe in souls. Most people do not believe in minds as emergent out of interactions of neurons. I am not sure how to cultivate a more nuanced view on this when flat majorities of people say when asked that they hold the belief I am imputing on them.
Am I saying that this is where all skepticism comes from? No. Is it a considerable portion? Yes.
Being sceptical of chatGPT is entirely reasonable, and there is plenty of room for discussion on exactly when we will hit the limits of scalining LLMs.
No one who has used chatGPT more than a couple of times will argue in good faith that it is a "parrot", however, unless they have an extremely weird definition of "parrot".
I can easily falsify the accusation that, "people underestimate transformers and don't see that they are actually intelligent," by defeating the best open-source transformer-based word embedding (at the time) with a simple TF-DF based detector (this was back in September).
No, these things are not, "emergent," they are just rearranging numbers. You don't have to use a transformer or neural network at all to re-arrange numbers and create something that is even more, "artificially intelligent," than one that does use transformers it turns out!
>No, these things are not, "emergent," they are just rearranging numbers.
This is a bad take. Most ways to "rearrange numbers" produce noise. That there is a very small subset of permutations that produce meaningful content, and the system consistently produces such permutations, is a substantial result. The question of novelty is whether these particular permutations have been seen before, or perhaps are simple interpolations of what has been seen before. I think its pretty obvious the space of possible meaningful permutations is much larger than what is present in the training set. The question of novelty then is whether the model can produce meaningful output (i.e. grammatically correct, sensible, plausible) in a space that far outpaces what was present in the training corpus. I strongly suspect the answer is yes, but this is ultimately an empirical question.
I haven't written anything substantial on the subject unfortunately. I do have some ideas rattling around so maybe this will motivate me to get them down.
I can tell that this conversation is not going to be super productive, so a few brief thoughts:
> I can easily falsify the accusation that, "people underestimate transformers and don't see that they are actually intelligent,"
I think that you have an idiosyncratic definition of what "falsify" means compared to what most might. Getting away from messy definitions of "intelligent" which I think are value-laden, I see nothing in your blog post that falsifies the notion that LLMs can generate novel content (another fuzzy value-laden notion perhaps).
> these things are not, "emergent," they are just rearranging numbers.
It seems non-obvious to me that 'rearranging numbers' cannot lead to anything emergent out of that process, yet cascading voltage (as in our brain) can.
>There is no reason that novel pathways of "thought" (or whatever makes one comfortable) aren't emergent in a model under optimization & regularization.
Please substantiate this assertion. People always just state it as a fact without producing an argument for it.
You're asking me to substantiate a negative - ie. identify any possible reason someone might provide that novel behavior might not be emergent out of a model under optimization and then disprove it, but ahead of time. This is a challenging task.
Our minds are emergent out of the interaction of billions of neurons in our brain. Each is individually pretty dumb, just taking in voltage and outputting voltage (to somewhat oversimplify). Out of that simple interaction & under the pressures of evolutionary optimization, we have reached a more emergent whole.
Linear transformations stacked with non-linearities can similarly create an individually dumb input and output that under the pressure of optimization lead to a more emergent whole. If there is a reason why this has to be tied to voltage regulating neuron substrate, I have yet to see a compelling one.
If we think of the tools as generating entirely novel content then I'd suggest we're using them for the wrong thing here - we shouldn't be using it at all as a glorified (now literal) search engine, it should be exploring some other space entirely. If we discovered a brand new sentient creature we wouldn't immediately try to fill its head with all the knowledge on the internet and then force it to answer what the weather will be tomorrow.
True, I was overly grandiose. Regardless, we're taking something that can apparently generate new intellectual content, but we're using it as a beast of burden.
* [On the question of replacing Engineers](https://www.youtube.com/watch?v=GMmIol4mnLo)
* [On AI Plagiarism](https://www.youtube.com/watch?v=whbNCSZb3c8)
The consensus seems to be building now on HackerNews that there is a huge over-hype. Hopefully these two videos help see some of the nuance behind why it's an over-hype.
That being said, being that language generation is probabilistic, a given language model which is transformer based can either be trained or fine-tuned to have fewer errors in a particular domain - so this is all far from settled.
Long-term, I think we're going to see something closer to human intelligence from CNN's and other forms of neural networks than from transformers, which are really a poor man's NN. As hardware advances and NN's inevitably become cheaper to run, we will continue to see scarier and scarier A.I. -- I'm talking over a 10-20 year timeframe.