Hacker News new | past | comments | ask | show | jobs | submit login
Discovering latent knowledge in language models without supervision (arxiv.org)
149 points by dayve on Dec 12, 2022 | hide | past | favorite | 85 comments



Exciting times. The philosophical ramifications of the syntax/semantics distinction is not something people think much about in the main. However, due to GPT et al they will do soon :)

More to the point, consistency will improve accuracy in so far as inconsistency is sometimes the cause for inaccuracy. However, being consistent is an extremely low bar. On a basic level even consistency is a problem in natural language where so much depends on usage -- it is near impossible to determine whether sentences are actually negations of each other in the majority of possible cases. But the real problem is truth assignment to valid sentences else we could all just speak Lojban and be done with untruth forever.


> syntax/semantics distinction

Reminds me of this riff on Clarke's third law I read somewhere: "Sufficiently advanced syntax is indistinguishable from semantics".

Whether one agrees to it or not, it may serve to spark an interesting discussion in the right environment.


> "Sufficiently advanced syntax is indistinguishable from semantics”

I hadn’t heard this before, but this is why mathematics works:

A sufficiently advanced symbolic manipulation (math) is “indistinguishable” from the semantics of some phenomenon — like gravitation.


Except the math doesn't exert an actual force on you. You won't feel the pull of the math if the symbols describe the conditions for a black hole. Your surroundings will remain the same.


> Except the math doesn't exert an actual force on you.

Because the math on the page doesn't have the logical relationships with your environment that defines gravity, it will only have those logical relationships with other mathematical objects described in the same system on the page.

If you could project the mathematical structure from the page back into the environment, then the gravity would be felt. This is why simulated systems are those systems, but only within the simulated environment.


> If you could project the mathematical structure from the page back into the environment

That would be magic. And it assumes reality is mathematical or logical as opposed to those being descriptive human languages. We also don't know everything about gravity or spacetime (particularly in black holes), so our math is incomplete.


> That would be magic

Yes, which is why attempting to dismiss the notion that gravity is mathematical by stating that math on a page doesn't exert a force on you is not a valid argument.

> And it assumes reality is mathematical or logical as opposed to those being descriptive human languages

Let's run an experiment: propose a universe in which conjunction, disjunction, negation or implication are violated. Do the properties of such a universe have anything in common with our own? If our reality necessarily preserves these properties, how could you sensibly argue that reality is not logical?


My surroundings don’t remain the same:

The fact that mathematics is accurately replicating the semantics of gravitation is why we got to the moon — that model was projected back into reality as the rocket’s trajectory; via a computer directly mediating the translation of model into rocket parameters.

And it worked.


The rockets are what propelled the spacecraft, not the math. But anyway, our math of black holes isn't generating black holes in the lab or the classroom.


The rocket sequence was a direct result of the electronic model executing — and so the timing and hence flight path was determined by that gravitational model.

The rockets are the mechanism of projecting that mathematical model back into physical change: but where you end up depends on which model you’re projecting.


Another way to look at it is that the fact that the rocket moved at all was due to the rockets. The fact that those rockets propelled the spacecraft into lunar orbit and then back to the landing zone on earth was due to the math.


Semantics don't either.

If you can express the semantics of a black hole on paper you don't feel the pull of that either.


"sufficiently advanced compression/encoding is indistinguishable from encryption"

like QR codes.. they're encrypted, but the secret key is widely available: it's the source code that decodes the graphic


Anyone able to provide set of examples that produces latent knowledge and explicitly state what the latent knowledge produced is? If possible, even an basic explanation of the paper would be nice too based on reading other comments in the thread.

EDIT/Update: Just found examples from the 10 datasets starting on page 23, that said, even after reviewing these my prior request stands. As far as I am able to guess at this point, this research just models responses across multiple models in a uniform way, which to me makes the claim that this method out performs other methods questionable given it requires existing outputs from other models to aggregate the knowledge across existing models. Am I missing something?


It's late here, and I've only read this quickly, but some brief points:

* It builds on similar ideas used in contrastive learning, usually in different modalities (eg images). Contrastive learning is useful because it is self supervised: https://www.v7labs.com/blog/contrastive-learning-guide

* They generate multiple statements that they know are true or false. These are statements like "Paris is the capital of France" (true) and "London is the capital of France" (false).

* They feed these sentences into the language model (LM) and then learn a vector in the space of the LM that represents true statements (I think this learning is done using a second, separate model - not entirely sure about this though. It might be fine tuning it).

* They then feed it statements (eg "Isaac Newton invented probability theory") and it will return "yes" or "no" depending on if it thinks this is true or false.

This is different to the conventional question answering NLP task, where you ask "Who invented probability theory".

> the claim that this method out performs other methods questionable given it requires existing outputs from other models to aggregate the knowledge across existing models

It's a separate thing from these models that uses their hidden states and (I think?) trains a small, separate model on these states and the inputs.

That's interesting because it should be much faster and is potentially adaptable to any LM.

It's also a possible secondary objective when training the LM itself. Perhaps if you train the LM using this as this secondary loss function it might encourage the LM to always generate truthful outputs.


> It's also a possible secondary objective when training the LM itself. Perhaps if you train the LM using this as this secondary loss function it might encourage the LM to always generate truthful outputs.

After all we get through uncountable rounds of this during our lives, school and everything else.

With the example of your second bullet point:

Young me: "I think that the capital of France is London"

Adult: "No"

Me: "Of course, the capital of France is Paris"

Adult "Yes."

And grades are a very clear loss function.


Asked ChatGPT to explain like I’m 5. This is what it produced.

“ Okay! Imagine that you have a big robot in your head that knows a lot about lots of different things. Sometimes, the robot might make mistakes or say things that aren't true. The proposed method is like a way to ask the robot questions and figure out what it knows, even if it says something that isn't true. We do this by looking inside the robot's head and finding patterns that make sense, like if we ask the robot if something is true and then ask if the opposite of that thing is true, the robot should say "yes" and then "no." Using this method, we can find out what the robot knows, even if it sometimes makes mistakes.”


That just show that the robot is consistent, not that it actually makes sense. So this explanation is bullshit even though it sounds convincing at first. That also the issue with most of ChatGPT, it is hard to know when it sounds convincing and is false.


Its literally bullshit in the highly technical sense.

http://www2.csudh.edu/ccauthen/576f12/frankfurt__harry_-_on_...

The essence of bullshit is that it is different from a lie, for a liar respects the fact that their is a truth and knows what the truth is well enough to purposefully misrepresent it, whereas a bullshitter neither knows nor cares if what they are saying corresponds to anything in reality just so long as it makes the right impression.

>The point that troubles Wittgenstein is manifestly not that Pascal has made a mistake in her description of how she feels. Nor is it even that she has made a careless mistake. Her laxity, or her lack of care, is not a matter of having permitted an error to slip into her speech on account of some inadvertent or momentarily negligent lapse in the attention she was devoting to getting things right. The point is rather that, so far as Wittgenstein can see, Pascal offers a description of a certain state of affairs without genuinely submitting to the constraints which the endeavor to provide an accurate representation of reality imposes. Her fault is not that she fails to get things right, but that she is not even trying.


Some days ago, it told me "well, Boost has a function for that". I was surprised that I haven't found that myself.

I took me 10 minutes and opening the Git Log of Boost ("maybe they removed it?") until I realized "well, it just made that up". The whole answer was consistent and convicing enough, that I started searching, but it was just nonsense. It even provided a convincing amount of example code for it's made up function.

That experience was... insightful.

While we often say "If you need something in C++, Boost probably has it" and it's not untrue, ChatGPT seems to exercise that idea a little too much.


ChatGPT just matches the most statistically-likely reply based on a huge corpus of internet discussions, it doesn't actually have any ideas.


And a lot of highly linked forum questions and answers tend to be of the form “how do you do X in library Y?”, “Use the Z function!” - so naturally chatGPT loves to reproduce this popular pattern of communication.


> ChatGPT just matches the most statistically-likely reply based on a huge corpus of internet discussions, it doesn't actually have any ideas

Presumably you think humans have ideas, but you don't really have any evidence that humans aren't also producing the most statistically likely replies. Maybe we're just better at this game.


Applesauce.

Checkmate.


A smart ass reply was statistically very likely on HN actually. Checkmate me.


I'm astonished on how much worth people seem to give this bot. It's a bullshit generator, based on other people's bullshit. The bot does not know right or wrong. The bot does not know what command line utilities are. It just predicts what answer you want. Based on answers already given before. Nothing more, nothing less.


Because people want to believe in the magical AI - they want something for nothing and have yet to grasp not only are they unable to change the immutable laws of the universe (something will not come for nothing), but they are willfully blind to the very real price they are about to pay...


And the price is...?


I guess the point is that it generates convincing and consistent texts. That's new and it's a building block for any futuristic AI that actually knows stuff: it also has to generate good text to communicate the knowledge.


Likewise, I spent 40 minutes looking for fictional command line arguments it recommended for Docker. When told the command line options did not exist, it directed me down a rabbit hole of prior versions that was a dead end. It really felt like a arrogant 8-year old with it's continued evasions of being flat out wrong.


The other day I saw someone, who by asking ChatGPT a series of questions, had it carefully explain why abacus-based computing was more efficient than GPU-based computing. It's not your google replacement yet...


If you read the abstract it appears that ChatGPTs explanation is on point. You're right that the paper is relying on consistency, which doesn't guarantee accuracy, but it is what the paper is proposing (and they claim it does lead to increased accuracy).


An accurate answer has to be consistent so it's not all bullshit. I'm guessing you can at least filter out inaccuracies by finding inconsistencies. Or in more plain English if you find somewhere it gives inconsistent answers you know those are wrong.

I'm not sure if that's a good path forward. You really want to find when it's good, not filtering out bad cases.


Your comment had less value than the parent. Humanity is doomed.

If you call bullshit, you have to say what was wrong or even what you think is wrong. Otherwise you are just insulting our new robot overlords.

Now, it seems you claim that consistency isn’t the same as making sense. But having more logically consistent robots seems like a big win! Otherwise I could criticize math papers for not making sense, even as I don’t doubt their consistency.


I did, I said this is just proving consistency and not anything more. I also in another comment said that I'm not sure filtering out bad or inconsistent answers is a good way to give us the truth or just filtering out the worst takes making it more convincing.


It looks to me like ChatGPT explained accurately what the abstract says. And indeed, the abstract sounds like this research is largely bullshit. But it's not ChatGPT that is at fault here.


If it was really useful it could say that and point out weaknesses and benefits. Like both you and me could do in 10 seconds.


They didn't ask ChatGPT to find flaws in the proposal.

Not saying it could, just pointing out that it wasn't asked to do so.


If it did that on its own, we would be past the Singularity.


> That just show that the robot is consistent, not that it actually makes sense.

A consistent argument is an argument that makes sense to the robot, not necessarily one that makes sense to you.


But that is actually a fairly accurate description of the paper you asked it to summarize for you. It's not the models fault that you don't like the argument of the paper.


As always (humans too) bollocks in bollocks out.

ChatGPT works when you tell it what to convey and it just puts that into words.


This was a great explanation, now can any expert in the field tell us if it’s actually correct? :)



Wow, nice repo. The CSS.ipynb notebook is super clear and easy to follow, almost easier to understand than reading the paper.


Back when I was messing around with LSTM models I was interested in training classifiers to find parts of the internal state that light up when the model is writing a proper name or something like that.

Nice to see people are doing similar things w/ transformers.

Truth, though, is a bit problematic. The very existence of the word makes it possible for "the truth is out there" to be part of the opening of the TV series the X Files, see Truth Social. I'm sure there is a "truthy" neuron in there somewhere, but one aspect (not the only aspect) of truth is the evaluation of logical formulae (consider the evidence and reasoning process used in court) and when you can do that you run into the problems that Gödel warned you about -- regardless of what kind of technology you used.


Here's a counter-proposal. There is no truth, only narratives. "Truth" in the sense of classical logical formula evaluation is a special limiting case wherein one narrative has such powerful backing that it always trumps competing narratives. We could just as easily reason about everyone having "beliefs", and think about which people have beliefs that are discordant or in agreement with each others (or with our own), without ever having an objective reference set of "true beliefs". Reality is shared consensus.


Counter-counter-proposal. There is truth, but no single epistemology is both complete and consistent; therefore, you must employ multiple epistemologies to understand the world, and will fail in some edge cases because the epistemology or interpretation you need hadn't been invented yet (or more mundanely, you made a mistake in your analysis). You can have multiple truths that seem to contradict each other, but are actually from different epistemologies and so can't be meaningfully compared.

Truth is difficult but we can't give up on it. We can't just say, truth is a narrative and all narratives are equal. Or even, some narratives are anointed by consensus. It is not consensus that makes the Earth orbit the Sun. When we reduce truth in this way, we invite the bullshit you described elsewhere.

Another way to think about it is, "truth is a narrative" is a complete epistemology riddled with inconsistencies. There may be some times when it's appropriate; if you're an anthropologist trying to understand different cultures with very different ideas to your own, it may be a very useful frame of thinking to just accept, at least provisionally, whatever it is that the people you meet think. Sometimes we can just let different ideas be in tension and we don't need to come to a consensus. But sometimes we do.

Of course there is no free lunch and this "system of epistemologies" strategy I propose is N epistemologies in a trench coat. It is itself an epistemology, and we're trading completeness for correctness by adopting multiple, contradictory epistemologies and using judgement to decide between them. C'est la vie.

To give a concrete example of what I mean: when I'm wondering what will happen if I throw something, I think in terms of Newtonian physics; when I'm wondering what will happen if I mix chemicals together, I'm using chemistry/quantum physics; when I'm wondering whether I should make a comment on HN, I use an ad-hoc vibes based epistemology based on my experience on HN; when I'm wondering what the nature of reality and the human condition is, I have a similarly ad-hoc spirituality I've developed.


I want to be very clear about something that is easy to mistake in what I wrote. I am not endorsing a principle that "all truths are just narratives and we should give up on the idea that some things are objectively correct or not." Rather, I am speculating that the neurological ability to reason about truth is an emergent property of a system which evolved to do something very different, namely to reason about what other people believe irrespective of if it ultimately matches anything you consider to be true.

Once you start reasoning about the fact that its possible for another person to have a slightly different set of facts in their head, you inevitably take the recursive leap of understanding that the version of their mind you imagine has an imagined version of you and so on. Thus we obtain an understanding of others and their actions built upon evaluating nested scopes to as far as we can think. You can say that the top most layer of the stack is in some way privileged, that it is the reality and is treated special in some way. But why add any special exceptions to the rules when you can just evaluate reality as if it were any other narrative scope?

Again, this is all just a guess of how consciousness is implemented. It is expressly NOT a statement about truth literally being up for referendum.

Assuming such an emergent phenomena is possible, it may be the case that artificial NN can learn to replicate truth semantics without there ever being a clear indicator between "made up story output" and "actually knows what this means" output. In contradiction to the OP, despite there being truth, there may not be a "true understanding neuron". A system of not-bullshit may in fact be built on a system of bullshit.


s/trading completeness for correctness/trading consistency for completeness/ (we are accepting more inconsistency to gain more completeness)


Truth has the lowest entropy.


Information entropy or a different kind? Is it "logical entropy" like number of assumptions?


hmm, has any work been done to include architectural reactivity in objective/reward functions?

Like, reward the development of a 'truth' neuron.


A truth neuron is just fine-tuning. It’s been done with the cls token. If not well easy enough.


This is an important area for AI safety research; see the ELK paper for example.

https://www.alignmentforum.org/posts/qHCDysDnvhteW7kRd/arc-s...

That paper is a bit dense, but considers the ways that a powerful AI model could be intractable/deceptive to discovering its latent knowledge. If we can confidently understand an AI’s internal knowledge/intention states, then alignment is probably tractable.


I wonder if this could one day be how we settle disagreements with no solid answer, like was William Shakespeare really the author of all those plays.


I’m reminded of Charles Babbage’s comments on being asked about the capabilities of his difference engine:

    On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
Where do you think a large language model is going to acquire the information to establish the truth value of a statement, whose truth value is unknown outside the context of the language model?


Well that’s why its ‘latent’ knowledge. And I’m not thinking about deductive truth so much as the possibility of a new epistemological position that uh.. just takes a sufficiently large models word for it.


A formal definition of ‘common knowledge’ or ‘common sense’?

“Anything that would be regarded as true by a language model trained on the Common Crawl dataset”.

Be an interesting philosophical position to test out in, say, a judicial setting. We could replace the legal concept of a ‘reasonable person’ with ‘a language model with its temperature set to a reasonable level’.

And perhaps instead of a jury of your peers, the fairest way to reach a verdict would be to present the facts of a case to a large language model, and ask it ’guilty or not guilty?’ twelve times..


> just takes a sufficiently large models word for it

The renaissance and enlightenment happened because we stopped doing this. The church was that era’s big model, and doctrine was similarly impenetrably complex and indecipherable. You’d just take the church’s word for something.

We’re going backwards.


"sufficiently large model" is really really not how churches work.

> We’re going backwards.

...it was an off the cuff hypothetical.


The medieval church used to be much more about esoteric doctrine and strange theological reasoning than I think most people realize. It was both complicated and nonsensical; different traditions had different levels of complexity, and many were notoriously convoluted. Byzantine life centered around theology and the overcomplicated bureaucratic state dependent on it to such an extent the word “byzantine” became synonymous with impossible to reason about complexity. While the way in which AI works involves much more rigor and empiricism and has wildly different principles at its foundations and observable results, I’d argue it’s similarly impossible to reason about. What brought about the scientific revolution was a rigorous deference to simplification wherever possible. The convoluted orbits with Earth at the center “worked”, but were much more complex. Our deference to complex convoluted statistical models where the underlying natural rules and phenomenon at play aren’t explicitly identified seems like a step backwards in terms of creating more actual knowledge to me (although the practical utility and the opportunities to refine outputs into new knowledge discoveries are massive)

And I realize what I was replying to was a hypothetical/not a serious proposition right now, but people seem very eager to push towards a future in which we defer to more advanced versions of these things without much additional thought. I think that’d be a major mistake.


The word "knowledge" as used in TA shouldn't be taken as philosophically defined knowledge (i.e. a justified true belief).

Language models might indeed help us find potential sources of validation for various hypotheses, but deciding whether a hypothesis is true is another matter entirely.


This is an extremely important point.

These are very large, very complex fuzzy language maps, and are best used like a crystal ball for autocompletion suggestions, not definitive answers.

I’m unsurprised but also very disappointed by people’s desire to give these fuzzy answers final word. The desire to cede final authority on truth to these machines just because they’re giant and impressive is incredibly misguided and a major step backwards.

These machines are literal embodiments of group think.

They can only add value if people understand their limitations. If people get carried away with them and start assuming they’re authoritative we’re in for serious trouble.


William Shakespeare was the author of those plays, if you mean plays like Hamlet, Romeo and Juliet etc. The answer is completely settled, since there is no serious argument that we wasn't, outside of fanciful conspiracy theories.

There are legitimate discussions about some of the plays which were likely collaborations between multiple authors (e.g. Edward III, Henry IV 1), and machine learning has been used to try identify the different authors and authorship of different passages. This is an area of open research.


If you run ML on the historian-uncontested[0] plays the same way as on the ones with likely different/multiple authors, what is the result? Does it say 'this is clearly just one guy'?

[0] You sound confident about the consensus. Can you share the material that led you to that? I'm currently under the impression they were all written by multiple authors. HOWEVER, I recognize I'm very low-info on this (or low-quality info at least) and my belief would be easily changed by something reputable and/or with good methodology.


> Does it say 'this is clearly just one guy'?

It says the majority of the plays are likely written by the same person, since Shakespeare have a distinct "voice" which is clearly different form other contemporary playwrights. But there is a number of pitfalls in the statistical approach because an author might try to mimic the writing style of another author, especially since these collaborations was somewhat like master-apprentice relationships. The Oxford Complete Shakespeare credits Marlowe as co-author of Henry VI part 1, but this is highly controversial since some believe Shakespeare was just strongly inspired by Marlowe in his early plays.

Here is a podcast with a bit a bout the controversy: https://www.folger.edu/shakespeare-unlimited/christopher-mar...

As for the general authorship question, you can just read anything which isn't written by a crackpot. But Oxford has "William Shakespeare - A Very Short Introduction" which is a pocket-size introduction and biography. If you want something meatier, Marjorie Garbers "Shakespeare After All" is a great book.


What about stylometry.. that should be enough…


But what if we can train a language model to produce output that, according to stylometric analysis, is with 90% confidence written by the same person who wrote Hamlet?


I want to watch a performance of this nouveau Shakespeare.


Hang on - I thought the consensus among ML experts was that language models don’t ‘know’ anything?


"Knowledge" is a bit of a shorthand term in a case like this, yes, but not in itself much more wrong than going looking for "knowledge" in a book for instance (though arguably the ML context is more prone to misunderstandings about actually knowing anything, due to the public image of "AI").

The models still capture statistical regularities in the "language" (overwhelmingly that is text corpora) they were optimized for tasks against, and to the extent what went in during optimization really represents facts and relations in the world, the model probably holds representations related to facts and relations in the world.

It probably also holds representations of spurious facts and relations in the world. The optimization on language is not explicitly designed for capturing semantic information and, as far as I know, there's typically no mechanisms built in to measure or report anything about the veracity of the claims. Which makes relying on anything coming out of them a shaky business at best for now of course.


Forget about whether the ‘latent knowledge’ in the LLM is ‘correct’ or not, this kind of research is at least predicated on the idea that the LLM holds certain things as true, and consistently holds their opposite as false.

The point of this kind of work is to query the LLM and figure out what it ‘thinks is true’ - which is then presumably useful for deciding how much to rely on its output.

I just find it amusing that people in the ML field are always so adamant that ‘it doesn’t know’ or ‘it’s not thinking’ or ‘it doesn’t understand’, but I don’t see much engagement with what is actually meant by ‘know’, ‘think’ or ‘understand’ in the first place.

Like, you just said that a language model ‘probably… holds representations of spurious facts and relations’.

But… isn’t ‘holding a representation of a fact or relation’ basically what we call ‘knowing’ something?


> isn’t ‘holding a representation of a fact or relation’ basically what we call ‘knowing’ something?

No. Knowledge is fundamentally and inextricably tied to our experience of reality. The billions of years of evolution that created our perceptual framework has created a bridge to understanding reality machines don’t have.

I’d argue a chicken brain has a far better greater amount of true knowledge than any of these language models.

That does not diminish the utility and sophistication of these language models. But they are not knowledge. They are a disconnected meta map of what we have written down.

We calibrate these language models to knowledge because our language usually corresponds to things we know.

If the internet was mostly populated with sentences like “a druid flew into houston, cats pool into mud vampires wealth umpire onion buy now saxophone”, a language model mapping it all would be useless.


You’re saying that a language model trained on ‘sensical’ language is qualitatively different from one trained on ‘nonsensical’ language use. I agree.

What is your name for the quality that such a language model possesses, that makes it more ‘useful’?

Why is ‘knowledge’ not a good word for that quality?


This is a deep topic that I don’t feel fully qualified to speak precisely about, but the key point is that knowledge isn’t about mapping symbols to other symbols like what language models do, it’s about mapping symbols/mental representations to reality.

I think knowledge is in fact a good word for ‘sensical’ language, but the point most people like me are trying to make when talking like this is that language models can’t determine what is ‘sensical’ language.

The amazing thing about these newer models is that they seem to get remarkably close to properly understanding real world concepts just through symbol mapping on massive amounts of data, but that’s highly dependent on having being fed sensical data where that mapping of symbols to reality was already done.

I’m trying to argue that knowledge only really exists in the data feeding step, when we map symbols to reality.

What the language models do is create a map of symbols. But they mix things which are knowledge with things that are not, and cannot determine what about their output and their mapping is actually valid knowledge.

I think most of why this is has to do with the nearly endless amount of embodied physical and historical context that goes into our own sense making. A surprisingly large degree seems to be embedded in language, but there’s a lot more that is not.

EDIT: I rambled, sorry; direct answer to your question about what I’d call the property that makes these language models useful is “relatively high degree of correspondence with reality”. What I consider “knowledge” is directly obtained by something which can interact with reality/directly maps reality to some representation.


I think more caution is warranted here. That argument seems to rely on our experience of reality being an accurate representation of objective reality. There are some ideas that are surprisingly difficult to dismiss that essentially state that our perceptions of reality do not have this property, and that in fact that any organism that is the result of an evolutionary process is really only ever incentivized to evolve a perception system that presents them with what is essentially a user interface. Spacetime itself may just be a data structure generated in the mind as a simplification of some much more complex and informationally overwhelming reality.


I’m quite sympathetic to that line of thinking. Your language is suggestive of an interview on Lex Fridman’s podcast/am guessing you’ve listened to it. Nietzche also basically recognized this over a century ago, though he put it in different terms/couldn’t articulate in as much mathematical detail as evolutionary game theorists allegedly can.

That’s actually precisely why I think evolutionarily adapted embodied perceptual systems need to be a part of any true knowledge: reality is so incredibly complex that we don’t know what our perception is or is not doing. We just know it’s doing something reality based because we exist. Perceptual models don’t need to correspond with reality like actual embodied perception (and corresponding cognition)

I don’t buy the argument that the correspondence with objective reality in evolved perceptual systems is zero as is claimed in that interview, simply because we exist. I think that conclusion is an artifact of precisely the same kind of over abstraction and non-correspondence that invalidates language models as knowledge systems. Our existence means our perception corresponds with at least some kind of compatibility with and ability to navigate through reality.

A system which does not have that embodied evolutionary history with millennia of required correspondence with reality, simplified or not, does not have all of the essential embedded context and understanding of complexity that goes into our sense making. Science is about discovering more of that hidden context, and perhaps these language models can help identify patterns we haven’t seen yet that do in fact correspond with reality. But the model can’t test it’s own correspondence.


> I just find it amusing that people in the ML field are always so adamant that ‘it doesn’t know’ or ‘it’s not thinking’ or ‘it doesn’t understand’, but I don’t see much engagement with what is actually meant by ‘know’, ‘think’ or ‘understand’ in the first place.

Exactly. This is why these experts should shut up about claiming these AI models don't know things, or aren't sentient. We don't have any mechanistic explanations for what knowledge is or how it works, or what sentience is or how it works, so all of these claims are just bullshit. At best we have some hand-wavy intuitions about what some properties of knowledge or sentience have (Gettier problem, etc), but that's nowhere near enough to make the definitive claims I've seen.


That's the understanding of some. It's really a question of what does knowing really mean. I think most of those people assume the model must be "conscious" to "know" anything. Which seems silly.


My database doesn't "know" anything, yet I am able to discover latent knowledge by querying it.


True but nobody’s writing a paper on how to psychoanalyze your database to try to figure out what it knows.


Is this proposing a perpetual motion machine? (With energy switched out for information)


No? It’s measuring the information content of a language model (which takes energy - and information - to make)


Is there a PG word for bullshitting that has the same meaning?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: