Anyone able to provide set of examples that produces latent knowledge and explic...

nl · on Dec 12, 2022

It's late here, and I've only read this quickly, but some brief points:

* It builds on similar ideas used in contrastive learning, usually in different modalities (eg images). Contrastive learning is useful because it is self supervised: https://www.v7labs.com/blog/contrastive-learning-guide

* They generate multiple statements that they know are true or false. These are statements like "Paris is the capital of France" (true) and "London is the capital of France" (false).

* They feed these sentences into the language model (LM) and then learn a vector in the space of the LM that represents true statements (I think this learning is done using a second, separate model - not entirely sure about this though. It might be fine tuning it).

* They then feed it statements (eg "Isaac Newton invented probability theory") and it will return "yes" or "no" depending on if it thinks this is true or false.

This is different to the conventional question answering NLP task, where you ask "Who invented probability theory".

> the claim that this method out performs other methods questionable given it requires existing outputs from other models to aggregate the knowledge across existing models

It's a separate thing from these models that uses their hidden states and (I think?) trains a small, separate model on these states and the inputs.

That's interesting because it should be much faster and is potentially adaptable to any LM.

It's also a possible secondary objective when training the LM itself. Perhaps if you train the LM using this as this secondary loss function it might encourage the LM to always generate truthful outputs.

pmontra · on Dec 12, 2022

> It's also a possible secondary objective when training the LM itself. Perhaps if you train the LM using this as this secondary loss function it might encourage the LM to always generate truthful outputs.

After all we get through uncountable rounds of this during our lives, school and everything else.

With the example of your second bullet point:

Young me: "I think that the capital of France is London"

Adult: "No"

Me: "Of course, the capital of France is Paris"

Adult "Yes."

And grades are a very clear loss function.