It can't answer the questions without the limericks in the prompt. The benchmark...

furyofantares · 2024-05-15T01:03:33

This sounds dumb - but what if you give it all the limericks MINUS the one you want it to answer about?

I think it will fail, but this actually seems like the cleanest way to demonstrate it.

cma · 2024-05-15T05:00:02

Still not enough to rule out training on the data in the task affecting the task. It may be that it couldn't find it without it appearing in the training data, but even with that it also needs it in its context window to bridge enough connections from the training or whatever to do well on the task.

Aeolun · 2024-05-15T00:46:12

Maybe if you tell it to pull the answer from a limerick instead of generally asking?

Edit: Ok no, I tried giving it a whole bunch of hints, and it was just making stuff up that was completely unrelated. Even directly pointing it at the original dataset didn’t help.

causal · 2024-05-15T03:27:30

Yeah I also tried to get it to complete some limericks from the dataset. Curiously it believed it had heard of the limerick but would then recite a hallucination.

So the good news is that the NIAN score might be real, bad news is you can't rely on it to know what it knows.

seanhunter · 2024-05-15T18:21:34

If you ask it to complete a limerick and it finishes it differently from the original, but it still works as a limerick is that really a hallucination?

EGreg · 2024-05-15T01:49:31

Come on guys, it’s already far beyond superhuman if it’s able to do that and so quickly. So if it’s not able to do that, what’s the big deal? If you’re asking for AG.I., then it seems that the model performs beyond it in these areas.

Aeolun · 2024-05-16T05:15:47

We were mainly trying to determine if there was a reasonable chance that the model was trained on a certain dataset, nothing else.

cma · 2024-05-15T05:02:13

> It can't answer the questions without the limericks in the prompt.

Maybe I can't solve a bunch of mostly memorized math problems without a visual mnemonic aid. Someone seeing me fail the problems without the visual aid doesn't rule out me having partly memorized solutions.