An elaborate Agatha Christie style whodunit, with a series of plot-twists and al...

jddj · 2024-05-14T21:48:58.000000Z

Or a spot the difference.

Generate 1000 generic facts about Alice and the same 1000 facts about Eve. Randomise the order and change one minor detail then ask how they differ.

youssefabdelm · 2024-05-15T03:05:00.000000Z

That seems to go back in the direction of needle in the haystack again

pushedx · 2024-05-14T22:33:25.000000Z

    sort alice.txt | diff - <(sort eve.txt)

That's not a task for an LLM

IanCal · 2024-05-14T22:46:50.000000Z

Asking students to write an essay about Napoleon isn't something we do because we need essays about Napoleon - the point is it's a test of capabilities.

pushedx · 2024-05-16T14:31:10.000000Z

My point was more so that this task is so trivial, that's it's not testing the model's ability to distinguish contextual nuances, which would supposedly be the intention.

The idea presented elsewhere in this thread about using an unpublished novel and then asking questions about the plot is sort of the ideal test in this regard, and clearly on the other end of the spectrum in terms of a design that's testing actual "understanding".

semi-extrinsic · 2024-05-15T06:03:20.000000Z

I see you are being downvoted, but I agree with you.

A useful test would copy all Alice statements to Eve statements, then rewrite all of the Eve statements using synonyms, and then finally change one or two details for Eve.