Hacker News new | past | comments | ask | show | jobs | submit login

An elaborate Agatha Christie style whodunit, with a series of plot-twists and alibis which can be chopped off the end of the piece to modify who is the most likely suspect



Or a spot the difference.

Generate 1000 generic facts about Alice and the same 1000 facts about Eve. Randomise the order and change one minor detail then ask how they differ.


That seems to go back in the direction of needle in the haystack again


    sort alice.txt | diff - <(sort eve.txt)
That's not a task for an LLM


Asking students to write an essay about Napoleon isn't something we do because we need essays about Napoleon - the point is it's a test of capabilities.


My point was more so that this task is so trivial, that's it's not testing the model's ability to distinguish contextual nuances, which would supposedly be the intention.

The idea presented elsewhere in this thread about using an unpublished novel and then asking questions about the plot is sort of the ideal test in this regard, and clearly on the other end of the spectrum in terms of a design that's testing actual "understanding".


I see you are being downvoted, but I agree with you.

A useful test would copy all Alice statements to Eve statements, then rewrite all of the Eve statements using synonyms, and then finally change one or two details for Eve.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: