Hacker News new | past | comments | ask | show | jobs | submit login

    sort alice.txt | diff - <(sort eve.txt)
That's not a task for an LLM



Asking students to write an essay about Napoleon isn't something we do because we need essays about Napoleon - the point is it's a test of capabilities.


My point was more so that this task is so trivial, that's it's not testing the model's ability to distinguish contextual nuances, which would supposedly be the intention.

The idea presented elsewhere in this thread about using an unpublished novel and then asking questions about the plot is sort of the ideal test in this regard, and clearly on the other end of the spectrum in terms of a design that's testing actual "understanding".


I see you are being downvoted, but I agree with you.

A useful test would copy all Alice statements to Eve statements, then rewrite all of the Eve statements using synonyms, and then finally change one or two details for Eve.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: