In my case, code that runs is more convincing than code that doesn't. Also it's ...

mistrial9 · 2024-06-09T23:55:38 1729780047

a teacher once told me, three are three kinds of questions. One is factual, that is a valid answer is maybe a number or details of an event that is documented.. lots of computer things or science knowledge; Second question purely for an opinion .. "Do you like house music?" .. there is no correct answer it is an opinion.. but the Third might be called a "well-reasoned judgement" .. that is often in the realm of decisions.. there are factors, not everything about it is known.. goals or culture outside of the question might shape the acceptable answers.. law certainly.. lots of business things..

extending that to an LLM, perhaps language translation sits as a "3rd type" on top of those three types.. translating a question or answer into another spoken language.. or via an intermediate model of some kind .. but that is going "meta" ..

the point is, there are different kinds of questions and answers, and they dont all fit in the same buckets if "testing" an LLM for better..