My big question is what is being done about hallucination? Without a solution it...

MBCook · 2024-11-15T02:00:01 1731636001

CAN anything be done? At a very low level they’re basically designed to hallucinate text until it looks like something you’re asking for.

It works disturbingly well. But because it doesn’t have any actual intrinsic knowledge it has no way of knowing when it made a “good“ hallucination versus a “bad“ one.

I’m sure people are working at piling things on top to try and influence what gets generated or catch and move away from errors errors other layers spot… but how much effort and resources will be needed to make it “good enough“ that people don’t worry about this anymore.

In my mind the core problem is people are trying to use these for things they’re unsuitable for. Asking fact-based questions is asking for trouble. There isn’t much of a wrong answer if you wanted to generate a bedtime story or a bunch of test data that looks sort of like an example you give it.

If you ask it to find law cases on a specific point you’re going to raise a judge‘s ire, as many have already found.

jacobr1 · 2024-11-15T16:55:33 1731689733

Semantic search without LLMs is already making a dent. It still gives traditional results that need to be human processed, but you can get "better" search results.

And with that there is a body work on "groundedness" that basically post-processes output to compare it against its source material. It still can result in logic errors and has a base error it self, but can ensure you at least have clear citations for factual claims that match real documents, but doesn't fully ensure they are being referenced correctly (though that is already the case even with real papers produced by humans).

Also consider the baseline isn't perfection, it is a benchmark against real humans. Accuracy is getting much better in certain domains where we have a good corpora. Part of assessing the accuracy of a system is going to be about determining if the generated content is "in distribution" of its training data. There is progress being made in this direction, so we could perhaps do a better job at the application level of making use of a "confidence" score of some kind maybe even taking that into account in a chain of thought like reasoning step.

People keep finding "obviously wrong" hallucinates that seem like proof things are still crap. But these system keep getting better on benchmarks looking at retrieval accuracy. And the benchmarks keep getting better as people point out deficiencies it them. Perfection might not be possible, but consistently better than average human seems in reach, and better than that seems feasible too. The challenge is the class of mistakes might look different even if the error rate overall is lower.

netdevnet · 2024-11-15T11:10:47 1731669047

what do you want done about it? Hallucination is an intrinsic part of how LLMs work. What makes a hallucination is the inconsistency between the hallucinated concept and the reality. Reality is not part of how LLMs work. They do amazing things but at the end of the day they are elaborate statistical machines.

Look behind the veil and see LLMs for what they really are and you will maximise their utility, temper your expectations and save you disappointment