An LLM itself cannot provide references with any integrity. They’re autogressive...

ein0p · 2024-06-18T07:35:09 1718696109

Sure, but it can semantically query a vector DB on the side and use the results to generate a "grounded" response.

azinman2 · 2024-06-18T13:57:39 1718719059

“LLMs can provide references. There’s no limitation on that.”

That’s not really an LLM providing references, but a separate db as an extra step providing the references.

ein0p · 2024-06-18T15:21:12 1718724072

All outputs of an LLM are generated by LLM. LLMs today can and do use external data sources. Applied to humans, what you're saying is like saying that it's not humans who provide references because they copy the bibtex for them from Arxiv.

azinman2 · 2024-06-18T17:43:52 1718732632

But if you’re using an external data source and putting it into the context then it’s the external data source that’s providing the reference — the LLM is just asked to regurgitate it. The large language model, pretrained on trillions of tokens of text, is unable to provide those references.

If I take llama3, for example, and ask it to provide a reference.. it will just make something up. Sometimes these things happen to exist, often times they don’t. And that’s the fundamental problem - they hallucinate. This is well understood.