There are models that can provide references if you’d like

malfist · 2024-06-17T18:23:08.000000Z

And they'll happily make up those references as well

ein0p · 2024-06-17T18:28:57.000000Z

Nope, that’s not how it works. Those references aren’t generated in such systems, they are retrieved. They might not provide references to all the sources, of course, same as humans.

azinman2 · 2024-06-17T19:35:29.000000Z

Exactly. Right now if I google something (ai overview aside), I’m linked to a source. That source may or may not include its sources, but its provenance tells me a lot. If I’m reading info linked off Mayo Clinic, their reputation compels the information to be judged of high quality. If they start putting in a bunch of garbage, their reputation gets shot and will cause me to look elsewhere. With LLMs there is no such choice, and it will spew everything from high to low quality (to dangerously wrong) info.

ein0p · 2024-06-17T19:54:15.000000Z

LLMs can provide references. There’s no limitation on that. Even GPT4 includes references sometimes when it deems them beneficial.

azinman2 · 2024-06-18T05:24:53.000000Z

An LLM itself cannot provide references with any integrity. They’re autogressive probabilistic models. They’ll happily make something up, and you can even try and train a reference with it, but as the article states this is very very far from a guarantee. What you can do is a kind of RAG situation where you have some existing database that you include into the prompt to ground it, but that’s not inherent into the model itself.

ein0p · 2024-06-18T07:35:09.000000Z

Sure, but it can semantically query a vector DB on the side and use the results to generate a "grounded" response.

azinman2 · 2024-06-18T13:57:39.000000Z

“LLMs can provide references. There’s no limitation on that.”

That’s not really an LLM providing references, but a separate db as an extra step providing the references.

ein0p · 2024-06-18T15:21:12.000000Z

All outputs of an LLM are generated by LLM. LLMs today can and do use external data sources. Applied to humans, what you're saying is like saying that it's not humans who provide references because they copy the bibtex for them from Arxiv.

azinman2 · 2024-06-18T17:43:52.000000Z

But if you’re using an external data source and putting it into the context then it’s the external data source that’s providing the reference — the LLM is just asked to regurgitate it. The large language model, pretrained on trillions of tokens of text, is unable to provide those references.

If I take llama3, for example, and ask it to provide a reference.. it will just make something up. Sometimes these things happen to exist, often times they don’t. And that’s the fundamental problem - they hallucinate. This is well understood.

leereeves · 2024-06-18T07:34:46.000000Z

Those aren't sources. They aren't where the model got the information (that info isn't stored in an LLM).

Rather, such AI systems have two parts: an LLM that writes whatever, and a separate system that searches for links related to what the LLM wrote.