At my job we are scraping using LLMs. For a 10M sector of the company. GPT4 turb...

what · 2024-05-08T04:13:19 1715141599

Bold claim, did you review all 1.5 million requests?

bryanrasmussen · 2024-05-08T09:13:42 1715159622

I guess the claim is based on statistical sampling at reasonably high level to be sure that if there were hallucinations you would catch them? Or is there something else you're doing?

Do you have any workflow tools etc. to find hallucinations, I've got a project in backlog to build that kind of thing and would be interested in how you sort through bad and good results.

ewild · 2024-05-08T14:36:34 1715178994

in this case we had 1.5 millioon ground truths for our testing purposes. we now have run it over 10 million, but i didnt want to claim it had 0 hallucinations on those as technically we cant say for sure, but considering the hallucination rate was 0% for 1.5 million when compared to ground truths im fairly confident.

krainboltgreene · 2024-05-08T04:49:12 1715143752

How do you know that's true?

ewild · 2024-05-08T14:35:30 1715178930

the 1.5 million was our test set. we had 1.5 million ground truths, and it didnt make up fake data for a single one

krainboltgreene · 2024-05-10T17:34:26 1715362466

That's not what I asked. I asked "How did you determine that it didn't make up/get information wrong for all 1.5m?"