Hacker News new | past | comments | ask | show | jobs | submit login

How are you going to find (not even talking about correcting) hallucinated errors?

If money is involved and the LLM produces hallucination errors, how do you handle monetary impacts of such errors?

How does that approach scale financially?




Indeed. I anticipate the next Post Office Scandal(1,2) attributed to LLMs.

1 https://en.wikipedia.org/wiki/British_Post_Office_scandal 2 https://www.postofficescandal.uk/


That's awful.

Reminds me of the Dutch childcare benefits scandal [0], where 26,000 families were unfairly labeled as having committed tax fraud (11,000 of which had been targeted via "risk profiling", as they had dual nationalities [1]). Bad policy + automation = disaster. The wikipedia article doesn't fully explain how some automated decisions were made (e.g. You had a typo in a form, therefore all previous benefits were clawed-back; if you owe more than €3.000,- then you're a fraudster and if you called to ask for clarification they wouldn't help you — you're officially labeled a fraudster, you see).

Edit: couldn't find a source for my last statement, but I remember hearing it in an episode of the great Dutch News podcast. I'll see if I can find it.

[0]: https://en.wikipedia.org/wiki/Dutch_childcare_benefits_scand...

[1]: https://www.dutchnews.nl/2021/02/full-scale-parliamentary-in...


Just for posterity, I couldn't find the specific podcast episode, but there are public statements from some of the victims [0] available online (translated):

> What personally hurt you the most about how you were treated? > Derya: 'The worst thing about all of this, I think, was that I was registered as a fraudster. But I didn't know anything about that. There was a legal process for it, but they blocked it by not telling me what OGS (intent gross negligence) entailed. They had given me the qualification OGS and that was reason for judges to send me home with rejections. I didn't get any help anywhere and only now do I realize that I didn't stand a chance. All those years I fought against the heaviest sanction they could impose on me and I didn't know anything. I worked for the government. I worked very hard. And yet I was faced with wage garnishment and had to use the food bank. If I had known that I was just a fraudster and that was why I was being treated like that, I wouldn't have exhausted myself to prove that I did work hard and could pay off my debts myself. I literally and figuratively worked myself to death. And the consequences are now huge. Unfortunately.'

[0]: https://www.bnnvara.nl/artikelen/hoe-gaat-het-nu-met-de-slac...


We tried all models from openai and google to get data from images and all of them made "mistakes".

The images are tables with 4 columns and 10 rows of numbers and metadata above that are in a couple of fields. We had thousands of images already loaded and when we tried to check those previously loaded images we found quite a few errors.


Multimodal LLMs are not up for these tasks imo. It can describe an image but its not great on tables and numbers. Now on the other hand, using something like Textract to get the text representation of the table and then feeding that into a LLM was a massive success for us.


LLMs don't offer much value for our use case, almost all values are just numbers


Then you should be using something like Textract or other tooling in that space. Multimodal LLMs are no replacement.


We use opencv + tesseract and easyocr


Curious, did that make you "fall back" to more conservative OCR?

Or what else did you do to correct them?


We already had an OCR solution. We were exploring models in case the information source changes


Not the OP, but if doing this at scale, I'd consider a quorum approach using several models and looking for a majority to agree (otherwise bump it for human review). You could also get two different approaches out of each model by using purely the model and external OCR + model and compare those too.


I’m working on a problem in this space, and that’s the approach I’m taking.

More detailed explanation: I have to OCR dense, handwritten data using technical codes. Luckily, the form designers included intermediate steps. The intermediate fields are amenable to Textract, so I can use a multimodal model to OCR the full table and then error check.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: