Hacker News new | past | comments | ask | show | jobs | submit login

Is this a good or bad thing? We hear "hallucination" this and that. You can't rely on the LLM. It is not like a search engine. But then you hear on the other side "it memorises PII".

Being able to memorise information is demanded when we want the top 5 countries by population in Europe or the height of Everest. But then we don't want it in other contexts.

Looks more like a dataset pre-processing issue.




I think I agree with this take.

Is it conceivable that a model could leak PII that is present but extremely hard to detect in the data set? For example, spread out in very different documents in the corpus that aren't obviously related, but that the model would synthesize relatively easily?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: