> This is an intellectually fascinating thought experiment.
It's not a thought experiment. I'd actually like to do it (and others to do it). IRL.
I probably would start with an open source model that's especially prone to hallucinate, try to trigger hallucinations, then maybe feed back the hallucinations by retraining. Might make the most sense to target long-tail topics, because that would give the impression of unreliability while being harder to specifically counter at the topic level (e.g. the large apparent effort to make ChatGPT say only the right things about election results and many other sensitive topics).
It's not a thought experiment. I'd actually like to do it (and others to do it). IRL.
I probably would start with an open source model that's especially prone to hallucinate, try to trigger hallucinations, then maybe feed back the hallucinations by retraining. Might make the most sense to target long-tail topics, because that would give the impression of unreliability while being harder to specifically counter at the topic level (e.g. the large apparent effort to make ChatGPT say only the right things about election results and many other sensitive topics).