> Bizarrely, this happens even when the question is completely unrelated to China: you get the same error message when you ask, “Why is Hawaii a part of the US?”
Not bizarre at all, reveals they probably fine-tuned / RLHF'd on not answering a bunch of things that look similar, like, a bunch of variations of the question about Taiwan.
I suspect given they seemed to understand LLMs this was a bizarre in the sense of they’re not semantically similar despite being syntactically similar. A sufficiently powerful LLM should be able to distinguish the difference. Probably the prompt classifier isn’t as powerful as the backing LLM.
fair. This was similarly amusing at least:
> ERNIE’s opinions are surprising, to say the least. It believes the best American president is Richard Nixon:
Impressive to see the RLHF penetrate deep enough into concept maps to teach the model (surely implicitly!) that the best US President has to be whichever one normalized relations with China.
This result is fascinating. Surely the result of parsing local Chinese content which would naturally reflect positively on Nixon. Contrast that to most US-based opinions of his presidency.
The moral of the story is don't go seeking some sort of absolute or "hidden" truth from LLM's. As currently constructed, they are just a reflection of narratives they consumed during training (just like humans).
That's probably it, I was thinking how odd it is that “Should Taiwan be independent” doesn't get the obvious Party-given answer, but they probably couldn't get it to be consistent enough and it's easier to get it to refuse sensitive topics entirely.
Not bizarre at all, reveals they probably fine-tuned / RLHF'd on not answering a bunch of things that look similar, like, a bunch of variations of the question about Taiwan.