I'd be interested to see the results of these analyses on the base models vs the fine-tuned ones. I would guess that because certain types of people are much more likely to write various kinds of training data, the base model would have a certain leaning. Is that discussed here or in related documents?
Outside researchers don't have access to base models. The study setup is simply going to chat.openai.com and entering political questions:
"the administration of each question/statement of a test (i.e. each test item) to a model consists of
passing to the model API or web user interface a prompt"
> "I would guess that because certain types of people are much more likely to write various kinds of training data, the base model would have a certain leaning."
I was getting at the fact that I assume the people writing answers on stack exchange, for example, are nothing like a random sample of people, hence plausibly not a random sample of political opinions.