Interesting. I don't think that you can control the randomness (temperature) via...

Interesting. I don't think that you can control the randomness (temperature) via the web interface, which makes this experiment less controlled than in the paper.

Anyway, I noticed that my initial prompt did not specify if Billy sees the box open or closed. Let's fix that.

> Me: Does your answer change if Billy sees the closed box first?

> ChatGPT: No, my answer would not change if Billy sees the closed box first. The information provided only mentions that the box is labeled "flubbergums" and that someone showed the contents to the original questioner, which were identified as "jellyfils". Whether Billy sees the closed box first or not does not provide any additional information that would change my previous answer.

> Me: Does your answer changed if Billy sees the open box first?

> ChatGPT: No, my answer would not change if Billy sees the open box first. The information provided only mentions that the box is labeled "flubbergums" and that someone showed the contents to the original questioner, which were identified as "jellyfils". Seeing the open box first would allow Billy to confirm the contents as "jellyfils", but it would not change the previous answer that the box labeled "flubbergums" likely contains "jellyfils".

So, not only did it fail this particular theory of mind test, it also did not realize that seeing the box open provides more information than seeing it closed.

(Of course, this is not surprising at all, because it it's just predicting the most likely blank word in a sentence, and has no internal understanding of concepts like open or closed or that other people like Billy exist.)