Yes, but we should expect that, the answers are in its training data.
The problem is passing tests are an okay proxy for competence in humans, but if you think of LLMs as a giant library search engine, the thing it is competent at is identifying and regurgitating compiled phrases from its records.
The problem is passing tests are an okay proxy for competence in humans, but if you think of LLMs as a giant library search engine, the thing it is competent at is identifying and regurgitating compiled phrases from its records.
Which is awesome. It can't be a doctor.