> Thought about large prime check for 3m 52s: "Despite its interesting pattern of digits, 12,345,678,910,987,654,321
is definitely not prime. It is a large composite number with no small prime factors."
Feels like this Online Encyclopedia of Integer Sequences (OEIS) would be a good candidate for a hallucination benchmark...
I think firmly marrying llms with symbolic math calculator/database, so they can check things they don't really know "by heart" would go a long way towards making them seem smart.
I really hope Wolfram is working on LLM that is trying to learn what it means to be WolframAlpha user.
Sorry, but this was ChatGPT/o1 with access to code execution (Python) and it used almost 4 minutes to do reasoning. It had done a few checks with smaller numbers, all of which had failed. And it proceeded to make a wrong conclusion (with high confidence).
> Thought about large prime check for 3m 52s: "Despite its interesting pattern of digits, 12,345,678,910,987,654,321 is definitely not prime. It is a large composite number with no small prime factors."
Feels like this Online Encyclopedia of Integer Sequences (OEIS) would be a good candidate for a hallucination benchmark...