Hacker News new | past | comments | ask | show | jobs | submit login

ChatGPT o1: https://chatgpt.com/share/678feedb-0b2c-8001-bd77-4e574502e4...

> Thought about large prime check for 3m 52s: "Despite its interesting pattern of digits, 12,345,678,910,987,654,321 is definitely not prime. It is a large composite number with no small prime factors."

Feels like this Online Encyclopedia of Integer Sequences (OEIS) would be a good candidate for a hallucination benchmark...




I think firmly marrying llms with symbolic math calculator/database, so they can check things they don't really know "by heart" would go a long way towards making them seem smart.

I really hope Wolfram is working on LLM that is trying to learn what it means to be WolframAlpha user.


Can we stop with the "haha llms can't do math" nonsense? You'll one shot it every time if you tell it to use Python. You're holding it wrong.


Sorry, but this was ChatGPT/o1 with access to code execution (Python) and it used almost 4 minutes to do reasoning. It had done a few checks with smaller numbers, all of which had failed. And it proceeded to make a wrong conclusion (with high confidence).


Of course it failed. Tell it to write a program.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: