Hacker News new | past | comments | ask | show | jobs | submit login

LLM responses are random. One's failure is other's success. When evaluating we all should do rerurns and see how many times it fails or succeeds.

Without number of rerurns, the result is as good as random.






Okay?

OC was saying that the article said that Claude recognized the “artistic” lines of the image from just the scatter plot data.

That isn’t what happened.

The author added a png of the plot to the conversation.

Idk why I need to explain that twice.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: