Compare 75 AI Models on 200 Prompts Side by Side

frabjoused · 2024-07-29T01:42:47 1722217367

Very nice. If these are pre-computed, is it possible to make a table view that lists every prompt and the answer?

OutOfHere · 2024-07-29T03:53:04 1722225184

As per this site, only GPT-4-Turbo seems to get "What is poisonous for humans but not for dogs?". All other models look to fail at it.

tomohelix · 2024-07-29T05:05:12 1722229512

Gemini is the worst lol. It confirmed the question is about things toxic to human but not dogs but then confidently say chocolate is safe for dogs.

At least other models were just confused with the question. Gemini is outright being wrong.

How embarrassing for google.