Llama 3 70B has debuted on the famous LMSYS chatbot arena leaderboard at positio...

Llama 3 70B has debuted on the famous LMSYS chatbot arena leaderboard at position number 5, tied with Claude 2 Sonnet, Bard (Gemini Pro), and Command R+, ahead of Claude 2 Haiku and older versions of GPT-4.

The score still has a large uncertainty so it will take a while to determine the exact ranking and things may change.

Llama 3 8B is at #12 tied with Claude 1, Mixtral 8x22B, and Qwen-1.5-72B.

These rankings seem very impressive to me, on the most trusted benchmark around! Check the latest updates at https://arena.lmsys.org/

Edit: On the English-only leaderboard Llama 3 70B is doing even better, hovering at the very top with GPT-4 and Claude Opus. Very impressive! People seem to be saying that Llama 3's safety tuning is much less severe than before so my speculation is that this is due to reduced refusal of prompts more than increased knowledge or reasoning, given the eval scores. But still, a real and useful improvement! At this rate, the 400B is practically guaranteed to dominate.