Hacker News new | past | comments | ask | show | jobs | submit | from login
New Gemini 1.5 Pro (0801) top on LMSys leaderborad (twitter.com/lmsysorg)
9 points by zopper 5 months ago | past
Gemma 2B: Scores better than GPT 3.5 Turbo (twitter.com/lmsysorg)
8 points by FergusArgyll 5 months ago | past
Chatbot Arena Leaderboard: Gemini 1.5 Flash, Pro and Advanced Results (twitter.com/lmsysorg)
57 points by tosh 7 months ago | past | 38 comments
GPT-2 Chatbots Top the Arena with +50 Elo, Strongest Model Ever (twitter.com/lmsysorg)
2 points by georgehill 7 months ago | past
Gemini 1.5 Pro is now #2 on the leaderboard (twitter.com/lmsysorg)
1 point by kmisiunas 8 months ago | past
Gemini 1.5 moves to #2 on the lmsys arena leaderboard (twitter.com/lmsysorg)
3 points by petulla 8 months ago | past
Llama-3 now top-5 on the Arena leaderboard (twitter.com/lmsysorg)
4 points by tosh 8 months ago | past
Lmsys Arena results are out: Claude 3 Opus behind turbo, ahead of classic GPT4 (twitter.com/lmsysorg)
3 points by vitorgrs 10 months ago | past | 1 comment
Google's Bard shows big leap on LLM performance leaderboard (twitter.com/lmsysorg)
132 points by mkmk 11 months ago | past | 91 comments
Mistral Medium reaches Claude-level performance on Chatbot Arena (twitter.com/lmsysorg)
4 points by reissbaker 12 months ago | past | 2 comments
Gemini Pro, Mixtral (Mistral-Small) vs. GPT3.5 in LLM Arena (twitter.com/lmsysorg)
2 points by Palmik on Dec 16, 2023 | past
Vicuna v1.5 series, featuring 4K and 16K context, based on Llama 2 (twitter.com/lmsysorg)
168 points by tosh on Aug 3, 2023 | past | 43 comments
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena (twitter.com/lmsysorg)
20 points by weichiang on June 16, 2023 | past
Google PaLM 2 ranked 6th on the LLM benchmark in the wild (twitter.com/lmsysorg)
1 point by weichiang on May 25, 2023 | past
Chatbot Arena: a crowd-sourced LLM leaderboard (twitter.com/lmsysorg)
1 point by weichiang on May 12, 2023 | past | 1 comment
Chatbot Arena Leaderboard: OpenAI GPT-4 and Anthropic Claude Take the Lead (twitter.com/lmsysorg)
2 points by MMMercy2 on May 10, 2023 | past
Fastchat-T5: 4x smaller but more powerful than Dolly-v2, commercial use ready (twitter.com/lmsysorg)
7 points by zhisbug on April 28, 2023 | past | 1 comment
Vicuna releases its secrete of finding available A100s on the cloud to train it (twitter.com/lmsysorg)
4 points by zhwu on April 13, 2023 | past | 2 comments
State-of-the-Art Chatbot, Vicuna-7B, now runs on MacBook with GPU acceleration (twitter.com/lmsysorg)
126 points by weichiang on April 6, 2023 | past | 84 comments
State-of-the-art open-source chatbot, Vicuna-13B, just released model weights (twitter.com/lmsysorg)
271 points by weichiang on April 3, 2023 | past | 139 comments

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: