With tech companies like OpenAI, Google, and Meta launching new AI models within weeks of each other, it’s becoming increasingly difficult not only to keep track but also to assess how sophisticated each model truly is. That’s where Chatbot Arena comes in—a free, crowdsourced benchmarking platform that tests newly launched AI models and pits them against each other across various parameters. In fact, ever since it was launched, Chatbot Arena has become the silicon valley’s new obsession.
Here’s everything you need to know about Chatbot Arena—how it works and why the crowdsourced ranking site has become so popular.
What is Chatbot Arena?
Most companies measure their AI models against a set of general capability benchmarks, but there is no industry-standard benchmark or universally accepted method for assessing large language models (LLMs). Founded in 2023 by researchers affiliated to UC Berkeley’s Sky Computing Lab, Chatbot Arena has emerged as the most practical—and virtually the only—tool to determine which AI model is the best on the market.
Essentially, it’s an interactive platform where users can pit multiple AI chatbots against each other in real-time conversations.
Ranking of various AI models on the chatbot arena.
What sets Chatbot Arena apart is that it allows AI models to converse freely across a wide range of topics, offering a more holistic assessment of their conversational skills. This is important criteria – after all, even small differences to factors like prompts, datasets, and formatting can have a huge impact on how a model performs.
In Chatbot Arena, users can interact with the chatbots, get side-by-side comparisons of various AI tools with complete weakness and strengths, and vote on which one performs better. The tool becomes a testing ground for AI developers, researchers, and anyone who is interested in benchmarking these AI tools.
Chatbot Arena recently transitioned into a full fledged company called LMArena, operating under Arena Intelligence Inc. The new company is co-founded by Dimitris Angelopoulos, Wei-Lin Chiang—another former UC Berkeley postdoctoral researcher—and Ion Stoica, a professor and tech entrepreneur. Chatbot Arena is funded through a combination of grants and donations, including support from Google’s Kaggle data science platform, Andreessen Horowitz, and Together AI.
Story continues below this ad
How does Chatbot Arena work?
Perhaps the biggest reason why the AI benchmarking tool is so popular in the first place is that it makes it easy to compare two AI models side by side. Not only does Chatbot Arena allow users to pit the latest AI chatbots from OpenAI, Google, Anthropic, and Meta against each other, but its scoreboard also ranks over 100 AI models (developed by organisations or individuals) based on nearly 1.5 million votes. These rankings span a wide range of categories, including coding, long-form queries, mathematics, “hard prompts,” and various languages such as English, French, Chinese, Japanese, Spanish, among others. It all adds up to why the AI benchmarking tool is so popular among the global community users.
The industry also praises Chatbot Arena for offering neutral benchmarking, making the platform largely free of bias—an important factor for objective comparisons between different AI models. Chatbot Arena has partnerships with OpenAI, Google, and Anthropic to make their flagship models available for the community to evaluate.
How to use Chatbot Arena
Chatbot Arena offers two ways of evaluate different AI models, and if you are keen to try the AI benchmarking tool, make sure to try their “battle” modes. The first mode is the Arena Battle, where your prompts is answered by Model A and Model B
However, you don’t know the model names until you click a button at the bottom. Then the model names appear. It’s completely anonymous.
Story continues below this ad
To use Chatbot Arena battle:
*Navigate to the Chatbot Arena website: https://arena.lmsys.org/, and select Chatbot Arena (menu) from the top menu.
*Click OK on the popup that indicates this is a research preview.
*Make sure to read the Terms of use to better understand how the battle works, then scroll down to the field that reads Enter text and press Enter.
*Enter your prompt.
*Click the Send button.
Read the results and click the appropriate button. Now you should now see the name of the LLMs used for the battle.
Story continues below this ad
Another way to evaluate two AI models is through a side-by-side comparison. This allows you to choose the AI models of your choice and see how they perform against each other. It’s definitely a better approach—at the very least, it helps you identify which model best suits your needs
To use Chatbot Arena (side-by-side comparison):
*Open https://arena.lmsys.org/ in your web browser.
*Click OK to the research preview popup.
*Now click the top tab labeled “Arena (side-by-side)“.
*Click in the field showing the model name.
*Select your model name from the drop-down list. You can also clear the field and start typing letters.
*Scroll down and enter your prompt.
*Click Send.
*Review your responses and then Cast your vote using the buttons at the bottom.
Story continues below this ad
In case, you don’t like the responses, you always click the Regenerate button.