LLM Leaderboard
Comparing top-tier general purpose models on key reasoning and language benchmarks.
Top Performers
1st Place
ChatGPT 5
OpenAI's most advanced reasoning model with breakthrough performance in complex problem-solving.
Overall Score
97.3
2nd Place
Gemini 2.5 Pro
Google's flagship model with exceptional multimodal capabilities and massive context window.
Overall Score
95.7
3rd Place
Claude 4.1 Opus
Anthropic's most powerful model with exceptional reasoning and creative capabilities.
Overall Score
95.5
OpenAI's most advanced reasoning model with breakthrough performance in complex problem-solving.
Google's flagship model with exceptional multimodal capabilities and massive context window.
Anthropic's most powerful model with exceptional reasoning and creative capabilities.
Anthropic's most powerful model with exceptional reasoning and creative capabilities.
OpenAI's most advanced reasoning model with breakthrough performance in complex problem-solving.
OpenAI's reasoning model optimized for complex problem-solving, mathematics, and coding tasks.
Anthropic's balanced model offering excellent performance across all domains.
Alibaba's most powerful Qwen3 model with state-of-the-art performance across all benchmarks.
Anthropic's most powerful model with exceptional reasoning and creative capabilities.
Google's optimized model balancing speed and performance for efficient deployment.
OpenAI's omni-modal model with native audio, vision, and text capabilities.
Mistral AI's most advanced model with superior multilingual and coding performance.
OpenAI's enhanced multimodal model with improved reasoning and efficiency.
Anthropic's most capable model, excelling at coding, writing, and complex reasoning tasks.
DeepSeek's advanced model with strong coding and reasoning capabilities.
Google's advanced model with 2M token context window and strong multimodal capabilities.
Alibaba's flagship open-source model with exceptional multilingual and coding capabilities.
Meta's efficient large model offering strong performance with lower computational requirements.
DeepSeek's advanced model with strong coding and reasoning capabilities.
xAI's most advanced model with real-time information access and enhanced reasoning.
xAI's improved model with enhanced conversational abilities and real-time data access.
Anthropic's fastest model, optimized for speed while maintaining strong capabilities.
DeepSeek's advanced model with strong coding and reasoning capabilities.
Meta's next-generation open-source model with state-of-the-art capabilities.
Select | Model | ||||||||
---|---|---|---|---|---|---|---|---|---|
ChatGPT 5 OpenAI's most advanced reasoning model with breakthrough performance in complex problem-solving. Top Tier Reasoning | 97.3 | 84.6 | 84.2 | 85.7 | 56.5 | 81.1 | 88.8 | 92.6 | |
Gemini 2.5 Pro Google's flagship model with exceptional multimodal capabilities and massive context window. Top Tier Reasoning | 95.7 | 89.8 | 84.0 | 88.4 | 52.3 | 80.0 | 89.0 | 89.0 | |
Claude 4.1 Opus Anthropic's most powerful model with exceptional reasoning and creative capabilities. Best for Coding | 95.5 | 88.8 | 77.1 | 79.6 | 58.9 | 82.4 | 89.5 | 78.0 | |
Claude 4 Opus Anthropic's most powerful model with exceptional reasoning and creative capabilities. Best for Coding | 93.3 | 88.8 | 76.5 | 79.6 | 55.9 | 81.4 | 88.8 | 75.5 | |
ChatGPT o3 OpenAI's most advanced reasoning model with breakthrough performance in complex problem-solving. | 91.6 | 85.6 | 82.9 | 83.3 | 49.6 | 70.4 | 88.8 | 88.9 | |
ChatGPT o1 OpenAI's reasoning model optimized for complex problem-solving, mathematics, and coding tasks. | 89.5 | 89.3 | 78.2 | 78.0 | 50.5 | 73.5 | 82.1 | 79.3 | |
Claude 4 Sonnet Anthropic's balanced model offering excellent performance across all domains. Best for Coding | 88.3 | 74.4 | 74.4 | 75.4 | 54.1 | 80.5 | 86.5 | 70.5 | |
Qwen3 480B Open Source Alibaba's most powerful Qwen3 model with state-of-the-art performance across all benchmarks. | 87.4 | 82.3 | 82.4 | 78.3 | 47.1 | 70.9 | 80.8 | 83.6 | |
Claude 3.7 Sonnet Anthropic's most powerful model with exceptional reasoning and creative capabilities. Great for Creative Tasks | 87.2 | 88.8 | 75.0 | 68.0 | 52.8 | 81.2 | 83.2 | 61.3 | |
Gemini 2.5 Flash Google's optimized model balancing speed and performance for efficient deployment. Top Tier Reasoning | 85.6 | 88.4 | 79.7 | 82.8 | 42.6 | 72.3 | 87.2 | 72.0 | |
ChatGPT 4o OpenAI's omni-modal model with native audio, vision, and text capabilities. Great for Creative Tasks | 83.7 | 88.7 | 69.1 | 53.6 | 46.7 | 78.0 | 90.1 | 76.6 | |
Mistral Large 3 Mistral AI's most advanced model with superior multilingual and coding performance. Great for Creative Tasks | 83.4 | 81.3 | 73.8 | 80.2 | 40.9 | 78.9 | 86.4 | 72.6 | |
ChatGPT 4.1 OpenAI's enhanced multimodal model with improved reasoning and efficiency. | 80.2 | 74.8 | 71.8 | 66.3 | 42.5 | 68.0 | 83.7 | 79.5 | |
Claude 3.5 Sonnet Anthropic's most capable model, excelling at coding, writing, and complex reasoning tasks. | 77.8 | 88.7 | 68.3 | 59.4 | 54.6 | 71.5 | 79.2 | 16.0 | |
DeepSeek-V3 Open Source DeepSeek's advanced model with strong coding and reasoning capabilities. | 76.8 | 84.1 | 65.2 | 70.9 | 48.6 | 70.0 | 71.8 | 32.0 | |
Gemini 1.5 Pro Google's advanced model with 2M token context window and strong multimodal capabilities. Great for Creative Tasks | 73.1 | 85.9 | 62.2 | 63.9 | 36.5 | 75.7 | 88.0 | 40.0 | |
Qwen2.5 72B Open Source Alibaba's flagship open-source model with exceptional multilingual and coding capabilities. | 71.5 | 72.3 | 75.2 | 49.8 | 40.8 | 72.4 | 85.0 | 37.0 | |
Llama 3.1 70B Open Source Meta's efficient large model offering strong performance with lower computational requirements. | 70.4 | 79.6 | 68.9 | 46.7 | 34.5 | 73.8 | 60.3 | 70.2 | |
DeepSeek-V3 Open Source DeepSeek's advanced model with strong coding and reasoning capabilities. | 62.5 | 81.7 | 65.2 | 43.9 | 29.4 | 70.0 | 71.8 | 32.0 | |
Grok 4 xAI's most advanced model with real-time information access and enhanced reasoning. Top Tier Reasoning | 54.7 | 87.6 | 72.1 | 87.5 | - | - | 83.2 | 74.5 | |
Grok 3 xAI's improved model with enhanced conversational abilities and real-time data access. | 54.7 | 85.7 | 76.0 | 80.2 | - | - | 80.1 | 83.0 | |
Claude 3 Haiku Anthropic's fastest model, optimized for speed while maintaining strong capabilities. | 53.0 | 75.2 | 46.4 | 33.3 | 26.0 | 61.8 | 65.4 | 23.0 | |
DeepSeek R1 Open Source DeepSeek's advanced model with strong coding and reasoning capabilities. | 44.1 | - | 76.0 | 71.5 | 25.2 | - | - | 79.8 | |
Llama 4 405B Open Source Meta's next-generation open-source model with state-of-the-art capabilities. | 41.3 | 85.5 | 73.4 | 69.8 | - | - | 84.6 | - |