AI Value Index — Best AI Model Rankings & Benchmark Leaderboard

Find the best AI model for YOUR use case. Pick your benchmarks, set your weights, see actual data — not abstract scores.

Metrics & Weights

6 metrics selected — total weight: 100%

Preset

Active metrics

SWE-bench Verified 30%HumanEval+ 15%Chatbot Arena ELO 15%Output Speed 15%Input Cost 13%Output Cost 13%
#Model
1
Gemini 3 Flash
Google
87.2
2
GPT-5.2
OpenAI
85.1
3
Gemini 3.1 Pro
Google
84.6
4
Claude Opus 4.6
Anthropic
83.2
5
GPT-5.1
OpenAI
83.1
6
GPT-5.1 Codex
OpenAI
81.8
7
Claude Opus 4.5
Anthropic
80.8
8
GPT-5
OpenAI
79.5
9
Claude Sonnet 4.6
Anthropic
78.6
10
Grok 4.1 Fast
xAI
77.2
11
Claude Sonnet 4.5
Anthropic
77.1
12
Gemini 3 Pro
Google
76.7
13
Grok 4
xAI
76.3
14
DeepSeek V3.2
DeepSeek
75.7
15
Qwen 3.5 397B
Qwen
75.2
16
o4 Mini
OpenAI
75.1
17
Gemini 2.5 Pro
Google
74.5
18
Claude Sonnet 4
Anthropic
74.3
19
o3
OpenAI
73.2
20
Qwen 3 Coder
Qwen
71.4
21
GPT-5.1 Codex Mini
OpenAI
70.7
22
Claude 3.7 Sonnet
Anthropic
68
23
Claude Opus 4
Anthropic
67.1
24
DeepSeek R1 0528
DeepSeek
67
25
GPT-5 Mini
OpenAI
66.7
26
Grok 4 Fast
xAI
66.2
27
o3 Pro
OpenAI
65.7
28
Gemini 2.5 Flash
Google
65.4
29
o3 Mini
OpenAI
65
30
GPT-4.1
OpenAI
64.6
31
DeepSeek R1
DeepSeek
62.8
32
Mistral Large 25.12
Mistral
62.4
33
Qwen 3 Max
Qwen
61.9
34
DeepSeek V3.1
DeepSeek
61.8
35
o1 Mini
OpenAI
60.9
36
Grok 3
xAI
60
37
Qwen 3 235B
Qwen
59
38
Claude 3.5 Sonnet
Anthropic
58
39
Codestral
Mistral
58
40
o1
OpenAI
56.8
41
GPT-4o
OpenAI
56.6
42
GPT-4.1 Mini
OpenAI
56.5
43
Llama 4 Maverick
Meta
55
44
Gemini 2.0 Flash
Google
54.7
45
Claude Haiku 4.5
Anthropic
53.1
46
Grok 3 Mini
xAI
52.8
47
Qwen 3 32B
Qwen
52.3
48
Pixtral Large
Mistral
52
49
GPT-5 Nano
OpenAI
51.8
50
Llama 4 Scout
Meta
50.5
51
Gemini 2.5 Flash Lite
Google
50.5
52
GPT-4o Mini
OpenAI
50.3
53
Mistral Medium 3.1
Mistral
50.1
54
Command A
Cohere
49.3
55
Qwen 2.5 72B
Qwen
49.2
56
Grok 2
xAI
48.1
57
Claude 3.5 Haiku
Anthropic
46.3
58
Mistral Small 3.2
Mistral
44.4
59
Llama 3.3 70B
Meta
43.4
60
GPT-4.1 Nano
OpenAI
40.9
61
Command R+
Cohere
39.6
62
Nova Pro
Amazon
39.2
63
Claude 3 Opus
Anthropic
35
64
GPT-4.5
OpenAI
33.5
65
Command R
Cohere
32.4
66
Nova Lite
Amazon
31.1
67
Reka Flash 3
Reka AI
30.4
68
Jamba 1.5 Mini
AI21 Labs
25.7
69
Yi Lightning
01.AI
25
70
Sonar
Perplexity
24.8
71
Sonar Pro
Perplexity
24.7
72
Sonar Reasoning Pro
Perplexity
24.2
73
Jamba 1.5 Large
AI21 Labs
24
74
GLM-5
Zhipu AI
15
75
GPT-OSS 120B
OpenAI
15
76
MiniMax M2.5
MiniMax
14.6
77
Doubao Seed 2.0
ByteDance
14
78
GPT-OSS 20B
OpenAI
13.7
79
Veo 3.1
Google
11.2
80
Gemma 3 27B
Google
10.6
81
Sora 2
OpenAI
9.9
82
Nova 2.0 Lite
Amazon
9.5
83
Veo 3
Google
8.8
84
Seedream 4.5
ByteDance
7.9
85
Ministral 3 8B
Mistral
7.2
86
MiMo V2 Flash
Xiaomi
6.4
87
Phi-4 Mini
Microsoft
6.1
88
Qwen 3 Next 80B
Qwen
5.6
89
Phi-4
Microsoft
3.5
90
Gemma 3 12B
Google
3.3
91
Ministral 3 14B
Mistral
3
92
Ring Flash 2.0
InclusionAI
3
93
Nemotron 3 Nano
NVIDIA
2.7
94
GLM-4.6V
Zhipu AI
2.3
95
Step 2.5 Flash
StepFun
2.3
96
K-EXAONE
LG AI Research
2
97
Qwen 3 Coder 480B
Qwen
1.9
98
Qwen 3 VL 235B
Qwen
1.3
99
Kimi K2.5
Moonshot AI
1.2
100
Magistral Medium 1.2
Mistral
0.5
101
Phi-4 Reasoning Plus
Microsoft
0
102
Gemma 3 4B
Google
0
103
Mistral Large 3
Mistral
0
104
Kling 2.5 Turbo
Kuaishou
0
105
DALL-E 3
OpenAI
0
106
Midjourney v7
Midjourney
0
107
Midjourney v6.1
Midjourney
0
108
Stable Diffusion 3.5
Stability AI
0
109
Flux 1.1 Pro
Black Forest Labs
0
110
Flux 1.0 Dev
Black Forest Labs
0
111
Imagen 4
Google
0
112
Ideogram 3.0
Ideogram
0
113
ERNIE 4.5
Baidu
0
114
ERNIE X1
Baidu
0
115
Hermes 4 70B
Nous Research
0
116
Luma Ray 3
Luma AI
0
117
Runway Gen-4
Runway
0
118
Pika 2.5
Pika
0

Data sourced from Chatbot Arena, OpenRouter, and public benchmarks. Updated daily. Scores are dynamically computed based on your selected metrics.

How It Works

1

Choose a Persona or Build Your Own

Select a preset (Developer, Researcher, Business, etc.) that pre-selects relevant benchmarks and weights. Or customize everything from scratch.

2

See Real Data, Not Abstract Scores

Every metric shows the actual value — ELO 1410, $2.50/1M tokens, 65 tok/s. No normalization black box. The raw numbers are always visible.

3

Dynamic Ranking — Your Weights, Your Score

For each metric you select, we find the min/max across all models, normalize to 0-100, then compute a weighted composite. Missing data is excluded and weights renormalize automatically.

4

Share Your Rankings

Your exact configuration is encoded in the URL. Share it with your team or embed it — they'll see your exact ranking.

Data sourced from Chatbot Arena, OpenRouter, SWE-bench, and public research papers.

14 benchmarks across General, Coding, Math, Reasoning, Speed, Cost, and Context categories.

Frequently Asked Questions

What is the best AI model in 2026?

The best AI model depends on your use case. As of 2026, top contenders include Gemini 3.1 Pro, Claude Opus 4.6, and GPT-5.2 for general intelligence. For coding, Claude Opus 4.6 and GPT-5.2 lead on SWE-bench. For budget-conscious users, DeepSeek V3.2 and Gemini 2.5 Flash offer excellent performance per dollar. Use the AI Value Index to rank models based on YOUR priorities.

How do AI benchmarks work?

AI benchmarks are standardized tests that evaluate language models across specific capabilities. Common benchmarks include Chatbot Arena ELO (human preference voting), SWE-bench (real software engineering tasks), MMLU-Pro (knowledge and reasoning), GPQA Diamond (graduate-level science), and MATH (competition mathematics). Each benchmark tests a different aspect of model capability, and no single benchmark tells the whole story.

What is Chatbot Arena ELO?

Chatbot Arena ELO is a human preference ranking system where users compare AI model responses in blind head-to-head matchups. The ELO rating (borrowed from chess) reflects how often a model is preferred over others. Higher ELO means the model is more frequently preferred. It's considered one of the most reliable benchmarks because it uses real human judgment rather than automated scoring.

Which AI model is best for coding?

For coding tasks in 2026, the top models are Claude Opus 4.6 (80.8% SWE-bench), GPT-5.2 (80.0% SWE-bench), and Gemini 3.1 Pro (80.6% SWE-bench). For more affordable coding, GPT-5.1 Codex and Qwen 3 Coder offer strong performance at lower costs. The AI Value Index Developer persona pre-weights coding benchmarks to help you find the best fit.

Which AI model is cheapest?

The cheapest AI models by API pricing include GPT-5 Nano ($0.05/1M input), GPT-4.1 Nano ($0.10/1M input), Nova Lite ($0.06/1M input), and Gemini 2.5 Flash Lite ($0.10/1M input). For the best balance of quality and cost, DeepSeek V3.2 ($0.14/1M input) and Qwen 3 32B ($0.10/1M input) offer strong performance at budget prices.