AI Value Index — Best AI Model Rankings & Benchmark Leaderboard

Find the best AI model for YOUR use case. Pick your benchmarks, set your weights, see actual data — not abstract scores.

Metrics & Weights

6 metrics selected — total weight: 100%

Preset

Active metrics

SWE-bench Verified 30%HumanEval+ 15%Chatbot Arena ELO 15%Output Speed 15%Input Cost 13%Output Cost 13%
#Model
1
Gemini 3 Flash
Google
82.4
2
GPT-5.2
OpenAI
80.2
3
Gemini 3.1 Pro
Google
79.7
4
GPT-5.1
OpenAI
78.4
5
Claude Opus 4.6
Anthropic
78.3
6
GPT-5.1 Codex
OpenAI
77
7
Claude Opus 4.5
Anthropic
75.8
8
GPT-5
OpenAI
75
9
Claude Sonnet 4.6
Anthropic
73.7
10
Grok 4.1 Fast
xAI
73.7
11
Gemini 3 Pro
Google
72.8
12
Claude Sonnet 4.5
Anthropic
72.4
13
Grok 4
xAI
71.9
14
DeepSeek V3.2
DeepSeek
71.3
15
o4 Mini
OpenAI
71
16
Gemini 2.5 Pro
Google
70.7
17
Qwen 3.5 397B
Qwen
70.6
18
Claude Sonnet 4
Anthropic
69.9
19
o3
OpenAI
69
20
GPT-5.1 Codex Mini
OpenAI
67.5
21
Qwen 3 Coder
Qwen
67.4
22
Claude 3.7 Sonnet
Anthropic
64.3
23
GPT-5 Mini
OpenAI
64
24
DeepSeek R1 0528
DeepSeek
63.9
25
Grok 4 Fast
xAI
63.6
26
Gemini 2.5 Flash
Google
63.5
27
Claude Opus 4
Anthropic
62.7
28
o3 Mini
OpenAI
62.1
29
GPT-4.1
OpenAI
61.8
30
o3 Pro
OpenAI
61.5
31
Mistral Large 25.12
Mistral
60.2
32
DeepSeek R1
DeepSeek
60.1
33
DeepSeek V3.1
DeepSeek
59.3
34
Qwen 3 Max
Qwen
59.3
35
o1 Mini
OpenAI
58.5
36
Grok 3
xAI
57.8
37
Qwen 3 235B
Qwen
56.8
38
Codestral
Mistral
55.9
39
Claude 3.5 Sonnet
Anthropic
55.3
40
GPT-4o
OpenAI
55.2
41
GPT-4.1 Mini
OpenAI
54.8
42
o1
OpenAI
54
43
Gemini 2.0 Flash
Google
53.4
44
Llama 4 Maverick
Meta
53.2
45
Claude Haiku 4.5
Anthropic
51.7
46
Grok 3 Mini
xAI
51.4
47
Claude Fable 5
Anthropic
51.1
48
Qwen 3 32B
Qwen
50.9
49
GPT-5 Nano
OpenAI
50.8
50
Pixtral Large
Mistral
50.2
51
Gemini 2.5 Flash Lite
Google
49.7
52
GPT-4o Mini
OpenAI
49.6
53
Llama 4 Scout
Meta
49.3
54
Mistral Medium 3.1
Mistral
48.8
55
Qwen 2.5 72B
Qwen
47.9
56
Command A
Cohere
47.7
57
Grok 2
xAI
46.9
58
Claude 3.5 Haiku
Anthropic
45.5
59
Mistral Small 3.2
Mistral
43.7
60
Llama 3.3 70B
Meta
42.6
61
GPT-4.1 Nano
OpenAI
40.3
62
Command R+
Cohere
38.9
63
Nova Pro
Amazon
38.7
64
Claude 3 Opus
Anthropic
34.2
65
Command R
Cohere
32.3
66
GPT-4.5
OpenAI
31.6
67
Nova Lite
Amazon
31.1
68
Reka Flash 3
Reka AI
30.4
69
Jamba 1.5 Mini
AI21 Labs
25.7
70
Yi Lightning
01.AI
25
71
Sonar
Perplexity
24.8
72
Sonar Pro
Perplexity
24.7
73
Sonar Reasoning Pro
Perplexity
24.2
74
Jamba 1.5 Large
AI21 Labs
24
75
GLM-5
Zhipu AI
15
76
GPT-OSS 120B
OpenAI
15
77
MiniMax M2.5
MiniMax
14.6
78
Doubao Seed 2.0
ByteDance
14
79
GPT-OSS 20B
OpenAI
13.7
80
Veo 3.1
Google
11.2
81
Gemma 3 27B
Google
10.6
82
Sora 2
OpenAI
9.9
83
Nova 2.0 Lite
Amazon
9.5
84
Veo 3
Google
8.8
85
Seedream 4.5
ByteDance
7.9
86
Ministral 3 8B
Mistral
7.2
87
MiMo V2 Flash
Xiaomi
6.4
88
Phi-4 Mini
Microsoft
6.1
89
Qwen 3 Next 80B
Qwen
5.6
90
Phi-4
Microsoft
3.5
91
Gemma 3 12B
Google
3.3
92
Ministral 3 14B
Mistral
3
93
Ring Flash 2.0
InclusionAI
3
94
Nemotron 3 Nano
NVIDIA
2.7
95
GLM-4.6V
Zhipu AI
2.3
96
Step 2.5 Flash
StepFun
2.3
97
K-EXAONE
LG AI Research
2
98
Qwen 3 Coder 480B
Qwen
1.9
99
Qwen 3 VL 235B
Qwen
1.3
100
Kimi K2.5
Moonshot AI
1.2
101
Magistral Medium 1.2
Mistral
0.5
102
Phi-4 Reasoning Plus
Microsoft
0
103
Gemma 3 4B
Google
0
104
Mistral Large 3
Mistral
0
105
Kling 2.5 Turbo
Kuaishou
0
106
DALL-E 3
OpenAI
0
107
Midjourney v7
Midjourney
0
108
Midjourney v6.1
Midjourney
0
109
Stable Diffusion 3.5
Stability AI
0
110
Flux 1.1 Pro
Black Forest Labs
0
111
Flux 1.0 Dev
Black Forest Labs
0
112
Imagen 4
Google
0
113
Ideogram 3.0
Ideogram
0
114
ERNIE 4.5
Baidu
0
115
ERNIE X1
Baidu
0
116
Hermes 4 70B
Nous Research
0
117
Luma Ray 3
Luma AI
0
118
Runway Gen-4
Runway
0
119
Pika 2.5
Pika
0

Data sourced from Chatbot Arena, OpenRouter, and public benchmarks. Updated daily. Scores are dynamically computed based on your selected metrics.

How It Works

1

Choose a Persona or Build Your Own

Select a preset (Developer, Researcher, Business, etc.) that pre-selects relevant benchmarks and weights. Or customize everything from scratch.

2

See Real Data, Not Abstract Scores

Every metric shows the actual value — ELO 1410, $2.50/1M tokens, 65 tok/s. No normalization black box. The raw numbers are always visible.

3

Dynamic Ranking — Your Weights, Your Score

For each metric you select, we find the min/max across all models, normalize to 0-100, then compute a weighted composite. Missing data is excluded and weights renormalize automatically.

4

Share Your Rankings

Your exact configuration is encoded in the URL. Share it with your team or embed it — they'll see your exact ranking.

Data sourced from Chatbot Arena, OpenRouter, SWE-bench, and public research papers.

14 benchmarks across General, Coding, Math, Reasoning, Speed, Cost, and Context categories.

Frequently Asked Questions

What is the best AI model in 2026?

The best AI model depends on your use case. As of 2026, top contenders include Gemini 3.1 Pro, Claude Opus 4.6, and GPT-5.2 for general intelligence. For coding, Claude Opus 4.6 and GPT-5.2 lead on SWE-bench. For budget-conscious users, DeepSeek V3.2 and Gemini 2.5 Flash offer excellent performance per dollar. Use the AI Value Index to rank models based on YOUR priorities.

How do AI benchmarks work?

AI benchmarks are standardized tests that evaluate language models across specific capabilities. Common benchmarks include Chatbot Arena ELO (human preference voting), SWE-bench (real software engineering tasks), MMLU-Pro (knowledge and reasoning), GPQA Diamond (graduate-level science), and MATH (competition mathematics). Each benchmark tests a different aspect of model capability, and no single benchmark tells the whole story.

What is Chatbot Arena ELO?

Chatbot Arena ELO is a human preference ranking system where users compare AI model responses in blind head-to-head matchups. The ELO rating (borrowed from chess) reflects how often a model is preferred over others. Higher ELO means the model is more frequently preferred. It's considered one of the most reliable benchmarks because it uses real human judgment rather than automated scoring.

Which AI model is best for coding?

For coding tasks in 2026, the top models are Claude Opus 4.6 (80.8% SWE-bench), GPT-5.2 (80.0% SWE-bench), and Gemini 3.1 Pro (80.6% SWE-bench). For more affordable coding, GPT-5.1 Codex and Qwen 3 Coder offer strong performance at lower costs. The AI Value Index Developer persona pre-weights coding benchmarks to help you find the best fit.

Which AI model is cheapest?

The cheapest AI models by API pricing include GPT-5 Nano ($0.05/1M input), GPT-4.1 Nano ($0.10/1M input), Nova Lite ($0.06/1M input), and Gemini 2.5 Flash Lite ($0.10/1M input). For the best balance of quality and cost, DeepSeek V3.2 ($0.14/1M input) and Qwen 3 32B ($0.10/1M input) offer strong performance at budget prices.