AI Value Index — Best AI Model Rankings & Benchmark Leaderboard

Find the best AI model for YOUR use case. Pick your benchmarks, set your weights, see actual data — not abstract scores.

Metrics & Weights

6 metrics selected — total weight: 100%

Preset

Active metrics

SWE-bench Verified 30%HumanEval+ 15%Chatbot Arena ELO 15%Output Speed 15%Input Cost 13%Output Cost 13%

#	Model		30%	15%	15%	15%	12.5%	12.5%
1	Gemini 3 Flash Google	87.2	78.0%	86.0%	1473	200	$0.50	$3.0
2	GPT-5.2 OpenAI	85.1	80.0%	95.0%	1475	90	$1.8	$14.0
3	Gemini 3.1 Pro Google	84.6	80.6%	93.0%	1501	65	$2.0	$12.0
4	Claude Opus 4.6 Anthropic	83.2	80.8%	93.5%	1496	68	$5.0	$25.0
5	GPT-5.1 OpenAI	83.1	76.3%	93.0%	1464	95	$1.3	$10.0
6	GPT-5.1 Codex OpenAI	81.8	78.0%	96.0%	1395	85	$1.3	$10.0
7	Claude Opus 4.5 Anthropic	80.8	80.9%	92.0%	1468	50	$5.0	$25.0
8	GPT-5 OpenAI	79.5	74.9%	92.0%	1390	100	$1.3	$10.0
9	Claude Sonnet 4.6 Anthropic	78.6	79.6%	91.0%	1395	57	$3.0	$15.0
10	Grok 4.1 Fast xAI	77.2	60.0%	88.0%	1482	120	$0.20	$0.50
11	Claude Sonnet 4.5 Anthropic	77.1	77.2%	90.0%	1380	67	$3.0	$15.0
12	Gemini 3 Pro Google	76.7	65.0%	91.0%	1492	60	$2.0	$12.0
13	Grok 4 xAI	76.3	72.0%	90.0%	1430	55	$3.0	$15.0
14	DeepSeek V3.2 DeepSeek	75.7	73.0%	87.0%	1370	70	$0.14	$0.28
15	Qwen 3.5 397B Qwen	75.2	76.4%	88.0%	1350	45	$0.60	$3.6
16	o4 Mini OpenAI	75.1	68.1%	88.0%	1350	120	$1.1	$4.4
17	Gemini 2.5 Pro Google	74.5	63.8%	89.0%	1465	55	$1.3	$10.0
18	Claude Sonnet 4 Anthropic	74.3	72.7%	88.0%	1365	75	$3.0	$15.0
19	o3 OpenAI	73.2	69.1%	90.0%	1380	40	$2.0	$8.0
20	Qwen 3 Coder Qwen	71.4	66.5%	90.0%	1290	80	$0.20	$0.80
21	GPT-5.1 Codex Mini OpenAI	70.7	55.0%	88.0%	1310	170	$0.25	$2.0
22	Claude 3.7 Sonnet Anthropic	68	62.3%	86.0%	1340	70	$3.0	$15.0
23	Claude Opus 4 Anthropic	67.1	72.5%	90.0%	1375	50	$15.0	$75.0
24	DeepSeek R1 0528 DeepSeek	67	55.0%	88.0%	1375	40	$0.55	$2.2
25	GPT-5 Mini OpenAI	66.7	48.0%	85.0%	1300	180	$0.25	$2.0
26	Grok 4 Fast xAI	66.2	48.0%	85.0%	1370	110	$0.20	$0.50
27	o3 Pro OpenAI	65.7	70.0%	92.0%	1410	25	$20.0	$80.0
28	Gemini 2.5 Flash Google	65.4	38.0%	82.0%	1320	251	$0.30	$2.5
29	o3 Mini OpenAI	65	50.0%	87.0%	1320	100	$1.1	$4.4
30	GPT-4.1 OpenAI	64.6	50.0%	89.0%	1340	70	$2.0	$8.0
31	DeepSeek R1 DeepSeek	62.8	49.2%	86.0%	1355	35	$0.55	$2.2
32	Mistral Large 25.12 Mistral	62.4	42.0%	82.0%	1418	70	$0.50	$1.5
33	Qwen 3 Max Qwen	61.9	46.0%	86.0%	1340	55	$0.46	$1.8
34	DeepSeek V3.1 DeepSeek	61.8	46.0%	84.0%	1340	65	$0.14	$0.28
35	o1 Mini OpenAI	60.9	45.0%	89.0%	1300	80	$3.0	$12.0
36	Grok 3 xAI	60	42.0%	82.0%	1402	65	$3.0	$15.0
37	Qwen 3 235B Qwen	59	42.0%	84.0%	1320	60	$0.20	$0.80
38	Claude 3.5 Sonnet Anthropic	58	49.0%	81.7%	1268	70	$3.0	$15.0
39	Codestral Mistral	58	40.0%	86.0%	1260	90	$0.30	$0.90
40	o1 OpenAI	56.8	48.9%	89.0%	1360	35	$15.0	$60.0
41	GPT-4o OpenAI	56.6	30.7%	87.2%	1280	143	$2.5	$10.0
42	GPT-4.1 Mini OpenAI	56.5	35.0%	80.0%	1250	160	$0.40	$1.6
43	Llama 4 Maverick Meta	55	35.0%	80.0%	1280	100	$0.25	$0.80
44	Gemini 2.0 Flash Google	54.7	28.0%	76.0%	1240	220	$0.10	$0.40
45	Claude Haiku 4.5 Anthropic	53.1	30.0%	78.0%	1220	180	$1.0	$5.0
46	Grok 3 Mini xAI	52.8	30.0%	78.0%	1260	130	$0.30	$0.50
47	Qwen 3 32B Qwen	52.3	30.0%	79.0%	1250	120	$0.10	$0.30
48	Pixtral Large Mistral	52	35.0%	80.0%	1270	60	$2.0	$6.0
49	GPT-5 Nano OpenAI	51.8	25.0%	72.0%	1200	250	$0.05	$0.40
50	Llama 4 Scout Meta	50.5	28.0%	75.0%	1240	140	$0.17	$0.50
51	Gemini 2.5 Flash Lite Google	50.5	22.0%	70.0%	1230	240	$0.10	$0.40
52	GPT-4o Mini OpenAI	50.3	20.0%	78.0%	1220	200	$0.15	$0.60
53	Mistral Medium 3.1 Mistral	50.1	28.0%	77.0%	1250	110	$0.40	$2.0
54	Command A Cohere	49.3	32.0%	78.0%	1250	70	$2.5	$10.0
55	Qwen 2.5 72B Qwen	49.2	28.0%	80.0%	1230	80	$0.12	$0.39
56	Grok 2 xAI	48.1	28.0%	78.0%	1250	80	$2.0	$10.0
57	Claude 3.5 Haiku Anthropic	46.3	22.0%	74.0%	1180	170	$0.80	$4.0
58	Mistral Small 3.2 Mistral	44.4	20.0%	70.0%	1190	160	$0.10	$0.30
59	Llama 3.3 70B Meta	43.4	22.0%	72.0%	1210	90	$0.10	$0.30
60	GPT-4.1 Nano OpenAI	40.9	18.0%	65.0%	1120	200	$0.10	$0.40
61	Command R+ Cohere	39.6	20.0%	72.0%	1200	60	$2.5	$10.0
62	Nova Pro Amazon	39.2	18.0%	68.0%	1180	100	$0.80	$3.2
63	Claude 3 Opus Anthropic	35	22.0%	78.0%	1240	25	$15.0	$75.0
64	GPT-4.5 OpenAI	33.5	38.0%	88.0%	1310	60	$75.0	$150.0
65	Command R Cohere	32.4	12.0%	62.0%	1130	90	$0.15	$0.60
66	Nova Lite Amazon	31.1	10.0%	55.0%	1110	150	$0.06	$0.24
67	Reka Flash 3 Reka AI	30.4	—%	—%	—	136	$0.20	$0.80
68	Jamba 1.5 Mini AI21 Labs	25.7	—%	—%	—	35	$0.20	$0.40
69	Yi Lightning 01.AI	25	—%	—%	—	—	$0.14	$0.14
70	Sonar Perplexity	24.8	—%	—%	—	—	$1.0	$1.0
71	Sonar Pro Perplexity	24.7	—%	—%	—	50	$3.0	$15.0
72	Sonar Reasoning Pro Perplexity	24.2	—%	—%	—	22	$2.0	$8.0
73	Jamba 1.5 Large AI21 Labs	24	—%	—%	—	19	$2.0	$8.0
74	GLM-5 Zhipu AI	15	—%	—%	1456	55	—	—
75	GPT-OSS 120B OpenAI	15	—%	—%	—	339	—	—
76	MiniMax M2.5 MiniMax	14.6	—%	—%	1443	59	—	—
77	Doubao Seed 2.0 ByteDance	14	—%	—%	1474	—	—	—
78	GPT-OSS 20B OpenAI	13.7	—%	—%	—	312	—	—
79	Veo 3.1 Google	11.2	—%	—%	1401	—	—	—
80	Gemma 3 27B Google	10.6	—%	—%	1339	58	—	—
81	Sora 2 OpenAI	9.9	—%	—%	1368	—	—	—
82	Nova 2.0 Lite Amazon	9.5	—%	—%	—	221	—	—
83	Veo 3 Google	8.8	—%	—%	1340	—	—	—
84	Seedream 4.5 ByteDance	7.9	—%	—%	1316	—	—	—
85	Ministral 3 8B Mistral	7.2	—%	—%	—	172	—	—
86	MiMo V2 Flash Xiaomi	6.4	—%	—%	—	155	—	—
87	Phi-4 Mini Microsoft	6.1	—%	—%	—	150	—	—
88	Qwen 3 Next 80B Qwen	5.6	—%	—%	—	138	—	—
89	Phi-4 Microsoft	3.5	—%	—%	—	93	—	—
90	Gemma 3 12B Google	3.3	—%	—%	—	90	—	—
91	Ministral 3 14B Mistral	3	—%	—%	—	83	—	—
92	Ring Flash 2.0 InclusionAI	3	—%	—%	—	83	—	—
93	Nemotron 3 Nano NVIDIA	2.7	—%	—%	—	76	—	—
94	GLM-4.6V Zhipu AI	2.3	—%	—%	—	68	—	—
95	Step 2.5 Flash StepFun	2.3	—%	—%	—	67	—	—
96	K-EXAONE LG AI Research	2	—%	—%	—	62	—	—
97	Qwen 3 Coder 480B Qwen	1.9	—%	—%	—	60	—	—
98	Qwen 3 VL 235B Qwen	1.3	—%	—%	—	46	—	—
99	Kimi K2.5 Moonshot AI	1.2	—%	—%	—	45	—	—
100	Magistral Medium 1.2 Mistral	0.5	—%	—%	—	29	—	—
101	Phi-4 Reasoning Plus Microsoft	0	—%	—%	—	—	—	—
102	Gemma 3 4B Google	0	—%	—%	—	—	—	—
103	Mistral Large 3 Mistral	0	—%	—%	—	—	—	—
104	Kling 2.5 Turbo Kuaishou	0	—%	—%	—	—	—	—
105	DALL-E 3 OpenAI	0	—%	—%	—	—	—	—
106	Midjourney v7 Midjourney	0	—%	—%	—	—	—	—
107	Midjourney v6.1 Midjourney	0	—%	—%	—	—	—	—
108	Stable Diffusion 3.5 Stability AI	0	—%	—%	—	—	—	—
109	Flux 1.1 Pro Black Forest Labs	0	—%	—%	—	—	—	—
110	Flux 1.0 Dev Black Forest Labs	0	—%	—%	—	—	—	—
111	Imagen 4 Google	0	—%	—%	—	—	—	—
112	Ideogram 3.0 Ideogram	0	—%	—%	—	—	—	—
113	ERNIE 4.5 Baidu	0	—%	—%	—	—	—	—
114	ERNIE X1 Baidu	0	—%	—%	—	—	—	—
115	Hermes 4 70B Nous Research	0	—%	—%	—	—	—	—
116	Luma Ray 3 Luma AI	0	—%	—%	—	—	—	—
117	Runway Gen-4 Runway	0	—%	—%	—	—	—	—
118	Pika 2.5 Pika	0	—%	—%	—	—	—	—

Data sourced from Chatbot Arena, OpenRouter, and public benchmarks. Updated daily. Scores are dynamically computed based on your selected metrics.

How It Works

Choose a Persona or Build Your Own

Select a preset (Developer, Researcher, Business, etc.) that pre-selects relevant benchmarks and weights. Or customize everything from scratch.

See Real Data, Not Abstract Scores

Every metric shows the actual value — ELO 1410, $2.50/1M tokens, 65 tok/s. No normalization black box. The raw numbers are always visible.

Dynamic Ranking — Your Weights, Your Score

For each metric you select, we find the min/max across all models, normalize to 0-100, then compute a weighted composite. Missing data is excluded and weights renormalize automatically.

Share Your Rankings

Your exact configuration is encoded in the URL. Share it with your team or embed it — they'll see your exact ranking.

Data sourced from Chatbot Arena, OpenRouter, SWE-bench, and public research papers.

14 benchmarks across General, Coding, Math, Reasoning, Speed, Cost, and Context categories.

Frequently Asked Questions

What is the best AI model in 2026?

The best AI model depends on your use case. As of 2026, top contenders include Gemini 3.1 Pro, Claude Opus 4.6, and GPT-5.2 for general intelligence. For coding, Claude Opus 4.6 and GPT-5.2 lead on SWE-bench. For budget-conscious users, DeepSeek V3.2 and Gemini 2.5 Flash offer excellent performance per dollar. Use the AI Value Index to rank models based on YOUR priorities.

How do AI benchmarks work?

AI benchmarks are standardized tests that evaluate language models across specific capabilities. Common benchmarks include Chatbot Arena ELO (human preference voting), SWE-bench (real software engineering tasks), MMLU-Pro (knowledge and reasoning), GPQA Diamond (graduate-level science), and MATH (competition mathematics). Each benchmark tests a different aspect of model capability, and no single benchmark tells the whole story.

What is Chatbot Arena ELO?

Chatbot Arena ELO is a human preference ranking system where users compare AI model responses in blind head-to-head matchups. The ELO rating (borrowed from chess) reflects how often a model is preferred over others. Higher ELO means the model is more frequently preferred. It's considered one of the most reliable benchmarks because it uses real human judgment rather than automated scoring.

Which AI model is best for coding?

For coding tasks in 2026, the top models are Claude Opus 4.6 (80.8% SWE-bench), GPT-5.2 (80.0% SWE-bench), and Gemini 3.1 Pro (80.6% SWE-bench). For more affordable coding, GPT-5.1 Codex and Qwen 3 Coder offer strong performance at lower costs. The AI Value Index Developer persona pre-weights coding benchmarks to help you find the best fit.

Which AI model is cheapest?

The cheapest AI models by API pricing include GPT-5 Nano ($0.05/1M input), GPT-4.1 Nano ($0.10/1M input), Nova Lite ($0.06/1M input), and Gemini 2.5 Flash Lite ($0.10/1M input). For the best balance of quality and cost, DeepSeek V3.2 ($0.14/1M input) and Qwen 3 32B ($0.10/1M input) offer strong performance at budget prices.