AI Models Directory — Benchmark Profiles for 118+ Models

Browse all AI models with benchmark scores, pricing, and performance data. Click any model for detailed analysis.

OpenAI

23 models

GPT-5.2

Flagship
Arena ELO
1475
SWE-bench
80.0%
Input
$1.8/1M
Output
$14.0/1M

GPT-OSS 120B

Open Source
Arena ELO
SWE-bench
Input
/1M
Output
/1M

GPT-OSS 20B

Open Source
Arena ELO
SWE-bench
Input
/1M
Output
/1M

Sora 2

Flagship
Arena ELO
1368
SWE-bench
Input
/1M
Output
/1M

GPT-5.1 Codex Mini

Mid-Range
Arena ELO
1310
SWE-bench
55.0%
Input
$0.25/1M
Output
$2.0/1M

GPT-5.1 Codex

Flagship
Arena ELO
1395
SWE-bench
78.0%
Input
$1.3/1M
Output
$10.0/1M

GPT-5.1

Flagship
Arena ELO
1464
SWE-bench
76.3%
Input
$1.3/1M
Output
$10.0/1M

GPT-5 Nano

Budget
Arena ELO
1200
SWE-bench
25.0%
Input
$0.05/1M
Output
$0.40/1M

GPT-5 Mini

Mid-Range
Arena ELO
1300
SWE-bench
48.0%
Input
$0.25/1M
Output
$2.0/1M

GPT-5

Flagship
Arena ELO
1390
SWE-bench
74.9%
Input
$1.3/1M
Output
$10.0/1M

o3 Pro

Flagship
Arena ELO
1410
SWE-bench
70.0%
Input
$20.0/1M
Output
$80.0/1M

o4 Mini

Mid-Range
Arena ELO
1350
SWE-bench
68.1%
Input
$1.1/1M
Output
$4.4/1M

o3

Flagship
Arena ELO
1380
SWE-bench
69.1%
Input
$2.0/1M
Output
$8.0/1M

GPT-4.1 Nano

Budget
Arena ELO
1120
SWE-bench
18.0%
Input
$0.10/1M
Output
$0.40/1M

GPT-4.1 Mini

Budget
Arena ELO
1250
SWE-bench
35.0%
Input
$0.40/1M
Output
$1.6/1M

GPT-4.1

Mid-Range
Arena ELO
1340
SWE-bench
50.0%
Input
$2.0/1M
Output
$8.0/1M

GPT-4.5

Flagship
Arena ELO
1310
SWE-bench
38.0%
Input
$75.0/1M
Output
$150.0/1M

o3 Mini

Mid-Range
Arena ELO
1320
SWE-bench
50.0%
Input
$1.1/1M
Output
$4.4/1M

o1 Mini

Mid-Range
Arena ELO
1300
SWE-bench
45.0%
Input
$3.0/1M
Output
$12.0/1M

o1

Flagship
Arena ELO
1360
SWE-bench
48.9%
Input
$15.0/1M
Output
$60.0/1M

GPT-4o Mini

Budget
Arena ELO
1220
SWE-bench
20.0%
Input
$0.15/1M
Output
$0.60/1M

GPT-4o

Mid-Range
Arena ELO
1280
SWE-bench
30.7%
Input
$2.5/1M
Output
$10.0/1M

DALL-E 3

Flagship
Arena ELO
SWE-bench
Input
/1M
Output
/1M

Google

13 models

Anthropic

11 models

About the AI Models Directory

The AI Value Index tracks 118+ large language models from leading providers including OpenAI, Google, Anthropic, Qwen, Mistral, and more. Each model profile includes benchmark scores across general intelligence, coding, math, reasoning, speed, and cost metrics.

Models are categorized as Flagship, Mid-Range, Budget, or Open Source based on their capability tier and pricing. Click any model to view its full benchmark profile, or use the Compare tool to see side-by-side comparisons, or check Pricing for detailed cost analysis.

Frequently Asked Questions

How many AI models does the AI Value Index track?

The AI Value Index currently tracks 49+ large language models from 8+ leading providers including OpenAI, Anthropic, Google, Meta, DeepSeek, xAI, Qwen, and Mistral. New models are added as they launch.

What is the difference between Flagship, Mid-Range, Budget, and Open Source models?

Flagship models (e.g. GPT-5.2, Claude Opus 4.6) offer peak capability at premium prices. Mid-Range models balance quality and cost. Budget models (e.g. GPT-5 Nano) prioritize low cost for high-volume use. Open Source models (e.g. Llama, Qwen) can be self-hosted and fine-tuned freely.

Which AI provider has the most models?

OpenAI and Google currently offer the largest model lineups, each with 8+ models spanning flagship to budget tiers. Anthropic, Meta, and DeepSeek each offer 4-6 models, while xAI, Qwen, and Mistral round out the directory.

How often is the AI models directory updated?

The directory is updated within days of a new model launch or pricing change. Benchmark scores are refreshed as new evaluation results become available from official leaderboards and independent testing platforms.

What data is shown on each model profile?

Each model profile includes Chatbot Arena ELO, SWE-bench Verified, MMLU-Pro, HumanEval, and 20+ other benchmark scores, plus input and output pricing per 1M tokens, output speed, context window size, and provider details.