Back

Benchmarks Comparison

Benchmark Comparison

Compare AI model performance across standardized tests

Individual Benchmark Results

This table shows performance on specific standardized tests (MMLU, HumanEval, GPQA, etc.) rather than category aggregations. Scores represent pass rates or accuracy percentages on each benchmark.

Filters & View Options

203 models

Select benchmarks to display (showing first 10):

Model
Provider
AIME
AA Coding Index
AAII
AA Math Index
DROP
GPQA
HLE
HumanEval
LiveCodeBench
MATH-500
claude-3-5-sonnet-20240620
anthropic
15.730.215.987.159.73.992.038.177.1
claude-3-haiku-20240307
anthropic
1.09.378.433.375.915.439.4
claude-3-opus-20240229
anthropic
3.319.512.583.149.63.184.927.964.1
claude-3.5-haiku
anthropic
3.310.718.783.141.23.588.131.472.1
claude-3.7-sonnet
anthropic
48.727.634.656.377.210.347.394.7
claude-haiku-4.5
anthropic
29.631.039.073.04.351.1
claude-opus-4
anthropic
75.734.027.473.379.611.763.698.2
claude-opus-4.1
anthropic
36.531.980.380.911.965.4
claude-opus-4.5
anthropic
47.849.791.386.628.487.1
claude-opus-4.6
anthropic
47.646.491.318.6
claude-sonnet-4
anthropic
77.334.138.674.377.79.665.599.1
claude-sonnet-4.5
anthropic
38.642.988.083.417.371.4
llama3-1-70b-instruct-v1.0
bedrock
17.310.912.24.079.641.34.680.523.264.9
llama3-1-8b-instruct-v1.0
bedrock
7.74.911.74.359.528.15.172.611.651.9
llama3-2-3b-instruct-v1.0
bedrock
6.79.73.332.85.28.348.9
mistral-7b-instruct-v0.2
bedrock
0.07.417.74.34.612.1
DeepSeek-R1
deepinfra
89.324.027.076.081.314.977.098.3
DeepSeek-R1-Distill-Llama-70B
deepinfra
67.011.416.053.765.26.126.693.5
DeepSeek-V3
deepinfra
25.316.416.426.091.657.43.635.988.7
deepseek-chat-v3-0324
deepinfra
28.428.049.773.56.357.7
deepseek-chat-v3.1
deepinfra
28.428.049.773.56.357.7
deepseek-r1-0528
deepinfra
81.0
devstral-small
deepinfra
0.312.115.229.341.43.725.463.5
devstral-small-2505
deepinfra
6.712.218.043.44.025.868.4
gemma-3-12b-it
deepinfra
22.06.38.818.340.94.885.413.785.3
gemma-3-27b-it
deepinfra
25.39.610.220.742.64.787.813.788.3
gemma-3-4b-it
deepinfra
6.32.96.312.729.95.271.311.276.6
gpt-oss-120b
deepinfra
28.633.393.479.218.587.8
gpt-oss-20b
deepinfra
18.524.589.370.29.877.7
hermes-3-llama-3.1-70b
deepinfra
2.310.640.14.118.853.8
llama-3.1-405b-instruct
deepinfra
21.314.514.23.084.851.14.289.030.570.3
llama-3.1-70b-instruct
deepinfra
17.310.912.24.079.641.34.680.523.264.9
llama-3.1-8b-instruct
deepinfra
7.74.911.74.359.528.15.172.611.651.9
llama-3.1-nemotron-70b-instruct
deepinfra
24.710.813.411.046.54.616.973.3
llama-3.2-3b-instruct
deepinfra
6.79.73.332.85.28.348.9
llama-3.3-70b-instruct
deepinfra
30.010.714.27.750.24.088.428.877.3
llama-4-maverick
deepinfra
39.015.618.319.368.54.839.788.9
llama-4-scout
deepinfra
28.36.713.514.057.94.329.984.4
mistral-7b-instruct
deepinfra
0.07.417.74.34.612.1
mistral-nemo
deepinfra
mistral-small-24b-instruct-2501
deepinfra
45.384.8
mistral-small-3.1-24b-instruct
deepinfra
6.310.238.14.314.156.3
mistral-small-3.2-24b-instruct
deepinfra
6.310.238.14.314.156.3
mixtral-8x7b-instruct
deepinfra
0.07.729.24.56.629.9
nemotron-nano-9b-v2
deepinfra
8.314.869.757.04.672.4
phi-4
deepinfra
65.8
phi-4-multimodal-instruct
deepinfra
9.310.031.54.413.169.3
phi-4-reasoning-plus
deepinfra
68.9
qwen-2.5-72b-instruct
deepinfra
16.011.915.614.049.04.286.627.685.8
qwen-2.5-7b-instruct
deepinfra
36.484.8
qwen3-14b
deepinfra
76.313.116.255.760.44.352.396.1
qwen3-235b-a22b
deepinfra
32.714.016.923.761.34.734.390.2
qwen3-235b-a22b-2507
deepinfra
94.023.229.591.079.015.078.898.4
qwen3-235b-a22b-thinking-2507
deepinfra
81.1
qwen3-30b-a3b
deepinfra
72.714.215.066.365.96.851.597.5
qwen3-32b
deepinfra
80.713.816.573.066.88.354.696.1
qwen3-coder
deepinfra
22.928.173.79.3
qwq-32b
deepinfra
78.019.729.065.28.263.195.7
deepseek-chat
deepseek
34.632.159.075.110.559.3
deepseek-reasoner
deepseek
36.741.692.083.222.286.2
deepseek-r1-05-28
fireworksai
89.324.027.076.081.314.977.098.3
deepseek-v3-03-24
fireworksai
25.316.416.426.091.657.43.635.988.7
kimi-k2-instruct
fireworksai
75.193.3
qwen3-coder-480b-a35b-instruct
fireworksai
47.724.624.639.361.84.458.594.2
gemini-2.0-flash
gemini
33.013.618.521.762.25.333.493.0
gemini-2.0-flash-lite
gemini
51.5
gemini-2.5-flash
gemini
50.017.820.560.382.85.149.593.2
gemini-2.5-flash-preview
gemini
50.017.820.560.382.85.149.593.2
gemini-2.5-pro
gemini
88.731.934.587.783.721.180.196.7
gemini-2.5-pro-preview
gemini
88.731.934.587.783.721.180.196.7
gemini-3-flash-preview
gemini
37.835.155.790.414.179.7
gemini-3-pro-preview
gemini
46.548.495.791.337.291.7
gemma-2-9b-it
groq
40.2
kimi-k2-0905
groq
25.930.857.376.36.394.561.0
codestral-2501
mistralai
81.1
codestral-2508
mistralai
81.1
devstral-medium
mistralai
6.715.918.64.749.23.833.770.7
mistral-large
mistralai
0.09.935.13.417.852.7
mistral-large-2.1
mistralai
0.09.935.13.417.852.7
mistral-large-2407
mistralai
9.313.00.047.23.226.771.4
mistral-medium-3
mistralai
44.013.618.730.357.84.340.090.7
mistral-medium-3.1
mistralai
18.321.138.358.84.440.6
mistral-saba
mistralai
13.012.142.44.167.7
mistral-small
mistralai
6.310.238.14.314.156.3
mistral-small-3
mistralai
6.310.238.14.314.156.3
mistral-small-3.1
mistralai
6.310.238.14.314.156.3
mistral-small-3.2
mistralai
6.310.238.14.314.156.3
chatgpt-4o-latest
openai
84.0
gpt-3.5-turbo
openai
10.79.070.230.368.044.1
gpt-3.5-turbo-0613
openai
gpt-3.5-turbo-instruct
openai
10.79.070.230.368.044.1
gpt-4.1
openai
43.721.825.634.766.54.645.791.3
gpt-4.1-mini
openai
43.018.522.446.366.44.648.392.5
gpt-4.1-nano
openai
23.711.212.924.051.23.932.684.8
gpt-4o
openai
15.016.717.36.054.33.330.975.9
gpt-4o-2024-05-13
openai
15.016.717.36.083.454.03.390.230.975.9
gpt-4o-mini
openai
11.712.614.742.64.023.478.9
gpt-4o-mini-search-preview
openai
79.740.287.2
gpt-5
openai
95.736.044.694.385.426.584.699.4
gpt-5-chat
openai
21.221.848.368.65.854.3
gpt-5-mini
openai
35.341.090.782.819.783.8
gpt-5-nano
openai
20.326.783.767.68.278.9
gpt-5.1
openai
44.747.694.087.326.586.8
gpt-5.2
openai
48.751.299.090.335.488.9
o1
openai
72.320.530.776.47.788.167.997.0
o1-mini
openai
60.320.460.14.992.457.694.4
o3
openai
90.338.438.388.383.020.080.899.2
o3-mini
openai
77.017.925.976.08.771.797.3
o4-mini
openai
94.025.633.090.779.917.585.998.9
command-a
openrouter
9.79.913.413.052.74.628.781.9
deepseek-r1
openrouter
89.324.027.076.081.314.977.098.3
deepseek-r1-distill-llama-70b
openrouter
67.011.416.053.765.26.126.693.5
deepseek-r1-distill-qwen-32b
openrouter
68.717.263.061.85.527.094.1
deepseek-v3.1-terminus
openrouter
33.733.889.779.215.279.8
deepseek-v3.1-terminus:exacto
openrouter
25.316.416.426.091.657.43.635.988.7
deepseek-v3.2
openrouter
36.741.692.084.022.286.2
deepseek-v3.2-exp
openrouter
30.028.357.779.98.655.4
deepseek-v3.2-speciale
openrouter
37.934.196.787.126.189.6
ernie-4.5-21b-a3b
openrouter
49.314.514.941.328.681.13.546.793.1
ernie-4.5-21b-a3b-thinking
openrouter
49.314.514.941.328.681.13.546.793.1
ernie-4.5-300b-a47b
openrouter
49.314.514.941.328.681.13.546.793.1
ernie-4.5-vl-28b-a3b
openrouter
49.314.514.941.328.681.13.546.793.1
ernie-4.5-vl-424b-a47b
openrouter
49.314.514.941.328.681.13.546.793.1
gemini-2.0-flash-001
openrouter
33.013.618.521.762.25.333.493.0
gemini-2.0-flash-lite-001
openrouter
27.714.753.53.618.587.3
gemini-2.5-flash-image
openrouter
50.017.820.560.382.85.149.593.2
gemini-2.5-flash-lite
openrouter
50.07.412.535.364.63.740.092.6
gemini-2.5-flash-lite-preview-09-2025
openrouter
18.121.668.770.96.668.8
gemini-2.5-pro-preview-05-06
openrouter
88.731.934.587.783.721.180.196.7
gemini-3-pro-image-preview
openrouter
46.548.495.791.337.291.7
gemma-3n-e4b-it
openrouter
23.775.0
glm-4.6v
openrouter
19.723.585.371.98.916.0
glm-4.7
openrouter
36.342.095.085.825.189.4
glm-4.7-flash
openrouter
25.930.175.27.1
glm-5
openrouter
44.249.682.027.2
gpt-oss-120b:exacto
openrouter
28.633.393.479.218.587.8
grok-3-mini-beta
openrouter
93.325.232.084.779.111.169.699.2
grok-4-fast
openrouter
27.434.989.784.717.083.2
grok-4.1-fast
openrouter
30.938.589.385.317.682.2
intellect-3
openrouter
19.122.188.076.112.177.7
kimi-k2-thinking
openrouter
34.840.794.783.822.385.3
kimi-k2.5
openrouter
39.546.787.829.4
lfm2-8b-a1b
openrouter
2.36.825.334.44.915.1
llama-3.1-405b
openrouter
21.314.514.23.084.851.14.289.030.570.3
llama-3.1-nemotron-ultra-253b-v1
openrouter
74.713.115.063.774.48.164.195.2
llama-3.3-nemotron-super-49b-v1.5
openrouter
19.37.614.37.766.73.528.077.5
longcat-flash-chat
openrouter
79.173.288.4
mimo-v2-flash
openrouter
31.839.296.384.621.186.8
minimax-m1
openrouter
81.314.120.913.768.77.565.797.2
minimax-m2
openrouter
29.236.078.377.812.582.6
minimax-m2-her
openrouter
29.236.078.377.812.582.6
minimax-m2.1
openrouter
32.839.582.782.022.281.0
minimax-m2.5
openrouter
37.442.084.819.1
nemotron-3-nano-30b-a3b
openrouter
15.813.313.375.04.636.0
nemotron-nano-12b-v2-vl
openrouter
5.910.126.743.94.534.5
nova-lite-v1
openrouter
10.75.112.47.080.242.64.685.416.776.5
nova-micro-v1
openrouter
8.04.110.36.079.337.94.781.114.070.3
nova-premier-v1
openrouter
17.013.818.917.356.94.731.783.9
nova-pro-v1
openrouter
10.711.013.57.085.448.43.489.023.378.6
olmo-3-32b-think
openrouter
10.512.073.761.05.967.2
olmo-3-7b-instruct
openrouter
3.48.141.340.05.826.6
olmo-3-7b-think
openrouter
7.69.570.751.65.761.7
olmo-3.1-32b-instruct
openrouter
5.612.053.94.9
olmo-3.1-32b-think
openrouter
9.814.277.359.16.069.5
qwen-2.5-coder-32b-instruct
openrouter
12.012.941.73.892.729.576.7
qwen-turbo
openrouter
12.012.041.04.216.380.5
qwen3-4b
openrouter
9.518.682.766.75.964.1
qwen3-8b
openrouter
74.79.013.119.058.94.240.690.4
qwen3-max
openrouter
26.431.380.776.411.176.7
qwen3-max-thinking
openrouter
24.532.582.377.612.053.5
qwen3-next-80b-a3b-instruct
openrouter
15.320.166.373.47.368.4
qwen3-next-80b-a3b-thinking
openrouter
77.2
qwen3-vl-235b-a22b-instruct
openrouter
16.520.670.771.26.359.4
qwen3-vl-235b-a22b-thinking
openrouter
qwen3-vl-30b-a3b-instruct
openrouter
29.719.420.029.070.44.040.389.3
qwen3-vl-30b-a3b-thinking
openrouter
74.4
qwen3-vl-32b-instruct
openrouter
15.617.268.368.06.351.4
qwen3-vl-8b-instruct
openrouter
7.314.327.342.72.933.2
qwen3-vl-8b-thinking
openrouter
69.9
qwen3.5-397b-a17b
openrouter
41.345.088.927.3
sonar
openrouter
48.715.547.17.329.581.7
sonar-pro
openrouter
29.015.257.87.927.574.5
sonar-pro-search
openrouter
29.015.257.87.927.574.5
sonar-reasoning-pro
openrouter
79.024.695.7
step-3.5-flash
openrouter
deepseek-r1-0528-qwen3-8b
parasail
89.324.027.076.081.314.977.098.3
glm-4.5
parasail
87.326.326.273.778.612.273.897.9
glm-4.5v
parasail
10.812.515.357.33.635.2
DeepSeek-R1-Distill-Qwen-1.5B
togetherai
17.79.122.033.83.37.068.7
DeepSeek-R1-Distill-Qwen-14B
togetherai
66.715.855.759.14.437.694.9
Llama-3.1-Nemotron-70B-Instruct-HF
togetherai
24.710.813.411.046.54.616.973.3
QwQ-32B-Preview
togetherai
45.315.265.24.833.791.0
Qwen2-72B-Instruct
togetherai
14.711.742.43.786.015.970.1
gemma-2-27b-it
togetherai
51.8
grok-2
xai
56.088.4
grok-3
xai
21.6
grok-3-mini
xai
93.325.232.084.779.111.169.699.2
grok-4
xai
94.340.541.492.787.623.981.999.0
grok-code-fast-1
xai
23.728.743.372.77.565.7
glm-4.5-air
zai
67.323.823.280.774.26.868.496.5
glm-4.5-airx
zai
67.323.823.280.774.26.868.496.5
glm-4.5-x
zai
87.326.326.273.778.612.273.897.9
glm-4.6
zai
30.230.144.381.05.256.1

Showing 203 models with benchmark data. Click column headers to sort. Select benchmarks above to customize view.