Back

Benchmarks Comparison

Benchmark Comparison

Compare AI model performance across standardized tests

Individual Benchmark Results

This table shows performance on specific standardized tests (MMLU, HumanEval, GPQA, etc.) rather than category aggregations. Scores represent pass rates or accuracy percentages on each benchmark.

Filters & View Options

189 models

Select benchmarks to display (showing first 10):

Model
Provider
AIME
AA Coding Index
AAII
AA Math Index
DROP
GPQA
HLE
HumanEval
LiveCodeBench
MATH-500
claude-3-5-sonnet-20240620
anthropic
15.737.329.987.159.73.992.038.177.1
claude-3-haiku-20240307
anthropic
1.017.09.678.433.375.915.439.4
claude-3-opus-20240229
anthropic
3.325.620.683.149.63.184.927.964.1
claude-opus-4
anthropic
53.159.380.380.911.965.4
claude-opus-4.1
anthropic
56.347.542.336.379.65.954.294.1
claude-sonnet-4
anthropic
40.741.144.438.075.44.044.993.4
mistral-7b-instruct-v0.2
bedrock
0.03.51.017.74.34.612.1
DeepSeek-R1
deepinfra
89.358.752.076.081.314.977.098.3
DeepSeek-R1-Distill-Llama-70B
deepinfra
67.028.930.853.765.26.126.693.5
DeepSeek-R1-Distill-Qwen-32B
deepinfra
68.732.332.763.061.85.527.094.1
DeepSeek-V3
deepinfra
25.335.632.526.091.657.43.635.988.7
deepseek-chat-v3-0324
deepinfra
47.244.849.773.56.357.7
deepseek-chat-v3.1
deepinfra
47.244.849.773.56.357.7
deepseek-r1-0528
deepinfra
devstral-small
deepinfra
0.324.927.229.341.43.725.463.5
devstral-small-2505
deepinfra
6.725.219.643.44.025.868.4
gemma-3-12b-it
deepinfra
22.015.520.918.340.94.885.413.785.3
gemma-3-27b-it
deepinfra
25.317.422.020.742.64.787.813.788.3
gemma-3-4b-it
deepinfra
6.39.314.712.729.95.271.311.276.6
gpt-oss-120b
deepinfra
50.157.993.479.218.563.9
gpt-oss-20b
deepinfra
53.744.861.771.58.572.1
kimi-k2
deepinfra
50.457.376.36.394.561.0
llama-3.1-405b-instruct
deepinfra
21.330.225.73.084.851.14.289.030.570.3
llama-3.1-70b-instruct
deepinfra
17.325.022.64.079.641.34.680.523.264.9
llama-3.1-8b-instruct
deepinfra
7.712.416.94.359.528.15.172.611.651.9
llama-3.1-nemotron-70b-instruct
deepinfra
24.720.123.611.046.54.616.973.3
llama-3.2-3b-instruct
deepinfra
6.76.711.23.332.85.28.348.9
llama-3.3-70b-instruct
deepinfra
30.027.427.97.750.24.088.428.877.3
llama-4-maverick
deepinfra
39.036.435.819.368.54.839.788.9
llama-4-scout
deepinfra
28.323.528.114.057.94.329.984.4
mistral-7b-instruct
deepinfra
0.03.51.017.74.34.612.1
mistral-nemo
deepinfra
0.38.15.231.44.45.739.5
mistral-small-24b-instruct-2501
deepinfra
45.384.8
mistral-small-3.1-24b-instruct
deepinfra
6.314.813.038.14.314.156.3
mistral-small-3.2-24b-instruct
deepinfra
6.314.813.038.14.314.156.3
mixtral-8x7b-instruct
deepinfra
0.04.72.629.24.56.629.9
nemotron-nano-9b-v2
deepinfra
45.538.162.355.74.070.1
phi-4
deepinfra
14.324.624.618.075.556.84.182.623.181.0
phi-4-multimodal-instruct
deepinfra
9.312.112.431.54.413.169.3
phi-4-reasoning-plus
deepinfra
68.9
qwen-2.5-72b-instruct
deepinfra
16.027.229.014.049.04.286.627.685.8
qwen-2.5-7b-instruct
deepinfra
36.484.8
qwen3-14b
deepinfra
28.027.329.258.047.04.228.087.1
qwen3-235b-a22b
deepinfra
32.732.129.923.761.34.734.390.2
qwen3-235b-a22b-2507
deepinfra
32.732.129.923.761.34.734.390.2
qwen3-235b-a22b-thinking-2507
deepinfra
81.1
qwen3-30b-a3b
deepinfra
26.029.326.521.765.84.632.286.3
qwen3-32b
deepinfra
30.328.426.419.753.54.328.886.9
qwen3-coder
deepinfra
47.747.242.339.361.84.458.594.2
qwq-32b
deepinfra
78.049.437.929.065.28.263.195.7
deepseek-chat
deepseek
47.244.849.773.56.357.7
deepseek-reasoner
deepseek
58.854.089.777.913.078.4
deepseek-r1
fireworksai
89.358.752.076.081.314.977.098.3
deepseek-r1-05-28
fireworksai
89.358.752.076.081.314.977.098.3
deepseek-v3
fireworksai
25.335.632.526.091.657.43.635.988.7
deepseek-v3-03-24
fireworksai
25.335.632.526.091.657.43.635.988.7
gpt-oss-120b
fireworksai
50.157.993.479.218.563.9
gpt-oss-20b
fireworksai
53.744.861.771.58.572.1
kimi-k2-instruct
fireworksai
75.193.3
qwen3-235b-a22b
fireworksai
32.732.129.923.761.34.734.390.2
qwen3-coder-480b-a35b-instruct
fireworksai
47.747.242.339.361.84.458.594.2
gemini-1.5-flash-8b
gemini
3.322.316.337.14.521.768.9
gemini-2.0-flash
gemini
33.033.434.021.762.25.333.493.0
gemini-2.0-flash-lite
gemini
51.5
gemini-2.5-flash
gemini
50.039.340.460.382.85.149.593.2
gemini-2.5-pro
gemini
88.761.559.687.783.721.180.196.7
gemini-2.5-pro-preview
gemini
86.4
codestral-2501
mistralai
4.324.513.231.24.524.360.7
codestral-2508
mistralai
4.324.513.231.24.524.360.7
devstral-medium
mistralai
6.731.527.94.749.23.833.770.7
ministral-3b
mistralai
0.08.15.226.05.56.953.7
mistral-large
mistralai
0.019.311.935.13.417.852.7
mistral-large-2.1
mistralai
0.019.311.935.13.417.852.7
mistral-large-2407
mistralai
9.326.922.30.047.23.226.771.4
mistral-medium-3
mistralai
44.036.534.730.357.84.340.090.7
mistral-medium-3.1
mistralai
3.710.98.434.93.49.940.5
mistral-nemo
mistralai
0.38.15.231.44.45.739.5
mistral-saba
mistralai
13.019.642.44.167.7
mistral-small
mistralai
6.314.813.038.14.314.156.3
mistral-small-3
mistralai
6.314.813.038.14.314.156.3
mistral-small-3.1
mistralai
6.314.813.038.14.314.156.3
mistral-small-3.2
mistralai
6.314.813.038.14.314.156.3
chatgpt-4o-latest
openai
32.739.635.625.765.55.042.589.3
gpt-3.5-turbo
openai
8.370.230.368.044.1
gpt-4.1
openai
43.741.943.434.766.54.645.791.3
gpt-4.1-mini
openai
65.0
gpt-4.1-nano
openai
50.3
gpt-4o
openai
15.032.127.06.054.33.330.975.9
gpt-4o-2024-05-13
openai
15.032.127.06.083.454.03.390.230.975.9
gpt-4o-mini
openai
11.723.121.214.742.64.023.478.9
gpt-4o-mini-search-preview
openai
79.740.287.2
gpt-5
openai
95.754.966.794.385.426.566.899.4
gpt-5-chat
openai
46.141.848.368.65.854.3
gpt-5-mini
openai
51.462.390.782.819.763.6
gpt-5-nano
openai
45.648.583.767.68.254.6
o1
openai
72.351.947.276.47.788.167.997.0
o1-mini
openai
60.344.939.260.14.992.457.694.4
o3
openai
90.359.765.288.383.020.078.499.2
o3-mini
openai
77.055.848.176.08.771.797.3
o4-mini
openai
94.063.559.090.779.917.580.498.9
codestral-2501
openrouter
4.324.513.231.24.524.360.7
codestral-2508
openrouter
4.324.513.231.24.524.360.7
command
openrouter
0.712.05.532.34.512.227.9
command-a
openrouter
9.728.428.113.052.74.628.781.9
command-r
openrouter
0.36.61.028.95.14.414.9
command-r-03-2024
openrouter
0.75.51.028.44.84.816.4
command-r-plus
openrouter
0.011.67.133.75.011.140.2
command-r-plus-04-2024
openrouter
0.712.05.532.34.512.227.9
deephermes-3-mistral-24b-preview
openrouter
4.721.115.538.23.919.559.5
deepseek-r1-distill-llama-70b
openrouter
67.028.930.853.765.26.126.693.5
deepseek-r1-distill-llama-8b
openrouter
33.317.619.541.349.04.223.385.3
deepseek-r1-distill-qwen-14b
openrouter
66.730.729.755.759.14.437.694.9
deepseek-r1-distill-qwen-32b
openrouter
68.732.332.763.061.85.527.094.1
deepseek-v3.1-base
openrouter
25.335.632.526.091.657.43.635.988.7
devstral-medium
openrouter
6.731.527.94.749.23.833.770.7
dolphin3.0-mistral-24b
openrouter
44.2
dolphin3.0-r1-mistral-24b
openrouter
44.2
gemini-2.5-flash-lite
openrouter
50.028.930.135.364.63.740.092.6
gemma-2-9b-it
openrouter
0.06.67.831.13.940.212.651.7
gemma-3n-e4b-it
openrouter
23.775.0
grok-2-1212
openrouter
13.327.624.751.03.826.777.8
grok-3-mini-beta
openrouter
84.0
kimi-k2-0905
openrouter
50.457.376.36.394.561.0
llama-3.1-405b
openrouter
21.330.225.73.084.851.14.289.030.570.3
llama-3.1-nemotron-ultra-253b-v1
openrouter
74.749.438.563.774.48.164.195.2
magistral-small-2506
openrouter
68.2
minimax-m1
openrouter
81.351.841.613.768.27.565.797.2
ministral-3b
openrouter
0.08.15.226.05.56.953.7
mistral-7b-instruct-v0.1
openrouter
0.03.51.017.74.34.612.1
mistral-7b-instruct-v0.3
openrouter
0.03.51.017.74.34.612.1
mistral-large
openrouter
0.019.311.935.13.417.852.7
mistral-large-2407
openrouter
9.326.922.30.047.23.226.771.4
mistral-large-2411
openrouter
0.019.311.935.13.417.852.7
mistral-medium-3
openrouter
44.036.534.730.357.84.340.090.7
mistral-medium-3.1
openrouter
3.710.98.434.93.49.940.5
mistral-saba
openrouter
13.019.642.44.167.7
mistral-small
openrouter
6.314.813.038.14.314.156.3
mistral-tiny
openrouter
6.314.813.038.14.314.156.3
nova-lite-v1
openrouter
10.715.321.47.080.242.64.685.416.776.5
nova-micro-v1
openrouter
8.011.717.36.079.337.94.781.114.070.3
nova-pro-v1
openrouter
10.722.125.57.085.448.43.489.023.378.6
qwen-2.5-coder-32b-instruct
openrouter
12.028.321.841.73.892.729.576.7
qwen-turbo
openrouter
12.015.819.141.04.216.380.5
qwen3-30b-a3b-instruct-2507
openrouter
26.029.326.521.765.84.632.286.3
qwen3-30b-a3b-thinking-2507
openrouter
26.029.326.521.765.84.632.286.3
qwen3-8b
openrouter
74.731.628.319.058.94.240.690.4
qwen3-max
openrouter
48.575.076.49.365.1
qwq-32b-preview
openrouter
45.318.728.065.24.833.791.0
r1-1776
openrouter
19.195.4
sonar
openrouter
48.726.228.847.17.329.581.7
sonar-pro
openrouter
29.025.028.257.87.927.574.5
sonar-reasoning
openrouter
77.034.262.392.1
sonar-reasoning-pro
openrouter
79.046.395.7
deepseek-chat-v3-0324
parasail
47.244.849.773.56.357.7
deepseek-chat-v3.1
parasail
47.244.849.773.56.357.7
deepseek-r1-0528
parasail
deepseek-r1-0528-qwen3-8b
parasail
89.358.752.076.081.314.977.098.3
gemma-3-27b-it
parasail
25.317.422.020.742.64.787.813.788.3
glm-4.5
parasail
87.354.349.473.778.612.273.897.9
glm-4.5v
parasail
27.026.015.357.33.635.2
gpt-oss-120b
parasail
50.157.993.479.218.563.9
kimi-k2
parasail
50.457.376.36.394.561.0
llama-3.3-70b-instruct
parasail
30.027.427.97.750.24.088.428.877.3
llama-4-maverick
parasail
39.036.435.819.368.54.839.788.9
mistral-nemo
parasail
0.38.15.231.44.45.739.5
mistral-small-3.2-24b-instruct
parasail
6.314.813.038.14.314.156.3
qwen3-14b
parasail
28.027.329.258.047.04.228.087.1
qwen3-235b-a22b-2507
parasail
32.732.129.923.761.34.734.390.2
qwen3-235b-a22b-thinking-2507
parasail
81.1
qwen3-30b-a3b
parasail
26.029.326.521.765.84.632.286.3
qwen3-32b
parasail
30.328.426.419.753.54.328.886.9
qwen3-coder
parasail
47.747.242.339.361.84.458.594.2
DeepSeek-R1-Distill-Llama-70B
togetherai
67.028.930.853.765.26.126.693.5
DeepSeek-R1-Distill-Qwen-1.5B
togetherai
17.76.88.622.033.83.37.068.7
DeepSeek-R1-Distill-Qwen-14B
togetherai
66.730.729.755.759.14.437.694.9
DeepSeek-V3
togetherai
25.335.632.526.091.657.43.635.988.7
QwQ-32B-Preview
togetherai
45.318.728.065.24.833.791.0
Qwen2-72B-Instruct
togetherai
14.719.418.142.43.786.015.970.1
gemma-2-27b-it
togetherai
29.720.217.235.73.751.827.954.1
grok-2
xai
56.088.4
grok-3
xai
33.039.736.084.65.142.587.0
grok-3-mini
xai
84.0
grok-4
xai
94.363.865.392.787.623.981.999.0
grok-code-fast-1
xai
51.048.643.372.77.565.7
glm-4.5
zai
87.354.349.473.778.612.273.897.9
glm-4.5-air
zai
67.349.548.880.774.26.868.496.5
glm-4.5-airx
zai
67.349.548.880.774.26.868.496.5
glm-4.5-x
zai
87.354.349.473.778.612.273.897.9
glm-4.5v
zai
27.026.015.357.33.635.2

Showing 189 models with benchmark data. Click column headers to sort. Select benchmarks above to customize view.