Back

Benchmarks Comparison

Benchmark Comparison

Compare AI model performance across standardized tests

Individual Benchmark Results

This table shows performance on specific standardized tests (MMLU, HumanEval, GPQA, etc.) rather than category aggregations. Scores represent pass rates or accuracy percentages on each benchmark.

Filters & View Options

203 models

Select benchmarks to display (showing first 10):

Model
Provider
AIME
AA Coding Index
AAII
AA Math Index
DROP
GPQA
HLE
HumanEval
LiveCodeBench
MATH-500
claude-3-5-sonnet-20240620
anthropic
15.730.229.987.159.73.992.038.177.1
claude-3-haiku-20240307
anthropic
1.09.678.433.375.915.439.4
claude-3-opus-20240229
anthropic
3.319.520.683.149.63.184.927.964.1
claude-3.5-haiku
anthropic
3.320.283.141.23.588.131.472.1
claude-3.7-sonnet
anthropic
22.332.333.521.065.64.839.485.0
claude-opus-4
anthropic
56.342.336.370.15.954.294.1
claude-opus-4.1
anthropic
46.159.380.380.911.965.4
claude-sonnet-4
anthropic
77.345.156.574.377.79.665.599.1
claude-sonnet-4.5
anthropic
49.862.788.083.417.371.4
mistral-7b-instruct-v0.2
bedrock
0.01.017.74.34.612.1
DeepSeek-R1
deepinfra
89.344.152.076.081.314.977.098.3
DeepSeek-R1-Distill-Llama-70B
deepinfra
67.030.853.765.26.126.693.5
DeepSeek-R1-Distill-Qwen-32B
deepinfra
68.732.763.061.85.527.094.1
DeepSeek-V3
deepinfra
25.325.932.526.091.657.43.635.988.7
deepseek-chat-v3-0324
deepinfra
39.044.849.773.56.357.7
deepseek-chat-v3.1
deepinfra
39.044.849.773.56.357.7
deepseek-r1-0528
deepinfra
81.0
devstral-small
deepinfra
0.318.527.229.341.43.725.463.5
devstral-small-2505
deepinfra
6.719.643.44.025.868.4
gemma-3-12b-it
deepinfra
22.010.620.418.340.94.885.413.785.3
gemma-3-27b-it
deepinfra
25.312.821.620.742.64.787.813.788.3
gemma-3-4b-it
deepinfra
6.36.414.712.729.95.271.311.276.6
gpt-oss-120b
deepinfra
41.258.093.479.218.565.3
gpt-oss-20b
deepinfra
32.843.361.771.58.557.2
kimi-k2
deepinfra
38.150.457.376.36.394.561.0
llama-3.1-405b-instruct
deepinfra
21.322.225.73.084.851.14.289.030.570.3
llama-3.1-70b-instruct
deepinfra
17.317.622.64.079.641.34.680.523.264.9
llama-3.1-8b-instruct
deepinfra
7.78.516.94.359.528.15.172.611.651.9
llama-3.1-nemotron-70b-instruct
deepinfra
24.714.823.611.046.54.616.973.3
llama-3.2-3b-instruct
deepinfra
6.711.23.332.85.28.348.9
llama-3.3-70b-instruct
deepinfra
30.019.227.97.750.24.088.428.877.3
llama-4-maverick
deepinfra
39.026.435.819.368.54.839.788.9
llama-4-scout
deepinfra
28.316.128.114.057.94.329.984.4
mistral-7b-instruct
deepinfra
0.01.017.74.34.612.1
mistral-nemo
deepinfra
0.35.231.44.45.739.5
mistral-small-24b-instruct-2501
deepinfra
45.384.8
mistral-small-3.1-24b-instruct
deepinfra
6.313.038.14.314.156.3
mistral-small-3.2-24b-instruct
deepinfra
6.313.038.14.314.156.3
mixtral-8x7b-instruct
deepinfra
0.02.629.24.56.629.9
nemotron-nano-9b-v2
deepinfra
30.636.162.355.74.070.1
phi-4
deepinfra
14.317.624.618.075.556.84.182.623.181.0
phi-4-multimodal-instruct
deepinfra
9.312.431.54.413.169.3
phi-4-reasoning-plus
deepinfra
68.9
qwen-2.5-72b-instruct
deepinfra
16.019.529.014.049.04.286.627.685.8
qwen-2.5-7b-instruct
deepinfra
36.484.8
qwen3-14b
deepinfra
28.019.829.258.047.04.228.087.1
qwen3-235b-a22b
deepinfra
32.723.329.923.761.34.734.390.2
qwen3-235b-a22b-2507
deepinfra
32.723.329.923.761.34.734.390.2
qwen3-235b-a22b-thinking-2507
deepinfra
81.1
qwen3-30b-a3b
deepinfra
72.729.237.066.365.96.851.597.5
qwen3-32b
deepinfra
80.730.938.773.066.88.354.696.1
qwen3-coder
deepinfra
47.737.442.339.361.84.458.594.2
qwq-32b
deepinfra
78.037.929.065.28.263.195.7
deepseek-chat
deepseek
39.044.849.773.56.357.7
deepseek-reasoner
deepseek
47.254.089.777.913.078.4
deepseek-r1
fireworksai
89.344.152.076.081.314.977.098.3
deepseek-r1-05-28
fireworksai
89.344.152.076.081.314.977.098.3
deepseek-v3
fireworksai
25.325.932.526.091.657.43.635.988.7
deepseek-v3-03-24
fireworksai
25.325.932.526.091.657.43.635.988.7
gpt-oss-120b
fireworksai
41.258.093.479.218.565.3
gpt-oss-20b
fireworksai
32.843.361.771.58.557.2
kimi-k2-instruct
fireworksai
75.193.3
qwen3-235b-a22b
fireworksai
32.723.329.923.761.34.734.390.2
qwen3-coder-480b-a35b-instruct
fireworksai
47.737.442.339.361.84.458.594.2
gemini-2.0-flash
gemini
33.023.433.621.762.25.333.493.0
gemini-2.0-flash-lite
gemini
51.5
gemini-2.5-flash
gemini
50.030.040.460.382.85.149.593.2
gemini-2.5-pro
gemini
88.749.359.687.783.721.180.196.7
gemini-2.5-pro-preview
gemini
86.4
deepseek-r1-distill-llama-70b
groq
67.030.853.765.26.126.693.5
gemma-2-9b-it
groq
0.07.831.13.940.212.651.7
gpt-oss-120b
groq
41.258.093.479.218.565.3
gpt-oss-20b
groq
32.843.361.771.58.557.2
kimi-k2
groq
38.150.457.376.36.394.561.0
kimi-k2-0905
groq
38.150.457.376.36.394.561.0
llama-3.1-8b-instruct
groq
7.78.516.94.359.528.15.172.611.651.9
llama-3.3-70b-instruct
groq
30.019.227.97.750.24.088.428.877.3
llama-4-maverick
groq
39.026.435.819.368.54.839.788.9
llama-4-scout
groq
28.316.128.114.057.94.329.984.4
qwen3-32b
groq
80.730.938.773.066.88.354.696.1
codestral-2501
mistralai
4.313.231.24.524.360.7
codestral-2508
mistralai
4.313.231.24.524.360.7
devstral-medium
mistralai
6.723.927.94.749.23.833.770.7
ministral-3b
mistralai
0.05.45.226.05.56.953.7
mistral-large
mistralai
0.011.935.13.417.852.7
mistral-large-2.1
mistralai
0.011.935.13.417.852.7
mistral-large-2407
mistralai
9.322.30.047.23.226.771.4
mistral-medium-3
mistralai
44.025.634.730.357.84.340.090.7
mistral-medium-3.1
mistralai
3.78.434.93.49.940.5
mistral-nemo
mistralai
0.35.231.44.45.739.5
mistral-saba
mistralai
13.019.642.44.167.7
mistral-small
mistralai
6.313.038.14.314.156.3
mistral-small-3
mistralai
6.313.038.14.314.156.3
mistral-small-3.1
mistralai
6.313.038.14.314.156.3
mistral-small-3.2
mistralai
6.313.038.14.314.156.3
chatgpt-4o-latest
openai
32.735.625.765.55.042.589.3
gpt-3.5-turbo
openai
10.78.370.230.368.044.1
gpt-4.1
openai
43.732.243.434.766.54.645.791.3
gpt-4.1-mini
openai
65.0
gpt-4.1-nano
openai
50.3
gpt-4o
openai
15.024.027.06.054.33.330.975.9
gpt-4o-2024-05-13
openai
15.024.027.06.083.454.03.390.230.975.9
gpt-4o-mini
openai
11.721.214.742.64.023.478.9
gpt-4o-mini-search-preview
openai
79.740.287.2
gpt-5
openai
95.752.768.594.385.426.584.699.4
gpt-5-chat
openai
34.741.848.368.65.854.3
gpt-5-mini
openai
51.464.390.782.819.783.8
gpt-5-nano
openai
42.351.083.767.68.278.9
o1
openai
72.338.647.276.47.788.167.997.0
o1-mini
openai
60.339.260.14.992.457.694.4
o3
openai
90.352.265.588.383.020.080.899.2
o3-mini
openai
77.039.448.176.08.771.797.3
o4-mini
openai
94.048.959.690.779.917.585.998.9
codestral-2501
openrouter
4.313.231.24.524.360.7
codestral-2508
openrouter
4.313.231.24.524.360.7
command
openrouter
0.75.532.34.512.227.9
command-a
openrouter
9.728.113.052.74.628.781.9
command-r
openrouter
0.31.028.95.14.414.9
command-r-03-2024
openrouter
0.71.028.44.84.816.4
command-r-plus
openrouter
0.07.133.75.011.140.2
command-r-plus-04-2024
openrouter
0.75.532.34.512.227.9
deephermes-3-mistral-24b-preview
openrouter
4.715.538.23.919.559.5
deepseek-r1-distill-llama-70b
openrouter
67.030.853.765.26.126.693.5
deepseek-r1-distill-llama-8b
openrouter
33.319.541.349.04.223.385.3
deepseek-r1-distill-qwen-14b
openrouter
66.729.755.759.14.437.694.9
deepseek-r1-distill-qwen-32b
openrouter
68.732.763.061.85.527.094.1
deepseek-v3.1-base
openrouter
25.325.932.526.091.657.43.635.988.7
devstral-medium
openrouter
6.723.927.94.749.23.833.770.7
dolphin3.0-mistral-24b
openrouter
44.2
dolphin3.0-r1-mistral-24b
openrouter
44.2
gemini-2.5-flash-lite
openrouter
50.019.930.135.364.63.740.092.6
gemma-2-9b-it
openrouter
0.07.831.13.940.212.651.7
gemma-3n-e4b-it
openrouter
23.775.0
grok-2-1212
openrouter
13.324.751.03.826.777.8
grok-3-mini-beta
openrouter
84.0
kimi-k2-0905
openrouter
38.150.457.376.36.394.561.0
llama-3.1-405b
openrouter
21.322.225.73.084.851.14.289.030.570.3
llama-3.1-nemotron-ultra-253b-v1
openrouter
74.733.738.563.774.48.164.195.2
magistral-small-2506
openrouter
68.2
minimax-m1
openrouter
81.341.613.768.27.565.797.2
ministral-3b
openrouter
0.05.45.226.05.56.953.7
mistral-7b-instruct-v0.1
openrouter
0.01.017.74.34.612.1
mistral-7b-instruct-v0.3
openrouter
0.01.017.74.34.612.1
mistral-large
openrouter
0.011.935.13.417.852.7
mistral-large-2407
openrouter
9.322.30.047.23.226.771.4
mistral-large-2411
openrouter
0.011.935.13.417.852.7
mistral-medium-3
openrouter
44.025.634.730.357.84.340.090.7
mistral-medium-3.1
openrouter
3.78.434.93.49.940.5
mistral-saba
openrouter
13.019.642.44.167.7
mistral-small
openrouter
6.313.038.14.314.156.3
mistral-tiny
openrouter
6.313.038.14.314.156.3
nova-lite-v1
openrouter
10.721.47.080.242.64.685.416.776.5
nova-micro-v1
openrouter
8.017.36.079.337.94.781.114.070.3
nova-pro-v1
openrouter
10.725.57.085.448.43.489.023.378.6
qwen-2.5-coder-32b-instruct
openrouter
12.021.841.73.892.729.576.7
qwen-turbo
openrouter
12.019.141.04.216.380.5
qwen3-30b-a3b-instruct-2507
openrouter
72.729.237.066.365.96.851.597.5
qwen3-30b-a3b-thinking-2507
openrouter
72.729.237.066.365.96.851.597.5
qwen3-8b
openrouter
74.721.828.319.058.94.240.690.4
qwen3-max
openrouter
44.755.180.776.411.176.7
qwq-32b-preview
openrouter
45.328.065.24.833.791.0
r1-1776
openrouter
19.195.4
sonar
openrouter
48.728.847.17.329.581.7
sonar-pro
openrouter
29.028.257.87.927.574.5
sonar-reasoning
openrouter
77.034.262.392.1
sonar-reasoning-pro
openrouter
79.046.395.7
deepseek-chat-v3-0324
parasail
39.044.849.773.56.357.7
deepseek-chat-v3.1
parasail
39.044.849.773.56.357.7
deepseek-r1-0528
parasail
81.0
deepseek-r1-0528-qwen3-8b
parasail
89.344.152.076.081.314.977.098.3
gemma-3-27b-it
parasail
25.312.821.620.742.64.787.813.788.3
glm-4.5
parasail
87.343.351.373.778.612.273.897.9
glm-4.5v
parasail
20.126.015.357.33.635.2
gpt-oss-120b
parasail
41.258.093.479.218.565.3
kimi-k2
parasail
38.150.457.376.36.394.561.0
llama-3.3-70b-instruct
parasail
30.019.227.97.750.24.088.428.877.3
llama-4-maverick
parasail
39.026.435.819.368.54.839.788.9
mistral-nemo
parasail
0.35.231.44.45.739.5
mistral-small-3.2-24b-instruct
parasail
6.313.038.14.314.156.3
qwen3-14b
parasail
28.019.829.258.047.04.228.087.1
qwen3-235b-a22b-2507
parasail
32.723.329.923.761.34.734.390.2
qwen3-235b-a22b-thinking-2507
parasail
81.1
qwen3-30b-a3b
parasail
72.729.237.066.365.96.851.597.5
qwen3-32b
parasail
80.730.938.773.066.88.354.696.1
qwen3-coder
parasail
47.737.442.339.361.84.458.594.2
DeepSeek-R1-Distill-Llama-70B
togetherai
67.030.853.765.26.126.693.5
DeepSeek-R1-Distill-Qwen-1.5B
togetherai
17.78.622.033.83.37.068.7
DeepSeek-R1-Distill-Qwen-14B
togetherai
66.729.755.759.14.437.694.9
DeepSeek-V3
togetherai
25.325.932.526.091.657.43.635.988.7
QwQ-32B-Preview
togetherai
45.328.065.24.833.791.0
Qwen2-72B-Instruct
togetherai
14.718.142.43.786.015.970.1
gemma-2-27b-it
togetherai
29.717.235.73.751.827.954.1
grok-2
xai
56.088.4
grok-3
xai
33.026.936.084.65.142.587.0
grok-3-mini
xai
84.0
grok-4
xai
94.355.165.392.787.623.981.999.0
grok-code-fast-1
xai
39.448.643.372.77.565.7
glm-4.5
zai
87.343.351.373.778.612.273.897.9
glm-4.5-air
zai
67.339.448.880.774.26.868.496.5
glm-4.5-airx
zai
67.339.448.880.774.26.868.496.5
glm-4.5-x
zai
87.343.351.373.778.612.273.897.9
glm-4.5v
zai
20.126.015.357.33.635.2
glm-4.6
zai
38.744.744.381.05.256.1

Showing 203 models with benchmark data. Click column headers to sort. Select benchmarks above to customize view.