Back

Benchmarks Comparison

Benchmark Comparison

Compare AI model performance across standardized tests

Individual Benchmark Results

This table shows performance on specific standardized tests (MMLU, HumanEval, GPQA, etc.) rather than category aggregations. Scores represent pass rates or accuracy percentages on each benchmark.

Filters & View Options

164 models

Select benchmarks to display (showing first 10):

Model
Provider
AIME
AA Coding Index
AAII
AA Math Index
DROP
GPQA
HLE
HumanEval
LiveCodeBench
MATH-500
claude-3-5-sonnet-20240620
anthropic
15.737.333.546.487.159.73.992.038.177.1
claude-3-haiku-20240307
anthropic
1.017.012.120.278.433.375.915.439.4
claude-3-opus-20240229
anthropic
3.325.623.733.783.149.63.184.927.964.1
claude-opus-4
anthropic
49.080.9
claude-opus-4.1
anthropic
56.347.546.675.279.65.954.294.1
claude-sonnet-4
anthropic
40.741.145.767.075.44.044.993.4
mistral-7b-instruct-v0.2
bedrock
0.03.51.06.117.74.34.612.1
mixtral-8x7b-instruct-v0.1
bedrock
0.04.74.815.029.24.56.629.9
DeepSeek-R1
deepinfra
89.358.758.693.892.281.314.977.098.3
DeepSeek-R1-Distill-Llama-70B
deepinfra
67.028.934.580.265.26.126.693.5
DeepSeek-R1-Distill-Qwen-32B
deepinfra
68.732.336.481.461.85.527.094.1
DeepSeek-V3
deepinfra
25.335.636.957.091.657.43.635.988.7
deepseek-r1-0528
deepinfra
81.0
devstral-small
deepinfra
0.324.920.531.941.43.725.463.5
devstral-small-2505
deepinfra
6.725.222.737.543.44.025.868.4
gemma-3-12b-it
deepinfra
22.015.524.053.740.94.885.413.785.3
gemma-3-27b-it
deepinfra
25.317.425.256.842.64.787.813.788.3
gemma-3-4b-it
deepinfra
6.39.317.641.529.95.271.311.276.6
gpt-oss-120b
deepinfra
50.161.379.218.563.9
gpt-oss-20b
deepinfra
53.749.071.58.572.1
kimi-k2
deepinfra
69.336.548.683.276.67.055.697.1
llama-3.1-405b-instruct
deepinfra
21.330.228.945.884.851.14.289.030.570.3
llama-3.1-70b-instruct
deepinfra
17.325.026.041.179.641.34.680.523.264.9
llama-3.1-8b-instruct
deepinfra
0.010.89.525.059.530.05.172.69.649.9
llama-3.1-nemotron-70b-instruct
deepinfra
24.720.126.149.046.54.616.973.3
llama-3.2-3b-instruct
deepinfra
6.76.713.827.832.85.28.348.9
llama-3.3-70b-instruct
deepinfra
0.019.315.724.150.54.488.419.848.3
llama-4-maverick
deepinfra
39.036.441.763.968.54.839.788.9
llama-4-scout
deepinfra
28.323.533.156.457.94.329.984.4
mistral-7b-instruct
deepinfra
0.03.51.06.117.74.34.612.1
mistral-small-24b-instruct-2501
deepinfra
45.384.8
mixtral-8x7b-instruct
deepinfra
0.04.74.815.029.24.56.629.9
phi-4
deepinfra
14.324.627.947.775.556.84.182.623.181.0
phi-4-multimodal-instruct
deepinfra
9.312.115.139.331.54.413.169.3
phi-4-reasoning-plus
deepinfra
68.9
qwen-2.5-72b-instruct
deepinfra
16.027.231.450.949.04.286.627.685.8
qwen-2.5-7b-instruct
deepinfra
36.484.8
qwen3-14b
deepinfra
28.027.331.957.647.04.228.087.1
qwen3-235b-a22b
deepinfra
32.732.133.361.461.34.734.390.2
qwen3-235b-a22b-2507
deepinfra
32.732.133.361.461.34.734.390.2
qwen3-235b-a22b-thinking-2507
deepinfra
32.732.133.361.461.34.734.390.2
qwen3-30b-a3b
deepinfra
26.029.329.956.165.84.632.286.3
qwen3-32b
deepinfra
30.328.429.858.653.54.328.886.9
qwen3-coder
deepinfra
47.747.245.270.961.84.458.594.2
qwq-32b
deepinfra
78.049.441.986.965.28.263.195.7
deepseek-r1
fireworksai
89.358.758.693.892.281.314.977.098.3
deepseek-r1-05-28
fireworksai
89.358.758.693.892.281.314.977.098.3
deepseek-v3
fireworksai
25.335.636.957.091.657.43.635.988.7
deepseek-v3-03-24
fireworksai
25.335.636.957.091.657.43.635.988.7
gpt-oss-120b
fireworksai
50.161.379.218.563.9
gpt-oss-20b
fireworksai
53.749.071.58.572.1
kimi-k2-instruct
fireworksai
75.193.3
mixtral-8x22b-instruct
fireworksai
0.016.814.427.233.24.114.854.5
qwen3-235b-a22b
fireworksai
32.732.133.361.461.34.734.390.2
qwen3-coder-480b-a35b-instruct
fireworksai
47.747.245.270.961.84.458.594.2
gemini-1.5-flash-8b
gemini
3.322.319.236.137.14.521.768.9
gemini-2.0-flash
gemini
33.033.437.863.062.25.333.493.0
gemini-2.0-flash-lite
gemini
51.5
gemini-2.5-flash
gemini
50.039.347.371.682.85.149.593.2
gemini-2.5-pro
gemini
88.761.564.692.783.721.180.196.7
gemini-2.5-pro-preview
gemini
86.4
chatgpt-4o-latest
openai
32.739.639.561.065.55.042.589.3
gpt-3.5-turbo
openai
10.870.230.368.044.1
gpt-4.1
openai
43.741.946.867.566.54.645.791.3
gpt-4.1-mini
openai
65.0
gpt-4.1-nano
openai
50.3
gpt-4o
openai
15.032.129.645.554.33.330.975.9
gpt-4o-2024-05-13
openai
15.032.129.645.583.454.03.390.230.975.9
gpt-4o-mini
openai
11.723.124.345.342.64.023.478.9
gpt-4o-mini-search-preview
openai
79.740.287.2
gpt-5
openai
95.754.969.097.585.426.566.899.4
gpt-5-mini
openai
51.465.482.819.763.6
gpt-5-nano
openai
45.654.767.68.254.6
o1
openai
72.351.951.784.776.47.788.167.997.0
o1-mini
openai
60.344.943.377.460.14.992.457.694.4
o1-pro
openai
45.679.0
o3
openai
90.359.767.194.883.020.078.499.2
o3-mini
openai
77.055.852.787.276.08.771.797.3
o3-pro
openai
67.584.5
o4-mini
openai
94.063.565.096.479.917.580.498.9
command
openrouter
0.712.07.814.332.34.512.227.9
command-a
openrouter
9.728.431.645.852.74.628.781.9
command-r
openrouter
0.36.62.57.628.95.14.414.9
command-r-03-2024
openrouter
0.75.52.48.528.44.84.816.4
command-r-plus
openrouter
0.011.69.520.133.75.011.140.2
command-r-plus-04-2024
openrouter
0.712.07.814.332.34.512.227.9
deephermes-3-mistral-24b-preview
openrouter
4.721.118.432.138.23.919.559.5
deepseek-r1-distill-llama-70b
openrouter
67.028.934.580.265.26.126.693.5
deepseek-r1-distill-llama-8b
openrouter
33.317.622.859.349.04.223.385.3
deepseek-r1-distill-qwen-14b
openrouter
66.730.733.380.859.14.437.694.9
deepseek-r1-distill-qwen-32b
openrouter
68.732.336.481.461.85.527.094.1
devstral-medium
openrouter
6.731.531.338.749.23.833.770.7
devstral-small
openrouter
0.324.920.531.941.43.725.463.5
devstral-small-2505
openrouter
6.725.222.737.543.44.025.868.4
gemini-2.5-flash-lite
openrouter
50.028.934.971.364.63.740.092.6
gemma-2-9b-it
openrouter
0.06.610.225.931.13.940.212.651.7
gemma-3-12b-it
openrouter
22.015.524.053.740.94.885.413.785.3
gemma-3-4b-it
openrouter
6.39.317.641.529.95.271.311.276.6
gemma-3n-e4b-it
openrouter
23.775.0
grok-2-1212
openrouter
13.327.628.045.651.03.826.777.8
grok-3
openrouter
33.039.739.960.084.65.142.587.0
grok-3-mini
openrouter
84.0
llama-3.1-405b-instruct
openrouter
21.330.228.945.884.851.14.289.030.570.3
llama-3.1-70b-instruct
openrouter
17.325.026.041.179.641.34.680.523.264.9
llama-3.1-8b-instruct
openrouter
0.010.89.525.059.530.05.172.69.649.9
llama-3.1-nemotron-70b-instruct
openrouter
24.720.126.149.046.54.616.973.3
llama-3.1-nemotron-ultra-253b-v1
openrouter
74.749.446.484.974.48.164.195.2
llama-3.2-3b-instruct
openrouter
6.76.713.827.832.85.28.348.9
llama-3.3-nemotron-super-49b-v1
openrouter
19.325.529.348.466.73.528.077.5
llama-4-scout
openrouter
28.323.533.156.457.94.329.984.4
magistral-small-2506
openrouter
68.2
minimax-m1
openrouter
81.351.845.889.368.27.565.797.2
ministral-3b
openrouter
0.08.17.526.826.05.56.953.7
mistral-7b-instruct
openrouter
0.03.51.06.117.74.34.612.1
mistral-large
openrouter
0.019.314.626.335.13.417.852.7
mistral-large-2407
openrouter
9.326.925.540.447.23.226.771.4
mistral-medium-3
openrouter
44.036.538.667.357.84.340.090.7
mistral-saba
openrouter
13.022.640.342.44.167.7
mistral-small
openrouter
6.314.815.731.338.14.314.156.3
mistral-small-24b-instruct-2501
openrouter
45.384.8
mixtral-8x7b-instruct
openrouter
0.04.74.815.029.24.56.629.9
nova-lite-v1
openrouter
10.715.324.543.680.242.64.685.416.776.5
nova-micro-v1
openrouter
8.011.720.239.279.337.94.781.114.070.3
nova-pro-v1
openrouter
10.722.128.844.685.448.43.489.023.378.6
phi-4-multimodal-instruct
openrouter
9.312.115.139.331.54.413.169.3
phi-4-reasoning-plus
openrouter
68.9
qwen-2.5-72b-instruct
openrouter
16.027.231.450.949.04.286.627.685.8
qwen-2.5-7b-instruct
openrouter
36.484.8
qwen-2.5-coder-32b-instruct
openrouter
12.028.325.044.341.73.892.729.576.7
qwen-turbo
openrouter
12.015.822.146.341.04.216.380.5
qwen3-30b-a3b-instruct-2507
openrouter
26.029.329.956.165.84.632.286.3
qwen3-8b
openrouter
24.318.525.353.645.22.820.282.8
qwq-32b
openrouter
78.049.441.986.965.28.263.195.7
qwq-32b-preview
openrouter
45.318.731.568.265.24.833.791.0
r1-1776
openrouter
22.295.4
sonar
openrouter
48.726.232.465.247.17.329.581.7
sonar-pro
openrouter
29.025.031.751.757.87.927.574.5
sonar-reasoning
openrouter
77.038.084.662.392.1
sonar-reasoning-pro
openrouter
79.050.787.495.7
deepseek-r1-0528
parasail
81.0
deepseek-r1-0528-qwen3-8b
parasail
89.358.758.693.892.281.314.977.098.3
gemma-3-27b-it
parasail
25.317.425.256.842.64.787.813.788.3
glm-4.5
parasail
87.354.356.192.678.212.273.897.9
gpt-oss-120b
parasail
50.161.379.218.563.9
kimi-k2
parasail
69.336.548.683.276.67.055.697.1
llama-3.3-70b-instruct
parasail
0.019.315.724.150.54.488.419.848.3
llama-4-maverick
parasail
39.036.441.763.968.54.839.788.9
qwen3-14b
parasail
28.027.331.957.647.04.228.087.1
qwen3-235b-a22b-2507
parasail
32.732.133.361.461.34.734.390.2
qwen3-235b-a22b-thinking-2507
parasail
32.732.133.361.461.34.734.390.2
qwen3-30b-a3b
parasail
26.029.329.956.165.84.632.286.3
qwen3-32b
parasail
30.328.429.858.653.54.328.886.9
qwen3-coder
parasail
47.747.245.270.961.84.458.594.2
DeepSeek-R1-Distill-Llama-70B
togetherai
67.028.934.580.265.26.126.693.5
DeepSeek-R1-Distill-Qwen-1.5B
togetherai
17.76.811.143.233.83.37.068.7
DeepSeek-R1-Distill-Qwen-14B
togetherai
66.730.733.380.859.14.437.694.9
DeepSeek-V3
togetherai
25.335.636.957.091.657.43.635.988.7
QwQ-32B-Preview
togetherai
45.318.731.568.265.24.833.791.0
Qwen2-72B-Instruct
togetherai
14.719.421.142.442.43.786.015.970.1
gemma-2-27b-it
togetherai
29.720.220.141.935.73.751.827.954.1
grok-2
xai
56.088.4
grok-3-mini-beta
xai
84.0
grok-4
xai
94.363.867.596.787.623.981.999.0
glm-4.5
zai
87.354.356.192.678.212.273.897.9

Showing 164 models with benchmark data. Click column headers to sort. Select benchmarks above to customize view.