Back

Benchmarks Comparison

Benchmark Comparison

Compare AI model performance across standardized tests

Individual Benchmark Results

This table shows performance on specific standardized tests (MMLU, HumanEval, GPQA, etc.) rather than category aggregations. Scores represent pass rates or accuracy percentages on each benchmark.

Filters & View Options

178 models

Select benchmarks to display (showing first 10):

Model
Provider
AIME
AA Coding Index
AAII
AA Math Index
DROP
GPQA
HLE
HumanEval
LiveCodeBench
MATH-500
claude-3-5-sonnet-20240620
anthropic
15.730.229.987.159.73.992.038.177.1
claude-3-haiku-20240307
anthropic
1.09.678.433.375.915.439.4
claude-3-opus-20240229
anthropic
3.319.520.683.149.63.184.927.964.1
claude-3.5-haiku
anthropic
3.320.283.141.23.588.131.472.1
claude-3.7-sonnet
anthropic
48.735.849.956.377.210.347.394.7
claude-haiku-4.5
anthropic
37.041.739.073.04.351.1
claude-opus-4
anthropic
75.744.254.273.379.611.763.698.2
claude-opus-4.1
anthropic
46.159.380.380.911.965.4
claude-opus-4.5
anthropic
60.269.891.386.628.487.1
claude-sonnet-4
anthropic
77.345.156.574.377.79.665.599.1
claude-sonnet-4.5
anthropic
49.862.788.083.417.371.4
llama3-1-70b-instruct-v1.0
bedrock
17.317.622.64.079.641.34.680.523.264.9
llama3-1-8b-instruct-v1.0
bedrock
7.78.516.94.359.528.15.172.611.651.9
llama3-2-3b-instruct-v1.0
bedrock
6.711.23.332.85.28.348.9
mistral-7b-instruct-v0.2
bedrock
0.01.017.74.34.612.1
DeepSeek-R1
deepinfra
89.344.152.076.081.314.977.098.3
DeepSeek-R1-Distill-Llama-70B
deepinfra
67.019.729.953.765.26.126.693.5
DeepSeek-V3
deepinfra
25.325.932.526.091.657.43.635.988.7
deepseek-chat-v3-0324
deepinfra
39.044.849.773.56.357.7
deepseek-chat-v3.1
deepinfra
39.044.849.773.56.357.7
deepseek-r1-0528
deepinfra
81.0
devstral-small
deepinfra
0.318.527.229.341.43.725.463.5
devstral-small-2505
deepinfra
6.719.643.44.025.868.4
gemma-3-12b-it
deepinfra
22.010.620.418.340.94.885.413.785.3
gemma-3-27b-it
deepinfra
25.312.822.120.742.64.787.813.788.3
gemma-3-4b-it
deepinfra
6.36.414.712.729.95.271.311.276.6
gpt-oss-120b
deepinfra
49.660.593.479.218.587.8
gpt-oss-20b
deepinfra
40.752.489.370.29.877.7
hermes-3-llama-3.1-70b
deepinfra
2.314.740.14.118.853.8
llama-3.1-405b-instruct
deepinfra
21.322.228.13.084.851.14.289.030.570.3
llama-3.1-70b-instruct
deepinfra
17.317.622.64.079.641.34.680.523.264.9
llama-3.1-8b-instruct
deepinfra
7.78.516.94.359.528.15.172.611.651.9
llama-3.1-nemotron-70b-instruct
deepinfra
24.714.823.611.046.54.616.973.3
llama-3.2-3b-instruct
deepinfra
6.711.23.332.85.28.348.9
llama-3.3-70b-instruct
deepinfra
30.019.227.97.750.24.088.428.877.3
llama-4-maverick
deepinfra
39.026.435.819.368.54.839.788.9
llama-4-scout
deepinfra
28.316.128.114.057.94.329.984.4
mistral-7b-instruct
deepinfra
0.01.017.74.34.612.1
mistral-nemo
deepinfra
0.35.231.44.45.739.5
mistral-small-24b-instruct-2501
deepinfra
45.384.8
mistral-small-3.1-24b-instruct
deepinfra
6.313.038.14.314.156.3
mistral-small-3.2-24b-instruct
deepinfra
6.313.038.14.314.156.3
mixtral-8x7b-instruct
deepinfra
0.02.629.24.56.629.9
nemotron-nano-9b-v2
deepinfra
31.937.269.757.04.672.4
phi-4
deepinfra
65.8
phi-4-multimodal-instruct
deepinfra
9.312.431.54.413.169.3
phi-4-reasoning-plus
deepinfra
68.9
qwen-2.5-72b-instruct
deepinfra
16.019.529.014.049.04.286.627.685.8
qwen-2.5-7b-instruct
deepinfra
36.484.8
qwen3-14b
deepinfra
28.019.829.258.047.04.228.087.1
qwen3-235b-a22b
deepinfra
32.723.329.923.761.34.734.390.2
qwen3-235b-a22b-2507
deepinfra
94.044.657.591.079.015.078.898.4
qwen3-235b-a22b-thinking-2507
deepinfra
81.1
qwen3-30b-a3b
deepinfra
72.729.237.066.365.96.851.597.5
qwen3-32b
deepinfra
80.730.938.773.066.88.354.696.1
qwq-32b
deepinfra
78.037.929.065.28.263.195.7
deepseek-chat
deepseek
39.044.849.773.56.357.7
deepseek-reasoner
deepseek
47.254.089.777.913.078.4
deepseek-r1-05-28
fireworksai
89.344.152.076.081.314.977.098.3
deepseek-v3-03-24
fireworksai
25.325.932.526.091.657.43.635.988.7
kimi-k2-instruct
fireworksai
75.193.3
qwen3-coder-480b-a35b-instruct
fireworksai
47.737.442.339.361.84.458.594.2
gemini-2.0-flash
gemini
33.023.433.621.762.25.333.493.0
gemini-2.0-flash-lite
gemini
51.5
gemini-2.5-flash
gemini
50.030.040.460.382.85.149.593.2
gemini-2.5-flash-preview
gemini
50.030.040.460.382.85.149.593.2
gemini-2.5-pro
gemini
88.749.359.687.783.721.180.196.7
gemini-2.5-pro-preview
gemini
88.749.359.687.783.721.180.196.7
gemini-3-pro-preview
gemini
62.372.895.791.337.291.7
gemma-2-9b-it
groq
0.07.831.13.940.212.651.7
kimi-k2-0905
groq
38.150.457.376.36.394.561.0
codestral-2501
mistralai
4.316.320.16.031.24.524.360.7
codestral-2508
mistralai
4.316.320.16.031.24.524.360.7
devstral-medium
mistralai
6.723.927.94.749.23.833.770.7
ministral-3b
mistralai
0.05.410.90.326.05.56.953.7
mistral-large
mistralai
0.011.935.13.417.852.7
mistral-large-2.1
mistralai
0.011.935.13.417.852.7
mistral-large-2407
mistralai
9.322.30.047.23.226.771.4
mistral-medium-3
mistralai
44.025.633.630.357.84.340.090.7
mistral-medium-3.1
mistralai
28.135.438.358.84.440.6
mistral-saba
mistralai
13.019.642.44.167.7
mistral-small
mistralai
6.313.038.14.314.156.3
mistral-small-3
mistralai
6.313.038.14.314.156.3
mistral-small-3.1
mistralai
6.313.038.14.314.156.3
mistral-small-3.2
mistralai
6.313.038.14.314.156.3
chatgpt-4o-latest
openai
32.735.625.765.55.042.589.3
gpt-3.5-turbo
openai
10.78.370.230.368.044.1
gpt-3.5-turbo-0613
openai
gpt-3.5-turbo-instruct
openai
10.78.370.230.368.044.1
gpt-4.1
openai
43.732.243.434.766.54.645.791.3
gpt-4.1-mini
openai
43.031.942.546.366.44.648.392.5
gpt-4.1-nano
openai
23.720.727.324.051.23.932.684.8
gpt-4o
openai
15.024.027.06.054.33.330.975.9
gpt-4o-2024-05-13
openai
15.024.027.06.083.454.03.390.230.975.9
gpt-4o-mini
openai
11.721.214.742.64.023.478.9
gpt-4o-mini-search-preview
openai
79.740.287.2
gpt-5
openai
95.752.768.594.385.426.584.699.4
gpt-5-chat
openai
34.741.848.368.65.854.3
gpt-5-mini
openai
51.464.390.782.819.783.8
gpt-5-nano
openai
42.351.083.767.68.278.9
gpt-5.1
openai
57.569.794.087.326.586.8
o1
openai
72.338.647.276.47.788.167.997.0
o1-mini
openai
60.339.260.14.992.457.694.4
o3
openai
90.352.265.588.383.020.080.899.2
o3-mini
openai
77.039.448.176.08.771.797.3
o4-mini
openai
94.048.959.690.779.917.585.998.9
command-a
openrouter
9.719.226.913.052.74.628.781.9
deephermes-3-mistral-24b-preview
openrouter
4.715.538.23.919.559.5
deepseek-r1
openrouter
89.344.152.076.081.314.977.098.3
deepseek-r1-distill-llama-70b
openrouter
67.019.729.953.765.26.126.693.5
deepseek-r1-distill-qwen-14b
openrouter
66.729.755.759.14.437.694.9
deepseek-r1-distill-qwen-32b
openrouter
68.732.763.061.85.527.094.1
deepseek-v3.1-terminus
openrouter
49.657.789.779.215.279.8
deepseek-v3.1-terminus:exacto
openrouter
25.325.932.526.091.657.43.635.988.7
deepseek-v3.2-exp
openrouter
39.646.357.779.98.655.4
ernie-4.5-300b-a47b
openrouter
49.327.932.941.381.13.546.793.1
gemini-2.0-flash-001
openrouter
33.023.433.621.762.25.333.493.0
gemini-2.0-flash-lite-001
openrouter
27.726.853.53.618.587.3
gemini-2.5-flash-image
openrouter
50.030.040.460.382.85.149.593.2
gemini-2.5-flash-image-preview
openrouter
50.030.040.460.382.85.149.593.2
gemini-2.5-flash-lite
openrouter
50.019.930.135.364.63.740.092.6
gemini-2.5-flash-lite-preview-06-17
openrouter
50.030.040.460.382.85.149.593.2
gemini-2.5-flash-lite-preview-09-2025
openrouter
36.547.968.770.96.668.8
gemini-2.5-flash-preview-09-2025
openrouter
42.554.478.379.312.771.3
gemini-2.5-pro-preview-05-06
openrouter
88.749.359.687.783.721.180.196.7
gemma-3n-e4b-it
openrouter
23.775.0
gpt-oss-120b:exacto
openrouter
49.660.593.479.218.587.8
grok-3-mini-beta
openrouter
93.342.257.184.779.111.169.699.2
grok-4-fast
openrouter
48.460.389.784.717.083.2
kimi-k2-thinking
openrouter
52.267.094.783.822.385.3
kimi-linear-48b-a3b-instruct
openrouter
22.836.341.22.737.8
lfm2-8b-a1b
openrouter
7.317.425.334.44.915.1
ling-1t
openrouter
37.644.871.371.97.267.7
llama-3.1-405b
openrouter
21.322.228.13.084.851.14.289.030.570.3
llama-3.1-nemotron-ultra-253b-v1
openrouter
74.733.738.563.774.48.164.195.2
llama-3.3-nemotron-super-49b-v1.5
openrouter
19.317.025.97.766.73.528.077.5
minimax-m1
openrouter
81.335.240.013.768.77.565.797.2
minimax-m2
openrouter
47.661.478.377.812.582.6
nova-lite-v1
openrouter
10.710.421.47.080.242.64.685.416.776.5
nova-micro-v1
openrouter
8.08.317.76.079.337.94.781.114.070.3
nova-premier-v1
openrouter
17.022.032.317.356.94.731.783.9
nova-pro-v1
openrouter
10.716.625.07.085.448.43.489.023.378.6
qwen-2.5-coder-32b-instruct
openrouter
12.021.841.73.892.729.576.7
qwen-turbo
openrouter
12.019.141.04.216.380.5
qwen3-8b
openrouter
24.313.022.924.345.22.820.282.8
qwen3-max
openrouter
36.255.882.377.612.053.5
qwen3-next-80b-a3b-instruct
openrouter
35.444.866.373.47.368.4
qwen3-next-80b-a3b-thinking
openrouter
77.2
qwen3-vl-235b-a22b-instruct
openrouter
33.944.170.771.26.359.4
qwen3-vl-235b-a22b-thinking
openrouter
qwen3-vl-30b-a3b-instruct
openrouter
29.727.433.429.070.44.040.389.3
qwen3-vl-30b-a3b-thinking
openrouter
74.4
qwen3-vl-8b-instruct
openrouter
17.627.127.342.72.933.2
qwen3-vl-8b-thinking
openrouter
69.9
ring-1t
openrouter
35.841.889.359.510.264.3
sonar
openrouter
48.728.847.17.329.581.7
sonar-pro
openrouter
29.028.257.87.927.574.5
sonar-pro-search
openrouter
29.028.257.87.927.574.5
sonar-reasoning
openrouter
77.034.262.392.1
sonar-reasoning-pro
openrouter
79.046.395.7
deepseek-r1-0528-qwen3-8b
parasail
89.344.152.076.081.314.977.098.3
glm-4.5
parasail
87.343.351.373.778.612.273.897.9
glm-4.5v
parasail
20.126.015.357.33.635.2
DeepSeek-R1-Distill-Qwen-1.5B
togetherai
17.78.622.033.83.37.068.7
DeepSeek-R1-Distill-Qwen-14B
togetherai
66.729.755.759.14.437.694.9
Llama-3.1-Nemotron-70B-Instruct-HF
togetherai
24.714.823.611.046.54.616.973.3
QwQ-32B-Preview
togetherai
45.328.065.24.833.791.0
Qwen2-72B-Instruct
togetherai
14.718.142.43.786.015.970.1
gemma-2-27b-it
togetherai
29.717.235.73.751.827.954.1
grok-2
xai
56.088.4
grok-3
xai
41.4
grok-3-mini
xai
93.342.257.184.779.111.169.699.2
grok-4
xai
94.355.165.392.787.623.981.999.0
grok-code-fast-1
xai
39.448.643.372.77.565.7
glm-4.5-air
zai
67.339.448.880.774.26.868.496.5
glm-4.5-airx
zai
67.339.448.880.774.26.868.496.5
glm-4.5-x
zai
87.343.351.373.778.612.273.897.9
glm-4.6
zai
38.744.744.381.05.256.1

Showing 178 models with benchmark data. Click column headers to sort. Select benchmarks above to customize view.