Back to GalleryBack
Benchmarks Comparison
Benchmark Comparison
Compare AI model performance across standardized tests
Individual Benchmark Results
This table shows performance on specific standardized tests (MMLU, HumanEval, GPQA, etc.) rather than category aggregations. Scores represent pass rates or accuracy percentages on each benchmark.
Filters & View Options
189 models
Select benchmarks to display (showing first 10):
Model | Provider | AIME | AA Coding Index | AAII | AA Math Index | DROP | GPQA | HLE | HumanEval | LiveCodeBench | MATH-500 |
---|---|---|---|---|---|---|---|---|---|---|---|
claude-3-5-sonnet-20240620 | 15.7 | 37.3 | 29.9 | — | 87.1 | 59.7 | 3.9 | 92.0 | 38.1 | 77.1 | |
claude-3-haiku-20240307 | 1.0 | 17.0 | 9.6 | — | 78.4 | 33.3 | — | 75.9 | 15.4 | 39.4 | |
claude-3-opus-20240229 | 3.3 | 25.6 | 20.6 | — | 83.1 | 49.6 | 3.1 | 84.9 | 27.9 | 64.1 | |
claude-opus-4 | — | 53.1 | 59.3 | 80.3 | — | 80.9 | 11.9 | — | 65.4 | — | |
claude-opus-4.1 | 56.3 | 47.5 | 42.3 | 36.3 | — | 79.6 | 5.9 | — | 54.2 | 94.1 | |
claude-sonnet-4 | 40.7 | 41.1 | 44.4 | 38.0 | — | 75.4 | 4.0 | — | 44.9 | 93.4 | |
mistral-7b-instruct-v0.2 | 0.0 | 3.5 | 1.0 | — | — | 17.7 | 4.3 | — | 4.6 | 12.1 | |
DeepSeek-R1 | 89.3 | 58.7 | 52.0 | 76.0 | — | 81.3 | 14.9 | — | 77.0 | 98.3 | |
DeepSeek-R1-Distill-Llama-70B | 67.0 | 28.9 | 30.8 | 53.7 | — | 65.2 | 6.1 | — | 26.6 | 93.5 | |
DeepSeek-R1-Distill-Qwen-32B | 68.7 | 32.3 | 32.7 | 63.0 | — | 61.8 | 5.5 | — | 27.0 | 94.1 | |
DeepSeek-V3 | 25.3 | 35.6 | 32.5 | 26.0 | 91.6 | 57.4 | 3.6 | — | 35.9 | 88.7 | |
deepseek-chat-v3-0324 | — | 47.2 | 44.8 | 49.7 | — | 73.5 | 6.3 | — | 57.7 | — | |
deepseek-chat-v3.1 | — | 47.2 | 44.8 | 49.7 | — | 73.5 | 6.3 | — | 57.7 | — | |
deepseek-r1-0528 | — | — | — | — | — | — | — | — | — | — | |
devstral-small | 0.3 | 24.9 | 27.2 | 29.3 | — | 41.4 | 3.7 | — | 25.4 | 63.5 | |
devstral-small-2505 | 6.7 | 25.2 | 19.6 | — | — | 43.4 | 4.0 | — | 25.8 | 68.4 | |
gemma-3-12b-it | 22.0 | 15.5 | 20.9 | 18.3 | — | 40.9 | 4.8 | 85.4 | 13.7 | 85.3 | |
gemma-3-27b-it | 25.3 | 17.4 | 22.0 | 20.7 | — | 42.6 | 4.7 | 87.8 | 13.7 | 88.3 | |
gemma-3-4b-it | 6.3 | 9.3 | 14.7 | 12.7 | — | 29.9 | 5.2 | 71.3 | 11.2 | 76.6 | |
gpt-oss-120b | — | 50.1 | 57.9 | 93.4 | — | 79.2 | 18.5 | — | 63.9 | — | |
gpt-oss-20b | — | 53.7 | 44.8 | 61.7 | — | 71.5 | 8.5 | — | 72.1 | — | |
kimi-k2 | — | — | 50.4 | 57.3 | — | 76.3 | 6.3 | 94.5 | 61.0 | — | |
llama-3.1-405b-instruct | 21.3 | 30.2 | 25.7 | 3.0 | 84.8 | 51.1 | 4.2 | 89.0 | 30.5 | 70.3 | |
llama-3.1-70b-instruct | 17.3 | 25.0 | 22.6 | 4.0 | 79.6 | 41.3 | 4.6 | 80.5 | 23.2 | 64.9 | |
llama-3.1-8b-instruct | 7.7 | 12.4 | 16.9 | 4.3 | 59.5 | 28.1 | 5.1 | 72.6 | 11.6 | 51.9 | |
llama-3.1-nemotron-70b-instruct | 24.7 | 20.1 | 23.6 | 11.0 | — | 46.5 | 4.6 | — | 16.9 | 73.3 | |
llama-3.2-3b-instruct | 6.7 | 6.7 | 11.2 | 3.3 | — | 32.8 | 5.2 | — | 8.3 | 48.9 | |
llama-3.3-70b-instruct | 30.0 | 27.4 | 27.9 | 7.7 | — | 50.2 | 4.0 | 88.4 | 28.8 | 77.3 | |
llama-4-maverick | 39.0 | 36.4 | 35.8 | 19.3 | — | 68.5 | 4.8 | — | 39.7 | 88.9 | |
llama-4-scout | 28.3 | 23.5 | 28.1 | 14.0 | — | 57.9 | 4.3 | — | 29.9 | 84.4 | |
mistral-7b-instruct | 0.0 | 3.5 | 1.0 | — | — | 17.7 | 4.3 | — | 4.6 | 12.1 | |
mistral-nemo | 0.3 | 8.1 | 5.2 | — | — | 31.4 | 4.4 | — | 5.7 | 39.5 | |
mistral-small-24b-instruct-2501 | — | — | — | — | — | 45.3 | — | 84.8 | — | — | |
mistral-small-3.1-24b-instruct | 6.3 | 14.8 | 13.0 | — | — | 38.1 | 4.3 | — | 14.1 | 56.3 | |
mistral-small-3.2-24b-instruct | 6.3 | 14.8 | 13.0 | — | — | 38.1 | 4.3 | — | 14.1 | 56.3 | |
mixtral-8x7b-instruct | 0.0 | 4.7 | 2.6 | — | — | 29.2 | 4.5 | — | 6.6 | 29.9 | |
nemotron-nano-9b-v2 | — | 45.5 | 38.1 | 62.3 | — | 55.7 | 4.0 | — | 70.1 | — | |
phi-4 | 14.3 | 24.6 | 24.6 | 18.0 | 75.5 | 56.8 | 4.1 | 82.6 | 23.1 | 81.0 | |
phi-4-multimodal-instruct | 9.3 | 12.1 | 12.4 | — | — | 31.5 | 4.4 | — | 13.1 | 69.3 | |
phi-4-reasoning-plus | — | — | — | — | — | 68.9 | — | — | — | — | |
qwen-2.5-72b-instruct | 16.0 | 27.2 | 29.0 | 14.0 | — | 49.0 | 4.2 | 86.6 | 27.6 | 85.8 | |
qwen-2.5-7b-instruct | — | — | — | — | — | 36.4 | — | 84.8 | — | — | |
qwen3-14b | 28.0 | 27.3 | 29.2 | 58.0 | — | 47.0 | 4.2 | — | 28.0 | 87.1 | |
qwen3-235b-a22b | 32.7 | 32.1 | 29.9 | 23.7 | — | 61.3 | 4.7 | — | 34.3 | 90.2 | |
qwen3-235b-a22b-2507 | 32.7 | 32.1 | 29.9 | 23.7 | — | 61.3 | 4.7 | — | 34.3 | 90.2 | |
qwen3-235b-a22b-thinking-2507 | — | — | — | — | — | 81.1 | — | — | — | — | |
qwen3-30b-a3b | 26.0 | 29.3 | 26.5 | 21.7 | — | 65.8 | 4.6 | — | 32.2 | 86.3 | |
qwen3-32b | 30.3 | 28.4 | 26.4 | 19.7 | — | 53.5 | 4.3 | — | 28.8 | 86.9 | |
qwen3-coder | 47.7 | 47.2 | 42.3 | 39.3 | — | 61.8 | 4.4 | — | 58.5 | 94.2 | |
qwq-32b | 78.0 | 49.4 | 37.9 | 29.0 | — | 65.2 | 8.2 | — | 63.1 | 95.7 | |
deepseek-chat | — | 47.2 | 44.8 | 49.7 | — | 73.5 | 6.3 | — | 57.7 | — | |
deepseek-reasoner | — | 58.8 | 54.0 | 89.7 | — | 77.9 | 13.0 | — | 78.4 | — | |
deepseek-r1 | 89.3 | 58.7 | 52.0 | 76.0 | — | 81.3 | 14.9 | — | 77.0 | 98.3 | |
deepseek-r1-05-28 | 89.3 | 58.7 | 52.0 | 76.0 | — | 81.3 | 14.9 | — | 77.0 | 98.3 | |
deepseek-v3 | 25.3 | 35.6 | 32.5 | 26.0 | 91.6 | 57.4 | 3.6 | — | 35.9 | 88.7 | |
deepseek-v3-03-24 | 25.3 | 35.6 | 32.5 | 26.0 | 91.6 | 57.4 | 3.6 | — | 35.9 | 88.7 | |
gpt-oss-120b | — | 50.1 | 57.9 | 93.4 | — | 79.2 | 18.5 | — | 63.9 | — | |
gpt-oss-20b | — | 53.7 | 44.8 | 61.7 | — | 71.5 | 8.5 | — | 72.1 | — | |
kimi-k2-instruct | — | — | — | — | — | 75.1 | — | 93.3 | — | — | |
qwen3-235b-a22b | 32.7 | 32.1 | 29.9 | 23.7 | — | 61.3 | 4.7 | — | 34.3 | 90.2 | |
qwen3-coder-480b-a35b-instruct | 47.7 | 47.2 | 42.3 | 39.3 | — | 61.8 | 4.4 | — | 58.5 | 94.2 | |
gemini-1.5-flash-8b | 3.3 | 22.3 | 16.3 | — | — | 37.1 | 4.5 | — | 21.7 | 68.9 | |
gemini-2.0-flash | 33.0 | 33.4 | 34.0 | 21.7 | — | 62.2 | 5.3 | — | 33.4 | 93.0 | |
gemini-2.0-flash-lite | — | — | — | — | — | 51.5 | — | — | — | — | |
gemini-2.5-flash | 50.0 | 39.3 | 40.4 | 60.3 | — | 82.8 | 5.1 | — | 49.5 | 93.2 | |
gemini-2.5-pro | 88.7 | 61.5 | 59.6 | 87.7 | — | 83.7 | 21.1 | — | 80.1 | 96.7 | |
gemini-2.5-pro-preview | — | — | — | — | — | 86.4 | — | — | — | — | |
codestral-2501 | 4.3 | 24.5 | 13.2 | — | — | 31.2 | 4.5 | — | 24.3 | 60.7 | |
codestral-2508 | 4.3 | 24.5 | 13.2 | — | — | 31.2 | 4.5 | — | 24.3 | 60.7 | |
devstral-medium | 6.7 | 31.5 | 27.9 | 4.7 | — | 49.2 | 3.8 | — | 33.7 | 70.7 | |
ministral-3b | 0.0 | 8.1 | 5.2 | — | — | 26.0 | 5.5 | — | 6.9 | 53.7 | |
mistral-large | 0.0 | 19.3 | 11.9 | — | — | 35.1 | 3.4 | — | 17.8 | 52.7 | |
mistral-large-2.1 | 0.0 | 19.3 | 11.9 | — | — | 35.1 | 3.4 | — | 17.8 | 52.7 | |
mistral-large-2407 | 9.3 | 26.9 | 22.3 | 0.0 | — | 47.2 | 3.2 | — | 26.7 | 71.4 | |
mistral-medium-3 | 44.0 | 36.5 | 34.7 | 30.3 | — | 57.8 | 4.3 | — | 40.0 | 90.7 | |
mistral-medium-3.1 | 3.7 | 10.9 | 8.4 | — | — | 34.9 | 3.4 | — | 9.9 | 40.5 | |
mistral-nemo | 0.3 | 8.1 | 5.2 | — | — | 31.4 | 4.4 | — | 5.7 | 39.5 | |
mistral-saba | 13.0 | — | 19.6 | — | — | 42.4 | 4.1 | — | — | 67.7 | |
mistral-small | 6.3 | 14.8 | 13.0 | — | — | 38.1 | 4.3 | — | 14.1 | 56.3 | |
mistral-small-3 | 6.3 | 14.8 | 13.0 | — | — | 38.1 | 4.3 | — | 14.1 | 56.3 | |
mistral-small-3.1 | 6.3 | 14.8 | 13.0 | — | — | 38.1 | 4.3 | — | 14.1 | 56.3 | |
mistral-small-3.2 | 6.3 | 14.8 | 13.0 | — | — | 38.1 | 4.3 | — | 14.1 | 56.3 | |
chatgpt-4o-latest | 32.7 | 39.6 | 35.6 | 25.7 | — | 65.5 | 5.0 | — | 42.5 | 89.3 | |
gpt-3.5-turbo | — | — | 8.3 | — | 70.2 | 30.3 | — | 68.0 | — | 44.1 | |
gpt-4.1 | 43.7 | 41.9 | 43.4 | 34.7 | — | 66.5 | 4.6 | — | 45.7 | 91.3 | |
gpt-4.1-mini | — | — | — | — | — | 65.0 | — | — | — | — | |
gpt-4.1-nano | — | — | — | — | — | 50.3 | — | — | — | — | |
gpt-4o | 15.0 | 32.1 | 27.0 | 6.0 | — | 54.3 | 3.3 | — | 30.9 | 75.9 | |
gpt-4o-2024-05-13 | 15.0 | 32.1 | 27.0 | 6.0 | 83.4 | 54.0 | 3.3 | 90.2 | 30.9 | 75.9 | |
gpt-4o-mini | 11.7 | 23.1 | 21.2 | 14.7 | — | 42.6 | 4.0 | — | 23.4 | 78.9 | |
gpt-4o-mini-search-preview | — | — | — | — | 79.7 | 40.2 | — | 87.2 | — | — | |
gpt-5 | 95.7 | 54.9 | 66.7 | 94.3 | — | 85.4 | 26.5 | — | 66.8 | 99.4 | |
gpt-5-chat | — | 46.1 | 41.8 | 48.3 | — | 68.6 | 5.8 | — | 54.3 | — | |
gpt-5-mini | — | 51.4 | 62.3 | 90.7 | — | 82.8 | 19.7 | — | 63.6 | — | |
gpt-5-nano | — | 45.6 | 48.5 | 83.7 | — | 67.6 | 8.2 | — | 54.6 | — | |
o1 | 72.3 | 51.9 | 47.2 | — | — | 76.4 | 7.7 | 88.1 | 67.9 | 97.0 | |
o1-mini | 60.3 | 44.9 | 39.2 | — | — | 60.1 | 4.9 | 92.4 | 57.6 | 94.4 | |
o3 | 90.3 | 59.7 | 65.2 | 88.3 | — | 83.0 | 20.0 | — | 78.4 | 99.2 | |
o3-mini | 77.0 | 55.8 | 48.1 | — | — | 76.0 | 8.7 | — | 71.7 | 97.3 | |
o4-mini | 94.0 | 63.5 | 59.0 | 90.7 | — | 79.9 | 17.5 | — | 80.4 | 98.9 | |
codestral-2501 | 4.3 | 24.5 | 13.2 | — | — | 31.2 | 4.5 | — | 24.3 | 60.7 | |
codestral-2508 | 4.3 | 24.5 | 13.2 | — | — | 31.2 | 4.5 | — | 24.3 | 60.7 | |
command | 0.7 | 12.0 | 5.5 | — | — | 32.3 | 4.5 | — | 12.2 | 27.9 | |
command-a | 9.7 | 28.4 | 28.1 | 13.0 | — | 52.7 | 4.6 | — | 28.7 | 81.9 | |
command-r | 0.3 | 6.6 | 1.0 | — | — | 28.9 | 5.1 | — | 4.4 | 14.9 | |
command-r-03-2024 | 0.7 | 5.5 | 1.0 | — | — | 28.4 | 4.8 | — | 4.8 | 16.4 | |
command-r-plus | 0.0 | 11.6 | 7.1 | — | — | 33.7 | 5.0 | — | 11.1 | 40.2 | |
command-r-plus-04-2024 | 0.7 | 12.0 | 5.5 | — | — | 32.3 | 4.5 | — | 12.2 | 27.9 | |
deephermes-3-mistral-24b-preview | 4.7 | 21.1 | 15.5 | — | — | 38.2 | 3.9 | — | 19.5 | 59.5 | |
deepseek-r1-distill-llama-70b | 67.0 | 28.9 | 30.8 | 53.7 | — | 65.2 | 6.1 | — | 26.6 | 93.5 | |
deepseek-r1-distill-llama-8b | 33.3 | 17.6 | 19.5 | 41.3 | — | 49.0 | 4.2 | — | 23.3 | 85.3 | |
deepseek-r1-distill-qwen-14b | 66.7 | 30.7 | 29.7 | 55.7 | — | 59.1 | 4.4 | — | 37.6 | 94.9 | |
deepseek-r1-distill-qwen-32b | 68.7 | 32.3 | 32.7 | 63.0 | — | 61.8 | 5.5 | — | 27.0 | 94.1 | |
deepseek-v3.1-base | 25.3 | 35.6 | 32.5 | 26.0 | 91.6 | 57.4 | 3.6 | — | 35.9 | 88.7 | |
devstral-medium | 6.7 | 31.5 | 27.9 | 4.7 | — | 49.2 | 3.8 | — | 33.7 | 70.7 | |
dolphin3.0-mistral-24b | — | — | — | — | — | 44.2 | — | — | — | — | |
dolphin3.0-r1-mistral-24b | — | — | — | — | — | 44.2 | — | — | — | — | |
gemini-2.5-flash-lite | 50.0 | 28.9 | 30.1 | 35.3 | — | 64.6 | 3.7 | — | 40.0 | 92.6 | |
gemma-2-9b-it | 0.0 | 6.6 | 7.8 | — | — | 31.1 | 3.9 | 40.2 | 12.6 | 51.7 | |
gemma-3n-e4b-it | — | — | — | — | — | 23.7 | — | 75.0 | — | — | |
grok-2-1212 | 13.3 | 27.6 | 24.7 | — | — | 51.0 | 3.8 | — | 26.7 | 77.8 | |
grok-3-mini-beta | — | — | — | — | — | 84.0 | — | — | — | — | |
kimi-k2-0905 | — | — | 50.4 | 57.3 | — | 76.3 | 6.3 | 94.5 | 61.0 | — | |
llama-3.1-405b | 21.3 | 30.2 | 25.7 | 3.0 | 84.8 | 51.1 | 4.2 | 89.0 | 30.5 | 70.3 | |
llama-3.1-nemotron-ultra-253b-v1 | 74.7 | 49.4 | 38.5 | 63.7 | — | 74.4 | 8.1 | — | 64.1 | 95.2 | |
magistral-small-2506 | — | — | — | — | — | 68.2 | — | — | — | — | |
minimax-m1 | 81.3 | 51.8 | 41.6 | 13.7 | — | 68.2 | 7.5 | — | 65.7 | 97.2 | |
ministral-3b | 0.0 | 8.1 | 5.2 | — | — | 26.0 | 5.5 | — | 6.9 | 53.7 | |
mistral-7b-instruct-v0.1 | 0.0 | 3.5 | 1.0 | — | — | 17.7 | 4.3 | — | 4.6 | 12.1 | |
mistral-7b-instruct-v0.3 | 0.0 | 3.5 | 1.0 | — | — | 17.7 | 4.3 | — | 4.6 | 12.1 | |
mistral-large | 0.0 | 19.3 | 11.9 | — | — | 35.1 | 3.4 | — | 17.8 | 52.7 | |
mistral-large-2407 | 9.3 | 26.9 | 22.3 | 0.0 | — | 47.2 | 3.2 | — | 26.7 | 71.4 | |
mistral-large-2411 | 0.0 | 19.3 | 11.9 | — | — | 35.1 | 3.4 | — | 17.8 | 52.7 | |
mistral-medium-3 | 44.0 | 36.5 | 34.7 | 30.3 | — | 57.8 | 4.3 | — | 40.0 | 90.7 | |
mistral-medium-3.1 | 3.7 | 10.9 | 8.4 | — | — | 34.9 | 3.4 | — | 9.9 | 40.5 | |
mistral-saba | 13.0 | — | 19.6 | — | — | 42.4 | 4.1 | — | — | 67.7 | |
mistral-small | 6.3 | 14.8 | 13.0 | — | — | 38.1 | 4.3 | — | 14.1 | 56.3 | |
mistral-tiny | 6.3 | 14.8 | 13.0 | — | — | 38.1 | 4.3 | — | 14.1 | 56.3 | |
nova-lite-v1 | 10.7 | 15.3 | 21.4 | 7.0 | 80.2 | 42.6 | 4.6 | 85.4 | 16.7 | 76.5 | |
nova-micro-v1 | 8.0 | 11.7 | 17.3 | 6.0 | 79.3 | 37.9 | 4.7 | 81.1 | 14.0 | 70.3 | |
nova-pro-v1 | 10.7 | 22.1 | 25.5 | 7.0 | 85.4 | 48.4 | 3.4 | 89.0 | 23.3 | 78.6 | |
qwen-2.5-coder-32b-instruct | 12.0 | 28.3 | 21.8 | — | — | 41.7 | 3.8 | 92.7 | 29.5 | 76.7 | |
qwen-turbo | 12.0 | 15.8 | 19.1 | — | — | 41.0 | 4.2 | — | 16.3 | 80.5 | |
qwen3-30b-a3b-instruct-2507 | 26.0 | 29.3 | 26.5 | 21.7 | — | 65.8 | 4.6 | — | 32.2 | 86.3 | |
qwen3-30b-a3b-thinking-2507 | 26.0 | 29.3 | 26.5 | 21.7 | — | 65.8 | 4.6 | — | 32.2 | 86.3 | |
qwen3-8b | 74.7 | 31.6 | 28.3 | 19.0 | — | 58.9 | 4.2 | — | 40.6 | 90.4 | |
qwen3-max | — | — | 48.5 | 75.0 | — | 76.4 | 9.3 | — | 65.1 | — | |
qwq-32b-preview | 45.3 | 18.7 | 28.0 | — | — | 65.2 | 4.8 | — | 33.7 | 91.0 | |
r1-1776 | — | — | 19.1 | — | — | — | — | — | — | 95.4 | |
sonar | 48.7 | 26.2 | 28.8 | — | — | 47.1 | 7.3 | — | 29.5 | 81.7 | |
sonar-pro | 29.0 | 25.0 | 28.2 | — | — | 57.8 | 7.9 | — | 27.5 | 74.5 | |
sonar-reasoning | 77.0 | — | 34.2 | — | — | 62.3 | — | — | — | 92.1 | |
sonar-reasoning-pro | 79.0 | — | 46.3 | — | — | — | — | — | — | 95.7 | |
deepseek-chat-v3-0324 | — | 47.2 | 44.8 | 49.7 | — | 73.5 | 6.3 | — | 57.7 | — | |
deepseek-chat-v3.1 | — | 47.2 | 44.8 | 49.7 | — | 73.5 | 6.3 | — | 57.7 | — | |
deepseek-r1-0528 | — | — | — | — | — | — | — | — | — | — | |
deepseek-r1-0528-qwen3-8b | 89.3 | 58.7 | 52.0 | 76.0 | — | 81.3 | 14.9 | — | 77.0 | 98.3 | |
gemma-3-27b-it | 25.3 | 17.4 | 22.0 | 20.7 | — | 42.6 | 4.7 | 87.8 | 13.7 | 88.3 | |
glm-4.5 | 87.3 | 54.3 | 49.4 | 73.7 | — | 78.6 | 12.2 | — | 73.8 | 97.9 | |
glm-4.5v | — | 27.0 | 26.0 | 15.3 | — | 57.3 | 3.6 | — | 35.2 | — | |
gpt-oss-120b | — | 50.1 | 57.9 | 93.4 | — | 79.2 | 18.5 | — | 63.9 | — | |
kimi-k2 | — | — | 50.4 | 57.3 | — | 76.3 | 6.3 | 94.5 | 61.0 | — | |
llama-3.3-70b-instruct | 30.0 | 27.4 | 27.9 | 7.7 | — | 50.2 | 4.0 | 88.4 | 28.8 | 77.3 | |
llama-4-maverick | 39.0 | 36.4 | 35.8 | 19.3 | — | 68.5 | 4.8 | — | 39.7 | 88.9 | |
mistral-nemo | 0.3 | 8.1 | 5.2 | — | — | 31.4 | 4.4 | — | 5.7 | 39.5 | |
mistral-small-3.2-24b-instruct | 6.3 | 14.8 | 13.0 | — | — | 38.1 | 4.3 | — | 14.1 | 56.3 | |
qwen3-14b | 28.0 | 27.3 | 29.2 | 58.0 | — | 47.0 | 4.2 | — | 28.0 | 87.1 | |
qwen3-235b-a22b-2507 | 32.7 | 32.1 | 29.9 | 23.7 | — | 61.3 | 4.7 | — | 34.3 | 90.2 | |
qwen3-235b-a22b-thinking-2507 | — | — | — | — | — | 81.1 | — | — | — | — | |
qwen3-30b-a3b | 26.0 | 29.3 | 26.5 | 21.7 | — | 65.8 | 4.6 | — | 32.2 | 86.3 | |
qwen3-32b | 30.3 | 28.4 | 26.4 | 19.7 | — | 53.5 | 4.3 | — | 28.8 | 86.9 | |
qwen3-coder | 47.7 | 47.2 | 42.3 | 39.3 | — | 61.8 | 4.4 | — | 58.5 | 94.2 | |
DeepSeek-R1-Distill-Llama-70B | 67.0 | 28.9 | 30.8 | 53.7 | — | 65.2 | 6.1 | — | 26.6 | 93.5 | |
DeepSeek-R1-Distill-Qwen-1.5B | 17.7 | 6.8 | 8.6 | 22.0 | — | 33.8 | 3.3 | — | 7.0 | 68.7 | |
DeepSeek-R1-Distill-Qwen-14B | 66.7 | 30.7 | 29.7 | 55.7 | — | 59.1 | 4.4 | — | 37.6 | 94.9 | |
DeepSeek-V3 | 25.3 | 35.6 | 32.5 | 26.0 | 91.6 | 57.4 | 3.6 | — | 35.9 | 88.7 | |
QwQ-32B-Preview | 45.3 | 18.7 | 28.0 | — | — | 65.2 | 4.8 | — | 33.7 | 91.0 | |
Qwen2-72B-Instruct | 14.7 | 19.4 | 18.1 | — | — | 42.4 | 3.7 | 86.0 | 15.9 | 70.1 | |
gemma-2-27b-it | 29.7 | 20.2 | 17.2 | — | — | 35.7 | 3.7 | 51.8 | 27.9 | 54.1 | |
grok-2 | — | — | — | — | — | 56.0 | — | 88.4 | — | — | |
grok-3 | 33.0 | 39.7 | 36.0 | — | — | 84.6 | 5.1 | — | 42.5 | 87.0 | |
grok-3-mini | — | — | — | — | — | 84.0 | — | — | — | — | |
grok-4 | 94.3 | 63.8 | 65.3 | 92.7 | — | 87.6 | 23.9 | — | 81.9 | 99.0 | |
grok-code-fast-1 | — | 51.0 | 48.6 | 43.3 | — | 72.7 | 7.5 | — | 65.7 | — | |
glm-4.5 | 87.3 | 54.3 | 49.4 | 73.7 | — | 78.6 | 12.2 | — | 73.8 | 97.9 | |
glm-4.5-air | 67.3 | 49.5 | 48.8 | 80.7 | — | 74.2 | 6.8 | — | 68.4 | 96.5 | |
glm-4.5-airx | 67.3 | 49.5 | 48.8 | 80.7 | — | 74.2 | 6.8 | — | 68.4 | 96.5 | |
glm-4.5-x | 87.3 | 54.3 | 49.4 | 73.7 | — | 78.6 | 12.2 | — | 73.8 | 97.9 | |
glm-4.5v | — | 27.0 | 26.0 | 15.3 | — | 57.3 | 3.6 | — | 35.2 | — |
Showing 189 models with benchmark data. Click column headers to sort. Select benchmarks above to customize view.