Back to GalleryBack
Benchmarks Comparison
Benchmark Comparison
Compare AI model performance across standardized tests
Individual Benchmark Results
This table shows performance on specific standardized tests (MMLU, HumanEval, GPQA, etc.) rather than category aggregations. Scores represent pass rates or accuracy percentages on each benchmark.
Filters & View Options
164 models
Select benchmarks to display (showing first 10):
Model | Provider | AIME | AA Coding Index | AAII | AA Math Index | DROP | GPQA | HLE | HumanEval | LiveCodeBench | MATH-500 |
---|---|---|---|---|---|---|---|---|---|---|---|
claude-3-5-sonnet-20240620 | 15.7 | 37.3 | 33.5 | 46.4 | 87.1 | 59.7 | 3.9 | 92.0 | 38.1 | 77.1 | |
claude-3-haiku-20240307 | 1.0 | 17.0 | 12.1 | 20.2 | 78.4 | 33.3 | — | 75.9 | 15.4 | 39.4 | |
claude-3-opus-20240229 | 3.3 | 25.6 | 23.7 | 33.7 | 83.1 | 49.6 | 3.1 | 84.9 | 27.9 | 64.1 | |
claude-opus-4 | — | — | 49.0 | — | — | 80.9 | — | — | — | — | |
claude-opus-4.1 | 56.3 | 47.5 | 46.6 | 75.2 | — | 79.6 | 5.9 | — | 54.2 | 94.1 | |
claude-sonnet-4 | 40.7 | 41.1 | 45.7 | 67.0 | — | 75.4 | 4.0 | — | 44.9 | 93.4 | |
mistral-7b-instruct-v0.2 | 0.0 | 3.5 | 1.0 | 6.1 | — | 17.7 | 4.3 | — | 4.6 | 12.1 | |
mixtral-8x7b-instruct-v0.1 | 0.0 | 4.7 | 4.8 | 15.0 | — | 29.2 | 4.5 | — | 6.6 | 29.9 | |
DeepSeek-R1 | 89.3 | 58.7 | 58.6 | 93.8 | 92.2 | 81.3 | 14.9 | — | 77.0 | 98.3 | |
DeepSeek-R1-Distill-Llama-70B | 67.0 | 28.9 | 34.5 | 80.2 | — | 65.2 | 6.1 | — | 26.6 | 93.5 | |
DeepSeek-R1-Distill-Qwen-32B | 68.7 | 32.3 | 36.4 | 81.4 | — | 61.8 | 5.5 | — | 27.0 | 94.1 | |
DeepSeek-V3 | 25.3 | 35.6 | 36.9 | 57.0 | 91.6 | 57.4 | 3.6 | — | 35.9 | 88.7 | |
deepseek-r1-0528 | — | — | — | — | — | 81.0 | — | — | — | — | |
devstral-small | 0.3 | 24.9 | 20.5 | 31.9 | — | 41.4 | 3.7 | — | 25.4 | 63.5 | |
devstral-small-2505 | 6.7 | 25.2 | 22.7 | 37.5 | — | 43.4 | 4.0 | — | 25.8 | 68.4 | |
gemma-3-12b-it | 22.0 | 15.5 | 24.0 | 53.7 | — | 40.9 | 4.8 | 85.4 | 13.7 | 85.3 | |
gemma-3-27b-it | 25.3 | 17.4 | 25.2 | 56.8 | — | 42.6 | 4.7 | 87.8 | 13.7 | 88.3 | |
gemma-3-4b-it | 6.3 | 9.3 | 17.6 | 41.5 | — | 29.9 | 5.2 | 71.3 | 11.2 | 76.6 | |
gpt-oss-120b | — | 50.1 | 61.3 | — | — | 79.2 | 18.5 | — | 63.9 | — | |
gpt-oss-20b | — | 53.7 | 49.0 | — | — | 71.5 | 8.5 | — | 72.1 | — | |
kimi-k2 | 69.3 | 36.5 | 48.6 | 83.2 | — | 76.6 | 7.0 | — | 55.6 | 97.1 | |
llama-3.1-405b-instruct | 21.3 | 30.2 | 28.9 | 45.8 | 84.8 | 51.1 | 4.2 | 89.0 | 30.5 | 70.3 | |
llama-3.1-70b-instruct | 17.3 | 25.0 | 26.0 | 41.1 | 79.6 | 41.3 | 4.6 | 80.5 | 23.2 | 64.9 | |
llama-3.1-8b-instruct | 0.0 | 10.8 | 9.5 | 25.0 | 59.5 | 30.0 | 5.1 | 72.6 | 9.6 | 49.9 | |
llama-3.1-nemotron-70b-instruct | 24.7 | 20.1 | 26.1 | 49.0 | — | 46.5 | 4.6 | — | 16.9 | 73.3 | |
llama-3.2-3b-instruct | 6.7 | 6.7 | 13.8 | 27.8 | — | 32.8 | 5.2 | — | 8.3 | 48.9 | |
llama-3.3-70b-instruct | 0.0 | 19.3 | 15.7 | 24.1 | — | 50.5 | 4.4 | 88.4 | 19.8 | 48.3 | |
llama-4-maverick | 39.0 | 36.4 | 41.7 | 63.9 | — | 68.5 | 4.8 | — | 39.7 | 88.9 | |
llama-4-scout | 28.3 | 23.5 | 33.1 | 56.4 | — | 57.9 | 4.3 | — | 29.9 | 84.4 | |
mistral-7b-instruct | 0.0 | 3.5 | 1.0 | 6.1 | — | 17.7 | 4.3 | — | 4.6 | 12.1 | |
mistral-small-24b-instruct-2501 | — | — | — | — | — | 45.3 | — | 84.8 | — | — | |
mixtral-8x7b-instruct | 0.0 | 4.7 | 4.8 | 15.0 | — | 29.2 | 4.5 | — | 6.6 | 29.9 | |
phi-4 | 14.3 | 24.6 | 27.9 | 47.7 | 75.5 | 56.8 | 4.1 | 82.6 | 23.1 | 81.0 | |
phi-4-multimodal-instruct | 9.3 | 12.1 | 15.1 | 39.3 | — | 31.5 | 4.4 | — | 13.1 | 69.3 | |
phi-4-reasoning-plus | — | — | — | — | — | 68.9 | — | — | — | — | |
qwen-2.5-72b-instruct | 16.0 | 27.2 | 31.4 | 50.9 | — | 49.0 | 4.2 | 86.6 | 27.6 | 85.8 | |
qwen-2.5-7b-instruct | — | — | — | — | — | 36.4 | — | 84.8 | — | — | |
qwen3-14b | 28.0 | 27.3 | 31.9 | 57.6 | — | 47.0 | 4.2 | — | 28.0 | 87.1 | |
qwen3-235b-a22b | 32.7 | 32.1 | 33.3 | 61.4 | — | 61.3 | 4.7 | — | 34.3 | 90.2 | |
qwen3-235b-a22b-2507 | 32.7 | 32.1 | 33.3 | 61.4 | — | 61.3 | 4.7 | — | 34.3 | 90.2 | |
qwen3-235b-a22b-thinking-2507 | 32.7 | 32.1 | 33.3 | 61.4 | — | 61.3 | 4.7 | — | 34.3 | 90.2 | |
qwen3-30b-a3b | 26.0 | 29.3 | 29.9 | 56.1 | — | 65.8 | 4.6 | — | 32.2 | 86.3 | |
qwen3-32b | 30.3 | 28.4 | 29.8 | 58.6 | — | 53.5 | 4.3 | — | 28.8 | 86.9 | |
qwen3-coder | 47.7 | 47.2 | 45.2 | 70.9 | — | 61.8 | 4.4 | — | 58.5 | 94.2 | |
qwq-32b | 78.0 | 49.4 | 41.9 | 86.9 | — | 65.2 | 8.2 | — | 63.1 | 95.7 | |
deepseek-r1 | 89.3 | 58.7 | 58.6 | 93.8 | 92.2 | 81.3 | 14.9 | — | 77.0 | 98.3 | |
deepseek-r1-05-28 | 89.3 | 58.7 | 58.6 | 93.8 | 92.2 | 81.3 | 14.9 | — | 77.0 | 98.3 | |
deepseek-v3 | 25.3 | 35.6 | 36.9 | 57.0 | 91.6 | 57.4 | 3.6 | — | 35.9 | 88.7 | |
deepseek-v3-03-24 | 25.3 | 35.6 | 36.9 | 57.0 | 91.6 | 57.4 | 3.6 | — | 35.9 | 88.7 | |
gpt-oss-120b | — | 50.1 | 61.3 | — | — | 79.2 | 18.5 | — | 63.9 | — | |
gpt-oss-20b | — | 53.7 | 49.0 | — | — | 71.5 | 8.5 | — | 72.1 | — | |
kimi-k2-instruct | — | — | — | — | — | 75.1 | — | 93.3 | — | — | |
mixtral-8x22b-instruct | 0.0 | 16.8 | 14.4 | 27.2 | — | 33.2 | 4.1 | — | 14.8 | 54.5 | |
qwen3-235b-a22b | 32.7 | 32.1 | 33.3 | 61.4 | — | 61.3 | 4.7 | — | 34.3 | 90.2 | |
qwen3-coder-480b-a35b-instruct | 47.7 | 47.2 | 45.2 | 70.9 | — | 61.8 | 4.4 | — | 58.5 | 94.2 | |
gemini-1.5-flash-8b | 3.3 | 22.3 | 19.2 | 36.1 | — | 37.1 | 4.5 | — | 21.7 | 68.9 | |
gemini-2.0-flash | 33.0 | 33.4 | 37.8 | 63.0 | — | 62.2 | 5.3 | — | 33.4 | 93.0 | |
gemini-2.0-flash-lite | — | — | — | — | — | 51.5 | — | — | — | — | |
gemini-2.5-flash | 50.0 | 39.3 | 47.3 | 71.6 | — | 82.8 | 5.1 | — | 49.5 | 93.2 | |
gemini-2.5-pro | 88.7 | 61.5 | 64.6 | 92.7 | — | 83.7 | 21.1 | — | 80.1 | 96.7 | |
gemini-2.5-pro-preview | — | — | — | — | — | 86.4 | — | — | — | — | |
chatgpt-4o-latest | 32.7 | 39.6 | 39.5 | 61.0 | — | 65.5 | 5.0 | — | 42.5 | 89.3 | |
gpt-3.5-turbo | — | — | 10.8 | — | 70.2 | 30.3 | — | 68.0 | — | 44.1 | |
gpt-4.1 | 43.7 | 41.9 | 46.8 | 67.5 | — | 66.5 | 4.6 | — | 45.7 | 91.3 | |
gpt-4.1-mini | — | — | — | — | — | 65.0 | — | — | — | — | |
gpt-4.1-nano | — | — | — | — | — | 50.3 | — | — | — | — | |
gpt-4o | 15.0 | 32.1 | 29.6 | 45.5 | — | 54.3 | 3.3 | — | 30.9 | 75.9 | |
gpt-4o-2024-05-13 | 15.0 | 32.1 | 29.6 | 45.5 | 83.4 | 54.0 | 3.3 | 90.2 | 30.9 | 75.9 | |
gpt-4o-mini | 11.7 | 23.1 | 24.3 | 45.3 | — | 42.6 | 4.0 | — | 23.4 | 78.9 | |
gpt-4o-mini-search-preview | — | — | — | — | 79.7 | 40.2 | — | 87.2 | — | — | |
gpt-5 | 95.7 | 54.9 | 69.0 | 97.5 | — | 85.4 | 26.5 | — | 66.8 | 99.4 | |
gpt-5-mini | — | 51.4 | 65.4 | — | — | 82.8 | 19.7 | — | 63.6 | — | |
gpt-5-nano | — | 45.6 | 54.7 | — | — | 67.6 | 8.2 | — | 54.6 | — | |
o1 | 72.3 | 51.9 | 51.7 | 84.7 | — | 76.4 | 7.7 | 88.1 | 67.9 | 97.0 | |
o1-mini | 60.3 | 44.9 | 43.3 | 77.4 | — | 60.1 | 4.9 | 92.4 | 57.6 | 94.4 | |
o1-pro | — | — | 45.6 | — | — | 79.0 | — | — | — | — | |
o3 | 90.3 | 59.7 | 67.1 | 94.8 | — | 83.0 | 20.0 | — | 78.4 | 99.2 | |
o3-mini | 77.0 | 55.8 | 52.7 | 87.2 | — | 76.0 | 8.7 | — | 71.7 | 97.3 | |
o3-pro | — | — | 67.5 | — | — | 84.5 | — | — | — | — | |
o4-mini | 94.0 | 63.5 | 65.0 | 96.4 | — | 79.9 | 17.5 | — | 80.4 | 98.9 | |
command | 0.7 | 12.0 | 7.8 | 14.3 | — | 32.3 | 4.5 | — | 12.2 | 27.9 | |
command-a | 9.7 | 28.4 | 31.6 | 45.8 | — | 52.7 | 4.6 | — | 28.7 | 81.9 | |
command-r | 0.3 | 6.6 | 2.5 | 7.6 | — | 28.9 | 5.1 | — | 4.4 | 14.9 | |
command-r-03-2024 | 0.7 | 5.5 | 2.4 | 8.5 | — | 28.4 | 4.8 | — | 4.8 | 16.4 | |
command-r-plus | 0.0 | 11.6 | 9.5 | 20.1 | — | 33.7 | 5.0 | — | 11.1 | 40.2 | |
command-r-plus-04-2024 | 0.7 | 12.0 | 7.8 | 14.3 | — | 32.3 | 4.5 | — | 12.2 | 27.9 | |
deephermes-3-mistral-24b-preview | 4.7 | 21.1 | 18.4 | 32.1 | — | 38.2 | 3.9 | — | 19.5 | 59.5 | |
deepseek-r1-distill-llama-70b | 67.0 | 28.9 | 34.5 | 80.2 | — | 65.2 | 6.1 | — | 26.6 | 93.5 | |
deepseek-r1-distill-llama-8b | 33.3 | 17.6 | 22.8 | 59.3 | — | 49.0 | 4.2 | — | 23.3 | 85.3 | |
deepseek-r1-distill-qwen-14b | 66.7 | 30.7 | 33.3 | 80.8 | — | 59.1 | 4.4 | — | 37.6 | 94.9 | |
deepseek-r1-distill-qwen-32b | 68.7 | 32.3 | 36.4 | 81.4 | — | 61.8 | 5.5 | — | 27.0 | 94.1 | |
devstral-medium | 6.7 | 31.5 | 31.3 | 38.7 | — | 49.2 | 3.8 | — | 33.7 | 70.7 | |
devstral-small | 0.3 | 24.9 | 20.5 | 31.9 | — | 41.4 | 3.7 | — | 25.4 | 63.5 | |
devstral-small-2505 | 6.7 | 25.2 | 22.7 | 37.5 | — | 43.4 | 4.0 | — | 25.8 | 68.4 | |
gemini-2.5-flash-lite | 50.0 | 28.9 | 34.9 | 71.3 | — | 64.6 | 3.7 | — | 40.0 | 92.6 | |
gemma-2-9b-it | 0.0 | 6.6 | 10.2 | 25.9 | — | 31.1 | 3.9 | 40.2 | 12.6 | 51.7 | |
gemma-3-12b-it | 22.0 | 15.5 | 24.0 | 53.7 | — | 40.9 | 4.8 | 85.4 | 13.7 | 85.3 | |
gemma-3-4b-it | 6.3 | 9.3 | 17.6 | 41.5 | — | 29.9 | 5.2 | 71.3 | 11.2 | 76.6 | |
gemma-3n-e4b-it | — | — | — | — | — | 23.7 | — | 75.0 | — | — | |
grok-2-1212 | 13.3 | 27.6 | 28.0 | 45.6 | — | 51.0 | 3.8 | — | 26.7 | 77.8 | |
grok-3 | 33.0 | 39.7 | 39.9 | 60.0 | — | 84.6 | 5.1 | — | 42.5 | 87.0 | |
grok-3-mini | — | — | — | — | — | 84.0 | — | — | — | — | |
llama-3.1-405b-instruct | 21.3 | 30.2 | 28.9 | 45.8 | 84.8 | 51.1 | 4.2 | 89.0 | 30.5 | 70.3 | |
llama-3.1-70b-instruct | 17.3 | 25.0 | 26.0 | 41.1 | 79.6 | 41.3 | 4.6 | 80.5 | 23.2 | 64.9 | |
llama-3.1-8b-instruct | 0.0 | 10.8 | 9.5 | 25.0 | 59.5 | 30.0 | 5.1 | 72.6 | 9.6 | 49.9 | |
llama-3.1-nemotron-70b-instruct | 24.7 | 20.1 | 26.1 | 49.0 | — | 46.5 | 4.6 | — | 16.9 | 73.3 | |
llama-3.1-nemotron-ultra-253b-v1 | 74.7 | 49.4 | 46.4 | 84.9 | — | 74.4 | 8.1 | — | 64.1 | 95.2 | |
llama-3.2-3b-instruct | 6.7 | 6.7 | 13.8 | 27.8 | — | 32.8 | 5.2 | — | 8.3 | 48.9 | |
llama-3.3-nemotron-super-49b-v1 | 19.3 | 25.5 | 29.3 | 48.4 | — | 66.7 | 3.5 | — | 28.0 | 77.5 | |
llama-4-scout | 28.3 | 23.5 | 33.1 | 56.4 | — | 57.9 | 4.3 | — | 29.9 | 84.4 | |
magistral-small-2506 | — | — | — | — | — | 68.2 | — | — | — | — | |
minimax-m1 | 81.3 | 51.8 | 45.8 | 89.3 | — | 68.2 | 7.5 | — | 65.7 | 97.2 | |
ministral-3b | 0.0 | 8.1 | 7.5 | 26.8 | — | 26.0 | 5.5 | — | 6.9 | 53.7 | |
mistral-7b-instruct | 0.0 | 3.5 | 1.0 | 6.1 | — | 17.7 | 4.3 | — | 4.6 | 12.1 | |
mistral-large | 0.0 | 19.3 | 14.6 | 26.3 | — | 35.1 | 3.4 | — | 17.8 | 52.7 | |
mistral-large-2407 | 9.3 | 26.9 | 25.5 | 40.4 | — | 47.2 | 3.2 | — | 26.7 | 71.4 | |
mistral-medium-3 | 44.0 | 36.5 | 38.6 | 67.3 | — | 57.8 | 4.3 | — | 40.0 | 90.7 | |
mistral-saba | 13.0 | — | 22.6 | 40.3 | — | 42.4 | 4.1 | — | — | 67.7 | |
mistral-small | 6.3 | 14.8 | 15.7 | 31.3 | — | 38.1 | 4.3 | — | 14.1 | 56.3 | |
mistral-small-24b-instruct-2501 | — | — | — | — | — | 45.3 | — | 84.8 | — | — | |
mixtral-8x7b-instruct | 0.0 | 4.7 | 4.8 | 15.0 | — | 29.2 | 4.5 | — | 6.6 | 29.9 | |
nova-lite-v1 | 10.7 | 15.3 | 24.5 | 43.6 | 80.2 | 42.6 | 4.6 | 85.4 | 16.7 | 76.5 | |
nova-micro-v1 | 8.0 | 11.7 | 20.2 | 39.2 | 79.3 | 37.9 | 4.7 | 81.1 | 14.0 | 70.3 | |
nova-pro-v1 | 10.7 | 22.1 | 28.8 | 44.6 | 85.4 | 48.4 | 3.4 | 89.0 | 23.3 | 78.6 | |
phi-4-multimodal-instruct | 9.3 | 12.1 | 15.1 | 39.3 | — | 31.5 | 4.4 | — | 13.1 | 69.3 | |
phi-4-reasoning-plus | — | — | — | — | — | 68.9 | — | — | — | — | |
qwen-2.5-72b-instruct | 16.0 | 27.2 | 31.4 | 50.9 | — | 49.0 | 4.2 | 86.6 | 27.6 | 85.8 | |
qwen-2.5-7b-instruct | — | — | — | — | — | 36.4 | — | 84.8 | — | — | |
qwen-2.5-coder-32b-instruct | 12.0 | 28.3 | 25.0 | 44.3 | — | 41.7 | 3.8 | 92.7 | 29.5 | 76.7 | |
qwen-turbo | 12.0 | 15.8 | 22.1 | 46.3 | — | 41.0 | 4.2 | — | 16.3 | 80.5 | |
qwen3-30b-a3b-instruct-2507 | 26.0 | 29.3 | 29.9 | 56.1 | — | 65.8 | 4.6 | — | 32.2 | 86.3 | |
qwen3-8b | 24.3 | 18.5 | 25.3 | 53.6 | — | 45.2 | 2.8 | — | 20.2 | 82.8 | |
qwq-32b | 78.0 | 49.4 | 41.9 | 86.9 | — | 65.2 | 8.2 | — | 63.1 | 95.7 | |
qwq-32b-preview | 45.3 | 18.7 | 31.5 | 68.2 | — | 65.2 | 4.8 | — | 33.7 | 91.0 | |
r1-1776 | — | — | 22.2 | — | — | — | — | — | — | 95.4 | |
sonar | 48.7 | 26.2 | 32.4 | 65.2 | — | 47.1 | 7.3 | — | 29.5 | 81.7 | |
sonar-pro | 29.0 | 25.0 | 31.7 | 51.7 | — | 57.8 | 7.9 | — | 27.5 | 74.5 | |
sonar-reasoning | 77.0 | — | 38.0 | 84.6 | — | 62.3 | — | — | — | 92.1 | |
sonar-reasoning-pro | 79.0 | — | 50.7 | 87.4 | — | — | — | — | — | 95.7 | |
deepseek-r1-0528 | — | — | — | — | — | 81.0 | — | — | — | — | |
deepseek-r1-0528-qwen3-8b | 89.3 | 58.7 | 58.6 | 93.8 | 92.2 | 81.3 | 14.9 | — | 77.0 | 98.3 | |
gemma-3-27b-it | 25.3 | 17.4 | 25.2 | 56.8 | — | 42.6 | 4.7 | 87.8 | 13.7 | 88.3 | |
glm-4.5 | 87.3 | 54.3 | 56.1 | 92.6 | — | 78.2 | 12.2 | — | 73.8 | 97.9 | |
gpt-oss-120b | — | 50.1 | 61.3 | — | — | 79.2 | 18.5 | — | 63.9 | — | |
kimi-k2 | 69.3 | 36.5 | 48.6 | 83.2 | — | 76.6 | 7.0 | — | 55.6 | 97.1 | |
llama-3.3-70b-instruct | 0.0 | 19.3 | 15.7 | 24.1 | — | 50.5 | 4.4 | 88.4 | 19.8 | 48.3 | |
llama-4-maverick | 39.0 | 36.4 | 41.7 | 63.9 | — | 68.5 | 4.8 | — | 39.7 | 88.9 | |
qwen3-14b | 28.0 | 27.3 | 31.9 | 57.6 | — | 47.0 | 4.2 | — | 28.0 | 87.1 | |
qwen3-235b-a22b-2507 | 32.7 | 32.1 | 33.3 | 61.4 | — | 61.3 | 4.7 | — | 34.3 | 90.2 | |
qwen3-235b-a22b-thinking-2507 | 32.7 | 32.1 | 33.3 | 61.4 | — | 61.3 | 4.7 | — | 34.3 | 90.2 | |
qwen3-30b-a3b | 26.0 | 29.3 | 29.9 | 56.1 | — | 65.8 | 4.6 | — | 32.2 | 86.3 | |
qwen3-32b | 30.3 | 28.4 | 29.8 | 58.6 | — | 53.5 | 4.3 | — | 28.8 | 86.9 | |
qwen3-coder | 47.7 | 47.2 | 45.2 | 70.9 | — | 61.8 | 4.4 | — | 58.5 | 94.2 | |
DeepSeek-R1-Distill-Llama-70B | 67.0 | 28.9 | 34.5 | 80.2 | — | 65.2 | 6.1 | — | 26.6 | 93.5 | |
DeepSeek-R1-Distill-Qwen-1.5B | 17.7 | 6.8 | 11.1 | 43.2 | — | 33.8 | 3.3 | — | 7.0 | 68.7 | |
DeepSeek-R1-Distill-Qwen-14B | 66.7 | 30.7 | 33.3 | 80.8 | — | 59.1 | 4.4 | — | 37.6 | 94.9 | |
DeepSeek-V3 | 25.3 | 35.6 | 36.9 | 57.0 | 91.6 | 57.4 | 3.6 | — | 35.9 | 88.7 | |
QwQ-32B-Preview | 45.3 | 18.7 | 31.5 | 68.2 | — | 65.2 | 4.8 | — | 33.7 | 91.0 | |
Qwen2-72B-Instruct | 14.7 | 19.4 | 21.1 | 42.4 | — | 42.4 | 3.7 | 86.0 | 15.9 | 70.1 | |
gemma-2-27b-it | 29.7 | 20.2 | 20.1 | 41.9 | — | 35.7 | 3.7 | 51.8 | 27.9 | 54.1 | |
grok-2 | — | — | — | — | — | 56.0 | — | 88.4 | — | — | |
grok-3-mini-beta | — | — | — | — | — | 84.0 | — | — | — | — | |
grok-4 | 94.3 | 63.8 | 67.5 | 96.7 | — | 87.6 | 23.9 | — | 81.9 | 99.0 | |
glm-4.5 | 87.3 | 54.3 | 56.1 | 92.6 | — | 78.2 | 12.2 | — | 73.8 | 97.9 |
Showing 164 models with benchmark data. Click column headers to sort. Select benchmarks above to customize view.