llama-3.1-nemotron-ultra-253b-v1

completions

byopenrouter

llama-3.1-nemotron-ultra-253b-v1

completions

Published by: nvidiaProvider:

openrouter

Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural Architecture Search (NAS), resulting in enhanced efficiency, reduced memory usage, and improved inference latency. The model supports a context length of up to 128K tokens and can operate efficiently on an 8x NVIDIA H100 node. Note: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more.

Released

Apr 7, 2025

Knowledge

Dec 1, 2023

License

llama_3_1_community_license

Context

131072

Input

$0.6 / 1M tokens

Output

$1.8 / 1M tokens

Accepts: text

Returns: text

Released Apr 7, 2025Knowledge Cutoff: Dec 1, 2023License: llama_3_1_community_license

Context: 131072 Input: $0.6 / 1M tokensOutput: $1.8 / 1M tokensAccepts: textReturns: text

Access llama-3.1-nemotron-ultra-253b-v1 through LangDB AI Gateway

Recommended

Integrate with nvidia's llama-3.1-nemotron-ultra-253b-v1 and 250+ other models through a unified API. Monitor usage, control costs, and enhance security.

Unified API

Cost Optimization

Enterprise Security

Get Started Now

Free tier available • No credit card required

Instant Setup

99.9% Uptime

10,000+Monthly Requests

Benchmark Results for llama-3.1-nemotron-ultra-253b-v1

Category Performance Scores:

Maths: Score 63.70 (Top 42% - Rank #147)
Finance: Score 39.35 (Top 49% - Rank #171)
Science: Score 70.19 (Top 41% - Rank #143)
Writing: Score 43.97 (Top 63% - Rank #220)
Academia: Score 44.70 (Top 53% - Rank #185)
Marketing: Score 39.50 (Top 70% - Rank #244)
Programming: Score 13.10 (Top 77% - Rank #268)

Overall Performance: 44.92919047619048 average score across all categories

Detailed Benchmark Scores:

Benchmark	Score	Percentile	Domain
HLE	8.10	Top 41%	General Knowledge
AIME	74.70	Top 21%	Mathematics
GPQA	74.41	Top 43%	STEM (Physics, Chemistry, Biology)
SciCode	34.70	Top 46%	Scientific
MATH-500	95.20	Top 24%	Mathematics
MMLU-Pro	82.50	Top 32%	General Knowledge
LiveCodeBench	64.10	Top 33%	Programming
AA Math Index	63.70	Top 42%	Mathematics
AA Coding Index	13.10	Top 77%	Programming
AAII	15.00	Top 69%	General

GPQA Score: 74.41 - Graduate-level reasoning benchmark

Model Comparison:

Provider: openrouter

Model Type: completions

Context Size: 131072 tokens

Comparing against 348 models in the database

Category Scores

Benchmark Tests

View Other Benchmarks

HLE

8.1

General Knowledge

AIME

74.7

Mathematics

GPQA

74.4

STEM (Physics, Chemistry, Biology)

SciCode

34.7

Scientific

MATH-500

95.2

Mathematics

MMLU-Pro

82.5

General Knowledge

LiveCodeBench

64.1

Programming

AA Math Index

63.7

Mathematics

AA Coding Index

13.1

Programming

AAII

15.0

General

Metric	HLE	AIME	GPQA	SciCode	MATH-500	MMLU-Pro	LiveCodeBench	AA Math Index	AA Coding Index	AAII
Score	8.1	74.7	74.4	34.7	95.2	82.5	64.1	63.7	13.1	15.0

Compare with Similar Models

claude-opus-4.5

claude-opus-4.6

gemini-3-flash-preview

claude-sonnet-4.5

gemini-3-pro-preview

claude-sonnet-4

Code Examples

Integration samples and API usage

Code Samples for llama-3.1-nemotron-ultra-253b-v1

Python SDK Example:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.langdb.ai/projects/<your_project_id>",
    api_key="<your_api_key>"
)

response = client.chat.completions.create(
    model="llama-3.1-nemotron-ultra-253b-v1",
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

print(response.choices[0].message.content)

TypeScript SDK Example:

import OpenAI from 'openai';

const client = new OpenAI({
    baseURL: "https://api.langdb.ai/projects/<your_project_id>",
    apiKey: "<your_api_key>"
});

const response = await client.chat.completions.create({
    model: "llama-3.1-nemotron-ultra-253b-v1",
    messages: [
        { role: "user", content: "Hello, how are you?" }
    ]
});

console.log(response.choices[0].message.content);

cURL Example:

curl -X POST "https://api.langdb.ai/projects/<your_project_id>/v1/chat/completions" \
  -H "Authorization: Bearer <your_api_key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-nemotron-ultra-253b-v1",
    "messages": [
        {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Model: llama-3.1-nemotron-ultra-253b-v1

Provider: openrouter

API Endpoint: $https://api.langdb.ai

Create API Key

Related Models

Similar models from openrouter

llama-3.1-nemotron-ultra-253b-v1

llama-3.1-nemotron-ultra-253b-v1

Access llama-3.1-nemotron-ultra-253b-v1 through LangDB AI Gateway

Category Scores

Benchmark Tests

Compare with Similar Models

Code Examples

Related Models

aion-1.0

aion-1.0-mini

aion-2.0

aion-rp-llama-3.1-8b

coder-large

cogito-v2.1-671b

llama-3.1-nemotron-ultra-253b-v1 by openrouter - AI Model Details, Pricing, and Performance Metrics

llama-3.1-nemotron-ultra-253b-v1

llama-3.1-nemotron-ultra-253b-v1

Access llama-3.1-nemotron-ultra-253b-v1 through LangDB AI Gateway

Category Scores

Benchmark Tests

Compare with Similar Models

Code Examples

Related Models

aion-1.0

aion-1.0-mini

aion-2.0

aion-rp-llama-3.1-8b

coder-large

cogito-v2.1-671b