llama-3.2-90b-vision-instruct

completions

The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks. This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Input:$1.2 / 1M tokens

Output:$1.2 / 1M tokens

Context:131072 tokens

text

image

text

Access llama-3.2-90b-vision-instruct through LangDB AI Gateway

Recommended

Integrate with meta-llama's llama-3.2-90b-vision-instruct and 250+ other models through a unified API. Monitor usage, control costs, and enhance security.

Unified API

Cost Optimization

Enterprise Security

Get Started Now

Free tier available • No credit card required

Instant Setup

99.9% Uptime

10,000+Monthly Requests

Code Example

Configuration

Base URL

API Keys

Headers

Project ID in header

X-Run-Id

X-Thread-Id

Model Parameters

11 available

frequency_penalty

-202

logit_bias

max_tokens

min_p

001

presence_penalty

-201.999

repetition_penalty

012

response_format

stop

temperature

012

top_k

top_p

011

Additional Configuration

Tools

Guards

User:

Id:

Name:

Tags:

Publicly Shared Threads0

Discover shared experiences

Shared threads will appear here, showcasing real-world applications and insights from the community. Check back soon for updates!

Share your threads to help others

Popular Models10

deepseek-chat-v3-0324
deepseek
DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) model and performs really well on a variety of tasks.
Input:$0.25 / 1M tokens
Output:$0.85 / 1M tokens
Context:163840 tokens
text
text
deepseek-r1-0528
deepseek
May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model.
Input:$0.27 / 1M tokens
Output:$0.27 / 1M tokens
Context:163840 tokens
text
text
gpt-4o-mini
openai
GPT-4o mini (o for omni) is a fast, affordable small model for focused tasks. It accepts both text and image inputs, and produces text outputs (including Structured Outputs). It is ideal for fine-tuning, and model outputs from a larger model like GPT-4o can be distilled to GPT-4o-mini to produce similar results at lower cost and latency.The knowledge cutoff for GPT-4o-mini models is October, 2023.
Input:$0.15 / 1M tokens
Output:$0.6 / 1M tokens
Context:128K tokens
tools
text
image
text
qwen3-235b-a22b-2507
qwen
Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench.
Input:$0.12 / 1M tokens
Output:$0.12 / 1M tokens
Context:262144 tokens
text
text
deepseek-r1
deepseek
DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model & [technical report](https://api-docs.deepseek.com/news/news250120). MIT licensed: Distill & commercialize freely!
Input:$0.4 / 1M tokens
Output:$2 / 1M tokens
Context:163840 tokens
text
text
gemini-2.0-flash
gemini
Google's most capable multi-modal model with great performance across all tasks, with a 1 million token context window, and built for the era of Agents.
Input:$0.1 / 1M tokens
Output:$0.4 / 1M tokens
Context:1M tokens
tools
text
image
audio
video
text
mai-ds-r1
microsoft
MAI-DS-R1 is a post-trained variant of DeepSeek-R1 developed by the Microsoft AI team to improve the model’s responsiveness on previously blocked topics while enhancing its safety profile. Built on top of DeepSeek-R1’s reasoning foundation, it integrates 110k examples from the Tulu-3 SFT dataset and 350k internally curated multilingual safety-alignment samples. The model retains strong reasoning, coding, and problem-solving capabilities, while unblocking a wide range of prompts previously restricted in R1. MAI-DS-R1 demonstrates improved performance on harm mitigation benchmarks and maintains competitive results across general reasoning tasks. It surpasses R1-1776 in satisfaction metrics for blocked queries and reduces leakage in harmful content categories. The model is based on a transformer MoE architecture and is suitable for general-purpose use cases, excluding high-stakes domains such as legal, medical, or autonomous systems.
Input:$0.3 / 1M tokens
Output:$0.3 / 1M tokens
Context:163840 tokens
text
text
grok-4
x-ai
Grok 4 is xAI's latest reasoning model with a 256k context window. It supports parallel tool calling, structured outputs, and both image and text inputs. Note that reasoning is not exposed, reasoning cannot be disabled, and the reasoning effort cannot be specified. Pricing increases once the total tokens in a given request is greater than 128k tokens. See more details on the [xAI docs](https://docs.x.ai/docs/models/grok-4-0709)
Input:$3 / 1M tokens
Output:$15 / 1M tokens
Context:256K tokens
tools
text
image
text
claude-3.7-sonnet
anthropic
Intelligent model, with visible step‑by‑step reasoning
Input:$3 / 1M tokens
Output:$15 / 1M tokens
Context:200K tokens
tools
text
text
image
deepseek-chat
deepseek
DeepSeek-Chat is an advanced conversational AI model designed to provide intelligent
Input:$0.14 / 1M tokens
Output:$0.28 / 1M tokens
Context:64K tokens
tools
text
text