qwen3-vl-8b-instruct by openrouter - AI Model Details, Pricing, and Performance Metrics

qwen
qwen3-vl-8b-instruct
Try
qwen

qwen3-vl-8b-instruct

completions
byopenrouter

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon temporal reasoning, DeepStack for fine-grained visual-text alignment, and text-timestamp alignment for precise event localization. The model supports a native 256K-token context window, extensible to 1M tokens, and handles both static and dynamic media inputs for tasks like document parsing, visual question answering, spatial reasoning, and GUI control. It achieves text understanding comparable to leading LLMs while expanding OCR coverage to 32 languages and enhancing robustness under varied visual conditions.

Released
Oct 14, 2025
Knowledge
Apr 17, 2025
Context
131072
Input
$0.14 / 1M tokens
Output
$0.63 / 1M tokens
Capabilities: tools
Accepts: text, image
Returns: text

Access qwen3-vl-8b-instruct through LangDB AI Gateway

Recommended

Integrate with qwen's qwen3-vl-8b-instruct and 250+ other models through a unified API. Monitor usage, control costs, and enhance security.

Unified API
Cost Optimization
Enterprise Security
Get Started Now

Free tier available • No credit card required

Instant Setup
99.9% Uptime
10,000+Monthly Requests

Category Scores

Benchmark Tests

View Other Benchmarks
AA Coding Index
17.6
Programming
AAII
27.1
General
AA Math Index
27.3
Mathematics
GPQA
42.7
STEM (Physics, Chemistry, Biology)
HLE
2.9
General Knowledge
LiveCodeBench
33.2
Programming
MMLU-Pro
68.6
General Knowledge
SciCode
17.4
Scientific

Code Examples

Integration samples and API usage