phi-4-multimodal-instruct by deepinfra - AI Model Details, Pricing, and Performance Metrics

microsoft
phi-4-multimodal-instruct
microsoft

phi-4-multimodal-instruct

completions
bydeepinfra

Phi-4 Multimodal Instruct is a versatile 5.6B parameter foundation model that combines advanced reasoning and instruction-following capabilities across both text and visual inputs, providing accurate text outputs. The unified architecture enables efficient, low-latency inference, suitable for edge and mobile deployments. Phi-4 Multimodal Instruct supports text inputs in multiple languages including Arabic, Chinese, English, French, German, Japanese, Spanish, and more, with visual input optimized primarily for English. It delivers impressive performance on multimodal tasks involving mathematical, scientific, and document reasoning, providing developers and enterprises a powerful yet compact model for sophisticated interactive applications. For more information, see the [Phi-4 Multimodal blog post](https://azure.microsoft.com/en-us/blog/empowering-innovation-the-next-generation-of-the-phi-family/).

Released
Feb 1, 2025
Knowledge
Jun 1, 2024
License
MIT
Context
131072
Input
$0.05 / 1M tokens
Output
$0.1 / 1M tokens
Accepts: text, image
Returns: text

Access phi-4-multimodal-instruct through LangDB AI Gateway

Recommended

Integrate with microsoft's phi-4-multimodal-instruct and 250+ other models through a unified API. Monitor usage, control costs, and enhance security.

Unified API
Cost Optimization
Enterprise Security
Get Started Now

Free tier available • No credit card required

Instant Setup
99.9% Uptime
10,000+Monthly Requests

Category Scores

Benchmark Tests

View Other Benchmarks
AIME
9.3
Mathematics
AA Coding Index
12.1
Programming
AAII
12.4
General
GPQA
31.5
STEM (Physics, Chemistry, Biology)
HLE
4.4
General Knowledge
LiveCodeBench
13.1
Programming
MATH-500
69.3
Mathematics
MMLU-Pro
48.5
General Knowledge
MMMU
55.1
General Knowledge
SciCode
11.0
Scientific

Code Examples

Integration samples and API usage