ui-tars-1.5-7b by parasail - AI Model Details, Pricing, and Performance Metrics

bytedance
ui-tars-1.5-7b
bytedance

ui-tars-1.5-7b

completions
byparasail

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.

Context
128K
Input
$0.1 / 1M tokens
Output
$0.2 / 1M tokens
Accepts: text, image
Returns: text

Access ui-tars-1.5-7b through LangDB AI Gateway

Recommended

Integrate with bytedance's ui-tars-1.5-7b and 250+ other models through a unified API. Monitor usage, control costs, and enhance security.

Unified API
Cost Optimization
Enterprise Security
Get Started Now

Free tier available • No credit card required

Instant Setup
99.9% Uptime
10,000+Monthly Requests

Code Examples

Integration samples and API usage