qwerky-72b:free

completions

Qwerky-72B is a linear-attention RWKV variant of the Qwen 2.5 72B model, optimized to significantly reduce computational cost at scale. Leveraging linear attention, it achieves substantial inference speedups (>1000x) while retaining competitive accuracy on common benchmarks like ARC, HellaSwag, Lambada, and MMLU. It inherits knowledge and language support from Qwen 2.5, supporting approximately 30 languages, making it suitable for efficient inference in large-context applications.

Input:Free

Output:Free

Context:32768 tokens

text

Access qwerky-72b:free through LangDB AI Gateway

Recommended

Integrate with featherless's qwerky-72b:free and 250+ other models through a unified API. Monitor usage, control costs, and enhance security.

Unified API

Cost Optimization

Enterprise Security

Get Started Now

Free tier available • No credit card required

Instant Setup

99.9% Uptime

10,000+Monthly Requests

Code Example

Configuration

Base URL

API Keys

Headers

Project ID in header

X-Run-Id

X-Thread-Id

Model Parameters

10 available

frequency_penalty

-202

max_tokens

min_p

001

presence_penalty

-201.999

repetition_penalty

012

seed

stop

temperature

012

top_k

top_p

011

Additional Configuration

Tools

Guards

User:

Id:

Name:

Tags:

Publicly Shared Threads0

Discover shared experiences

Shared threads will appear here, showcasing real-world applications and insights from the community. Check back soon for updates!

Share your threads to help others

Popular Models10

gpt-4.1-mini
openai
GPT-4.1 mini provides a balance between intelligence, speed, and cost that makes it an attractive model for many use cases.
Input:$0.4 / 1M tokens
Output:$1.6 / 1M tokens
Context:1047576 tokens
tools
text
image
text
claude-sonnet-4
anthropic
Our high-performance model with exceptional reasoning and efficiency
Input:$3 / 1M tokens
Output:$15 / 1M tokens
Context:200K tokens
tools
text
image
text
claude-opus-4
anthropic
Our most capable and intelligent model yet. Claude Opus 4 sets new standards in complex reasoning and advanced coding
Input:$15 / 1M tokens
Output:$75 / 1M tokens
Context:200K tokens
tools
text
image
text
gpt-4.1
openai
GPT-4.1 is OpenAI's flagship model for complex tasks. It is well suited for problem solving across domains.
Input:$2 / 1M tokens
Output:$8 / 1M tokens
Context:1047576 tokens
tools
text
image
text
gemini-2.5-pro-preview
gemini
Gemini 2.5 Pro Experimental is Google's state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context.
Input:$1.25 / 1M tokens
Output:$10 / 1M tokens
Context:1M tokens
tools
text
image
audio
video
text
grok-4
xai
Grok 4 is the latest and greatest flagship model, offering unparalleled performance in natural language, math and reasoning - the perfect jack of all trades.
Input:$3 / 1M tokens
Output:$15 / 1M tokens
Context:256K tokens
tools
text
text
gemini-2.5-flash-preview
gemini
Google's best model in terms of price-performance, offering well-rounded capabilities. Gemini 2.5 Flash rate limits are more restricted since it is an experimental / preview model.
Input:$0.15 / 1M tokens
Output:$0.6 / 1M tokens
Context:1M tokens
tools
text
image
audio
video
text
gemini-2.0-flash
gemini
Google's most capable multi-modal model with great performance across all tasks, with a 1 million token context window, and built for the era of Agents.
Input:$0.1 / 1M tokens
Output:$0.4 / 1M tokens
Context:1M tokens
tools
text
image
audio
video
text
claude-3.7-sonnet
anthropic
Intelligent model, with visible step‑by‑step reasoning
Input:$3 / 1M tokens
Output:$15 / 1M tokens
Context:200K tokens
tools
text
text
image
gemini-2.0-flash-lite
gemini
Google's smallest and most cost effective model, built for at scale usage.
Input:$0.07 / 1M tokens
Output:$0.3 / 1M tokens
Context:1M tokens
text
image
audio
video
text