Provides standardized Model Context Protocol access to extract and transcribe text from videos and audio across major platforms using OpenAI's Whisper for high-quality, multi-language speech recognition.
Unlock the full potential of MCP Video Audio Text Extraction Server through LangDB's AI Gateway. Get enterprise-grade security, analytics, and seamless integration with zero configuration.
Free tier available • No credit card required
An MCP server that provides text extraction capabilities from various video platforms and audio files. This server implements the Model Context Protocol (MCP) to provide standardized access to audio transcription services.
This service supports downloading videos and extracting audio from various platforms, including but not limited to:
For a complete list of supported platforms, please visit yt-dlp supported sites.
This project utilizes OpenAI's Whisper model for audio-to-text processing through MCP tools. The server exposes four main tools:
This server is built using the Model Context Protocol, which provides:
Important: On first run, the system will automatically download the Whisper model file (approximately 1GB). This process may take several minutes to tens of minutes, depending on your network conditions. The model file will be cached locally and won't need to be downloaded again for subsequent runs.
When using uv no specific installation is needed. We will use uvx to directly run the video extraction server:
curl -LsSf https://astral.sh/uv/install.sh | sh
FFmpeg is required for audio processing. You can install it through various methods:
# Ubuntu or Debian sudo apt update && sudo apt install ffmpeg # Arch Linux sudo pacman -S ffmpeg # MacOS brew install ffmpeg # Windows (using Chocolatey) choco install ffmpeg # Windows (using Scoop) scoop install ffmpeg
Add to your Claude/Cursor settings:
"mcpServers": { "video-extraction": { "command": "uvx", "args": ["mcp-video-extraction"] } }
The service can be configured through environment variables:
WHISPER_MODEL
: Whisper model size (tiny/base/small/medium/large), default: 'base'WHISPER_LANGUAGE
: Language setting for transcription, default: 'auto'YOUTUBE_FORMAT
: Video format for download, default: 'bestaudio'AUDIO_FORMAT
: Audio format for extraction, default: 'mp3'AUDIO_QUALITY
: Audio quality setting, default: '192'TEMP_DIR
: Temporary file storage location, default: '/tmp/mcp-video'DOWNLOAD_RETRIES
: Number of download retries, default: 10FRAGMENT_RETRIES
: Number of fragment download retries, default: 10SOCKET_TIMEOUT
: Socket timeout in seconds, default: 30GPU Acceleration:
Model Size Adjustment:
Use SSD storage for temporary files to improve I/O performance
This server can be used with any MCP-compatible client, such as:
For more information about MCP, visit Model Context Protocol.
For Chinese version of this documentation, please refer to README_zh.md
MIT
Discover shared experiences
Shared threads will appear here, showcasing real-world applications and insights from the community. Check back soon for updates!