An MCP server that downloads videos/extracts audio from various platforms like YouTube, Bilibili, and TikTok, then transcribes them to text using OpenAI's Whisper model.
An MCP server that provides text extraction capabilities from various video platforms and audio files. This server implements the Model Context Protocol (MCP) to provide standardized access to audio transcription services.
This service supports downloading videos and extracting audio from various platforms, including but not limited to:
For a complete list of supported platforms, please visit yt-dlp supported sites.
This project utilizes OpenAI's Whisper model for audio-to-text processing through MCP tools. The server exposes four main tools:
This server is built using the Model Context Protocol, which provides:
Important: On first run, the system will automatically download the Whisper model file (approximately 1GB). This process may take several minutes to tens of minutes, depending on your network conditions. The model file will be cached locally and won't need to be downloaded again for subsequent runs.
When using uv no specific installation is needed. We will use uvx to directly run the video extraction server:
curl -LsSf https://astral.sh/uv/install.sh | sh
FFmpeg is required for audio processing. You can install it through various methods:
# Ubuntu or Debian sudo apt update && sudo apt install ffmpeg # Arch Linux sudo pacman -S ffmpeg # MacOS brew install ffmpeg # Windows (using Chocolatey) choco install ffmpeg # Windows (using Scoop) scoop install ffmpeg
Add to your Claude/Cursor settings:
"mcpServers": { "video-extraction": { "command": "uvx", "args": ["mcp-video-extraction"] } }
The service can be configured through environment variables:
WHISPER_MODEL
: Whisper model size (tiny/base/small/medium/large), default: 'base'WHISPER_LANGUAGE
: Language setting for transcription, default: 'auto'YOUTUBE_FORMAT
: Video format for download, default: 'bestaudio'AUDIO_FORMAT
: Audio format for extraction, default: 'mp3'AUDIO_QUALITY
: Audio quality setting, default: '192'TEMP_DIR
: Temporary file storage location, default: '/tmp/mcp-video'DOWNLOAD_RETRIES
: Number of download retries, default: 10FRAGMENT_RETRIES
: Number of fragment download retries, default: 10SOCKET_TIMEOUT
: Socket timeout in seconds, default: 30GPU Acceleration:
Model Size Adjustment:
Use SSD storage for temporary files to improve I/O performance
This server can be used with any MCP-compatible client, such as:
For more information about MCP, visit Model Context Protocol.
For Chinese version of this documentation, please refer to README_zh.md
MIT
Discover shared experiences
Shared threads will appear here, showcasing real-world applications and insights from the community. Check back soon for updates!