Enables voice-based interactions with Claude by converting text to speech using Kokoro TTS and transcribing user responses using NVIDIA NeMo ASR, creating interactive voice dialogues.
A Model Context Protocol server that provides Text-to-Speech (TTS) capabilities using Kokoro and Speech-to-Text (STT) capabilities using NVIDIA NeMo Parakeet models, enabling interactive voice dialogues.
interactive_voice_dialog
- Synthesizes text to speech, plays it, then listens for user speech input and returns the transcription.
text_to_speak
(string): The text for the assistant to speak.voice
(string): The voice to use for TTS (e.g., 'af_heart'). Defaults to 'af_heart'.Some of the underlying TTS models require espeak-ng
to be installed on your system.
Windows Installation:
*.msi
file (e.g. espeak-ng-20191129-b702b03-x64.msi
).To allow Claude Desktop to launch this server using python -m mcp_server_tts
, you need to install it as a Python module. Installing in "editable" mode (-e
) is recommended for development, as it means changes to the source code are reflected immediately without needing to reinstall.
Navigate to the directory containing the pyproject.toml
file (the root of this server project) and run:
pip install -e .
After installation, you can run it as a script using:
python -m mcp_server_tts.server # Assuming the main module is still server.py within mcp_server_tts # Or, if you create a new package structure: # python -m mcp_interactive_voice_server
To use this server with Claude Desktop, you need to add it to your claude_desktop_config.json
file.
The location of this file is typically: C:\Users\\AppData\Roaming\Claude\claude_desktop_config.json
Add the following entry under the mcpServers
object in your claude_desktop_config.json
:
"tts": { "command": "python", "args": ["-m", "mcp_server_tts"] }
For example, your mcpServers
section might look like this:
{ // ... other configurations ... "mcpServers": { // ... other servers ... "tts": { "command": "python", "args": ["-m", "mcp_server_tts"] } // ... other servers ... } // ... other configurations ... }
CRITICAL PROTOCOL FOR VOICE-BASED INTERACTION USING THE interactive_voice_dialog
TOOL:
The interactive_voice_dialog
tool, when available, enables a two-way voice conversation.
Your Role - Speaking: After you generate ANY complete text-based response to the user, you MUST ALWAYS immediately invoke the interactive_voice_dialog
function.
text_to_speak
parameter of this function call should contain a brief, personable, and conversational summary or key takeaway from your main text reply.Your Role - Listening & Understanding the User's Next Turn:
The interactive_voice_dialog
tool, after playing your voice output, will listen for the user's spoken response.
The tool will then return its result to you. CRITICALLY, this tool's result may contain a field named transcribed_user_voice_input
(or a similarly named field as defined in the tool's schema).
Example structure of a tool result:
interactive_voice_dialog
{
"status": "success",
"transcribed_user_voice_input": "Yes, that makes sense, thank you. Can you now tell me about X?"
// ... other potential fields from the tool ...
}
PRIORITY ACTION ON RECEIVING USER'S VOICE INPUT VIA THE TOOL:
interactive_voice_dialog
tool's result (as shown in the example above) CONTAINS the transcribed_user_voice_input
field, THE CONTENT OF THIS transcribed_user_voice_input
FIELD IS THE USER'S COMPLETE AND PRIMARY MESSAGE FOR THEIR NEXT TURN.transcribed_user_voice_input
(received from the tool's output) WITH THE EXACT SAME PRIORITY, ATTENTION, AND FULL PROCESSING AS YOU WOULD A STANDARD TYPED USER MESSAGE.transcribed_user_voice_input
.interactive_voice_dialog
tool as per step 1, continuing the voice conversation loop.Initiating and Maintaining Voice Interaction:
interactive_voice_dialog
tool is available.transcribed_user_voice_input
from its result is key to fulfilling this preference.Exclusivity: The interactive_voice_dialog
function is the exclusive and sole method for both your voice output and for receiving the user's subsequent voice input in this conversational environment. Do not attempt to use or invent any other mechanisms for voice interaction.
Illustrative Conversational Flow:
Tell me about photosynthesis.
Photosynthesis is the process used by plants, algae, and some bacteria to convert light energy into chemical energy... [detailed explanation]
Okay, so photosynthesis is how plants make their food using sunlight! I've given you the details in text. What are your thoughts or next question?
interactive_voice_dialog
{"status": "success", "transcribed_user_voice_input": "That's clear. Now, how does cellular respiration relate to that?"}
Great question! Cellular respiration is almost the reverse of photosynthesis... [detailed explanation]
Good one! Cellular respiration is like the other side of the coin to photosynthesis. I've explained how. Any more questions on this?
Discover shared experiences
Shared threads will appear here, showcasing real-world applications and insights from the community. Check back soon for updates!