Voice Recognition MCP Service
This service provides voice recognition and text extraction capabilities through both stdio and MCP modes.
Features
- Voice recognition from file
- Voice recognition from base64 encoded data
- Text extraction
- Support for both stdio and MCP modes
- Structured voice recognition results
- AIO protocol compliant responses
Project Structure
voice_service.py - Core service implementation
stdio_server.py - stdio mode entry point
mcp_server.py - MCP mode entry point
build.py - Build script for executables
build_exec.sh - Build execution script
test_*.sh - Test scripts for different functionalities
Installation
- Clone the repository:
git clone https://github.com/AIO-2030/mcp_voice_identify.git
cd mcp_voice_identify
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables in
.env:
API_URL=your_api_url
API_KEY=your_api_key
Usage
stdio Mode
- Run the service:
- Send JSON-RPC requests via stdin:
{
"jsonrpc": "2.0",
"method": "help",
"params": {},
"id": 1
}
- Or use the executable:
MCP Mode
- Run the service:
- Or use the executable:
Response Format
The service follows the AIO protocol for response formatting. Here are examples of different response types:
Voice Recognition Response
{
"jsonrpc": "2.0",
"output": {
"type": "voice",
"message": "Voice processed successfully",
"text": "test test test",
"metadata": {
"language": "en",
"emotion": "unknown",
"audio_type": "speech",
"speaker": "woitn",
"raw_text": "test test test"
}
},
"id": 1
}
Help Information Response
{
"jsonrpc": "2.0",
"result": {
"type": "voice_service",
"description": "This service provides voice recognition and text extraction services",
"author": "AIO-2030",
"version": "1.0.0",
"github": "https://github.com/AIO-2030/mcp_voice_identify",
"transport": ["stdio"],
"methods": [
{
"name": "help",
"description": "Show this help information."
},
{
"name": "identify_voice",
"description": "Identify voice from file",
"inputSchema": {
"type": "object",
"properties": {
"file_path": {
"type": "string",
"description": "Voice file path"
}
},
"required": ["file_path"]
}
},
{
"name": "identify_voice_base64",
"description": "Identify voice from base64 encoded data",
"inputSchema": {
"type": "object",
"properties": {
"base64_data": {
"type": "string",
"description": "Base64 encoded voice data"
}
},
"required": ["base64_data"]
}
},
{
"name": "extract_text",
"description": "Extract text",
"inputSchema": {
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "Text to extract"
}
},
"required": ["text"]
}
}
]
},
"id": 1
}
Error Response
{
"jsonrpc": "2.0",
"output": {
"type": "error",
"message": "503 Server Error: Service Unavailable",
"error_code": 503
},
"id": 1
}
Response Fields
The service provides three types of responses:
-
Voice Recognition Response (using output field):
| Field | Description | Example Value |
|-----------|--------------------------------------|---------------|
| type | Response type | "voice" |
| message | Status message | "Voice processed successfully" |
| text | Recognized text content | "test test test" |
| metadata | Additional information | See below |
-
Help Information Response (using result field):
| Field | Description | Example Value |
|---------------|--------------------------------------|---------------|
| type | Service type | "voice_service" |
| description | Service description | "This service provides..." |
| author | Service author | "AIO-2030" |
| version | Service version | "1.0.0" |
| github | GitHub repository URL | "https://github.com/..." |
| transport | Supported transport modes | ["stdio"] |
| methods | Available methods | See methods list |
-
Error Response (using output field):
| Field | Description | Example Value |
|-------------|--------------------------------------|---------------|
| type | Response type | "error" |
| message | Error message | "503 Server Error: Service Unavailable" |
| error_code | HTTP status code | 503 |
Metadata Fields
The metadata field in voice recognition responses contains:
| Field | Description | Example Value |
|---|
| language | Language code | "en" |
| emotion | Emotion state | "unknown" |
| audio_type | Audio type | "speech" |
| speaker | Speaker identifier | "woitn" |
| raw_text | Original recognized text | "test test test" |
Building Executables
- Make the build script executable:
- Build stdio mode executable:
- Build MCP mode executable:
The executables will be created at:
- stdio mode:
dist/voice_stdio
- MCP mode:
dist/voice_mcp
Testing
Run the test scripts:
chmod +x test_*.sh
./test_help.sh
./test_voice_file.sh
./test_voice_base64.sh
License
This project is licensed under the MIT License - see the LICENSE file for details.