Provides voice recognition and text extraction capabilities with support for both stdio and MCP modes, processing audio files or base64 encoded data and returning structured results with language, emotion, and speaker information.
This service provides voice recognition and text extraction capabilities through both stdio and MCP modes.
voice_service.py
- Core service implementationstdio_server.py
- stdio mode entry pointmcp_server.py
- MCP mode entry pointbuild.py
- Build script for executablesbuild_exec.sh
- Build execution scripttest_*.sh
- Test scripts for different functionalitiesgit clone https://github.com/AIO-2030/mcp_voice_identify.git cd mcp_voice_identify
pip install -r requirements.txt
.env
:API_URL=your_api_url
API_KEY=your_api_key
python stdio_server.py
{ "jsonrpc": "2.0", "method": "help", "params": {}, "id": 1 }
./dist/voice_stdio
python mcp_server.py
./dist/voice_mcp
The service follows the AIO protocol for response formatting. Here are examples of different response types:
{ "jsonrpc": "2.0", "output": { "type": "voice", "message": "Voice processed successfully", "text": "test test test", "metadata": { "language": "en", "emotion": "unknown", "audio_type": "speech", "speaker": "woitn", "raw_text": "test test test" } }, "id": 1 }
{ "jsonrpc": "2.0", "result": { "type": "voice_service", "description": "This service provides voice recognition and text extraction services", "author": "AIO-2030", "version": "1.0.0", "github": "https://github.com/AIO-2030/mcp_voice_identify", "transport": ["stdio"], "methods": [ { "name": "help", "description": "Show this help information." }, { "name": "identify_voice", "description": "Identify voice from file", "inputSchema": { "type": "object", "properties": { "file_path": { "type": "string", "description": "Voice file path" } }, "required": ["file_path"] } }, { "name": "identify_voice_base64", "description": "Identify voice from base64 encoded data", "inputSchema": { "type": "object", "properties": { "base64_data": { "type": "string", "description": "Base64 encoded voice data" } }, "required": ["base64_data"] } }, { "name": "extract_text", "description": "Extract text", "inputSchema": { "type": "object", "properties": { "text": { "type": "string", "description": "Text to extract" } }, "required": ["text"] } } ] }, "id": 1 }
{ "jsonrpc": "2.0", "output": { "type": "error", "message": "503 Server Error: Service Unavailable", "error_code": 503 }, "id": 1 }
The service provides three types of responses:
Voice Recognition Response (using output
field):
| Field | Description | Example Value |
|-----------|--------------------------------------|---------------|
| type | Response type | "voice" |
| message | Status message | "Voice processed successfully" |
| text | Recognized text content | "test test test" |
| metadata | Additional information | See below |
Help Information Response (using result
field):
| Field | Description | Example Value |
|---------------|--------------------------------------|---------------|
| type | Service type | "voice_service" |
| description | Service description | "This service provides..." |
| author | Service author | "AIO-2030" |
| version | Service version | "1.0.0" |
| github | GitHub repository URL | "https://github.com/..." |
| transport | Supported transport modes | ["stdio"] |
| methods | Available methods | See methods list |
Error Response (using output
field):
| Field | Description | Example Value |
|-------------|--------------------------------------|---------------|
| type | Response type | "error" |
| message | Error message | "503 Server Error: Service Unavailable" |
| error_code | HTTP status code | 503 |
The metadata
field in voice recognition responses contains:
Field | Description | Example Value |
---|---|---|
language | Language code | "en" |
emotion | Emotion state | "unknown" |
audio_type | Audio type | "speech" |
speaker | Speaker identifier | "woitn" |
raw_text | Original recognized text | "test test test" |
chmod +x build_exec.sh
./build_exec.sh
./build_exec.sh mcp
The executables will be created at:
dist/voice_stdio
dist/voice_mcp
Run the test scripts:
chmod +x test_*.sh ./test_help.sh ./test_voice_file.sh ./test_voice_base64.sh
This project is licensed under the MIT License - see the LICENSE file for details.
Discover shared experiences
Shared threads will appear here, showcasing real-world applications and insights from the community. Check back soon for updates!