docmcp
A system for crawling, processing, and querying documentation with AI-powered embedding generation and semantic search capabilities.
Clone the Repository:
git clone https://github.com/visheshd/docmcp.git cd docmcp
Configure Environment:
cp .env.example .env
.env
file:
DATABASE_URL
to postgresql://postgres:postgres@localhost:5433/docmcp
AWS_REGION
to your AWS region (e.g., us-east-1
)AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
with your AWS credentialsLOG_LEVEL
if neededStart the Development Environment:
# Make the script executable chmod +x dev-start.sh # Start the development environment ./dev-start.sh
This script will:
Add Documentation:
Use the add-docs
script to crawl and process documentation:
# Basic usage npm run add-docs -- --url https://example.com/docs --max-depth 3 # With additional options npm run add-docs -- \ --url https://example.com/docs \ --max-depth 3 \ --tags react,frontend \ --package react \ --version 18.0.0 \ --wait
Available options:
--url
: Documentation URL to crawl (required)--max-depth
: Maximum crawl depth (default: 3)--tags
: Comma-separated tags for categorization--package
: Package name this documentation is for--version
: Package version (defaults to "latest")--wait
: Wait for processing to complete--verbose
: Enable detailed loggingnpm run add-docs -- --help
for all optionsQuery Documentation: Once documentation is added, you can query it using the MCP tools. See the "Querying Documentation" section below.
Stop the Development Environment:
docker-compose -f docker-compose.dev.yml down
This setup provides a lightweight development environment with just the required PostgreSQL database and pre-loaded seed data. For production deployments or if you prefer a fully containerized setup, see the "Production Docker Setup" section below.
To use DocMCP with Cursor IDE, you'll need to configure the MCP transport. Add the following configuration to your Cursor settings:
{ "docmcp-local-stdio": { "transport": "stdio", "command": "node", "args": [ "/dist/stdio-server.js" ], "clientInfo": { "name": "cursor-client", "version": "1.0.0" } } }
Replace `` with the absolute path to your DocMCP installation directory.
For example, if DocMCP is installed in /home/user/projects/docmcp
, your configuration would be:
"args": ["/home/user/projects/docmcp/dist/stdio-server.js"]
After adding this configuration, restart Cursor for the changes to take effect.
The system consists of several core services:
The DocMCP system processes documentation through the following pipeline:
Documentation Input
add_documentation
MCP toolWeb Crawling (CrawlerService)
Document Processing (DocumentProcessorService)
Chunking & Embedding (ChunkService)
Job Finalization (JobService)
Querying & Retrieval
query_documentation
MCP toolThis pipeline enables efficient storage, processing, and retrieval of documentation with semantic understanding capabilities. All steps are tracked through the job system, allowing detailed progress monitoring and error handling.
docmcp/
├── prisma/ # Database schema and migrations
│ └── schema.prisma # Prisma model definitions and database configuration
├── src/
│ ├── config/ # Application configuration
│ │ └── database.ts # Database connection setup
│ ├── generated/ # Generated code (Prisma client)
│ ├── services/ # Core service modules
│ │ ├── crawler.service.ts # Website crawling functionality
│ │ ├── document.service.ts # Document management
│ │ ├── document-processor.service.ts # Document processing and transformation
│ │ ├── job.service.ts # Async job management
│ │ ├── chunk.service.ts # Document chunking and vector operations
│ │ └── mcp-tools/ # MCP integration tools
│ │ ├── add-documentation.tool.ts # Tool for adding new documentation
│ │ ├── get-job-status.tool.ts # Tool for checking job status
│ │ ├── list-documentation.tool.ts # Tool for listing available documentation
│ │ ├── query-documentation.tool.ts # Tool for querying documentation
│ │ ├── sample.tool.ts # Example tool implementation
│ │ └── index.ts # Tool registry and exports
│ ├── types/ # TypeScript type definitions
│ │ └── mcp.ts # MCP tool interface definitions
│ ├── utils/ # Utility functions
│ │ ├── logger.ts # Logging utilities
│ │ └── prisma-filters.ts # Reusable Prisma filtering patterns
│ └── __tests__/ # Test files
│ └── utils/ # Test utilities
│ └── testDb.ts # Test database setup and teardown
├── .env # Environment variables
└── package.json # Project dependencies and scripts
Discover shared experiences
Shared threads will appear here, showcasing real-world applications and insights from the community. Check back soon for updates!