High-performance Model Context Protocol server enabling AI assistants to perform efficient web scraping, crawling, deep research, and structured data extraction with secure OAuth authentication and seamless integration.
Unlock the full potential of Crawl4AI MCP Server through LangDB's AI Gateway. Get enterprise-grade security, analytics, and seamless integration with zero configuration.
Free tier available β’ No credit card required
MCP SERVER CURRENTLY UNDER DEVELOPMENT
NOT READY FOR PRODUCTION USE
WILL UPDATE WHEN OPERATIONAL
π High-performance MCP Server for Crawl4AI - Enable AI assistants to access web scraping, crawling, and deep research via Model Context Protocol. Faster and more efficient than FireCrawl!
This project implements a custom Model Context Protocol (MCP) Server that integrates with Crawl4AI, an open-source web scraping and crawling library. The server is deployed as a remote MCP server on CloudFlare Workers, allowing AI assistants like Claude to access Crawl4AI's powerful web scraping capabilities.
For comprehensive details about this project, please refer to the following documentation:
crawl4ai-mcp/ βββ src/ β βββ index.ts # Main entry point with OAuth provider setup β βββ auth-handler.ts # Authentication handler β βββ mcp-server.ts # MCP server implementation β βββ crawl4ai-adapter.ts # Adapter for Crawl4AI API β βββ tool-schemas/ # MCP tool schema definitions β β βββ [...].ts # Tool schemas β βββ handlers/ β β βββ crawl.ts # Web crawling implementation β β βββ search.ts # Search functionality β β βββ extract.ts # Content extraction β βββ utils/ # Utility functions βββ tests/ # Test cases βββ .github/ # GitHub configuration βββ wrangler.toml # CloudFlare Workers configuration βββ tsconfig.json # TypeScript configuration βββ package.json # Node.js dependencies βββ README.md # Project documentation
Clone the repository:
git clone https://github.com/BjornMelin/crawl4ai-mcp-server.git cd crawl4ai-mcp-server
Install dependencies:
npm install
Set up CloudFlare KV namespace:
wrangler kv:namespace create CRAWL_DATA
Update wrangler.toml
with the KV namespace ID:
kv_namespaces = [ { binding = "CRAWL_DATA", id = "your-namespace-id" } ]
Start the development server:
npm run dev
The server will be available at
You can also use Docker for local development, which includes the Crawl4AI API and a debug UI:
Set up environment variables:
cp .env.example .env # Edit .env file with your API key
Start the Docker development environment:
docker-compose up -d
Access the services:
See the Docker Setup Guide for more details.
The project includes a comprehensive test suite using Jest. To run tests:
# Run all tests npm test # Run tests with watch mode during development npm run test:watch # Run tests with coverage report npm run test:coverage # Run only unit tests npm run test:unit # Run only integration tests npm run test:integration
When running in Docker:
docker-compose exec mcp-server npm test
Deploy to CloudFlare Workers:
npm run deploy
Your server will be available at the CloudFlare Workers URL assigned to your deployed worker.
This server implements the Model Context Protocol, allowing AI assistants to access its tools.
crawl
: Crawl web pages from a starting URLgetCrawl
: Retrieve crawl data by IDlistCrawls
: List all crawls or filter by domainsearch
: Search indexed documents by queryextract
: Extract structured content from a URLThe server can be configured by modifying environment variables in wrangler.toml
:
MAX_CRAWL_DEPTH
: Maximum depth for web crawling (default: 3)MAX_CRAWL_PAGES
: Maximum pages to crawl (default: 100)API_VERSION
: API version string (default: "v1")OAUTH_CLIENT_ID
: OAuth client ID for authenticationOAUTH_CLIENT_SECRET
: OAuth client secret for authenticationThe project is being developed with these components in mind:
Contributions are welcome! Please check the open issues or create a new one before starting work on a feature or bug fix. See Contributing Guidelines for detailed guidelines.
If you encounter issues or have questions:
If you use Crawl4AI MCP Server in your research or projects, please cite it using the following BibTeX entry:
@software{crawl4ai_mcp_2025, author = {Melin, Bjorn}, title = {Crawl4AI MCP Server: High-performance Web Crawling for AI Assistants}, url = {https://github.com/BjornMelin/crawl4ai-mcp-server}, version = {1.0.0}, year = {2025}, month = {5} }
MIT
Discover shared experiences
Shared threads will appear here, showcasing real-world applications and insights from the community. Check back soon for updates!