Related MCP Server Resources

Explore more AI models, providers, and integration options:

  • Explore AI Models
  • Explore AI Providers
  • Explore MCP Servers
  • LangDB Pricing
  • Documentation
  • AI Industry Blog
  • AWS Knowledge Base Retrieval MCP Server
  • Remote MCP Server
  • MCP SSH Server
  • ERPNext MCP Server
  • Powertools MCP Search Server
Back to MCP Servers
Crawl4AI MCP Server

Crawl4AI MCP Server

Public
BjornMelin/crawl4ai-mcp-server

High-performance Model Context Protocol server enabling AI assistants to perform efficient web scraping, crawling, deep research, and structured data extraction with secure OAuth authentication and seamless integration.

typescript
0 tools
May 29, 2025
Updated Jun 4, 2025

Supercharge Your AI with Crawl4AI MCP Server

MCP Server

Unlock the full potential of Crawl4AI MCP Server through LangDB's AI Gateway. Get enterprise-grade security, analytics, and seamless integration with zero configuration.

Unified API Access
Complete Tracing
Instant Setup
Get Started Now

Free tier available β€’ No credit card required

Instant Setup
99.9% Uptime
10,000+Monthly Requests

⚠️ NOTICE

MCP SERVER CURRENTLY UNDER DEVELOPMENT
NOT READY FOR PRODUCTION USE
WILL UPDATE WHEN OPERATIONAL

Crawl4AI MCP Server

πŸš€ High-performance MCP Server for Crawl4AI - Enable AI assistants to access web scraping, crawling, and deep research via Model Context Protocol. Faster and more efficient than FireCrawl!

Overview

This project implements a custom Model Context Protocol (MCP) Server that integrates with Crawl4AI, an open-source web scraping and crawling library. The server is deployed as a remote MCP server on CloudFlare Workers, allowing AI assistants like Claude to access Crawl4AI's powerful web scraping capabilities.

Documentation

For comprehensive details about this project, please refer to the following documentation:

  • Migration Plan - Detailed plan for migrating from Firecrawl to Crawl4AI
  • Enhanced Architecture - Multi-tenant architecture with cloud provider flexibility
  • Implementation Guide - Technical implementation details and code examples
  • Codebase Simplification - Details on code simplification and best practices implemented
  • Docker Setup Guide - Instructions for Docker setup for local development and production

Features

Web Data Acquisition

  • 🌐 Single Webpage Scraping: Extract content from individual webpages
  • πŸ•ΈοΈ Web Crawling: Crawl websites with configurable depth and page limits
  • πŸ—ΊοΈ URL Discovery: Map and discover URLs from a starting point
  • πŸ•ΈοΈ Asynchronous Crawling: Crawl entire websites efficiently

Content Processing

  • πŸ” Deep Research: Conduct comprehensive research across multiple pages
  • πŸ“Š Structured Data Extraction: Extract specific data using CSS selectors or LLM-based extraction
  • πŸ”Ž Content Search: Search through previously crawled content

Integration & Security

  • πŸ”„ MCP Integration: Seamless integration with MCP clients (Claude Desktop, etc.)
  • πŸ”’ OAuth Authentication: Secure access with proper authorization
  • πŸ”’ Authentication Options: Secure access via OAuth or API key (Bearer token)
  • ⚑ High Performance: Optimized for speed and efficiency

Project Structure

crawl4ai-mcp/ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ index.ts # Main entry point with OAuth provider setup β”‚ β”œβ”€β”€ auth-handler.ts # Authentication handler β”‚ β”œβ”€β”€ mcp-server.ts # MCP server implementation β”‚ β”œβ”€β”€ crawl4ai-adapter.ts # Adapter for Crawl4AI API β”‚ β”œβ”€β”€ tool-schemas/ # MCP tool schema definitions β”‚ β”‚ └── [...].ts # Tool schemas β”‚ β”œβ”€β”€ handlers/ β”‚ β”‚ β”œβ”€β”€ crawl.ts # Web crawling implementation β”‚ β”‚ β”œβ”€β”€ search.ts # Search functionality β”‚ β”‚ └── extract.ts # Content extraction β”‚ └── utils/ # Utility functions β”œβ”€β”€ tests/ # Test cases β”œβ”€β”€ .github/ # GitHub configuration β”œβ”€β”€ wrangler.toml # CloudFlare Workers configuration β”œβ”€β”€ tsconfig.json # TypeScript configuration β”œβ”€β”€ package.json # Node.js dependencies └── README.md # Project documentation

Getting Started

Prerequisites

  • Node.js (v18 or higher)
  • npm
  • Wrangler (CloudFlare Workers CLI)
  • A CloudFlare account

Installation

  1. Clone the repository:

    git clone https://github.com/BjornMelin/crawl4ai-mcp-server.git cd crawl4ai-mcp-server
  2. Install dependencies:

    npm install
  3. Set up CloudFlare KV namespace:

    wrangler kv:namespace create CRAWL_DATA
  4. Update wrangler.toml with the KV namespace ID:

    kv_namespaces = [ { binding = "CRAWL_DATA", id = "your-namespace-id" } ]

Development

Local Development

Using NPM

  1. Start the development server:

    npm run dev
  2. The server will be available at

Using Docker

You can also use Docker for local development, which includes the Crawl4AI API and a debug UI:

  1. Set up environment variables:

    cp .env.example .env # Edit .env file with your API key
  2. Start the Docker development environment:

    docker-compose up -d
  3. Access the services:

    • MCP Server:
    • Crawl4AI UI:

See the Docker Setup Guide for more details.

Testing

The project includes a comprehensive test suite using Jest. To run tests:

# Run all tests npm test # Run tests with watch mode during development npm run test:watch # Run tests with coverage report npm run test:coverage # Run only unit tests npm run test:unit # Run only integration tests npm run test:integration

When running in Docker:

docker-compose exec mcp-server npm test

Deployment

  1. Deploy to CloudFlare Workers:

    npm run deploy
  2. Your server will be available at the CloudFlare Workers URL assigned to your deployed worker.

Usage with MCP Clients

This server implements the Model Context Protocol, allowing AI assistants to access its tools.

Authentication

  • Implement OAuth authentication with workers-oauth-provider
  • Add API key authentication using Bearer tokens
  • Create login page and token management

Connecting to an MCP Client

  1. Use the CloudFlare Workers URL assigned to your deployed worker
  2. In Claude Desktop or other MCP clients, add this server as a tool source

Available Tools

  • crawl: Crawl web pages from a starting URL
  • getCrawl: Retrieve crawl data by ID
  • listCrawls: List all crawls or filter by domain
  • search: Search indexed documents by query
  • extract: Extract structured content from a URL

Configuration

The server can be configured by modifying environment variables in wrangler.toml:

  • MAX_CRAWL_DEPTH: Maximum depth for web crawling (default: 3)
  • MAX_CRAWL_PAGES: Maximum pages to crawl (default: 100)
  • API_VERSION: API version string (default: "v1")
  • OAUTH_CLIENT_ID: OAuth client ID for authentication
  • OAUTH_CLIENT_SECRET: OAuth client secret for authentication

Roadmap

The project is being developed with these components in mind:

  1. Project Setup and Configuration: CloudFlare Worker setup, TypeScript configuration
  2. MCP Server and Tool Schemas: Implementation of MCP server with tool definitions
  3. Crawl4AI Adapter: Integration with the Crawl4AI functionality
  4. OAuth Authentication: Secure authentication implementation
  5. Performance Optimizations: Enhancing speed and reliability
  6. Advanced Extraction Features: Improving structured data extraction capabilities

Contributing

Contributions are welcome! Please check the open issues or create a new one before starting work on a feature or bug fix. See Contributing Guidelines for detailed guidelines.

Support

If you encounter issues or have questions:

  • Open an issue on the GitHub repository
  • Check the Crawl4AI documentation
  • Refer to the Model Context Protocol specification

How to Cite

If you use Crawl4AI MCP Server in your research or projects, please cite it using the following BibTeX entry:

@software{crawl4ai_mcp_2025, author = {Melin, Bjorn}, title = {Crawl4AI MCP Server: High-performance Web Crawling for AI Assistants}, url = {https://github.com/BjornMelin/crawl4ai-mcp-server}, version = {1.0.0}, year = {2025}, month = {5} }

License

MIT

Publicly Shared Threads0

Discover shared experiences

Shared threads will appear here, showcasing real-world applications and insights from the community. Check back soon for updates!

Share your threads to help others
Related MCPs5
  • AWS Knowledge Base Retrieval MCP Server
    AWS Knowledge Base Retrieval MCP Server

    Retrieval-Augmented Generation (RAG) server enabling efficient extraction of contextual information ...

    Added May 30, 2025
  • Remote MCP Server
    Remote MCP Server

    Remote Model Context Protocol server with Cloudflare Workers and Xano integration offering tool mana...

    Added May 30, 2025
  • MCP SSH Server
    MCP SSH Server

    Secure Model Context Protocol (MCP) SSH server enabling remote command execution, file and directory...

    Added May 30, 2025
  • ERPNext MCP Server
    ERPNext MCP Server

    Model Context Protocol server enabling seamless integration with ERPNext via API, offering authentic...

    Added May 30, 2025
  • Powertools MCP Search Server
    Powertools MCP Search Server

    Model Context Protocol server enabling efficient local search of AWS Lambda Powertools documentation...

    2 tools
    Added May 30, 2025