Related MCP Server Resources

Explore more AI models, providers, and integration options:

  • Explore AI Models
  • Explore AI Providers
  • Explore MCP Servers
  • LangDB Pricing
  • Documentation
  • AI Industry Blog
  • OpenSearch MCP Server
  • mcp-server-asana
  • TxtAi Memory Vector Server
  • GraphRAG MCP Server
  • MCP Server
Back to MCP Servers
doc-lib-mcp

doc-lib-mcp

Public
shifusen329/doc-lib-mcp

Model Context Protocol server enabling document ingestion, chunking, semantic search, and advanced note management with versatile tools for processing markdown, code, HTML, and OpenAPI files to support AI-driven context retrieval and summarization.

python
0 tools
May 30, 2025
Updated Jun 4, 2025

Supercharge Your AI with doc-lib-mcp

MCP Server

Unlock the full potential of doc-lib-mcp through LangDB's AI Gateway. Get enterprise-grade security, analytics, and seamless integration with zero configuration.

Unified API Access
Complete Tracing
Instant Setup
Get Started Now

Free tier available • No credit card required

Instant Setup
99.9% Uptime
10,000+Monthly Requests

doc-lib-mcp MCP server

A Model Context Protocol (MCP) server for document ingestion, chunking, semantic search, and note management.

Components

Resources

  • Implements a simple note storage system with:
    • Custom note:// URI scheme for accessing individual notes
    • Each note resource has a name, description, and text/plain mimetype

Prompts

  • Provides a prompt:
    • summarize-notes: Creates summaries of all stored notes
      • Optional "style" argument to control detail level (brief/detailed)
      • Generates prompt combining all current notes with style preference

Tools

The server implements a wide range of tools:

  • add-note: Add a new note to the in-memory note store
    • Arguments: name (string), content (string)
  • ingest-string: Ingest and chunk a markdown or plain text string provided via message
    • Arguments: content (string, required), source (string, optional), tags (list of strings, optional)
  • ingest-markdown: Ingest and chunk a markdown (.md) file
    • Arguments: path (string)
  • ingest-python: Ingest and chunk a Python (.py) file
    • Arguments: path (string)
  • ingest-openapi: Ingest and chunk an OpenAPI JSON file
    • Arguments: path (string)
  • ingest-html: Ingest and chunk an HTML file
    • Arguments: path (string)
  • ingest-html-url: Ingest and chunk HTML content from a URL (optionally using Playwright for dynamic content)
    • Arguments: url (string), dynamic (boolean, optional)
  • smart_ingestion: Extracts all technically relevant content from a file using Gemini, then chunks it using robust markdown logic.
    • Arguments:
      • path (string, required): File path to ingest.
      • prompt (string, optional): Custom prompt to use for Gemini.
      • tags (list of strings, optional): Optional list of tags for classification.
    • Uses Gemini 2.0 Flash 001 to extract only code, configuration, markdown structure, and technical definitions (no summaries or commentary).
    • Passes the extracted content to a mistune 3.x-based chunker that preserves both code blocks and markdown/narrative content as separate chunks.
    • Each chunk is embedded and stored for semantic search and retrieval.
  • search-chunks: Semantic search over ingested content
    • Arguments:
      • query (string): The semantic search query.
      • top_k (integer, optional, default 3): Number of top results to return.
      • type (string, optional): Filter results by chunk type (e.g., code, html, markdown).
      • tag (string, optional): Filter results by tag in chunk metadata.
    • Returns the most relevant chunks for a given query, optionally filtered by type and/or tag.
  • delete-source: Delete all chunks from a given source
    • Arguments: source (string)
  • delete-chunk-by-id: Delete one or more chunks by id
    • Arguments: id (integer, optional), ids (list of integers, optional)
    • You can delete a single chunk by specifying id, or delete multiple chunks at once by specifying ids.
  • update-chunk-type: Update the type attribute for a chunk by id
    • Arguments: id (integer, required), type (string, required)
  • ingest-batch: Ingest and chunk multiple documentation files (markdown, OpenAPI JSON, Python) in batch
    • Arguments: paths (list of strings)
  • list-sources: List all unique sources (file paths) that have been ingested and stored in memory, with optional filtering by tag or semantic search.
    • Arguments:
      • tag (string, optional): Filter sources by tag in chunk metadata.
      • query (string, optional): Semantic search query to find relevant sources.
      • top_k (integer, optional, default 10): Number of top sources to return when using query.
  • get-context: Retrieve relevant content chunks (content only) for use as AI context, with filtering by tag, type, and semantic similarity.
    • Arguments:
      • query (string, optional): The semantic search query.
      • tag (string, optional): Filter results by a specific tag in chunk metadata.
      • type (string, optional): Filter results by chunk type (e.g., 'code', 'markdown').
      • top_k (integer, optional, default 5): The number of top relevant chunks to retrieve.
  • update-chunk-metadata: Update the metadata field for a chunk by id
    • Arguments: id (integer), metadata (object)
  • tag-chunks-by-source: Adds specified tags to the metadata of all chunks associated with a given source (URL or file path). Merges with existing tags.
    • Arguments: source (string), tags (list of strings)
  • list-notes: List all currently stored notes and their content.

Chunking and Code Extraction

  • Markdown, Python, OpenAPI, and HTML files are split into logical chunks for efficient retrieval and search.
  • The markdown chunker uses mistune 3.x's AST API and regex to robustly split content by code blocks and narrative, preserving all original formatting.
  • Both code blocks and markdown/narrative content are preserved as separate chunks.
  • The HTML chunker uses the readability-lxml library to extract main content first, then extracts block code snippets from tags as dedicated "code" chunks. Inline content remains part of the narrative chunks.

Semantic Search

  • The search-chunks tool performs vector-based semantic search over all ingested content, returning the most relevant chunks for a given query.
  • Supports optional type and tag arguments to filter results by chunk type (e.g., code, html, markdown) and/or by tag in chunk metadata, before semantic ranking.
  • This enables highly targeted retrieval, such as "all code chunks tagged with 'langfuse' relevant to 'cost and usage'".

Metadata Management

  • Chunks include a metadata field for categorization and tagging.
  • The update-chunk-metadata tool allows updating metadata for any chunk by its id.
  • The tag-chunks-by-source tool allows adding tags to all chunks from a specific source in one operation. Tagging merges new tags with existing ones, preserving previous tags.

Configuration

The server requires the following environment variables (can be set in a .env file):

Ollama Configuration

  • OLLAMA_HOST: Hostname for Ollama API (default: localhost)
  • OLLAMA_PORT: Port for Ollama API (default: 11434)
  • RAG_AGENT: Ollama model to use for RAG responses (default: llama3)
  • OLLAMA_MODEL: Ollama model to use for embeddings (default: nomic-embed-text-v2-moe)

Database Configuration

  • HOST: PostgreSQL database host (default: localhost)
  • DB_PORT: PostgreSQL database port (default: 5432)
  • DB_NAME: PostgreSQL database name (default: doclibdb)
  • DB_USER: PostgreSQL database user (default: doclibdb_user)
  • DB_PASSWORD: PostgreSQL database password (default: doclibdb_password)

Reranker Configuration

  • RERANKER_MODEL_PATH: Path to the reranker model (default: /srv/samba/fileshare2/AI/models/bge-reranker-v2-m3)
  • RERANKER_USE_FP16: Whether to use FP16 for reranker (default: True)

Quickstart

Install

Claude Desktop

On MacOS: ~/Library/Application\ Support/Claude/claude_desktop_config.json On Windows: %APPDATA%/Claude/claude_desktop_config.json

Development/Unpublished Servers Configuration

"mcpServers": {
  "doc-lib-mcp": {
    "command": "uv",
    "args": [
      "--directory",
      "/home/administrator/python-share/doc-lib-mcp",
      "run",
      "doc-lib-mcp"
    ]
  }
}

Published Servers Configuration

"mcpServers": {
  "doc-lib-mcp": {
    "command": "uvx",
    "args": [
      "doc-lib-mcp"
    ]
  }
}

Development

Building and Publishing

To prepare the package for distribution:

  1. Sync dependencies and update lockfile:
uv sync
  1. Build package distributions:
uv build

This will create source and wheel distributions in the dist/ directory.

  1. Publish to PyPI:
uv publish

Note: You'll need to set PyPI credentials via environment variables or command flags:

  • Token: --token or UV_PUBLISH_TOKEN
  • Or username/password: --username/UV_PUBLISH_USERNAME and --password/UV_PUBLISH_PASSWORD

Debugging

Since MCP servers run over stdio, debugging can be challenging. For the best debugging experience, we strongly recommend using the MCP Inspector.

You can launch the MCP Inspector via npm with this command:

npx @modelcontextprotocol/inspector uv --directory /home/administrator/python-share/doc-lib-mcp run doc-lib-mcp

Upon launching, the Inspector will display a URL that you can access in your browser to begin debugging.

Publicly Shared Threads0

Discover shared experiences

Shared threads will appear here, showcasing real-world applications and insights from the community. Check back soon for updates!

Share your threads to help others
Related MCPs5
  • OpenSearch MCP Server
    OpenSearch MCP Server

    Model Context Protocol server enabling seamless interaction with Opensearch for document search, ind...

    6 tools
    Added May 30, 2025
  • mcp-server-asana
    mcp-server-asana

    Enables seamless interaction with Asana API via Model Context Protocol, providing advanced task, pro...

    22 tools
    Added May 30, 2025
  • TxtAi Memory Vector Server
    TxtAi Memory Vector Server

    Model Context Protocol server offering advanced semantic search, persistent memory management, tag-b...

    Added May 30, 2025
  • GraphRAG MCP Server
    GraphRAG MCP Server

    Model Context Protocol server enabling hybrid semantic and graph-based document retrieval by integra...

    Added May 30, 2025
  • MCP Server
    MCP Server

    Browse and interact with your entire Notion workspace via a Model Context Protocol server that conve...

    Added May 30, 2025