Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GraphRAG Server

Production-ready REST API server for GraphRAG with multiple backend options.

Migration Notice: The server has been migrated from Axum to Actix-web 4.9 with Apistos for automatic OpenAPI 3.0.3 documentation generation. All endpoints remain the same, but the server now includes automatic API documentation at /openapi.json.

Features

Storage Backends

  • Qdrant Integration - Production vector database with 100M+ vectors support (client-server)
  • LanceDB Integration - Serverless embedded database for native/desktop apps
  • Graceful Fallback - Works without external database (in-memory mode)

Embeddings

  • Ollama Integration - Local embeddings via Ollama (nomic-embed-text, etc.)
  • Hash-based Fallback - Deterministic embeddings without external dependencies
  • Auto-detection - Automatically uses Ollama if available, falls back otherwise

API Features

  • REST API - Clean HTTP endpoints for all operations powered by Actix-web 4.9
  • OpenAPI 3.0.3 - Automatic API documentation via Apistos
  • Swagger UI - Interactive API explorer at /swagger
  • Vector Search - Semantic search with cosine similarity
  • Real Embeddings - Generate actual embeddings for queries and documents
  • CORS Support - Ready for browser clients
  • Health Checks - Monitor server and database status
  • Metrics - Query counts, embedding statistics, and performance tracking
  • Entity/Relationship Storage - Store graph metadata in vector database payloads

Quick Start

1. Start Qdrant (Docker)

cd graphrag-server
docker-compose up -d

# Or manually:
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

2. Start GraphRAG Server

# With Qdrant (recommended)
cargo run --bin graphrag-server --features qdrant

# Without Qdrant (in-memory mode)
cargo run --bin graphrag-server --no-default-features

Server starts on http://0.0.0.0:8080

API Documentation:

  • OpenAPI Spec: http://localhost:8080/openapi.json
  • Swagger UI: http://localhost:8080/swagger

3. Test API

# Health check
curl http://localhost:8080/health

# Add a document
curl -X POST http://localhost:8080/api/documents \
  -H "Content-Type: application/json" \
  -d '{
    "title": "GraphRAG Introduction",
    "content": "GraphRAG combines knowledge graphs with retrieval-augmented generation for enhanced AI systems."
  }'

# Query
curl -X POST http://localhost:8080/api/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is GraphRAG?",
    "top_k": 5
  }'

Configuration

Set via environment variables:

# Embeddings (choose backend)
export EMBEDDING_BACKEND="ollama"  # or "hash" for fallback
export EMBEDDING_DIM="384"  # 384 for MiniLM, 768 for BERT
export OLLAMA_URL="http://localhost"
export OLLAMA_EMBEDDING_MODEL="nomic-embed-text"  # or "mxbai-embed-large"

# Qdrant connection (optional)
export QDRANT_URL="http://localhost:6334"
export COLLECTION_NAME="graphrag"

# Run server
cargo run --bin graphrag-server --features ollama

Feature Flags

# With Qdrant + Ollama embeddings (recommended for production)
cargo run --bin graphrag-server --features "qdrant,ollama"

# With LanceDB (serverless, embedded)
cargo run --bin graphrag-server --features "lancedb,ollama"

# Minimal (hash-based embeddings, in-memory storage)
cargo run --bin graphrag-server --no-default-features

# With authentication
cargo run --bin graphrag-server --features "qdrant,ollama,auth"

API Endpoints

Health & Info

GET /

API information and available endpoints.

curl http://localhost:8080/

GET /health

Health check with statistics.

curl http://localhost:8080/health

Response:

{
  "status": "healthy",
  "timestamp": "2025-10-01T12:00:00Z",
  "document_count": 42,
  "graph_built": true,
  "total_queries": 1337,
  "backend": "qdrant",
  "embeddings": {
    "backend": "ollama",
    "available": true,
    "stats": {
      "total_requests": 100,
      "ollama_success": 95,
      "ollama_failures": 5,
      "fallback_used": 5
    }
  }
}

Configuration

The server now supports dynamic configuration via JSON REST API, allowing you to initialize the full GraphRAG pipeline without TOML files.

GET /api/config

Get the current configuration.

curl http://localhost:8080/api/config

Response:

{
  "success": true,
  "config": {
    "output_dir": "./output",
    "chunk_size": 1000,
    "chunk_overlap": 200,
    "embeddings": { ... },
    "graph": { ... },
    ...
  },
  "graphrag_initialized": true
}

POST /api/config

Set configuration and initialize the full GraphRAG pipeline.

curl -X POST http://localhost:8080/api/config \
  -H "Content-Type: application/json" \
  -d '{
    "output_dir": "./output",
    "chunk_size": 1000,
    "chunk_overlap": 200,
    "embeddings": {
      "backend": "ollama",
      "dimension": 768,
      "model": "nomic-embed-text",
      "fallback_to_hash": true,
      "batch_size": 32
    },
    "graph": {
      "max_connections": 25,
      "similarity_threshold": 0.75
    },
    "text": {
      "chunk_size": 1000,
      "chunk_overlap": 200,
      "languages": ["en"]
    },
    "entities": {
      "min_confidence": 0.65,
      "entity_types": ["PERSON", "CONCEPT", "LOCATION", "EVENT", "ORGANIZATION"]
    },
    "retrieval": {
      "top_k": 15,
      "search_algorithm": "cosine"
    },
    "parallel": {
      "num_threads": 8,
      "enabled": true,
      "min_batch_size": 10,
      "chunk_batch_size": 100,
      "parallel_embeddings": true,
      "parallel_graph_ops": true,
      "parallel_vector_ops": true
    },
    "ollama": {
      "enabled": true,
      "host": "http://localhost",
      "port": 11434,
      "embedding_model": "nomic-embed-text",
      "chat_model": "llama3.1:8b",
      "timeout_seconds": 300,
      "max_retries": 3,
      "fallback_to_hash": true
    },
    "enhancements": {
      "enabled": true
    }
  }'

GET /api/config/template

Get configuration templates with examples (minimal, ollama_production, high_performance).

curl http://localhost:8080/api/config/template

Response:

{
  "template": { ... },
  "description": "Full GraphRAG configuration template with all options",
  "examples": [
    {
      "name": "minimal",
      "description": "Minimal configuration with hash-based embeddings",
      "config": { ... }
    },
    {
      "name": "ollama_production",
      "description": "Production setup with Ollama LLM and real embeddings",
      "config": { ... }
    },
    {
      "name": "high_performance",
      "description": "Optimized for speed with parallel processing",
      "config": { ... }
    }
  ]
}

GET /api/config/default

Get the default configuration.

curl http://localhost:8080/api/config/default

POST /api/config/validate

Validate configuration without applying it.

curl -X POST http://localhost:8080/api/config/validate \
  -H "Content-Type: application/json" \
  -d '{ ... config object ... }'

Response:

{
  "valid": true,
  "message": "Configuration is valid"
}

Documents

POST /api/documents

Add a document to the knowledge graph.

curl -X POST http://localhost:8080/api/documents \
  -H "Content-Type: application/json" \
  -d '{
    "title": "My Document",
    "content": "Document content here..."
  }'

Response:

{
  "success": true,
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "message": "Document added to Qdrant successfully",
  "backend": "qdrant"
}

GET /api/documents

List all documents.

curl http://localhost:8080/api/documents

DELETE /api/documents/:id

Delete a document by ID.

curl -X DELETE http://localhost:8080/api/documents/550e8400-e29b-41d4-a716-446655440000

Query

POST /api/query

Query the knowledge graph with semantic search.

curl -X POST http://localhost:8080/api/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How does GraphRAG work?",
    "top_k": 5
  }'

Response:

{
  "query": "How does GraphRAG work?",
  "results": [
    {
      "document_id": "doc-1",
      "title": "GraphRAG Overview",
      "similarity": 0.92,
      "excerpt": "GraphRAG combines knowledge graphs with retrieval..."
    }
  ],
  "processing_time_ms": 15,
  "backend": "qdrant"
}

Graph Operations

POST /api/graph/build

Build/rebuild the knowledge graph.

curl -X POST http://localhost:8080/api/graph/build

GET /api/graph/stats

Get graph statistics.

curl http://localhost:8080/api/graph/stats

Response:

{
  "document_count": 42,
  "entity_count": 420,
  "relationship_count": 630,
  "vector_count": 840,
  "graph_built": true,
  "backend": "qdrant"
}

Architecture

With Qdrant (Production)

┌─────────────────┐
│  REST Client    │ (Browser, CLI, etc.)
└────────┬────────┘
         │ HTTP
┌────────▼─────────────────────┐
│   GraphRAG Server            │
│   ┌──────────────────────┐   │
│   │ Actix-web REST API   │   │
│   │ + Apistos OpenAPI    │   │
│   │ + CORS               │   │
│   │ + Tracing            │   │
│   └──────────┬───────────┘   │
│              │                │
│   ┌──────────▼───────────┐   │
│   │ Qdrant Client        │   │
│   │ + Vector Search      │   │
│   │ + Metadata Storage   │   │
│   └──────────┬───────────┘   │
└──────────────┼────────────────┘
               │ gRPC (port 6334)
┌──────────────▼────────────────┐
│   Qdrant Vector Database      │
│   + 100M+ vector capacity     │
│   + JSON payload storage      │
│   + Filtering & search        │
└───────────────────────────────┘

Without Qdrant (Development/Testing)

┌─────────────────┐
│  REST Client    │
└────────┬────────┘
         │ HTTP
┌────────▼─────────────────────┐
│   GraphRAG Server            │
│   ┌──────────────────────┐   │
│   │ Actix-web REST API   │   │
│   │ + Apistos OpenAPI    │   │
│   └──────────┬───────────┘   │
│              │                │
│   ┌──────────▼───────────┐   │
│   │ In-Memory Storage    │   │
│   │ + Vec<Document>      │   │
│   │ + Keyword matching   │   │
│   └──────────────────────┘   │
└───────────────────────────────┘

Qdrant Storage Schema

Collection Configuration

  • Name: graphrag (configurable)
  • Dimension: 384 (MiniLM) or 768 (BERT)
  • Distance: Cosine similarity
  • Indexing: HNSW (Hierarchical Navigable Small World)

Document Payload Structure

Each document in Qdrant stores:

{
  "id": "doc-uuid",
  "title": "Document Title",
  "text": "Full document text",
  "chunk_index": 0,
  "entities": [
    {
      "id": "entity-uuid",
      "name": "Entity Name",
      "entity_type": "Person|Organization|Location",
      "properties": {}
    }
  ],
  "relationships": [
    {
      "source": "entity-1",
      "relation": "WORKS_FOR",
      "target": "entity-2",
      "properties": {}
    }
  ],
  "timestamp": "2025-10-01T12:00:00Z",
  "custom": {}
}

Development

Build

# Development build
cargo build --bin graphrag-server

# Production build with optimizations
cargo build --release --bin graphrag-server

Test

# Unit tests
cargo test --bin graphrag-server

# Integration tests (requires Qdrant running)
docker-compose up -d
cargo test --bin graphrag-server --features qdrant -- --test-threads=1

Run

# Development mode with auto-reload
cargo watch -x 'run --bin graphrag-server'

# Production mode
cargo run --release --bin graphrag-server

TODO

Short Term

  • Real embedding generation (Ollama integrated)
  • OpenAPI 3.0.3 documentation (via Apistos)
  • Swagger UI integration (apistos swagger-ui, served at /swagger)
  • Entity extraction from documents
  • Relationship extraction
  • Batch document upload
  • Pagination for document listing

Medium Term

  • Authentication & authorization (feature temporarily disabled)
  • Rate limiting
  • OpenTelemetry metrics
  • Prometheus endpoint
  • API versioning

Long Term

  • GraphQL API
  • WebSocket support for streaming
  • Multi-tenant support
  • Advanced graph algorithms (PageRank, community detection)
  • LanceDB integration (alternative to Qdrant)

Deployment

Docker

# Coming soon
FROM rust:1.75 AS builder
WORKDIR /app
COPY . .
RUN cargo build --release --bin graphrag-server

FROM debian:bookworm-slim
COPY --from=builder /app/target/release/graphrag-server /usr/local/bin/
EXPOSE 8080
CMD ["graphrag-server"]

Docker Compose (Full Stack)

version: '3.8'
services:
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - qdrant_data:/qdrant/storage

  graphrag-server:
    build: .
    ports:
      - "8080:8080"
    environment:
      - QDRANT_URL=http://qdrant:6334
      - COLLECTION_NAME=graphrag
      - EMBEDDING_DIM=384
    depends_on:
      - qdrant

Performance

Benchmarks (Preliminary)

Hardware: M1 MacBook Pro, 16GB RAM

OperationQdrant BackendIn-Memory
Add document5-10ms<1ms
Query (top 10)10-20ms5-10ms
Build graph (1k docs)~2s~1s
Build graph (10k docs)~15s~8s

Note: Qdrant scales much better for large datasets (100k+ documents).

Troubleshooting

“Could not connect to Qdrant”

Cause: Qdrant not running or wrong URL.

Solution:

# Check Qdrant is running
docker ps | grep qdrant

# Start if not running
docker-compose up -d

# Verify connection
curl http://localhost:6333/healthz

“Collection not found”

Cause: Collection not created.

Solution: Server auto-creates collection on first run. Check logs:

cargo run --bin graphrag-server 2>&1 | grep collection

Slow query performance

Cause: Large dataset without proper indexing.

Solutions:

  1. Ensure HNSW indexing is enabled in Qdrant
  2. Adjust top_k parameter (lower = faster)
  3. Use filters to narrow search space

License

MIT

Credits

  • Qdrant - https://qdrant.tech/
  • Actix-web - https://actix.rs/
  • Apistos - https://github.com/netwo-io/apistos (OpenAPI 3.0.3 documentation)
  • GraphRAG - https://github.com/automataIA/graphrag-rs

Backend Comparison

Qdrant

Best for: Production deployments, cloud environments, microservices

  • ✅ Scales to 100M+ vectors
  • ✅ Distributed deployment support
  • ✅ Advanced filtering and search
  • ✅ Persistent storage with automatic backups
  • Requires separate server (Docker/cloud)

LanceDB

Best for: Desktop apps, native applications, embedded use cases

  • ✅ No server required (embedded)
  • ✅ Zero-copy data access
  • ✅ Automatic versioning
  • ✅ Works offline
  • Single-process access
  • Placeholder implementation (see lancedb_store.rs for integration guide)

In-Memory

Best for: Development, testing, demos

  • ✅ No dependencies
  • ✅ Fast for small datasets
  • Data lost on restart
  • Limited scalability

Embeddings Backends

Best for: Local development, privacy-focused deployments

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull embedding model
ollama pull nomic-embed-text  # 384 dimensions, 274MB
# or
ollama pull mxbai-embed-large  # 1024 dimensions, 670MB

# Start server with Ollama
EMBEDDING_BACKEND=ollama cargo run --bin graphrag-server --features "qdrant,ollama"

Pros:

  • ✅ Real semantic embeddings
  • ✅ Local/private (no API calls)
  • ✅ Multiple model options
  • ✅ Automatic fallback if unavailable

Cons:

  • Requires Ollama service running
  • Slower than hash-based (100-200ms per embedding)

Hash-based Fallback

Best for: Testing, offline environments, minimal dependencies

# Start server with hash embeddings (no Ollama needed)
EMBEDDING_BACKEND=hash cargo run --bin graphrag-server

Pros:

  • ✅ No external dependencies
  • ✅ Fast (<1ms per embedding)
  • ✅ Deterministic
  • ✅ Works offline

Cons:

  • Not semantic (hash-based, not neural)
  • Lower search quality
  • Fixed dimension (384)

Example Workflows

Production Setup (Qdrant + Ollama)

# 1. Start Qdrant
docker-compose up -d

# 2. Start Ollama
ollama serve &
ollama pull nomic-embed-text

# 3. Start GraphRAG server
export EMBEDDING_BACKEND=ollama
export QDRANT_URL=http://localhost:6334
cargo run --release --bin graphrag-server --features "qdrant,ollama"

# 4. Add documents with real embeddings
curl -X POST http://localhost:8080/api/documents \
  -H "Content-Type: application/json" \
  -d '{"title":"AI Safety","content":"AI safety research focuses on..."}'

# 5. Query with semantic search
curl -X POST http://localhost:8080/api/query \
  -H "Content-Type: application/json" \
  -d '{"query":"Tell me about AI safety","top_k":5}'

Desktop App (LanceDB + Ollama)

# 1. Start Ollama
ollama serve &
ollama pull nomic-embed-text

# 2. Start GraphRAG with LanceDB (embedded)
export EMBEDDING_BACKEND=ollama
export LANCEDB_PATH=./data/graphrag.lance
cargo run --release --bin graphrag-server --features "lancedb,ollama"

# No external database needed! Data stored in ./data/

Minimal Setup (Hash embeddings)

# Just run the server - no dependencies!
EMBEDDING_BACKEND=hash cargo run --bin graphrag-server --no-default-features

# Works immediately with hash-based embeddings

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     GraphRAG Server                          │
│                                                               │
│  ┌──────────────┐      ┌──────────────┐                     │
│  │  Embedding   │      │   Storage    │                     │
│  │  Service     │      │   Backend    │                     │
│  │              │      │              │                     │
│  │  - Ollama    │      │  - Qdrant    │                     │
│  │  - Hash      │      │  - LanceDB   │                     │
│  │  Fallback    │      │  - Memory    │                     │
│  └──────────────┘      └──────────────┘                     │
│         │                      │                             │
│         └──────────┬───────────┘                             │
│                    │                                         │
│              ┌─────▼─────┐                                   │
│              │  REST API │                                   │
│              └───────────┘                                   │
└─────────────────────────────────────────────────────────────┘

Performance

Embeddings

  • Ollama (nomic-embed-text): ~100-200ms per document
  • Hash-based: <1ms per document
  • Caching: Automatic with LRU cache
  • Qdrant: <50ms for 1M vectors with HNSW index
  • LanceDB: <100ms for 100K vectors
  • In-memory: <10ms for 10K vectors

Troubleshooting

Ollama not connecting

# Check Ollama is running
curl http://localhost:11434/api/tags

# Check model is available
ollama list | grep nomic-embed-text

# Pull model if missing
ollama pull nomic-embed-text

Qdrant connection failed

# Check Qdrant is running
curl http://localhost:6333/

# Check Docker container
docker ps | grep qdrant

# Restart Qdrant
docker-compose restart

Slow embedding generation

# Use smaller model
ollama pull nomic-embed-text  # 384 dim, faster

# Or use hash fallback for testing
export EMBEDDING_BACKEND=hash

Migration to Actix-web + Apistos

What Changed?

Previous Stack:

  • Web Framework: Axum 0.8
  • Documentation: Manual/external tools

Current Stack:

  • Web Framework: Actix-web 4.9 (high-performance, production-ready)
  • Documentation: Apistos 0.6 (automatic OpenAPI 3.0.3 generation)
  • API Schema: Automatically generated from Rust types

Benefits

  1. Automatic API Documentation: OpenAPI 3.0.3 spec generated directly from code
  2. Type-Safe Schemas: Request/response models automatically documented via #[derive(JsonSchema, ApiComponent)]
  3. Production-Ready: Actix-web is battle-tested in high-traffic production environments
  4. Better Error Handling: Structured error responses with OpenAPI documentation

Breaking Changes

None! All API endpoints remain identical. Clients don’t need any changes.

Temporary Limitations

  • Authentication feature disabled: The auth feature requires middleware migration and is temporarily unavailable. Will be re-enabled in a future update.
  • Swagger UI setup incomplete: Basic OpenAPI spec is generated, but interactive Swagger UI is not yet fully configured (coming soon).

Developer Notes

When adding new endpoints:

#![allow(unused)]
fn main() {
use apistos::api_operation;
use apistos_gen::ApiErrorComponent;
use schemars::JsonSchema;

// Annotate request/response models
#[derive(Serialize, Deserialize, JsonSchema, ApiComponent)]
pub struct MyRequest {
    #[schemars(example = "example_value")]
    pub field: String,
}

// Annotate handlers
#[api_operation(
    tag = "my_tag",
    summary = "Short description",
    description = "Detailed description",
    error_code = 400,
    error_code = 500
)]
async fn my_handler(
    state: Data<AppState>,
    body: Json<MyRequest>,
) -> Result<Json<MyResponse>, ApiError> {
    // Handler logic
}

// Register with Apistos routing
.service(
    scope("/api/my-endpoint")
        .service(resource("").route(post().to(my_handler)))
)
}

License

See LICENSE in the root directory.