Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GraphRAG-RS

GraphRAG Network Visualization

GraphRAG-RS is a modular, portable GraphRAG implementation written in Rust. It builds a knowledge graph from your documents — chunking, embeddings, entity and relationship extraction, community detection — and answers questions over that graph with citations.

The same core library runs natively and in the browser via WebAssembly, with a config-driven pipeline that scales from a zero-dependency pattern matcher to a full LLM-enriched extraction stack.

Why GraphRAG-RS

  • One library, three personalities. Pattern-only (no LLM, < 10 ms/chunk), LLM + KV-cache enrichment (Ollama), or a hybrid — selected at runtime from Config, not at compile time.
  • Native + WASM. graphrag-core is crate-type = ["rlib", "cdylib"]; the browser build uses a Voy vector store.
  • Turnkey. cargo run -p graphrag-cli -- index ./docs.txt then ask "..." — zero config to start.
  • Modular crates. Use the core library, the TUI/CLI, the REST server, or the WASM bindings.

Where to go next

If you want to…Start here
Install and run your first queryInstallation & Quick Start
Understand the pipelineHow It Works
Configure extraction & modelsConfiguration Guide
Browse the APIdocs.rs/graphrag-core

Source: github.com/automataIA/graphrag-rs

Overview

GraphRAG-RS is a 5-crate Cargo workspace. You pick the entry point that fits your deployment.

CrateRole
graphrag-coreCore library — all GraphRAG logic. Native + WASM (rlib + cdylib).
graphrag-cliTurnkey TUI + CLI binary. In-process use of the core (no HTTP).
graphrag-serverActix-web REST API with OpenAPI + optional Qdrant.
graphrag-wasmBrowser bindings (Voy vector store, WebLLM, ONNX).
graphragWrapper meta-crate that re-exports graphrag-core for the hello-world experience.

The config-driven pipeline

The same code runs three ways, selected at runtime from Config — not at compile time:

  • Pattern-only — no LLM, regex extractor, < 10 ms per chunk. Config::default() works offline via hash-fallback embeddings.
  • LLM-enriched — Ollama with KV-cache reuse (keep_alive + dynamic num_ctx) for higher-quality entity and relationship extraction.
  • Hybrid — selective LLM stages over a fast base pipeline.

See How It Works for the full 7-stage pipeline.

Deployment options

  • Server — multi-tenant, GPU workloads, large corpora. Qdrant + Ollama. See graphrag-server.
  • WASM (client-side) — privacy-first, offline, zero infrastructure. Full pipeline in the browser with ONNX embeddings and WebLLM synthesis. See graphrag-wasm.
  • Embedded library — call graphrag-core directly from your Rust app.

Prerequisites

  • Rust 1.85+ (add the wasm32-unknown-unknown target for WASM builds).
  • Ollama (optional) for LLM-quality extraction / real embeddings: ollama pull nomic-embed-text.
  • Docker (optional) for the Qdrant vector database.

Continue to Installation & Quick Start.

Installation & Quick Start

CLI (turnkey, zero config)

cargo install --path graphrag-cli           # one-time install
graphrag index ./mydoc.txt                  # builds ./graphrag-data
graphrag ask "What is the main topic?"      # answers from the graph

Add --ollama to either command for LLM-quality entity extraction (requires ollama serve running locally). With no flags, the CLI uses sensible defaults — hash-fallback embeddings, pattern-based extraction, and a persistent workspace.

Run graphrag with no arguments for the interactive TUI, or graphrag setup for the config wizard. See CLI & TUI Usage.

Library (Rust)

use graphrag::GraphRAG;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let mut g = GraphRAG::quick_start("Plato's Symposium full text here...").await?;
    println!("{}", g.ask("Who is Diotima?").await?);
    Ok(())
}

For more control, use GraphRAG::builder() or Config::quick(workspace) with .with_ollama() / .with_chunk_size(). The typical flow is:

  1. Config::quick(workspace) or GraphRAG::builder().
  2. add_document(doc)build_graph() (chunking → embeddings → entities → relationships → persist).
  3. ask(q) / ask_explained(q) / ask_with_reasoning(q).

System dependencies

PlatformInstall
Linux (Debian/Ubuntu)sudo apt install -y build-essential pkg-config
macOSxcode-select --install
WindowsVisual Studio Build Tools with C++ support

For WASM builds: rustup target add wasm32-unknown-unknown and cargo install trunk wasm-bindgen-cli.

Optional services

ollama pull nomic-embed-text     # local embeddings / LLM extraction
docker-compose up -d             # Qdrant vector database (server mode)

Next: understand the pipeline or tune the configuration.

CLI & TUI Usage

{{#include ../../../docs/TUI_USAGE_GUIDE.md}}

How GraphRAG Works: A Complete Guide

Understanding the 7-Stage Pipeline from Document to Answer


What is GraphRAG?

GraphRAG (Graph-based Retrieval-Augmented Generation) is an intelligent system that transforms unstructured text into a knowledge graph and uses it to answer questions with unprecedented accuracy and context awareness.

Think of it like this:

Imagine a brilliant librarian who:

  1. Reads every book in the library
  2. Creates an interconnected index of people, places, concepts, and their relationships
  3. When you ask a question, uses this knowledge map to find relevant information
  4. Combines multiple sources to give you a comprehensive, contextual answer

That’s exactly what GraphRAG does, but at machine scale with scientific precision.

Why GraphRAG vs Traditional RAG?

FeatureTraditional RAGGraphRAG
Knowledge StorageFlat vector chunksInterconnected knowledge graph
Context UnderstandingSemantic similarity onlyRelationships + concepts + hierarchy
Multi-hop Reasoning❌ Limited✅ Natural via graph traversal
Token EfficiencyBaseline6000x reduction (LightRAG)
AccuracyGood15% better (empirical studies)

Configuration-Driven Dynamic Pipeline

GraphRAG-rs adapts its behavior based on your TOML configuration - the same codebase can run as:

  • Fast, lightweight system (pattern-based, no LLM, <10ms processing)
  • High-accuracy AI system (LLM-based, gleaning, contextual extraction)
  • Hybrid approach (selective LLM use for critical stages)

All controlled by simple TOML settings - no code changes required.

How Configuration Changes the Pipeline

# Example 1: Fast, No-LLM Pipeline
[entity_extraction]
use_gleaning = false          # ← Pattern-based extraction

[ollama]
enabled = false               # ← No LLM required

# Result: <10ms entity extraction, good quality
# Example 2: High-Accuracy LLM Pipeline
[entity_extraction]
use_gleaning = true           # ← LLM-based extraction
max_gleaning_rounds = 4       # ← 4 refinement passes

[ollama]
enabled = true
chat_model = "llama3.1:8b"    # ← AI-powered extraction

# Result: 200-500ms entity extraction, excellent quality

Dynamic Stage Selection

The system automatically selects implementations based on config:

StageConfig SettingImplementationPerformance
Text Chunkingchunk_size, chunk_overlapFixed/AdaptiveAlways fast
Embeddingsembeddings.backendHash/Ollama/ONNXVaries
Entity Extractionuse_gleaning + ollama.enabledPattern/LLM10ms vs 500ms
Relationshipsextract_relationshipsPattern/LLMAuto-selected
Retrievalretrieval.strategyVector/BM25/Hybrid/PageRankVaries
Generationgeneration.backendMock/Ollama/WebLLMVaries

Logged during startup:

[INFO] Configuration loaded from: symposium_config.toml
[INFO] Using LLM-based entity extraction with gleaning (max_rounds: 4)
[INFO] Using Ollama embeddings: nomic-embed-text (768 dimensions)
[INFO] Using hybrid retrieval: vector (40%) + bm25 (30%) + pagerank (30%)
[INFO] Using Ollama generation: llama3.1:8b

Three Pipeline Approaches: Choose Your Strategy

GraphRAG-rs offers three distinct pipeline approaches, each optimized for different use cases and resource constraints. This approach-based architecture lets you explicitly choose your quality vs. speed trade-off.

The Three Approaches

┌─────────────────┬──────────────────┬─────────────────┐
│    SEMANTIC     │   ALGORITHMIC    │     HYBRID      │
│                 │                  │                 │
│  Neural/LLM     │  Pattern-based   │  Best of Both   │
│  High Quality   │  High Speed      │  Balanced       │
│  GPU Preferred  │  CPU Only        │  Moderate GPU   │
└─────────────────┴──────────────────┴─────────────────┘

1. Semantic Pipeline (Neural/LLM-based)

Philosophy: Use deep learning and LLMs for maximum understanding and quality.

Technology Stack:

  • Embeddings: Neural models (HuggingFace, OpenAI, Ollama)
  • Entity Extraction: LLM-based with gleaning (iterative refinement)
  • Retrieval: Vector similarity search (cosine similarity, HNSW)
  • Graph Construction: Semantic relationships with PageRank

Configuration:

[mode]
approach = "semantic"

[semantic.embeddings]
backend = "huggingface"
model_name = "sentence-transformers/all-MiniLM-L6-v2"

[semantic.entity_extraction]
use_gleaning = true
max_gleaning_rounds = 3
llm_model = "llama3.1:8b"

[semantic.retrieval]
strategy = "vector_similarity"
use_hnsw_index = true

Performance:

  • Quality: ★★★★★ (90-95% accuracy)
  • Speed: ★★☆☆☆ (100-500 docs/sec)
  • Resource: ★★★★★ (High: 4-8GB, GPU recommended)

Best For: Research papers, legal documents, philosophical texts, narrative fiction, nuanced content analysis.


2. Algorithmic Pipeline (Pattern-based)

Philosophy: Use traditional NLP and pattern matching for speed and deterministic behavior.

Technology Stack:

  • Embeddings: Hash-based with TF-IDF weighting
  • Entity Extraction: Pattern matching (regex, capitalization rules)
  • Retrieval: BM25 keyword-based retrieval
  • Graph Construction: Co-occurrence based relationships

Configuration:

[mode]
approach = "algorithmic"

[algorithmic.embeddings]
backend = "hash"
hash_size = 1024
use_tfidf_weighting = true

[algorithmic.entity_extraction]
use_gleaning = false
use_patterns = true
extract_capitalized = true

[algorithmic.retrieval]
strategy = "bm25"
bm25_k1 = 1.5
bm25_b = 0.75

Performance:

  • Quality: ★★★☆☆ (70-85% accuracy)
  • Speed: ★★★★★ (1000-5000 docs/sec)
  • Resource: ★☆☆☆☆ (Low: 1-2GB, CPU only)

Best For: Large-scale processing, resource-constrained environments, real-time applications, structured data, privacy-sensitive systems (no external APIs).


3. Hybrid Pipeline (Combined)

Philosophy: Combine semantic and algorithmic approaches for balanced quality and performance.

Technology Stack:

  • Embeddings: Dual (neural + hash-based)
  • Entity Extraction: LLM + pattern fusion
  • Retrieval: RRF (Reciprocal Rank Fusion) combining vector + BM25
  • Graph Construction: Cross-validated relationships

Configuration:

[mode]
approach = "hybrid"

[hybrid.weights]
semantic_weight = 0.6
algorithmic_weight = 0.4

[hybrid.embeddings]
primary_backend = "huggingface"
secondary_backend = "hash"
fusion_strategy = "weighted"

[hybrid.entity_extraction]
use_gleaning = true
use_patterns = true
max_gleaning_rounds = 2

[hybrid.retrieval]
fusion_strategy = "rrf"
rrf_k = 60

Performance:

  • Quality: ★★★★☆ (85-95% accuracy)
  • Speed: ★★★☆☆ (200-1000 docs/sec)
  • Resource: ★★★☆☆ (Medium: 3-4GB, moderate GPU)

Best For: Production systems, diverse query workloads, mixed document types, applications requiring both quality and efficiency.


How Approach Selection Works

The [mode] section in your TOML config controls the entire pipeline:

# Option 1: Semantic (high quality)
[mode]
approach = "semantic"

# Option 2: Algorithmic (high speed)
[mode]
approach = "algorithmic"

# Option 3: Hybrid (balanced)
[mode]
approach = "hybrid"

This single setting automatically configures:

  • Which embedding implementation to use
  • Whether to use LLM-based or pattern-based entity extraction
  • Which retrieval strategy to employ
  • How to construct graph relationships

Dynamic Pipeline Selection at Runtime:

#![allow(unused)]
fn main() {
// In src/lib.rs:346 - build_graph() method
// The system checks config.approach and selects implementations:

match config.approach.as_str() {
    "semantic" => {
        // Use LLM-based gleaning extraction
        if config.entities.use_gleaning && config.ollama.enabled {
            extract_entities_with_gleaning()
        }
    }
    "algorithmic" => {
        // Use pattern-based extraction
        extract_entities_with_patterns()
    }
    "hybrid" => {
        // Use both and fuse results
        let llm_entities = extract_entities_with_gleaning();
        let pattern_entities = extract_entities_with_patterns();
        fuse_entity_results(llm_entities, pattern_entities)
    }
}
}

Approach Comparison Matrix

AspectSemanticAlgorithmicHybrid
Entity ExtractionLLM + gleaning (3-4 rounds)Regex + capitalizationLLM + patterns (2 rounds)
EmbeddingsNeural (HuggingFace/Ollama)Hash + TF-IDFDual (neural + hash)
RetrievalVector similarity (HNSW)BM25 keyword searchRRF fusion
Graph RelationshipsSemantic similarityCo-occurrenceCross-validated
Processing Time500ms-1s per doc10-50ms per doc100-300ms per doc
Memory Usage4-8GB1-2GB3-4GB
GPU RequiredRecommendedNoOptional
LLM RequiredYes (Ollama/OpenAI)NoYes (with fallback)
Accuracy90-95%70-85%85-95%
Best Use CaseResearch, legal, literatureLarge-scale, real-timeProduction, general-purpose

Quick Start by Approach

Semantic Pipeline:

cp config/templates/semantic_pipeline.toml my_config.toml
# Edit paths in my_config.toml
cargo run --example your_example -- my_config.toml

Algorithmic Pipeline:

cp config/templates/algorithmic_pipeline.toml my_config.toml
# Edit paths in my_config.toml
cargo run --example your_example -- my_config.toml
# No Ollama required!

Hybrid Pipeline:

cp config/templates/hybrid_pipeline.toml my_config.toml
# Edit paths in my_config.toml
cargo run --example your_example -- my_config.toml

For detailed configuration guide, see CONFIGURATION_GUIDE.md.


LazyGraphRAG & E2GraphRAG: Ultra-Efficient Approaches

New in 2025: Revolutionary approaches achieving 0.1% of traditional indexing cost while maintaining 90%+ quality.

Overview: Cost-Optimized GraphRAG

These cutting-edge implementations eliminate expensive LLM-based entity extraction during indexing:

┌──────────────────┬─────────────────┬────────────────┐
│  Traditional     │  LazyGraphRAG   │  E2GraphRAG    │
│  GraphRAG        │                 │                │
│                  │                 │                │
│  LLM-based       │  Concept-based  │  Pattern-based │
│  High Cost       │  0.1% Cost      │  0.05% Cost    │
│  95% Quality     │  92% Quality    │  88% Quality   │
└──────────────────┴─────────────────┴────────────────┘

LazyGraphRAG (Microsoft Research, 2025)

Philosophy: Zero LLM for indexing, concept graph from co-occurrence, iterative deepening for queries.

Key Features:

  • No LLM Calls During Indexing: Uses noun phrase extraction
  • 1000x Cheaper Indexing: $0.10 vs $100 per 1M tokens
  • 100x Faster Indexing: 1000 docs/sec vs 10 docs/sec
  • 700x Cheaper Queries: $0.0014 vs $1.00 per query
  • 92% Quality: Acceptable trade-off for massive cost savings

Technology Stack:

  • Concept Extraction: Regex-based noun phrases (no LLM)
  • Graph Construction: Co-occurrence with Jaccard similarity
  • Indexing: Bidirectional entity-chunk index (O(1) lookups)
  • Query Processing: Iterative deepening search
  • Refinement: Query expansion via concept graph traversal

Configuration:

[experimental]
lazy_graphrag = true

[experimental.lazy_graphrag_config]
use_concept_extraction = true
min_concept_length = 3
max_concept_words = 5
co_occurrence_threshold = 1
use_query_refinement = true
max_refinement_iterations = 3
use_bidirectional_index = true

Performance:

  • Quality: ★★★★☆ (92% accuracy) | Speed: ★★★★★ (1000 docs/sec)
  • Cost: ★★★★★ (0.1% of traditional) | Resource: ★☆☆☆☆ (200MB RAM)

Example:

#![allow(unused)]
fn main() {
use graphrag_core::lightrag::LazyGraphRAGPipeline;

let mut pipeline = LazyGraphRAGPipeline::default();
pipeline.index_document("doc1", "Machine Learning transforms AI...");
pipeline.build_graph(); // Fast, no LLM!

let results = pipeline.query("machine learning applications");
println!("Found {} chunks", results.chunk_count());
}

E2GraphRAG (2025)

Philosophy: Pattern-based entity extraction, no LLM required, deterministic output.

Key Features:

  • 100x Faster Entity Extraction: 5ms vs 500ms per chunk
  • 2000x Cheaper: $0.05 per 1M tokens
  • Deterministic: Fully reproducible results

Configuration:

[experimental]
e2_graphrag = true

[experimental.e2_graphrag_config]
use_lightweight_ner = true
entity_types = ["PERSON", "ORGANIZATION", "LOCATION", "CONCEPT"]
use_capitalization_detection = true
use_noun_phrase_extraction = true

Cost Comparison

ApproachIndexing CostQuery CostSpeedQuality
Traditional$100/1M$1.00/query10 docs/sec95%
LazyGraphRAG$0.10/1M$0.0014/query1000 docs/sec92%
E2GraphRAG$0.05/1M$0.001/query2000 docs/sec88%

ROI Example (1M docs, 10k queries/month):

  • Traditional: $220k/year
  • LazyGraphRAG: $268/year (820x cheaper!)
  • E2GraphRAG: $170/year (1300x cheaper!)

For complete documentation, see docs/LAZYGRAPHRAG_E2GRAPHRAG.md.


The 7-Stage Pipeline

GraphRAG-rs processes documents through 7 interconnected stages, transforming raw text into intelligent, queryable knowledge. Let’s explore each stage with a real example using The Adventures of Tom Sawyer.

flowchart TB
    Input[Raw Document<br/>434,401 characters] --> Stage1

    subgraph Pipeline ["GraphRAG 7-Stage Pipeline"]
        Stage1[Stage 1: Text Chunking<br/>Break into 492 chunks]
        Stage2[Stage 2: Embeddings<br/>Generate 384-dim vectors]
        Stage3[Stage 3: Entity Extraction<br/>Find 429 entities]
        Stage4[Stage 4: Graph Construction<br/>Build knowledge graph]
        Stage5[Stage 5: Dual-Level Retrieval<br/>Smart search]
        Stage6[Stage 6: Query Processing<br/>Understand question]
        Stage7[Stage 7: Answer Generation<br/>Compose response]

        Stage1 --> Stage2
        Stage2 --> Stage3
        Stage3 --> Stage4
        Stage4 --> Stage5

        Query[User Query] --> Stage6
        Stage6 --> Stage5
        Stage5 --> Stage7
    end

    Stage7 --> Output[✅ Final Answer<br/>with sources]

    style Stage1 fill:#e1f5ff
    style Stage2 fill:#fff4e6
    style Stage3 fill:#f3e5f5
    style Stage4 fill:#e8f5e9
    style Stage5 fill:#fff9c4
    style Stage6 fill:#fce4ec
    style Stage7 fill:#e0f2f1

Stage 1: Text Chunking

What it does: Divides long documents into overlapping, semantically meaningful segments.

Why: LLMs have token limits (typically 4K-32K tokens). Chunking allows processing of arbitrarily large documents while preserving local context through overlap.

Process Details

Input:

"Tom!" No answer. "TOM!" No answer. "What's gone with that boy, I wonder?
You TOM!" No answer. The old lady pulled her spectacles down and looked
over them about the room; then she put them up and looked out under them...

Configuration (from config/templates/narrative_fiction.toml):

chunk_size = 800        # ~200 words
chunk_overlap = 200     # 50 words overlap

Output: 492 overlapping chunks

Chunk 1: "Tom! No answer. TOM! No answer. What's gone..."  [800 chars]
Chunk 2: "...What's gone with that boy, I wonder? You TOM!..." [800 chars, 200 overlap]
Chunk 3: "...You TOM! No answer. The old lady pulled her..." [800 chars, 200 overlap]
...
Chunk 492: "...the end of Tom Sawyer's adventures." [final chunk]

Why Overlap Matters

Without Overlap (❌ Context Loss):

Chunk A: "...Tom found the treasure under the"
Chunk B: "cross marked on the old tree..."
❌ Entity "treasure under the cross" split across chunks

With 200-char Overlap (✅ Preserved):

Chunk A: "...Tom found the treasure under the cross marked on..."
Chunk B: "...treasure under the cross marked on the old tree..."
✅ Complete entity captured in both chunks

Module: src/text/chunking.rs Performance: ~0.01s for 434KB document


Stage 2: Embeddings Generation

What it does: Converts text chunks into high-dimensional numerical vectors that capture semantic meaning.

Why: Computers can’t understand text directly. Embeddings transform words into numbers while preserving meaning relationships (e.g., “king - man + woman ≈ queen”).

The Vector Space

Each chunk becomes a 384-dimensional vector where similar meanings cluster together:

"Tom and Huck found treasure" → [0.23, -0.45, 0.67, ..., 0.12] (384 numbers)
"The boys discovered gold"    → [0.21, -0.42, 0.69, ..., 0.14] (close!)
"The weather was sunny"       → [-0.67, 0.23, -0.12, ..., 0.45] (far away)

Embedding Backends

GraphRAG-rs supports multiple embedding strategies:

BackendPerformanceUse CaseImplementation
Ollama (nomic-embed-text)100-200ms/chunkProduction semantic searchsrc/ollama/embeddings.rs
ONNX Runtime Web3-8ms/chunk (GPU)WASM browser deploymentgraphrag-wasm/src/onnx_embedder.rs
Hash-based (TF)<1ms/chunkTesting, offline, no dependenciessrc/embeddings/hash_embedder.rs
Candle (planned)50-100ms/chunk100% Rust, CPU-onlyFuture

Real Example Output

#![allow(unused)]
fn main() {
// From examples/real_ollama_pipeline.rs
let embedding = embedder.generate_embedding_async(
    "Tom found the treasure in the cave"
).await?;

// Result: Vec<f32> with 384 dimensions
// [0.234, -0.456, 0.678, 0.123, ..., -0.234]
// L2 norm: ~1.0 (normalized)
}

Module: src/embeddings/neural/mod.rs Performance:

  • Ollama: ~100ms per chunk (5-10 chunks/sec)
  • ONNX GPU: ~3-8ms per chunk (125-333 chunks/sec, 25-40x faster)

Stage 3: Entity Extraction

What it does: Identifies and extracts named entities (people, places, concepts, events) and their relationships from each chunk.

Why: Entities are the nodes of our knowledge graph. Without them, we’d just have disconnected chunks of text.

Dynamic Pipeline Configuration

GraphRAG-rs now adapts Stage 3 based on your TOML configuration. The system automatically chooses the optimal extraction method:

# Configuration controls the pipeline behavior
[entity_extraction]
use_gleaning = true           # ← If TRUE: LLM-based extraction
                              #    If FALSE: Pattern-based extraction
max_gleaning_rounds = 4       # ← Number of refinement passes

[ollama]
enabled = true                # ← Must be TRUE for LLM extraction
chat_model = "llama3.1:8b"    # ← LLM model for extraction

The pipeline dynamically selects:

Config SettingPipeline BehaviorPerformanceQuality
use_gleaning = falsePattern-Based (regex + capitalization)<10ms/chunk★★★ Good
use_gleaning = true + ollama.enabled = trueLLM-Based (gleaning with Ollama)200-500ms/chunk★★★★★ Excellent
use_gleaning = true + ollama.enabled = false❌ Error-N/A

Logged Output:

[INFO] Using LLM-based entity extraction with gleaning (max_rounds: 4)
  ✓ Ollama client initialized
  ✓ Model: llama3.1:8b
  ✓ Entity types: PERSON, CONCEPT, ARGUMENT, LOCATION, ...

or

[INFO] Using pattern-based entity extraction
  ✓ Fast regex-based extraction
  ✓ No LLM required

Entity Types

GraphRAG recognizes these entity categories (fully customizable via config):

PERSON    → "Tom Sawyer", "Huckleberry Finn", "Aunt Polly"
LOCATION  → "Mississippi River", "St. Petersburg", "McDougal's Cave"
CONCEPT   → "treasure hunting", "freedom", "childhood innocence"
EVENT     → "witnessing the murder", "finding the treasure", "trial scene"

Customize via TOML:

[pipeline.entity_extraction]
entity_types = [
    "PERSON",                 # Your custom types!
    "CONCEPT",
    "ARGUMENT",
    "MYTHOLOGICAL_REFERENCE"  # ← Philosophical texts
]

Extraction Methods (Config-Driven)

A. Pattern-Based (Fast, Deterministic)

#![allow(unused)]
fn main() {
// Enabled when: use_gleaning = false
// src/entity/mod.rs - Regex + capitalization
Keywords: ["Tom Sawyer", "Huck", "treasure", "cave"]
Performance: <10ms per chunk
Found: 189 entities in Symposium, 429 in Tom Sawyer
}

B. LLM-Based Gleaning (Accurate, Contextual)

#![allow(unused)]
fn main() {
// Enabled when: use_gleaning = true && ollama.enabled = true
// src/entity/gleaning_extractor.rs - Uses Ollama llama3.1:8b
Prompt: "Extract entities of types: PERSON, CONCEPT, ARGUMENT...
         from this text. Return JSON..."

Input: "Tom and Huck found the treasure under the cross..."

LLM Output (Round 1):
[
  {"name": "Tom Sawyer", "type": "PERSON", "confidence": 0.95},
  {"name": "Huckleberry Finn", "type": "PERSON", "confidence": 0.93},
  {"name": "treasure", "type": "CONCEPT", "confidence": 0.88},
  {"name": "cross marker", "type": "LOCATION", "confidence": 0.85}
]

Performance: 200-500ms per chunk
}

Gleaning (Multi-Pass LLM Refinement)

Gleaning is an iterative process controlled by max_gleaning_rounds:

Configuration: max_gleaning_rounds = 4

Round 1: Extract obvious entities     → Found 100 entities
Round 2: "Did you miss any entities?" → Found 15 more entities
Round 3: "Any relationships?"          → Found 8 relationships
Round 4: "Final check for concepts"   → Found 2 subtle concepts
Total: 125 entities, 8 relationships

[INFO] ✅ Extraction complete after 4 rounds
[INFO] Final gleaning results: 125 entities, 8 relationships

Module: src/entity/gleaning_extractor.rs Performance:

  • Pattern-based: <10ms per chunk
  • LLM-based gleaning: 200-500ms per chunk × max_gleaning_rounds
    • 1 round: ~300ms
    • 4 rounds: ~1200ms

Configuration Examples

Example 1: Fast Pattern-Based (No LLM)

[entity_extraction]
enabled = true
min_confidence = 0.7
use_gleaning = false          # ← Pattern-based extraction

[ollama]
enabled = false               # ← No LLM needed

Result: <10ms per chunk, good quality, no API/GPU required

Example 2: High-Quality LLM-Based

[entity_extraction]
enabled = true
min_confidence = 0.6          # ← Lower for philosophical nuance
use_gleaning = true           # ← LLM-based extraction
max_gleaning_rounds = 4       # ← 4 refinement passes

[ollama]
enabled = true
chat_model = "llama3.1:8b"    # ← LLM for extraction

Result: 200-500ms per chunk, excellent quality, custom entity types

Real Output Example

{
  "entity_id": "ent_tom_sawyer_001",
  "name": "Tom Sawyer",
  "type": "PERSON",
  "chunk_ids": ["chunk_001", "chunk_015", "chunk_234"],
  "confidence": 0.95,
  "description": "Main protagonist, adventurous boy",
  "extraction_method": "gleaning_llm",  // ← Indicates LLM extraction
  "gleaning_round": 1                   // ← Found in first pass
}

Stage 4: Knowledge Graph Construction

What it does: Connects extracted entities into a unified, queryable graph structure with typed relationships.

Why: A graph reveals how entities relate, not just that they co-occur. This enables multi-hop reasoning and contextual understanding.

Graph Structure

graph LR
    TomSawyer[Tom Sawyer<br/>PERSON]
    Huck[Huckleberry Finn<br/>PERSON]
    Treasure[Treasure<br/>CONCEPT]
    Cave[McDougal's Cave<br/>LOCATION]
    InjunJoe[Injun Joe<br/>PERSON]

    TomSawyer -->|FRIEND_OF| Huck
    TomSawyer -->|FOUND| Treasure
    Treasure -->|LOCATED_IN| Cave
    InjunJoe -->|GUARDS| Treasure
    TomSawyer -->|WITNESSED_MURDER_BY| InjunJoe
    Huck -->|HELPED_FIND| Treasure

    style TomSawyer fill:#e3f2fd
    style Huck fill:#e3f2fd
    style Treasure fill:#fff9c4
    style Cave fill:#e8f5e9
    style InjunJoe fill:#fce4ec

Graph Components

Nodes (Entities):

#![allow(unused)]
fn main() {
pub struct Entity {
    pub id: EntityId,
    pub name: String,
    pub entity_type: String,
    pub description: String,
    pub chunk_references: Vec<ChunkId>,
}
}

Edges (Relationships):

#![allow(unused)]
fn main() {
pub struct Relationship {
    pub source: EntityId,
    pub target: EntityId,
    pub relation_type: String,  // "FRIEND_OF", "FOUND", etc.
    pub confidence: f32,
}
}

Advanced Features

A. Incremental Updates (Zero-Downtime)

#![allow(unused)]
fn main() {
// src/graph/incremental.rs
graph.add_document("Tom Sawyer");   // 429 entities added
graph.add_document("Symposium");    // 189 entities added
// Automatically merges 58 duplicate entities!
}

B. PageRank Scoring (Fast-GraphRAG)

#![allow(unused)]
fn main() {
// src/graph/pagerank.rs
let scores = pagerank.compute_personalized(
    seed_entities: ["Tom Sawyer", "Huck Finn"],
    max_iterations: 20
);
// Ranks entities by importance: 27x faster retrieval!
}

C. Community Detection (Hierarchical Clustering)

Community 1: Tom Sawyer storyline (347 entities)
  ├─ Subgraph: Treasure hunting (45 entities)
  ├─ Subgraph: School adventures (89 entities)
  └─ Subgraph: Courtroom drama (23 entities)

Community 2: Philosophical concepts (189 entities)
  └─ From Symposium document

Module: src/graph/mod.rs, src/graph/incremental.rs Performance:

  • Graph construction: ~50ms for 500 entities
  • PageRank: ~20ms (cached, 27x speedup vs traditional)

Stage 5: Dual-Level Retrieval (LightRAG)

What it does: Searches the knowledge graph at two levels simultaneously - specific entities (low-level) and broad concepts (high-level).

Why: Traditional RAG searches only chunks. LightRAG searches entities AND their community context, achieving 6000x token reduction.

The Dual-Level Approach

Query: "What did Tom and Huck find in the cave?"

LOW-LEVEL RETRIEVAL (Specific):
  → Search entities: "Tom Sawyer", "Huck Finn", "cave"
  → Results: 12 entity matches

HIGH-LEVEL RETRIEVAL (Contextual):
  → Search communities: "treasure hunting" storyline
  → Results: 45 related entities in same narrative arc

FUSION:
  → Combine both levels with Reciprocal Rank Fusion (RRF)
  → Final results: Top 10 most relevant entities

Retrieval Strategies

GraphRAG-rs implements 4 complementary strategies:

StrategyWhat It DoesWhen to UseModule
Vector SimilaritySemantic embedding search“What is X about?”src/retrieval/mod.rs
BM25 KeywordTerm-frequency searchExact name/phrase lookupsrc/retrieval/bm25.rs
Graph TraversalFollow entity relationships“How are X and Y related?”src/graph/pagerank.rs
Hybrid FusionCombines all 3 aboveGeneral queriessrc/retrieval/hybrid.rs

Reciprocal Rank Fusion (RRF)

Formula:

RRF_score(entity) = Σ (1 / (k + rank_in_strategy))
                    for each strategy

Example:

Entity: "Tom Sawyer"
  Vector search rank: 2  → score = 1/(60+2) = 0.0161
  BM25 rank: 1          → score = 1/(60+1) = 0.0164
  PageRank rank: 3      → score = 1/(60+3) = 0.0159

  Total RRF = 0.0484 (ranked #1 overall!)

Module: src/lightrag/dual_retrieval.rs Performance:

  • Low-level retrieval: ~20ms
  • High-level retrieval: ~30ms
  • Fusion: ~10ms
  • Total: ~60ms (vs 2-5 seconds traditional GraphRAG)

Stage 6: Query Processing

What it does: Analyzes the user’s question to determine intent, entities, and optimal search strategy.

Why: “What is love?” requires different processing than “When did Tom find the treasure?” - query understanding guides retrieval.

Query Analysis Components

A. Intent Classification

#![allow(unused)]
fn main() {
// src/query/advanced_pipeline.rs
pub enum QueryIntent {
    Factual,     // "What is X?"
    Relational,  // "How is X related to Y?"
    Temporal,    // "When did X happen?"
    Causal,      // "Why did X happen?"
    Comparative, // "Compare X and Y"
    Exploratory, // "Tell me about X"
}
}

B. Entity Extraction from Query

Query: "How did Tom and Huck find the treasure in McDougal's Cave?"

Extracted Entities:
  - "Tom" (PERSON)
  - "Huck" (PERSON)
  - "treasure" (CONCEPT)
  - "McDougal's Cave" (LOCATION)

Intent: Relational + Temporal
Strategy: Graph traversal + vector search hybrid

C. Query Decomposition (ROGRAG)

For complex queries, break into sub-queries:

Complex: "Compare Tom's and Huck's roles in finding the treasure"

Decomposed:
  1. "What role did Tom play in finding the treasure?"
  2. "What role did Huck play in finding the treasure?"
  3. [Synthesis] "Compare the two roles"

Accuracy boost: 60% → 75% (15% improvement!)

Advanced Query Pipeline

#![allow(unused)]
fn main() {
// src/query/advanced_pipeline.rs:165-200
pub async fn execute_query() -> Result<QueryResult> {
    // Step 1: Analyze query
    let analysis = self.analyze_query(query).await?;

    // Step 2: Vector similarity search
    let vector_scores = self.vector_search(query, graph).await?;

    // Step 3: PageRank propagation
    let pagerank_scores = self.pagerank_propagation(&analysis).await?;

    // Step 4: Relationship scoring
    let rel_scores = self.score_relationships(graph).await?;

    // Step 5: Multi-modal fusion
    let combined = self.fuse_scores(vector_scores, pagerank_scores, rel_scores);

    // Step 6: Rank and filter
    let results = self.apply_ranking_policies(combined);

    return results;
}
}

Module: src/query/advanced_pipeline.rs, src/rograg/ Performance:

  • Query analysis: ~50ms
  • Decomposition (if needed): ~100ms

Stage 7: Answer Generation

What it does: Synthesizes retrieved entities, relationships, and chunks into a coherent, natural language answer.

Why: Raw search results are just data. Generation transforms them into human-readable, contextual answers with citations.

Generation Pipeline

Retrieved Context:
  Entity 1: Tom Sawyer (confidence: 0.95)
    - Relevant chunk: "Tom and Huck ventured into the cave..."
  Entity 2: Treasure (confidence: 0.92)
    - Relevant chunk: "They found twelve thousand dollars in gold..."
  Relationship: Tom FOUND Treasure (confidence: 0.88)

↓ LLM Prompt Construction ↓

System Prompt:
  "You are a knowledgeable assistant. Answer based ONLY on provided context."

Context Assembly:
  [Include top 5 chunks with source attribution]
  [Include entity descriptions]
  [Include relationship graph snippet]

User Question:
  "How did Tom and Huck find the treasure?"

↓ LLM Generation (Ollama llama3.1:8b) ↓

Generated Answer:
  "Tom Sawyer and Huckleberry Finn discovered the treasure in McDougal's Cave
   after witnessing Injun Joe hide it there. They found approximately $12,000
   in gold coins under a cross marked on a rock. This discovery came after Tom
   got lost in the cave with Becky Thatcher and noticed the hiding spot while
   trying to find an exit.

   Sources: Chapter 33 (cave discovery), Chapter 35 (counting the treasure)"

LLM Backend Options

BackendThroughputUse CaseModule
Ollama (llama3.1:8b)~15-30 tok/sProduction serversrc/ollama/async_generation.rs
WebLLM (Phi-3)40-62 tok/s (GPU)WASM browsergraphrag-wasm/src/webllm.rs
Mock LLMInstantTesting, demossrc/generation/async_mock_llm.rs

Caching (6x Cost Reduction)

#![allow(unused)]
fn main() {
// src/caching/cached_client.rs
let cache_key = generate_semantic_key(prompt);

if let Some(cached) = cache.get(&cache_key) {
    return cached;  // 80%+ hit rate in production!
}

let response = llm.generate(prompt).await?;
cache.put(cache_key, response.clone());
return response;
}

Cache Performance:

  • Hit rate: 80%+ (typical workload)
  • Cost reduction: 6x
  • Latency reduction: 50-100ms → 5ms (16-20x faster)

Module: src/generation/mod.rs, src/caching/ Performance:

  • Generation: 1-3 seconds (depending on answer length)
  • Cached: ~5ms

Complete Pipeline Performance

Real Benchmark: Tom Sawyer (434KB)

StageTimeMemoryOutput
1. Chunking0.01s+0.2 MB492 chunks
2. Embeddings0.08s+1.2 MB492 vectors (384-dim)
3. Entity Extraction0.05s+0.3 MB429 entities
4. Graph Construction0.05s+0.2 MB429 nodes, ~800 edges
5. Dual Retrieval0.06s+0.1 MBTop 10 results
6. Query Processing0.05s-Query plan
7. Answer Generation1.2s-Final answer
TOTAL1.5s2.0 MB✅ Complete

Source: examples/multi_document_pipeline.rs - production benchmarks

Scalability

DocumentsTotal TimeMemoryEntities
1 (Tom Sawyer)0.21s1.8 MB429
2 (+ Symposium)0.33s2.5 MB618
10 (estimated)~2s~15 MB~3000
100 (estimated)~20s~150 MB~30K

With PageRank + LightRAG optimizations:

  • 27x faster retrieval
  • 6000x fewer tokens processed
  • 6x cost reduction (caching)

Alternative Techniques for Each Stage

GraphRAG-rs is highly modular with pluggable implementations for each pipeline stage. Choose the best technique based on your requirements using the core::traits abstraction layer.

Architecture: Trait-Based Plugin System

#![allow(unused)]
fn main() {
// src/core/traits.rs - Core abstraction layer
pub trait Embedder { ... }            // Stage 2: Embeddings
pub trait EntityExtractor { ... }     // Stage 3: Entity Extraction
pub trait VectorStore { ... }         // Stage 5: Vector Search
pub trait Retriever { ... }           // Stage 5: Retrieval
pub trait LanguageModel { ... }       // Stage 7: Generation
pub trait GraphStore { ... }          // Stage 4: Graph Storage
}

Stage 1: Text Chunking - 3 Strategies

StrategyAlgorithmUse CaseModule
HierarchicalRecursiveCharacterTextSplitterRecommended - preserves semantic boundariessrc/text/chunking.rs
Fixed-SizeSimple character-basedFast, predictable chunkssrc/text/mod.rs
SemanticSentence-aware splittingAcademic papers, legal documentssrc/text/mod.rs

Hierarchical Separator Precedence:

#![allow(unused)]
fn main() {
[
    "\n\n",   // Paragraph breaks (priority 1)
    "\n",     // Line breaks
    ". ",     // Sentence endings
    "! ",     // Exclamations
    "? ",     // Questions
    "; ",     // Semicolons
    " ",      // Word boundaries
    "",       // Character fallback
]
}

Configuration:

[pipeline]
chunk_size = 800        # Characters per chunk
chunk_overlap = 200     # Overlap for context preservation
min_chunk_size = 50     # Skip tiny chunks

Stage 2: Embeddings - 11 Providers

GraphRAG Core now supports 11 embedding backends via unified configuration:

Free/Local Providers

ProviderPerformanceQualityGPUPlatformModule
HuggingFace HubFirst: ~2s
Cached: 50-100ms
★★★★❌ CPUAllgraphrag-core/src/embeddings/huggingface.rs
Ollama (nomic-embed-text)100-200ms★★★★★✅ CUDA/MetalServersrc/ollama/embeddings.rs
ONNX Runtime Web3-8ms (GPU)★★★★✅ WebGPUWASMgraphrag-wasm/src/onnx_embedder.rs
Hash-based (TF-IDF)<1ms★★★❌ CPU-onlyTestingsrc/embeddings/hash_embedder.rs

API Providers (Production)

ProviderCost/1M tokensQualityBest ForModule
OpenAI$0.13★★★★★Best qualitygraphrag-core/src/embeddings/api_providers.rs
Voyage AIMedium★★★★★Domain-specific (code, finance, law)graphrag-core/src/embeddings/api_providers.rs
Cohere$0.10★★★★Multilingual (100+ langs)graphrag-core/src/embeddings/api_providers.rs
Jina AI$0.02★★★★Cost-optimizedgraphrag-core/src/embeddings/api_providers.rs
Mistral AI$0.10★★★★RAG-optimizedgraphrag-core/src/embeddings/api_providers.rs
Together AI$0.008★★★★Cheapestgraphrag-core/src/embeddings/api_providers.rs

Planned

ProviderStatusNotes
CandlePlanned100% Rust, CPU-only
Burn + wgpu70%GPU acceleration, 100% Rust

Models Available:

HuggingFace Hub (100+ models):

sentence-transformers/all-MiniLM-L6-v2    → 384 dim (default, recommended)
sentence-transformers/all-mpnet-base-v2   → 768 dim (balanced)
BAAI/bge-large-en-v1.5                    → 1024 dim (best quality)
intfloat/e5-small-v2                      → 384 dim (E5 family)
paraphrase-multilingual-MiniLM-L12-v2     → 384 dim (50+ languages)

API Providers:

OpenAI:     text-embedding-3-small (1536), text-embedding-3-large (3072)
Voyage:     voyage-3-large (1024), voyage-code-3 (1024), voyage-finance-2, voyage-law-2
Cohere:     embed-english-v3.0 (1024), embed-multilingual-v3.0 (1024)
Jina:       jina-embeddings-v3 (1024), jina-embeddings-v4 (multimodal)
Mistral:    mistral-embed (1024), codestral-embed (code)
Together:   BAAI/bge-large-en-v1.5 (1024), BAAI/bge-base-en-v1.5 (768)
Ollama:     nomic-embed-text (768)

Trait Implementation:

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
pub trait EmbeddingProvider: Send + Sync {
    /// Initialize the embedding provider (e.g., download models)
    async fn initialize(&mut self) -> Result<()>;

    /// Generate embedding for single text
    async fn embed(&self, text: &str) -> Result<Vec<f32>>;

    /// Generate embeddings for multiple texts (batch processing)
    async fn embed_batch(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>;

    /// Get the embedding dimension
    fn dimensions(&self) -> usize;

    /// Check if the provider is available and ready
    fn is_available(&self) -> bool;

    /// Get the provider name
    fn provider_name(&self) -> &str;
}
}

Configuration:

[embeddings]
backend = "huggingface"           # Free, offline (default)
# backend = "openai"              # Best quality ($0.13/1M)
# backend = "voyage"              # Anthropic recommended
# backend = "cohere"              # Multilingual
# backend = "jina"                # Cost-optimized ($0.02/1M)
# backend = "mistral"             # RAG-optimized
# backend = "together"            # Cheapest ($0.008/1M)
# backend = "ollama"              # Local GPU

model = "sentence-transformers/all-MiniLM-L6-v2"
dimension = 384
batch_size = 32
cache_dir = "~/.cache/huggingface"  # For HuggingFace
# api_key = "..."  # For API providers (or set env vars)

# Environment variables (recommended for API keys):
# OPENAI_API_KEY, VOYAGE_API_KEY, COHERE_API_KEY, JINA_API_KEY, MISTRAL_API_KEY, TOGETHER_API_KEY

See: config/JSON5_CONFIG_GUIDE.md for the complete configuration reference.


Stage 3: Entity Extraction - Config-Driven Selection

The system automatically chooses the extraction method based on your configuration:

MethodAccuracySpeedEnabled WhenModule
LLM Gleaning (Multi-Pass)★★★★★200-500msuse_gleaning = true + ollama.enabled = truesrc/entity/gleaning_extractor.rs
Pattern-Based (Keywords)★★★<10msuse_gleaning = falsesrc/entity/mod.rs
NER Hybrid★★★★50-100msFuturesrc/entity/mod.rs
Semantic Merging★★★★Mediumsemantic_merging = truesrc/entity/semantic_merging.rs

Entity Types (Fully Customizable):

# Configure your own entity types!
[pipeline.entity_extraction]
entity_types = [
    "PERSON",                 # "Tom Sawyer", "Socrates"
    "LOCATION",               # "Mississippi River", "Athens"
    "CONCEPT",                # "treasure hunting", "Eros"
    "EVENT",                  # "murder witness", "symposium"
    "ARGUMENT",               # Philosophical arguments
    "MYTHOLOGICAL_REFERENCE"  # Gods, myths
]

Gleaning Process (LLM-Based, Config-Controlled):

[entity_extraction]
use_gleaning = true           # ← Enable LLM extraction
max_gleaning_rounds = 4       # ← Number of refinement passes

[ollama]
enabled = true
chat_model = "llama3.1:8b"    # ← LLM for extraction

Runtime Behavior:

Round 1: Extract obvious entities      → 100 entities
Round 2: "Did you miss any entities?"  → +15 entities
Round 3: "Find relationships"          → 8 relationships
Round 4: "Final check for concepts"    → 2 subtle concepts
Total: 125 entities, 8 relationships

[INFO] ✅ Extraction complete after 4 rounds

Trait Implementation:

#![allow(unused)]
fn main() {
pub trait EntityExtractor {
    fn extract(&self, text: &str) -> Result<Vec<Entity>>;
    fn extract_with_confidence(&self, text: &str) -> Result<Vec<(Entity, f32)>>;
    fn set_confidence_threshold(&mut self, threshold: f32);
}

#[async_trait]
pub trait AsyncEntityExtractor {
    async fn extract(&self, text: &str) -> Result<Vec<Entity>>;
    async fn extract_batch(&self, texts: &[&str]) -> Result<Vec<Vec<Entity>>>;
    async fn extract_batch_concurrent(&self, texts: &[&str], max_concurrent: usize);
}
}

Configuration (Controls Behavior):

[entity_extraction]
enabled = true
min_confidence = 0.6          # ← Minimum confidence threshold
use_gleaning = true           # ← Pattern-based (false) vs LLM-based (true)
max_gleaning_rounds = 4       # ← Number of LLM refinement passes
semantic_merging = true       # ← Deduplicate similar entities
automatic_linking = true      # ← Auto-link related entities

[pipeline.entity_extraction]
entity_types = ["PERSON", "CONCEPT", ...]  # ← Custom types
confidence_threshold = 0.7

[ollama]
enabled = true                # ← Required for LLM-based extraction
chat_model = "llama3.1:8b"    # ← LLM model

The pipeline reads this config at startup and selects the appropriate implementation automatically.


Stage 4: Graph Construction - 3 Storage Backends

BackendScaleFeaturesPlatformModule
In-Memory (Default)<100K entitiesFast, incremental updatesAllsrc/graph/incremental.rs
Qdrant>1M entitiesProduction vector DB, JSON payloadServersrc/storage/qdrant.rs
Neo4j (planned)>100K entitiesComplex graph queries, CypherServerFuture
LanceDB (70% complete)>500K entitiesServerless, embeddedDesktopsrc/storage/lancedb.rs

Graph Features:

FeatureImplementationStatusModule
Incremental UpdatesZero-downtime ACID-like✅ Completesrc/graph/incremental.rs
PageRankPersonalized importance scoring✅ Completesrc/graph/pagerank.rs
Community DetectionLeiden algorithm clustering✅ Completesrc/graph/mod.rs
Semantic DeduplicationEntity merging (58 duplicates)✅ Completesrc/entity/semantic_merging.rs

Trait Implementation:

#![allow(unused)]
fn main() {
pub trait GraphStore {
    fn add_node(&mut self, node: Node) -> Result<String>;
    fn add_edge(&mut self, from: &str, to: &str, edge: Edge) -> Result<String>;
    fn find_nodes(&self, criteria: &str) -> Result<Vec<Node>>;
    fn get_neighbors(&self, node_id: &str) -> Result<Vec<Node>>;
    fn traverse(&self, start_id: &str, max_depth: usize) -> Result<Vec<Node>>;
}
}

Configuration:

[graph]
backend = "in-memory"                  # or "qdrant", "neo4j"
enable_incremental = true
enable_pagerank = true
enable_community_detection = true
deduplication_threshold = 0.85

Stage 5: Retrieval - 5 Strategies

StrategyAlgorithmStrengthsModule
Vector SimilarityCosine similarity on embeddingsSemantic understandingsrc/retrieval/mod.rs
BM25 KeywordTF-IDF term matchingExact phrases, namessrc/retrieval/bm25.rs
PageRankGraph importance propagationEntity relevance (27x faster)src/retrieval/pagerank_retrieval.rs
Hybrid (RRF)Reciprocal Rank FusionRecommended - combines allsrc/retrieval/hybrid.rs
AdaptiveStrategy auto-selectionContext-aware switchingsrc/retrieval/adaptive.rs

LightRAG Dual-Level (6000x token reduction):

Query: "What did Tom find in the cave?"

LOW-LEVEL:  Search specific entities (Tom, cave, treasure)
            → 12 entity matches

HIGH-LEVEL: Search community context (treasure hunting storyline)
            → 45 related entities in narrative arc

FUSION:     RRF combines both levels
            → Top 10 most relevant results

Reciprocal Rank Fusion Formula:

#![allow(unused)]
fn main() {
RRF_score(entity) = Σ (1 / (k + rank_i))
where k = 60 (constant), rank_i = rank in strategy i
}

Trait Implementation:

#![allow(unused)]
fn main() {
pub trait Retriever {
    fn search(&self, query: Query, k: usize) -> Result<Vec<SearchResult>>;
    fn search_with_context(&self, query: Query, context: &str, k: usize);
}

#[async_trait]
pub trait AsyncRetriever {
    async fn search(&self, query: Query, k: usize) -> Result<Vec<SearchResult>>;
    async fn search_batch(&self, queries: Vec<Query>, k: usize);
}
}

Configuration:

[retrieval]
strategy = "hybrid"                    # or "vector", "bm25", "pagerank", "adaptive"
k = 10                                 # Top-k results
enable_lightrag = true                 # Dual-level retrieval
fusion_weights = { vector = 0.4, bm25 = 0.3, pagerank = 0.3 }

Stage 6: Query Processing - 3 Analyzers

AnalyzerCapabilitiesModule
BasicIntent classification (Factual/Relational/Temporal)src/query/mod.rs
AdvancedMulti-modal scoring + Entity extractionsrc/query/advanced_pipeline.rs
ROGRAGQuery decomposition + Logic formssrc/rograg/logic_form.rs

Query Intent Types:

#![allow(unused)]
fn main() {
pub enum QueryIntent {
    Factual,     // "What is X?"
    Relational,  // "How is X related to Y?"
    Temporal,    // "When did X happen?"
    Causal,      // "Why did X happen?"
    Comparative, // "Compare X and Y"
    Exploratory, // "Tell me about X"
}
}

ROGRAG Decomposition:

Complex: "Compare Tom's and Huck's roles in finding the treasure"

Decomposed:
  1. "What role did Tom play in finding the treasure?"
  2. "What role did Huck play in finding the treasure?"
  3. [Synthesis] "Compare the two roles"

Accuracy: 60% → 75% (+15% boost!)

Configuration:

[query_processing]
analyzer = "advanced"                  # or "basic", "rograg"
enable_decomposition = true
max_sub_queries = 5
confidence_threshold = 0.6

Stage 7: Answer Generation - 4 LLM Backends

BackendThroughputQualityPlatformModule
Ollama (llama3.1:8b)15-30 tok/s★★★★★Serversrc/ollama/async_generation.rs
WebLLM (Phi-3)40-62 tok/s (GPU)★★★★WASMgraphrag-wasm/src/webllm.rs
MockLLMInstant★★Testingsrc/generation/async_mock_llm.rs
OpenAI-Compatible APIVaries★★★★★ServerFuture

Caching Layer (6x cost reduction):

#![allow(unused)]
fn main() {
// src/caching/cached_client.rs
let cache_key = generate_semantic_key(prompt);
if let Some(cached) = cache.get(&cache_key) {
    return cached;  // 80%+ hit rate!
}
let response = llm.generate(prompt).await?;
cache.put(cache_key, response.clone());
}

Trait Implementation:

#![allow(unused)]
fn main() {
pub trait LanguageModel {
    fn complete(&self, prompt: &str) -> Result<String>;
    fn complete_with_params(&self, prompt: &str, params: GenerationParams);
    fn is_available(&self) -> bool;
}

#[async_trait]
pub trait AsyncLanguageModel {
    async fn complete(&self, prompt: &str) -> Result<String>;
    async fn complete_batch(&self, prompts: &[&str]) -> Result<Vec<String>>;
    async fn complete_streaming(&self, prompt: &str) -> Stream<String>;
}
}

Configuration:

[generation]
backend = "ollama"                     # or "webllm", "mock"
model = "llama3.1:8b"
temperature = 0.7
max_tokens = 1000
enable_caching = true
cache_ttl_seconds = 3600

Configuration Matrix: Choose Your Stack

Use Case: Production Server

[pipeline]
chunk_size = 800
chunk_overlap = 200

[embeddings]
provider = "ollama"
model = "nomic-embed-text"
device = "cuda"

[entity_extraction]
method = "gleaning"
llm_model = "llama3.1:8b"

[graph]
backend = "qdrant"
enable_pagerank = true

[retrieval]
strategy = "hybrid"
enable_lightrag = true

[generation]
backend = "ollama"
model = "llama3.1:8b"
enable_caching = true

Use Case: WASM Browser (Privacy-First)

[embeddings]
provider = "onnx_web"
model = "all-MiniLM-L6-v2"
device = "webgpu"

[entity_extraction]
method = "pattern"                     # No LLM required

[graph]
backend = "in-memory"
enable_pagerank = true

[retrieval]
strategy = "hybrid"
enable_lightrag = true

[generation]
backend = "webllm"
model = "Phi-3-mini"

Use Case: Testing/Development

[embeddings]
provider = "hash"                      # <1ms, deterministic

[entity_extraction]
method = "pattern"

[graph]
backend = "in-memory"

[retrieval]
strategy = "vector"

[generation]
backend = "mock"                       # Instant responses

Module Reference:

  • Core Traits: src/core/traits.rs (lines 1-1291) - All pluggable abstractions
  • Hybrid Embedder: src/embeddings/hybrid.rs - Auto-fallback system
  • Retrieval Strategies: src/retrieval/ - 5 retrieval implementations
  • Configuration: src/config/toml_config.rs - TOML-based setup

How to Customize Parameters and Tools

GraphRAG-rs offers 3 progressive levels of customization - from simple TOML files to programmatic trait implementations.

Level 1: TOML Configuration Files (Easiest)

Modify 60+ parameters without touching code using TOML configuration.

Where to Write Alternative Settings?

✅ Option 1: Use Pre-Built Templates (Copy & Modify)

# 1. Copy a template that matches your use case
cp config/templates/narrative_fiction.toml my_config.toml

# 2. Edit the file to change settings
nano my_config.toml

# 3. Run GraphRAG with your config
cargo run --bin simple_cli my_config.toml "Your question"

✅ Option 2: Create Your Own Config File

# 1. Create a new .toml file anywhere
touch my_custom_config.toml

# 2. Add your settings (see examples below)
nano my_custom_config.toml

# 3. Use it
cargo run --bin simple_cli my_custom_config.toml

✅ Option 3: Edit Existing Examples

# Modify the example configs
nano docs-example/symposium_config.toml
nano docs-example/config_tom_sawyer_complete.toml

How TOML Configuration Works

TOML files specify alternative implementations like this:

# Example: my_config.toml

# Stage 2: Choose embedding provider
[embeddings]
provider = "ollama"          # Alternative: "neural", "hybrid", "hash"
model = "nomic-embed-text"   # Alternative: "all-MiniLM-L6-v2"
device = "cuda"              # Alternative: "cpu", "auto"

# Stage 3: Choose entity extraction method
[pipeline.entity_extraction]
model_name = "llama3.1:8b"   # Uses LLM for extraction
temperature = 0.1            # Alternative: 0.7 for creative
entity_types = ["PERSON", "LOCATION", "CONCEPT"]  # Customize types!

# Stage 5: Choose retrieval strategy
[retrieval]
strategy = "hybrid"          # Alternative: "vector", "bm25", "pagerank", "adaptive"
enable_lightrag = true       # Alternative: false (standard retrieval)

# Stage 7: Choose LLM backend
[generation]
backend = "ollama"           # Alternative: "webllm", "mock"
model = "llama3.1:8b"        # Alternative: any Ollama model
enable_caching = true        # Alternative: false (no cache)

The system automatically uses your settings! No code changes needed.

Located in config/templates/, optimized for different document types:

TemplateOptimized ForChunk SizeKey Settings
narrative_fiction.tomlBooks, novels, stories800 charsHigh overlap (300), character-focused
academic_research.tomlPapers, studies, theses1024 charsSemantic chunking, citation extraction
technical_documentation.tomlManuals, API docs512 charsCode-aware, hierarchical entities
legal_documents.tomlContracts, laws512 charsLow temperature (0.1), precision mode
web_blog_content.tomlArticles, blogs600 charsFast processing, keyword extraction
dynamic_universal.tomlGeneral purposeAdaptiveAuto-detects optimal settings

Example: Customize for Your Document Type

# 1. Copy a template
cp config/templates/narrative_fiction.toml my_config.toml

# 2. Edit parameters (see full list below)
nano my_config.toml

# 3. Use your config
cargo run --bin simple_cli my_config.toml "Your question"

Complete TOML Configuration Reference

A. General Settings

[general]
input_document_path = "path/to/document.txt"  # Your document
output_dir = "./output/my_project"            # Results directory
log_level = "info"                            # error|warn|info|debug|trace
max_threads = 4                               # 0 = auto-detect CPU cores
enable_profiling = true                       # Performance metrics

B. Pipeline Workflows

[pipeline]
workflows = [
    "extract_text",        # Stage 1: Chunking
    "extract_entities",    # Stage 3: Entity extraction
    "build_graph",         # Stage 4: Graph construction
    "detect_communities"   # Stage 4: Community detection
]
parallel_execution = true  # Enable concurrent processing

C. Stage 1: Text Chunking

[pipeline.text_extraction]
chunk_size = 800              # Characters per chunk
chunk_overlap = 300           # Overlap for context (typically 25-50% of chunk_size)
min_chunk_size = 200          # Skip chunks smaller than this
clean_control_chars = true    # Remove \r, \t, etc.
normalize_whitespace = true   # Collapse multiple spaces

# Optional text cleaning
[pipeline.text_extraction.cleaning]
remove_urls = false           # Strip http:// links
remove_emails = false         # Strip email addresses
remove_special_chars = false  # Keep punctuation by default

D. Stage 2: Embeddings

[embeddings]
provider = "ollama"           # Options: ollama, neural, hybrid, hash
model = "nomic-embed-text"    # Model name (depends on provider)
dimension = 768               # Embedding vector size
batch_size = 32               # Embeddings per batch
device = "cuda"               # Options: cuda, cpu, auto
cache_size = 10000            # Number of cached embeddings

# Ollama-specific settings
[ollama]
base_url = "http://localhost:11434"
embedding_model = "nomic-embed-text"
generation_model = "llama3.1:8b"
timeout_seconds = 300

E. Stage 3: Entity Extraction

[pipeline.entity_extraction]
model_name = "llama3.1:8b"    # LLM for extraction
temperature = 0.1             # Lower = more deterministic (0.0-1.0)
max_tokens = 1500             # Maximum response length
confidence_threshold = 0.6    # Minimum confidence to keep entity

# Entity types to extract (fully customizable!)
entity_types = [
    "PERSON",                 # People, characters
    "LOCATION",               # Places, settings
    "CONCEPT",                # Abstract ideas, themes
    "EVENT",                  # Actions, occurrences
    "ORGANIZATION",           # Groups, institutions
    "OBJECT",                 # Physical items
    "EMOTION",                # Feelings, states
    "THEME"                   # Overarching topics
]

# Advanced: Entity filtering
[pipeline.entity_extraction.filters]
min_entity_length = 2         # Minimum characters
max_entity_length = 100       # Maximum characters
allowed_patterns = [          # Regex patterns to allow
    "^[A-Z][a-zA-Z\\s'-]+$"   # Capitalized words
]
excluded_patterns = [         # Regex patterns to exclude
    "^(the|and|but)$",        # Common stop words
    "^\\d+$"                  # Pure numbers
]

# Gleaning (multi-pass extraction)
[entity_extraction]
use_gleaning = true           # Enable iterative extraction
max_gleaning_rounds = 4       # Number of refinement passes
gleaning_improvement_threshold = 0.08  # Min improvement to continue

F. Stage 4: Graph Construction

[pipeline.graph_building]
relation_scorer = "cosine_similarity"  # or "jaccard", "levenshtein"
min_relation_score = 0.4      # Minimum similarity to create edge
max_connections_per_node = 25 # Limit edges per entity
bidirectional_relations = true # A→B implies B→A
character_centrality_boost = 1.5  # Boost importance of main entities

# Community detection
[pipeline.community_detection]
algorithm = "leiden"          # Options: leiden, louvain
resolution = 0.6              # Lower = tighter communities
min_community_size = 2        # Minimum entities per community
max_community_size = 15       # Maximum entities per community

# Semantic merging (entity deduplication)
[entity_extraction]
semantic_merging = true
merge_similarity_threshold = 0.85  # How similar to merge (0.0-1.0)
automatic_linking = true
linking_confidence_threshold = 0.7

G. Stage 5: Retrieval

[retrieval]
strategy = "hybrid"           # Options: vector, bm25, pagerank, hybrid, adaptive
k = 10                        # Top-k results to return
enable_lightrag = true        # Dual-level retrieval
enable_pagerank = true        # Graph importance scoring

# Hybrid strategy weights (must sum to ~1.0)
[retrieval.fusion_weights]
vector = 0.4                  # Semantic similarity weight
bm25 = 0.3                    # Keyword matching weight
pagerank = 0.3                # Graph importance weight

H. Stage 6: Query Processing

[query_processing]
analyzer = "advanced"         # Options: basic, advanced, rograg
enable_decomposition = true   # Break complex queries into sub-queries
max_sub_queries = 5           # Maximum decomposition depth
confidence_threshold = 0.6    # Minimum confidence for query understanding

I. Stage 7: Answer Generation

[generation]
backend = "ollama"            # Options: ollama, webllm, mock
model = "llama3.1:8b"
temperature = 0.7             # Creativity (0.0-1.0)
max_tokens = 1000             # Maximum answer length
top_p = 0.9                   # Nucleus sampling (0.0-1.0)
enable_caching = true         # Cache LLM responses
cache_ttl_seconds = 3600      # Cache expiration (1 hour)

J. Performance Tuning

[performance]
batch_size = 32               # Items per batch
max_concurrent_requests = 10  # Parallel API calls
embedding_cache_size = 10000  # Cached embeddings
enable_gpu = true             # GPU acceleration
gpu_device = 0                # GPU device ID (0 = first GPU)

K. Experimental Features

[experimental]
enable_rograg = true          # Query decomposition (+15% accuracy)
enable_fast_graphrag = true   # PageRank retrieval (27x faster)
enable_lightrag = true        # Dual-level retrieval (6000x tokens)

Real-World Example: Optimizing for Plato’s Symposium

# config/symposium_optimized.toml
[general]
input_document_path = "Symposium.txt"
output_dir = "./output/symposium"

[pipeline.text_extraction]
chunk_size = 800              # Larger for complete philosophical arguments
chunk_overlap = 300           # High overlap for dialogue continuity

[pipeline.entity_extraction]
temperature = 0.1             # Low for consistent concept extraction
entity_types = [
    "PERSON",                 # Socrates, Phaedrus, etc.
    "CONCEPT",                # Eros, Beauty, Love
    "ARGUMENT",               # Philosophical positions
    "DIALOGUE_SPEAKER",       # Who said what
    "MYTHOLOGICAL_REFERENCE"  # Gods, myths
]
confidence_threshold = 0.6    # Lower for philosophical nuance

[pipeline.graph_building]
min_relation_score = 0.4      # Lower for subtle philosophical connections
max_connections_per_node = 25 # Higher for complex concept networks

[retrieval]
strategy = "hybrid"           # Best for philosophical queries
enable_lightrag = true
fusion_weights = { vector = 0.5, bm25 = 0.2, pagerank = 0.3 }

Results:

  • ✅ Captures 189 philosophical entities (vs 120 with defaults)
  • ✅ Identifies speaker-argument relationships
  • ✅ 85% query accuracy on philosophical questions

Level 2: Runtime API Configuration (Intermediate)

Modify parameters programmatically using the Builder API.

#![allow(unused)]
fn main() {
use graphrag_rs::{GraphRAG, ConfigPreset};

let mut graphrag = GraphRAG::builder()
    // Choose preset as starting point
    .with_preset(ConfigPreset::PerformanceOptimized)

    // Override specific parameters
    .chunk_size(1024)                     // Stage 1
    .chunk_overlap(256)

    .embedding_model("all-mpnet-base-v2") // Stage 2
    .embedding_dimension(768)

    .entity_confidence(0.7)               // Stage 3
    .max_gleaning_rounds(3)

    .enable_pagerank(true)                // Stage 4
    .enable_lightrag(true)                // Stage 5

    .retrieval_strategy("hybrid")         // Stage 5
    .top_k(15)

    .llm_temperature(0.8)                 // Stage 7
    .max_tokens(1500)

    // Auto-detect available tools
    .auto_detect_llm()
    .auto_detect_embedder()

    .build()?;

// Process document
graphrag.add_document("Your text")?;

// Query with custom parameters
let answer = graphrag.ask_with_params(
    "Your question",
    QueryParams {
        max_results: 10,
        min_confidence: 0.7,
        enable_decomposition: true,
    }
)?;
}

Available Builder Methods:

CategoryMethodsDescription
Text Processingchunk_size(), chunk_overlap(), min_chunk_size()Stage 1 chunking
Embeddingsembedding_model(), embedding_dimension(), embedding_provider()Stage 2 vectors
Entity Extractionentity_confidence(), max_gleaning_rounds(), entity_types()Stage 3 NER
Graphenable_pagerank(), enable_incremental(), graph_backend()Stage 4 graph
Retrievalretrieval_strategy(), enable_lightrag(), top_k()Stage 5 search
Queryquery_analyzer(), enable_decomposition()Stage 6 understanding
Generationllm_model(), llm_temperature(), max_tokens(), enable_caching()Stage 7 LLM

Level 3: Custom Trait Implementations (Advanced)

Replace entire pipeline stages with custom implementations.

Example: Custom Embedder

#![allow(unused)]
fn main() {
use graphrag_rs::core::traits::{Embedder, Result};

pub struct MyCustomEmbedder {
    api_key: String,
    model: String,
}

impl Embedder for MyCustomEmbedder {
    type Error = std::io::Error;

    fn embed(&self, text: &str) -> Result<Vec<f32>> {
        // Your custom embedding logic
        // Call external API, use custom model, etc.
        let embedding = my_api_call(text, &self.api_key)?;
        Ok(embedding)
    }

    fn embed_batch(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>> {
        texts.iter()
            .map(|text| self.embed(text))
            .collect()
    }

    fn dimension(&self) -> usize {
        1024  // Your embedding dimension
    }

    fn is_ready(&self) -> bool {
        !self.api_key.is_empty()
    }
}

// Use your custom embedder
let custom_embedder = MyCustomEmbedder {
    api_key: "your-key".to_string(),
    model: "custom-model-v1".to_string(),
};

let graphrag = GraphRAG::builder()
    .with_embedder(Box::new(custom_embedder))
    .build()?;
}

Example: Custom Entity Extractor

#![allow(unused)]
fn main() {
use graphrag_rs::core::traits::{EntityExtractor, Result};
use graphrag_rs::core::Entity;

pub struct MyCustomNER {
    model_path: String,
}

impl EntityExtractor for MyCustomNER {
    type Entity = Entity;
    type Error = std::io::Error;

    fn extract(&self, text: &str) -> Result<Vec<Entity>> {
        // Your custom NER logic
        // Could use spaCy, Flair, custom ML model, etc.
        let entities = my_ner_model(text, &self.model_path)?;
        Ok(entities)
    }

    fn extract_with_confidence(&self, text: &str) -> Result<Vec<(Entity, f32)>> {
        let entities = self.extract(text)?;
        entities.into_iter()
            .map(|e| (e, 0.95))  // Add confidence scores
            .collect()
    }

    fn set_confidence_threshold(&mut self, threshold: f32) {
        // Store threshold for filtering
    }
}
}

Available Traits to Implement

TraitStageWhat You Can Replace
Embedder / AsyncEmbedder2Embedding generation (OpenAI, Cohere, custom)
EntityExtractor / AsyncEntityExtractor3Entity extraction (spaCy, Flair, custom NER)
VectorStore / AsyncVectorStore5Vector search (Pinecone, Weaviate, Milvus)
Retriever / AsyncRetriever5Retrieval strategy (custom ranking, filters)
LanguageModel / AsyncLanguageModel7LLM generation (OpenAI, Anthropic, local)
GraphStore / AsyncGraphStore4Graph storage (Neo4j, ArangoDB, custom)
Storage / AsyncStorageAllPersistence layer (PostgreSQL, MongoDB)

See: src/core/traits.rs (lines 1-1291) for complete trait definitions.


Configuration Validation & Testing

# 1. Validate TOML configuration
cargo run --bin simple_cli my_config.toml --validate

# 2. Dry-run with mock LLM (instant, no API calls)
cargo run --bin simple_cli my_config.toml --dry-run

# 3. Profile performance with your config
cargo run --bin simple_cli my_config.toml --profile

# 4. Compare configurations
cargo run --bin benchmark_configs config1.toml config2.toml

Quick Reference: Key Parameters by Use Case

Use CaseChunk SizeOverlapTemperatureEntity ConfidenceRetrieval
Fiction/Novels800300 (38%)0.70.6hybrid
Academic Papers1024256 (25%)0.10.7vector
Legal Documents512128 (25%)0.10.8bm25
Technical Docs512200 (39%)0.30.7hybrid
Blog Posts600150 (25%)0.50.6adaptive
Philosophical Texts800300 (38%)0.10.6hybrid

Pro Tips:

  1. Start with templates: config/templates/ covers 90% of use cases
  2. Iterate: Run with defaults → profile → adjust → rerun
  3. Document-specific: Longer chunks (800-1024) for narrative, shorter (512) for technical
  4. Temperature: Lower (0.1-0.3) for factual, higher (0.7-0.9) for creative
  5. Confidence threshold: Lower (0.5-0.6) for nuanced texts, higher (0.7-0.8) for precision
  6. Retrieval: hybrid is best general-purpose, bm25 for exact matches, vector for semantic

Module References:

  • TOML Config: src/config/toml_config.rs - All configuration structures
  • Builder API: src/builder.rs - Fluent API for runtime config
  • Core Traits: src/core/traits.rs - Pluggable implementations
  • Templates: config/templates/ - Pre-optimized configurations

Three Deployment Architectures

GraphRAG-rs uniquely supports three distinct deployment modes - choose based on your requirements:

1. Server-Only (Production Ready ✅)

Architecture:

┌─────────────┐
│ Client App  │ (React/Vue/Mobile)
└──────┬──────┘
       │ REST API
┌──────▼────────────────────┐
│  graphrag-server          │
│  ├─ Actix-web REST API    │
│  ├─ Apistos OpenAPI 3.0.3 │
│  ├─ Qdrant Vector DB      │
│  ├─ Ollama Embeddings     │
│  └─ GPU Acceleration      │
└───────────────────────────┘

Best For:

  • Multi-tenant SaaS (>1000 users)
  • Large datasets (>1M documents)
  • GPU-accelerated inference
  • Mobile apps (thin clients)

Tech Stack:

Backend: Rust + Actix-web 4.9 + Apistos (OpenAPI 3.0.3) + Tokio
Vector DB: Qdrant (scales to 100M+ vectors)
Embeddings: Ollama (nomic-embed-text, GPU)
LLM: Ollama (llama3.1:8b, GPU)
Binary Size: 5.2 MB (optimized release)

Performance:

  • Startup: <1s
  • Query: 500ms-2s (end-to-end)
  • Throughput: 20 queries/sec

2. WASM-Only (60% Complete )

Architecture:

┌───────────────────────────┐
│       Browser             │
│  ┌─────────────────────┐  │
│  │ Leptos UI (WASM)    │  │
│  │ ├─ ONNX Embeddings  │  │ ← GPU via WebGPU
│  │ ├─ WebLLM Inference │  │ ← 40-62 tok/s GPU
│  │ ├─ Voy Vector Search│  │ ← 75KB pure Rust
│  │ └─ IndexedDB Storage│  │ ← Offline persistence
│  └─────────────────────┘  │
└───────────────────────────┘
     ↑ NO SERVER REQUIRED!

Best For:

  • Privacy-first applications
  • Offline-first tools
  • Zero infrastructure cost
  • Edge deployment (CDN)

Tech Stack:

Frontend: Leptos 0.8 + Trunk
ML: ONNX Runtime Web (WebGPU, 3-8ms embeddings)
LLM: WebLLM (WebGPU, 40-62 tok/s)
Vector Search: Voy (75KB k-d tree)
Storage: IndexedDB + Cache API
WASM Size: ~2MB (gzipped)

Performance:

  • Cold start: 2-3s (model loading)
  • Embeddings: 3-8ms per chunk (GPU)
  • LLM: 40-62 tok/s (GPU)
  • Storage: 50% browser quota (~5-10GB)

3. Hybrid (Planned )

Architecture:

┌───────────────────────────┐
│       Browser             │
│  ┌─────────────────────┐  │
│  │ WASM Client (Fast)  │  │ ← Real-time UI
│  │ + GPU Embeddings    │  │ ← 3-8ms GPU
│  │ + Local Cache       │  │ ← Offline-first
│  └──────────┬──────────┘  │
└─────────────┼─────────────┘
              │ Optional WebSocket
┌─────────────▼─────────────┐
│  Server (Heavy Compute)   │
│  ├─ Batch Processing      │ ← Large documents
│  ├─ Multi-user Sync       │ ← Shared knowledge
│  └─ Background Jobs       │ ← Scheduled updates
└───────────────────────────┘

Best For:

  • Enterprise applications
  • Multi-device sync
  • Best UX + Scalability
  • Collaborative knowledge management

Status: Architecture designed, Phase 3 implementation


Optional Components & Features

GraphRAG-rs is modular - enable only what you need via feature flags:

LightRAG (Dual-Level Retrieval)

What: Searches entities (low-level) + communities (high-level) simultaneously

Impact:

  • 6000x token reduction vs traditional GraphRAG
  • ✅ 60ms query time (vs 2-5 seconds)
  • ✅ Better context retention

Enable:

# Cargo.toml
[features]
lightrag = []

# Usage
cargo build --features lightrag

Module: src/lightrag/dual_retrieval.rs

PageRank (Fast-GraphRAG)

What: Ranks entities by graph importance, personalizing to query context

Impact:

  • 27x performance boost in retrieval
  • ✅ 6x cost reduction
  • ✅ Better relevance ranking

Enable:

[features]
pagerank = []

# Usage
cargo build --features pagerank

Module: src/graph/pagerank.rs

ROGRAG (Query Decomposition)

What: Breaks complex queries into sub-queries with logic-based reasoning

Impact:

  • 15% accuracy improvement (60% → 75%)
  • ✅ Handles multi-hop questions
  • ✅ Structured reasoning traces

Enable:

[features]
rograg = []

Module: src/rograg/logic_form.rs

GPU Acceleration

Options:

BackendPlatformPerformanceModule
CUDANVIDIA20-50x speedup--features cuda
MetalApple Silicon15-30x speedup--features metal
VulkanCross-platform10-25x speedup--features vulkan
WebGPUBrowser25-40x speedup--features webgpu

Example:

# NVIDIA GPU acceleration
cargo build --release --features "neural-embeddings,cuda,ollama"

# Apple Silicon
cargo build --release --features "neural-embeddings,metal,ollama"

Intelligent Caching

What: Caches LLM responses with semantic key generation

Impact:

  • 80%+ hit rate in production
  • ✅ 6x cost reduction
  • ✅ 16-20x latency reduction (100ms → 5ms)

Enable:

[features]
caching = ["moka"]

Module: src/caching/cached_client.rs


Monitoring & Metrics

GraphRAG-rs includes comprehensive performance tracking across the entire pipeline.

PipelineStage Tracking

#![allow(unused)]
fn main() {
// src/monitoring/metrics.rs
pub enum PipelineStage {
    QueryExpansion,
    HybridRetrieval,
    BM25Search,
    VectorSearch,
    ResultFusion,
    Reranking,
    ConfidenceFiltering,
    TotalPipeline,
}
}

Real-Time Metrics

#![allow(unused)]
fn main() {
let mut timer = TimingBreakdown::new();

timer.start_stage(PipelineStage::VectorSearch);
let results = vector_search(query).await?;
let duration = timer.end_stage(PipelineStage::VectorSearch);

println!("Vector search: {:?}", duration);
// Output: Vector search: 23ms
}

Performance Breakdown

Query Performance Breakdown:
  Total time: 342ms
  Expanded queries: 3
  Raw results: 45
  Final results: 10
  Average confidence: 0.87

  Stage timings:
    QueryExpansion: 52ms (15.2%)
    VectorSearch: 103ms (30.1%)
    BM25Search: 45ms (13.2%)
    ResultFusion: 67ms (19.6%)
    Reranking: 48ms (14.0%)
    ConfidenceFiltering: 27ms (7.9%)

Module: src/monitoring/metrics.rs, src/monitoring/benchmark.rs


Learn More

Documentation

Practical Examples

Getting Started:

  • examples/01_basic_usage.rs - One-line API
  • examples/02_stateful_api.rs - Multi-query sessions
  • examples/03_builder_api.rs - Full configuration

Advanced:

  • examples/real_ollama_pipeline.rs - Complete 7-stage walkthrough
  • examples/multi_document_pipeline.rs - Incremental graph construction
  • examples/graphrag_multi_doc_server.rs - Production REST API

Configuration Templates

Pre-optimized configs for different document types:

config/templates/
├── narrative_fiction.toml      # Books, novels (800-char chunks)
├── academic_research.toml      # Papers, studies (1024-char chunks)
├── technical_documentation.toml # Manuals, specs (512-char chunks)
├── legal_documents.toml        # Contracts, laws (512-char, low temp)
├── web_blog_content.toml       # Articles, blogs (600-char chunks)
└── dynamic_universal.toml      # General-purpose (adaptive)

Research Papers

GraphRAG-rs implements cutting-edge research:

  1. Microsoft GraphRAG (2024) - “From Local to Global: A Graph RAG Approach”

    • Base architecture foundation
    • Community detection algorithms
  2. Fast-GraphRAG (2024) - PageRank-based retrieval

    • 27x performance improvement
    • 6x cost reduction
  3. LightRAG (2024) - “Simple and Fast Retrieval-Augmented Generation”

    • Dual-level retrieval
    • 6000x token reduction
  4. ROGRAG (2024) - Robust query processing

    • Query decomposition
    • 60% → 75% accuracy boost

Quick Start: See It In Action

1. One-Liner (Simplest)

#![allow(unused)]
fn main() {
use graphrag_rs::simple;

let answer = simple::answer(
    "Tom found treasure in the cave",
    "What did Tom find?"
)?;
// Output: "Tom found treasure in the cave."
}

2. Multi-Query Session

#![allow(unused)]
fn main() {
use graphrag_rs::easy::SimpleGraphRAG;

let mut graph = SimpleGraphRAG::from_text("Your document")?;

graph.ask("What are the main themes?")?;
graph.ask("Who are the characters?")?;
}

3. Production Server

# Start Ollama
ollama serve &
ollama pull llama3.1:8b
ollama pull nomic-embed-text

# Start GraphRAG server
export EMBEDDING_BACKEND=ollama
cargo run --release --bin graphrag-server --features "qdrant,ollama"

# Query via REST API
curl -X POST http://localhost:8080/api/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What did Tom find in the cave?"}'

4. WASM Browser (100% client-side)

cd graphrag-wasm
trunk serve --open

# Visit http://localhost:8080
# Upload document → Build graph → Query → Get answers (100% client-side!)

Configuration-Driven Behavior: Complete Examples

Example 1: Fast Pattern-Based Pipeline (No LLM)

Use Case: Testing, development, offline deployment, resource-constrained environments

Configuration (fast_config.toml):

[general]
log_level = "info"

[entity_extraction]
enabled = true
min_confidence = 0.7
use_gleaning = false          # ← Pattern-based extraction

[ollama]
enabled = false               # ← No LLM required

[embeddings]
backend = "hash"              # ← Fast hash-based embeddings
dimension = 128

[retrieval]
strategy = "vector"           # ← Simple vector search

Runtime Behavior:

[INFO] Configuration loaded from: fast_config.toml
[INFO] Using pattern-based entity extraction
  ✓ Regex + capitalization-based
  ✓ No LLM required
[INFO] Using hash-based embeddings (128 dimensions)
[INFO] Using vector retrieval strategy

Pipeline Performance:
  Chunking:           0.01s
  Embeddings:         0.002s (<1ms per chunk)
  Entity Extraction:  0.005s (<10ms per chunk)
  Graph Construction: 0.05s
  Query Processing:   0.03s
  TOTAL:              0.097s (~100ms)

Results: ✅ Ultra-fast, ✅ No dependencies, ✅ Offline-capable, Good quality (not excellent)


Example 2: High-Accuracy LLM Pipeline (Symposium Philosophy)

Use Case: Academic analysis, philosophical texts, high-quality extraction

Configuration (symposium_config.toml):

[general]
input_document_path = "info/Symposium.txt"
log_level = "info"

[entity_extraction]
enabled = true
min_confidence = 0.6          # ← Lower for philosophical nuance
use_gleaning = true           # ← LLM-based extraction
max_gleaning_rounds = 4       # ← 4 refinement passes
semantic_merging = true
automatic_linking = true

[pipeline.entity_extraction]
model_name = "llama3.1:8b"
temperature = 0.1             # ← Low for consistent concept extraction
entity_types = [
    "PERSON",                 # Socrates, Phaedrus
    "CONCEPT",                # Eros, Beauty, Love
    "ARGUMENT",               # Philosophical positions
    "MYTHOLOGICAL_REFERENCE"  # Gods, myths
]

[ollama]
enabled = true
host = "http://localhost"
port = 11434
chat_model = "llama3.1:8b"    # ← AI-powered extraction
embedding_model = "nomic-embed-text"
fallback_to_hash = false      # ← Error if Ollama fails

[embeddings]
backend = "ollama"
model = "nomic-embed-text"
dimension = 768

[retrieval]
strategy = "hybrid"           # ← Best for philosophical queries
enable_lightrag = true

Runtime Behavior:

[INFO] Configuration loaded from: symposium_config.toml
[INFO] Using LLM-based entity extraction with gleaning (max_rounds: 4)
  ✓ Ollama client initialized
  ✓ Model: llama3.1:8b
  ✓ Entity types: PERSON, CONCEPT, ARGUMENT, MYTHOLOGICAL_REFERENCE

Processing Symposium.txt (189 KB, 455 chunks):

Chunk 1/455:
  Round 1: Extract entities → Found 8 entities (PERSON: 2, CONCEPT: 4, ARGUMENT: 2)
  Round 2: "Did you miss any entities?" → Found 2 more (CONCEPT: 2)
  Round 3: "Find relationships" → Found 3 relationships
  Round 4: "Final check" → Found 1 subtle concept
  ✅ Extraction complete: 11 entities, 3 relationships

... (processing all chunks) ...

[INFO] Final Results:
  Entities:      317 (PERSON: 89, CONCEPT: 156, ARGUMENT: 45, MYTHOLOGICAL_REFERENCE: 27)
  Relationships: 455
  Communities:   12 (speaker groups, concept clusters)
  Processing Time: 325ms per chunk average

[INFO] Using Ollama embeddings: nomic-embed-text (768 dimensions)
[INFO] Using hybrid retrieval: vector (40%) + bm25 (30%) + pagerank (30%)

Query: "What is love according to Socrates?"
  VectorSearch:   123ms
  BM25Search:     45ms
  PageRankScore:  67ms
  Fusion (RRF):   28ms
  TOTAL:          263ms

Answer: "According to Socrates in the Symposium, love (Eros) is the
         pursuit of beauty and wisdom. Socrates relates Diotima's teaching
         that love is not a god but a spirit that mediates between mortals
         and the divine..."

Results: ★★★★★ Excellent quality, ✅ Contextual understanding, ✅ Custom entity types, Requires Ollama/GPU


Example 3: Hybrid Configuration (Tom Sawyer Narrative)

Use Case: Fiction analysis, balanced quality/performance

Configuration (tom_sawyer_config.toml):

[entity_extraction]
enabled = true
min_confidence = 0.65
use_gleaning = true           # ← LLM-based
max_gleaning_rounds = 2       # ← Only 2 rounds (faster)

[ollama]
enabled = true
chat_model = "llama3.1:8b"

[embeddings]
backend = "ollama"            # ← Real semantic embeddings
model = "nomic-embed-text"
fallback_to_hash = true       # ← Fallback if Ollama unavailable

[retrieval]
strategy = "hybrid"
enable_lightrag = true        # ← Dual-level retrieval

Runtime Behavior:

[INFO] Using LLM-based entity extraction with gleaning (max_rounds: 2)
[INFO] Using Ollama embeddings with hash fallback

Processing Tom Sawyer (434 KB, 492 chunks):
  Chunking:           0.01s
  Embeddings:         0.08s (Ollama, 768-dim)
  Entity Extraction:  0.6s (LLM, 2 rounds)
  Graph Construction: 0.05s
  TOTAL:              0.74s (~750ms)

Query: "How did Tom and Huck find the treasure?"
  Low-level retrieval:  23ms (entities: Tom, Huck, treasure)
  High-level retrieval: 31ms (community: treasure hunting storyline)
  Fusion:               12ms
  TOTAL:                66ms

Answer: "Tom and Huck discovered the treasure in McDougal's Cave after
         witnessing Injun Joe hide it there..."

Results: ★★★★ Very good quality, Balanced performance, ✅ Fallback safety


Configuration Comparison Matrix

ConfigEntity ExtractionEmbeddingsQuery TimeQualityBest For
FastPattern (10ms)Hash100ms★★★ GoodTesting, offline
SymposiumLLM 4-round (1.2s)Ollama263ms★★★★★ ExcellentPhilosophy, analysis
Tom SawyerLLM 2-round (600ms)Ollama66ms★★★★ Very goodFiction, balanced

Key Insight: The same codebase adapts automatically - you control behavior through configuration!


Key Takeaways

  1. 7 Stages: Text → Chunks → Vectors → Entities → Graph → Retrieval → Query → Answer
  2. 3 Architectures: Server-Only ✅ | WASM-Only | Hybrid
  3. Configuration-Driven: Same code, different behavior via TOML settings
  4. Dynamic Selection: Pipeline adapts based on use_gleaning, ollama.enabled, retrieval.strategy
  5. State-of-the-Art: LightRAG (6000x reduction) + PageRank (27x speedup) + ROGRAG (+15% accuracy)
  6. Production-Ready: 5.2MB binary, <1s startup, 500ms-2s queries
  7. Modular: Enable only what you need via feature flags
  8. GPU-Accelerated: CUDA, Metal, Vulkan, WebGPU support

GraphRAG transforms documents into intelligent knowledge that answers questions with unprecedented accuracy and context awareness - all controlled by simple TOML configuration.


Last Updated: October 2025 | GraphRAG-rs v1.0

LazyGraphRAG / E2GraphRAG

{{#include ../../../docs/LAZYGRAPHRAG_E2GRAPHRAG.md}}

Configuration Guide

{{#include ../../../docs/CONFIGURATION_GUIDE.md}}

JSON5 Configuration System for GraphRAG

Type-safe, validated configuration for GraphRAG pipelines.

Table of Contents


Why JSON5?

The Critical Advantage: Comments!

Unlike standard JSON, JSON5 allows comments to document your configuration choices:

❌ Standard JSON:

{
  "temperature": 0.1,
  "chunk_size": 800
}

No comments allowed - JSON syntax forbids comments entirely!

✅ JSON5:

{
  // Low temperature for consistent character analysis
  "temperature": 0.1,  // 0.05-0.3 optimal for narrative (IBM 2024)

  // Larger chunks capture complete narrative scenes
  "chunk_size": 800,  // LlamaIndex research: 800-1024 for narratives
}

Comments everywhere - document choices, cite research, explain “why”!

JSON5 Features

  1. Comments (// and /* */)

    • Document WHY you chose parameter values
    • Add research references inline
    • Explain domain-specific choices
  2. Trailing Commas

    {
      "a": 1,
      "b": 2,  // ← This trailing comma is valid!
    }
    
  3. Flexible Syntax

    • More forgiving than strict JSON
    • Numbers: +123, 0xFF, Infinity, NaN
    • Multi-line strings
    • Unquoted keys (we use quoted for consistency)
  4. Schema Validation

    • Real-time autocomplete in VSCode
    • Catch errors before runtime
    • Range and enum validation
    • Hover documentation

JSON5 vs JSON

FeatureJSONJSON5
Comments// or /* */
Trailing commas
Unquoted keys
NumbersLimited+123, 0xFF, Infinity
StringsSingle lineMulti-line
Schema support
Autocomplete
Validation

Winner: JSON5 = Best of JSON (tooling) + Comments + Flexible syntax


Quick Start

1. Use an Existing Template

GraphRAG provides 13 pre-configured templates for different use cases:

# List available templates
ls config/templates/*.graphrag.json5

# Copy a template
cp config/templates/narrative_fiction.graphrag.json5 my_config.graphrag.json5

# Edit with autocomplete in VSCode!
code my_config.graphrag.json5

Available templates:

  • semantic_pipeline.graphrag.json5 - LLM-based semantic analysis
  • algorithmic_pipeline.graphrag.json5 - Fast pattern-based extraction
  • hybrid_pipeline.graphrag.json5 - Combined semantic + algorithmic
  • narrative_fiction.graphrag.json5 - Novels, stories, literature
  • technical_documentation.graphrag.json5 - API docs, manuals
  • academic_research.graphrag.json5 - Research papers, theses
  • legal_documents.graphrag.json5 - Contracts, regulations
  • web_blog_content.graphrag.json5 - Blog posts, articles
  • And more!

2. Template Structure

{
  // ==========================================================================
  // GraphRAG Configuration - YOUR PROJECT NAME
  // ==========================================================================
  // VSCode: This file has autocomplete! Press Ctrl+Space for suggestions.
  // ==========================================================================

  "$schema": "../schema/graphrag-config.schema.json",

  "mode": {
    "approach": "semantic"  // Options: semantic | algorithmic | hybrid
  },

  "general": {
    "input_document_path": "path/to/your/document.txt",
    "output_dir": "./output/analysis",
    "log_level": "info",
    "max_threads": 4
  },

  "pipeline": {
    "workflows": ["extract_text", "extract_entities", "build_graph"],
    "text_extraction": {
      "chunk_size": 800,
      "chunk_overlap": 300
    },
    "entity_extraction": {
      "model_name": "llama3.1:8b",
      "temperature": 0.1,
      "entity_types": ["PERSON", "LOCATION", "EVENT"]
    }
  },

  "ollama": {
    "enabled": true,
    "chat_model": "llama3.1:8b",
    "embedding_model": "nomic-embed-text"
  }
}

3. Load in Rust (Coming Soon)

use graphrag_core::config::json5_loader::load_json5_config;

fn main() -> Result<()> {
    let config: GraphRAGConfig = load_json5_config("my_config.graphrag.json5")?;
    println!("Approach: {:?}", config.mode.approach);
    Ok(())
}

VSCode Setup

Automatic Setup (Already Done!)

The repository includes:

  • .vscode/settings.json - Schema mapping for *.graphrag.json5 files
  • .vscode/graphrag.code-snippets - Quick templates

What You Get

1. Autocomplete (Press Ctrl+Space)

{
  "mode": {
    "approach": ""  // ← Press Ctrl+Space here: semantic | algorithmic | hybrid
  }
}

2. Real-time Validation

{
  "general": {
    "max_threads": 999  // ❌ Red underline: Maximum is 128
  }
}

3. Hover Documentation

  • Hover over any field
  • See description, valid range, default value
  • Research-based recommendations

4. Error Prevention

{
  "mode": {
    "approach": "invalid"  // ❌ Error: must be semantic/algorithmic/hybrid
  },
  "text_processing": {
    "chunk_size": 99999  // ❌ Error: maximum is 4096
  }
}

Manual Setup (If Needed)

If autocomplete doesn’t work automatically:

  1. Open VSCode Settings (Ctrl+,)
  2. Search for “json.schemas”
  3. Verify this mapping exists:
    "json.schemas": [{
      "fileMatch": ["*.graphrag.json5", "*.graphrag.json"],
      "url": "./config/schema/graphrag-config.schema.json"
    }]
    
  4. Reload VSCode: Ctrl+Shift+P → “Reload Window”

Creating Configurations

Option 1: Copy a Template

Start with a template matching your use case:

# For semantic pipeline (LLM-based, high quality)
cp config/templates/semantic_pipeline.graphrag.json5 my_config.graphrag.json5

# For narrative fiction (novels, stories)
cp config/templates/narrative_fiction.graphrag.json5 my_novel_config.graphrag.json5

# For technical docs (API documentation, manuals)
cp config/templates/technical_documentation.graphrag.json5 my_api_docs.graphrag.json5

# For hybrid approach (balanced quality and speed)
cp config/templates/hybrid_pipeline.graphrag.json5 my_hybrid_config.graphrag.json5

Then customize:

  1. Update input_document_path
  2. Adjust output_dir
  3. Customize entity_types for your domain
  4. Tune parameters based on your needs

Option 2: Build from Scratch

In VSCode:

  1. Create my_config.graphrag.json5
  2. Add schema reference:
    {
      "$schema": "../config/schema/graphrag-config.schema.json"
    }
    
  3. Press Ctrl+Space and follow autocomplete suggestions!

The schema will guide you through all required and optional fields.

Option 3: Use Code Snippets

In VSCode:

  1. Create new file: my_config.graphrag.json5
  2. Type graphrag-semantic and press Tab
  3. Full template inserted!

Available snippets:

  • graphrag-semantic - Semantic pipeline template
  • graphrag-algorithmic - Algorithmic pipeline template
  • graphrag-hybrid - Hybrid pipeline template

✅ Validation

Real-time (VSCode)

Errors show immediately as you type:

{
  "mode": {
    "approach": "semantic"
  },
  "general": {
    "max_threads": 999,  // ❌ Error: Maximum is 128
    "log_level": "invalid"  // ❌ Error: Must be trace/debug/info/warn/error
  },
  "ollama": {
    "temperature": 5.0  // ❌ Error: Maximum is 2.0
  }
}

CLI Validation

Validate before running your application:

# Validate single config
uv run --with jsonschema --with json5 python scripts/validate_json5_configs.py \
  --config my_config.graphrag.json5

# Validate all configs in directory
uv run --with jsonschema --with json5 python scripts/validate_json5_configs.py \
  --dir config/templates

# Custom schema
uv run --with jsonschema --with json5 python scripts/validate_json5_configs.py \
  --config my_config.json5 \
  --schema path/to/schema.json

Output:

Validating 1 configuration file(s)...

✅ my_config.graphrag.json5

============================================================
Validation Complete: 1/1 valid
All configurations are valid!

Error output example:

❌ my_config.graphrag.json5
  • Path: general → max_threads
    Error: 999 is greater than the maximum of 128
    Allowed range: 1-128

  • Path: mode → approach
    Error: 'invalid' is not one of ['semantic', 'algorithmic', 'hybrid']
    Allowed values: "semantic", "algorithmic", "hybrid"

Programmatic Validation (Rust - Coming Soon)

use graphrag_core::config::schema_validator::validate_config_file;

fn main() -> Result<()> {
    validate_config_file(
        "my_config.graphrag.json5",
        "config/schema/graphrag-config.schema.json"
    )?;

    println!("✅ Configuration is valid!");
    Ok(())
}

Examples

Example 1: Minimal Semantic Config

{
  "$schema": "../config/schema/graphrag-config.schema.json",

  "mode": { "approach": "semantic" },

  "general": {
    "input_document_path": "data/document.pdf",
    "output_dir": "./output"
  },

  "pipeline": {
    "workflows": ["extract_text", "extract_entities", "build_graph"]
  },

  "ollama": {
    "enabled": true,
    "host": "http://localhost",
    "port": 11434,
    "chat_model": "llama3.1:8b"
  }
}

Example 2: Narrative Fiction

{
  "$schema": "../config/schema/graphrag-config.schema.json",

  "mode": { "approach": "semantic" },

  "general": {
    "input_document_path": "novels/tom_sawyer.txt",
    "output_dir": "./output/narrative",
    "log_level": "info"
  },

  // Narrative-optimized chunking (LlamaIndex 2024 research)
  "pipeline": {
    "text_extraction": {
      "chunk_size": 800,      // Captures complete scenes
      "chunk_overlap": 300,   // 37.5% overlap for character continuity
      "min_chunk_size": 200
    },
    "entity_extraction": {
      "model_name": "llama3.1:8b",
      "temperature": 0.1,     // Low for consistent character analysis
      "entity_types": [
        "PERSON",              // Characters
        "CHARACTER_TRAIT",     // Personality, appearance
        "LOCATION",            // Settings, places
        "EMOTION",             // Emotional states
        "THEME",               // Literary themes
        "RELATIONSHIP",        // Character relationships
        "EVENT"                // Plot events
      ],
      "confidence_threshold": 0.6  // Captures literary nuances
    }
  },

  "ollama": {
    "enabled": true,
    "chat_model": "llama3.1:8b",
    "generation": {
      "temperature": 0.3,    // Balanced for narrative analysis
      "max_tokens": 1500
    }
  }
}

Example 3: Technical Documentation

{
  "$schema": "../config/schema/graphrag-config.schema.json",

  "mode": { "approach": "semantic" },

  "general": {
    "input_document_path": "docs/api_reference.md",
    "output_dir": "./output/tech_docs"
  },

  // Technical precision (Databricks 2024 research)
  "pipeline": {
    "text_extraction": {
      "chunk_size": 512,      // Smaller chunks for precision
      "chunk_overlap": 100,   // 20% minimal overlap
      "min_chunk_size": 128
    },
    "entity_extraction": {
      "model_name": "llama3.1:8b",
      "temperature": 0.05,    // Maximum precision
      "entity_types": [
        "API_ENDPOINT",        // REST endpoints
        "FUNCTION",            // Functions, methods
        "PARAMETER",           // Function parameters
        "ERROR_CODE",          // Error codes, exceptions
        "LIBRARY",             // External libraries
        "VERSION",             // Version numbers
        "DATA_TYPE"            // Data types
      ],
      "confidence_threshold": 0.8  // High accuracy for technical content
    }
  },

  "ollama": {
    "enabled": true,
    "generation": {
      "temperature": 0.1,    // Very low for technical precision
      "max_tokens": 1200
    }
  }
}

Example 4: Hybrid Pipeline

{
  "$schema": "../config/schema/graphrag-config.schema.json",

  // Hybrid: Combines semantic (LLM) + algorithmic (patterns)
  "mode": { "approach": "hybrid" },

  "general": {
    "input_document_path": "data/mixed_content",
    "output_dir": "./output/hybrid"
  },

  "pipeline": {
    "workflows": ["extract_text", "extract_entities", "build_graph"],
    "text_extraction": {
      "chunk_size": 600,
      "chunk_overlap": 150
    },
    "entity_extraction": {
      "model_name": "llama3.1:8b",
      "temperature": 0.15,
      "entity_types": ["PERSON", "ORGANIZATION", "LOCATION", "CONCEPT"],
      "confidence_threshold": 0.6
    }
  },

  "ollama": {
    "enabled": true,
    "chat_model": "llama3.1:8b",
    "fallback_to_hash": true  // Graceful degradation if LLM fails
  },

  "performance": {
    "batch_processing": true,
    "batch_size": 32,
    "worker_threads": 6,
    "cache_embeddings": true
  }
}

Troubleshooting

Autocomplete Not Working

Problem: No suggestions when typing

Solutions:

  1. ✅ Verify $schema field points to correct path
  2. ✅ Check file extension is .graphrag.json5 or .json5
  3. ✅ Reload VSCode: Ctrl+Shift+P → “Reload Window”
  4. ✅ Check .vscode/settings.json has schema mapping
  5. ✅ Ensure you’re in VSCode (not other editors)

Validation Errors

Problem: Red underlines everywhere

Common Fixes:

ErrorFix
Missing required fieldAdd required fields: mode, general
Invalid enum valueUse Ctrl+Space to see valid options
Number out of rangeHover to see valid range (e.g., 0.0-1.0)
Wrong typeEnsure strings have quotes, numbers don’t
Additional properties not allowedRemove unsupported fields

Example fixes:

// ❌ Wrong
{
  "mode": { "approach": "semantic" },
  "unsupported_field": "value"  // Error: additional property
}

// ✅ Correct
{
  "$schema": "../config/schema/graphrag-config.schema.json",
  "mode": { "approach": "semantic" },
  "general": {
    "input_document_path": "data/input.txt",
    "output_dir": "./output"
  }
}

Schema Path Issues

Problem: VSCode can’t find schema

Solution: Use relative path from config file location:

{
  // If config is in project root:
  "$schema": "./config/schema/graphrag-config.schema.json",

  // If config is in config/:
  "$schema": "./schema/graphrag-config.schema.json",

  // If config is in config/templates/:
  "$schema": "../schema/graphrag-config.schema.json"
}

“Property keys must be doublequoted” Warning

Problem: VSCode shows warnings on unquoted keys (e.g., mode: {...})

Why This Happens:

  • VSCode treats .json5 files as JSONC (JSON with Comments)
  • JSONC requires quoted keys: "mode": {...}
  • JSON5 allows unquoted keys: mode: {...} ✅ Valid!
  • This is a false positive - your JSON5 syntax is correct

Example Warning:

{
  mode: {  // VSCode warning: "Property keys must be doublequoted"
    approach: "semantic"
  }
}

Solutions:

Option 1: Ignore the Warnings (Recommended)

  • These are cosmetic warnings only
  • Your JSON5 files are valid and will work correctly
  • The warnings don’t affect functionality

Option 2: Install JSON5 Extension

  • Install “JSON5 syntax” extension from VSCode marketplace
  • Provides true JSON5 language support
  • Eliminates false positives

Option 3: Use Quoted Keys

{
  "mode": {  // ✅ No warning with quoted keys
    "approach": "semantic"
  }
}

Trade-off: Loses the readability advantage of unquoted keys

Our Recommendation: Ignore the warnings. They’re false positives caused by VSCode’s JSONC mode not fully supporting JSON5’s unquoted key feature. Your configs are valid and will work correctly.


Best Practices

1. Always Use $schema Reference

{
  // ✅ First line: enables autocomplete and validation
  "$schema": "../config/schema/graphrag-config.schema.json",

  // ... rest of config
}

This single line enables:

  • ✅ Real-time autocomplete
  • ✅ Instant error detection
  • ✅ Hover documentation
  • ✅ Type validation

2. Document with Comments

{
  "pipeline": {
    "text_extraction": {
      // Research-based: LlamaIndex 2024 study shows 800-1024 optimal
      // for narrative continuity and character relationship tracking.
      // See: https://www.llamaindex.ai/blog/evaluating-chunk-size
      "chunk_size": 800,

      // 37.5% overlap preserves scene boundaries and dialogue context.
      // Critical for maintaining character consistency across chunks.
      // Pinecone 2024: "Chunking Strategies for LLM Applications"
      "chunk_overlap": 300
    }
  }
}

3. Use Descriptive Filenames

✅ Good:
  - narrative_dickens_analysis.graphrag.json5
  - api_docs_v2_production.graphrag.json5
  - legal_contracts_compliance.graphrag.json5

❌ Bad:
  - config.json5
  - test.json5
  - c1.json5

4. Validate Before Running

# Always validate before deploying
uv run --with jsonschema --with json5 python scripts/validate_json5_configs.py \
  --config production.graphrag.json5

5. Version Control Your Configs

git add my_project.graphrag.json5
git commit -m "feat: add GraphRAG config for project XYZ"

Keep configs in version control to track changes over time.

6. Document Custom Parameters

{
  "entity_extraction": {
    // Custom threshold chosen after A/B testing:
    // - 0.7: 85% precision, 72% recall
    // - 0.6: 78% precision, 84% recall ← chosen
    // - 0.5: 65% precision, 91% recall
    // Decision: Prioritize recall for this corpus (historical texts)
    "confidence_threshold": 0.6
  }
}

Advantages Summary

Why JSON5 for GraphRAG?

Comments - Document configuration choices inline ✅ Autocomplete - VSCode suggests all available fields ✅ Validation - Catch errors before runtime ✅ Research Documentation - Cite sources directly in config ✅ Trailing Commas - More forgiving, easier editing ✅ Schema Support - Full IDE integration ✅ Better DX - Faster development, fewer errors ✅ Self-Documenting - Configuration explains itself

Available Templates (All Validated ✅)

All 13 templates pass JSON Schema validation:

  • semantic_pipeline.graphrag.json5 - General semantic
  • algorithmic_pipeline.graphrag.json5 - General algorithmic
  • hybrid_pipeline.graphrag.json5 - General hybrid
  • narrative_fiction.graphrag.json5 - Novels, stories
  • technical_documentation.graphrag.json5 - API docs, manuals
  • academic_research.graphrag.json5 - Research papers
  • legal_documents.graphrag.json5 - Contracts, regulations
  • web_blog_content.graphrag.json5 - Blog posts, articles
  • dynamic_universal.graphrag.json5 - Adaptive configuration
  • enrichment_example.graphrag.json5 - Text enrichment
  • semantic.graphrag.json5 - Basic semantic
  • algorithmic.graphrag.json5 - Basic algorithmic
  • hybrid.graphrag.json5 - Basic hybrid

Status: 13/13 pass JSON Schema validation


Additional Resources

  • JSON Schema: config/schema/graphrag-config.schema.json
  • Template Examples: config/templates/*.graphrag.json5
  • Validation Scripts: scripts/README.md
  • VSCode Settings: .vscode/settings.json
  • Code Snippets: .vscode/graphrag.code-snippets

Common Questions

Q: What file extension should I use? A: Use .graphrag.json5 for automatic schema mapping, or .json5 for general JSON5 files.

Q: Can I use regular JSON instead of JSON5? A: Yes! JSON5 is a superset of JSON. Any valid JSON is valid JSON5. But you’ll lose the ability to add comments.

Q: How do I know which template to use? A: Match your content type:

  • Novels/stories → narrative_fiction
  • API docs → technical_documentation
  • Research papers → academic_research
  • Legal docs → legal_documents
  • Mixed content → hybrid_pipeline

Q: What if I need to customize entity types? A: Edit the entity_types array in your config:

"entity_types": [
  "CUSTOM_TYPE_1",
  "CUSTOM_TYPE_2",
  "PERSON",
  "LOCATION"
]

Q: How do I tune for my specific domain? A: Start with the closest template, then adjust:

  1. chunk_size - larger for better context, smaller for precision
  2. confidence_threshold - higher for precision, lower for recall
  3. entity_types - add domain-specific types
  4. temperature - lower for consistency, higher for variety

Ready to start?

cp config/templates/semantic_pipeline.graphrag.json5 my_config.graphrag.json5
code my_config.graphrag.json5

Press Ctrl+Space and let autocomplete guide you!

Auto-Save & Persistence

{{#include ../../../docs/AUTO_SAVE_CONFIGURATION.md}}

Summarization

{{#include ../../../docs/SUMMARIZATION_CONFIG.md}}

Entity Enrichment

{{#include ../../../docs/ENRICHMENT_USAGE_GUIDE.md}}

GLiNER-Relex Extraction

{{#include ../../../docs/GLINER_RELEX_GUIDE.md}}

Incremental Updates

{{#include ../../../docs/INCREMENTAL_UPDATES.md}}

Embeddings Reference

{{#include ../../../docs/EMBEDDINGS_REFERENCE.md}}

Model Recommendations

{{#include ../../../docs/BEST_MODELS_RECOMMENDATION.md}}

Qwen3 Integration

{{#include ../../../docs/QWEN3_INTEGRATION_GUIDE.md}}

GraphRAG Core

The core library for GraphRAG-rs, providing portable functionality for both native and WASM deployments.

Overview

graphrag-core is the foundational library that powers GraphRAG-rs. It provides:

  • Embedding Generation: 8 provider backends (HuggingFace, OpenAI, Voyage AI, Cohere, Jina, Mistral, Together AI, Ollama)
  • Entity Extraction: TRUE LLM-based gleaning extraction with multi-round refinement (Microsoft GraphRAG-style)
  • Graph Construction: Incremental updates, PageRank, community detection
  • Retrieval Strategies: Vector, BM25, PageRank, hybrid, adaptive
  • Configuration System: Hierarchical TOML-based configuration with environment variable overrides
  • Cross-Platform: Works on native (Linux, macOS, Windows) and WASM

Quick Start (5 Lines!)

use graphrag_core::prelude::*;

#[tokio::main]
async fn main() -> Result<()> {
    let mut graphrag = GraphRAG::quick_start("Your document text here").await?;
    let answer = graphrag.ask("What is the main topic?").await?;
    println!("{}", answer);
    Ok(())
}

Or with detailed explanations:

#![allow(unused)]
fn main() {
let explained = graphrag.ask_explained("What is the main topic?").await?;
println!("Answer: {}", explained.answer);
println!("Confidence: {:.0}%", explained.confidence * 100.0);
for step in &explained.reasoning_steps {
    println!("Step {}: {}", step.step_number, step.description);
}
}

Installation

Add to your Cargo.toml:

[dependencies]
# Choose a feature bundle:
graphrag-core = { version = "0.1", features = ["starter"] }  # Basic setup
# OR
graphrag-core = { version = "0.1", features = ["full"] }     # Production-ready
# OR
graphrag-core = { version = "0.1", features = ["research"] } # Advanced features

Feature Bundles

BundleDescriptionIncludes
starterMinimal setup to get startedasync, ollama, memory-storage, basic-retrieval
fullProduction-ready with common featuresstarter + pagerank, lightrag, caching, parallel-processing, leiden
wasm-bundleBrowser-safe features onlymemory-storage, basic-retrieval, leiden
researchAdvanced experimental featuresfull + rograg, cross-encoder, incremental, monitoring

Three Ways to Configure

1. TypedBuilder (Compile-Time Safety)

#![allow(unused)]
fn main() {
use graphrag_core::prelude::*;

// Build won't compile until required fields are set!
let graphrag = TypedBuilder::new()
    .with_output_dir("./output")    // Required
    .with_ollama()                   // Required: choose LLM backend
    .with_chunk_size(512)            // Optional
    .with_top_k(10)                  // Optional
    .build()?;
}

Available LLM backends:

  • .with_ollama() - Local Ollama (recommended)
  • .with_ollama_custom("host", 8080, "model") - Custom Ollama config
  • .with_hash_embeddings() - Offline, no LLM needed
  • .with_candle_embeddings() - Local neural embeddings

2. Hierarchical Config (with figment)

Enable with the hierarchical-config feature:

#![allow(unused)]
fn main() {
// Loads configuration from 5 sources (in priority order):
// 1. Code defaults (lowest priority)
// 2. ~/.graphrag/config.toml (user config)
// 3. ./graphrag.toml (project config)
// 4. Environment variables (GRAPHRAG_*)
// 5. Builder overrides (highest priority)

let config = Config::load()?;  // Automatically merges all sources
let graphrag = GraphRAG::new(config)?;
}

Environment variable overrides:

export GRAPHRAG_OLLAMA_HOST=my-server
export GRAPHRAG_OLLAMA_PORT=8080
export GRAPHRAG_CHUNK_SIZE=1000

3. TOML Configuration File

# graphrag.toml
output_dir = "./output"
approach = "hybrid"  # semantic, algorithmic, or hybrid
chunk_size = 1000
chunk_overlap = 200

[embeddings]
backend = "ollama"
dimension = 768
model = "nomic-embed-text:latest"

[ollama]
enabled = true
host = "localhost"
port = 11434
chat_model = "llama3.2:3b"

[entities]
min_confidence = 0.7
use_gleaning = true
max_gleaning_rounds = 3
entity_types = ["PERSON", "ORGANIZATION", "LOCATION", "DATE", "EVENT"]

Load with:

#![allow(unused)]
fn main() {
let config = Config::from_toml_file("graphrag.toml")?;
let graphrag = GraphRAG::new(config)?;
}

Sectoral Templates

Pre-configured templates for specific domains:

TemplateBest ForEntity Types
general.tomlMixed documentsPERSON, ORGANIZATION, LOCATION, DATE, EVENT
legal.tomlContracts, agreementsPARTY, JURISDICTION, CLAUSE_TYPE, OBLIGATION
medical.tomlClinical notesPATIENT, DIAGNOSIS, MEDICATION, SYMPTOM
financial.tomlReports, filingsCOMPANY, TICKER, MONETARY_VALUE, METRIC
technical.tomlAPI docs, codeFUNCTION, CLASS, MODULE, API_ENDPOINT

Using templates:

#![allow(unused)]
fn main() {
let config = Config::from_toml_file("templates/legal.toml")?;
}

Or via CLI:

graphrag-cli setup --template legal

Explained Answers

Get transparency into how answers are generated:

#![allow(unused)]
fn main() {
let explained = graphrag.ask_explained("Who founded the company?").await?;

// Access detailed information:
println!("Answer: {}", explained.answer);
println!("Confidence: {:.0}%", explained.confidence * 100.0);

// Reasoning trace
for step in &explained.reasoning_steps {
    println!("{}. {} (confidence: {:.0}%)",
        step.step_number,
        step.description,
        step.confidence * 100.0
    );
}

// Source references
for source in &explained.sources {
    println!("Source: {} ({:?})", source.id, source.source_type);
    println!("  Excerpt: {}", source.excerpt);
}

// Or get formatted output
println!("{}", explained.format_display());
}

Output:

**Answer:** John Smith founded Acme Corp in 2015.

**Confidence:** 85%

**Reasoning:**
1. Analyzed query: "Who founded the company?" (confidence: 95%)
2. Found 3 relevant entities (confidence: 85%)
3. Retrieved 5 relevant text chunks (confidence: 85%)
4. Synthesized answer from retrieved information (confidence: 85%)

**Sources:**
1. [TextChunk] chunk_123 (relevance: 92%)
2. [Entity] john_smith (relevance: 88%)

Error Handling

Errors implement standard std::error::Error and carry descriptive messages:

#![allow(unused)]
fn main() {
match graphrag.ask("question").await {
    Ok(answer) => println!("{}", answer),
    Err(e) => {
        println!("Error: {}", e);
    }
}
}

CLI Setup Wizard

Interactive configuration wizard:

graphrag-cli setup

# With template:
graphrag-cli setup --template legal

# Custom output:
graphrag-cli setup --output ./my-config.toml

Wizard prompts:

  1. Select use case (General, Legal, Medical, Financial, Technical)
  2. Choose LLM provider (Ollama or pattern-based)
  3. Configure Ollama settings (if selected)
  4. Set output directory

Full Usage Example

use graphrag_core::prelude::*;

#[tokio::main]
async fn main() -> Result<()> {
    // Option 1: Quick start (simplest)
    let mut graphrag = GraphRAG::quick_start("Your document text").await?;

    // Option 2: TypedBuilder (compile-time safe)
    let mut graphrag = TypedBuilder::new()
        .with_output_dir("./output")
        .with_ollama()
        .with_chunk_size(512)
        .build_and_init()?;

    // Add documents
    graphrag.add_document_from_text("Document content here")?;

    // Build knowledge graph
    graphrag.build_graph().await?;

    // Query
    let answer = graphrag.ask("What are the main topics?").await?;
    println!("{}", answer);

    // Or with explanations
    let explained = graphrag.ask_explained("What are the main topics?").await?;
    println!("{}", explained.format_display());

    Ok(())
}

Embedding Providers

GraphRAG Core supports 8 embedding backends:

ProviderCostQualityFeature FlagUse Case
HuggingFaceFree★★★★huggingface-hubOffline, 100+ models
OpenAI$0.13/1M★★★★★ureqBest quality
Voyage AIMedium★★★★★ureqAnthropic recommended
Cohere$0.10/1M★★★★ureqMultilingual (100+ langs)
Jina AI$0.02/1M★★★★ureqCost-optimized
Mistral$0.10/1M★★★★ureqRAG-optimized
Together AI$0.008/1M★★★★ureqCheapest
OllamaFree★★★★ollama + asyncLocal GPU + LLM

Advanced Features

LightRAG (Dual-Level Retrieval)

[retrieval]
strategy = "hybrid"
enable_lightrag = true  # 6000x token reduction!

PageRank (Fast-GraphRAG)

[graph]
enable_pagerank = true  # 27x performance boost

RoGRAG (Logic Form Reasoning)

#![allow(unused)]
fn main() {
// Enable with feature flag: rograg
let answer = graphrag.ask_with_reasoning("Why did X cause Y?").await?;
}

Intelligent Caching

[generation]
enable_caching = true  # 80%+ hit rate, 6x cost reduction

Pipeline Architecture

GraphRAG uses a configurable pipeline with different methods for each phase:

┌─────────────────────────────────────────────────────────────────────────┐
│                         build_graph()                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────────┐                                                    │
│  │    CHUNKING     │  TextProcessor splits document into chunks         │
│  │  (always runs)  │  Configurable: chunk_size, chunk_overlap           │
│  └────────┬────────┘                                                    │
│           │                                                             │
│           ▼                                                             │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                    ENTITY EXTRACTION                             │   │
│  │  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │   │
│  │  │   Algorithmic   │  │    Semantic     │  │     Hybrid      │  │   │
│  │  │ (pattern-based) │  │  (LLM-based)    │  │ (both + fusion) │  │   │
│  │  │    Fast      │  │  Accurate    │  │  Balanced    │  │   │
│  │  └─────────────────┘  └─────────────────┘  └─────────────────┘  │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│           │                                                             │
│           ▼                                                             │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                  RELATIONSHIP EXTRACTION                         │   │
│  │  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │   │
│  │  │  Co-occurrence  │  │    LLM-based    │  │    Gleaning     │  │   │
│  │  │ entity proximity│  │ GraphRAG method │  │ multi-round LLM │  │   │
│  │  │    Fast      │  │  Semantic    │  │  Iterative   │  │   │
│  │  └─────────────────┘  └─────────────────┘  └─────────────────┘  │   │
│  │  Optional: config.graph.extract_relationships = true/false       │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│           │                                                             │
│           ▼                                                             │
│  ┌─────────────────┐                                                    │
│  │    GRAPH        │  Entities + Relationships → KnowledgeGraph        │
│  │  CONSTRUCTION   │  Supports: PageRank, Community Detection          │
│  └─────────────────┘                                                    │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐
│                           ask() / query                                 │
├─────────────────────────────────────────────────────────────────────────┤
│  ┌─────────────────┐                                                    │
│  │    EMBEDDING    │  Generated on-demand (lazy evaluation)             │
│  │   GENERATION    │  8 providers: Ollama, OpenAI, HuggingFace, etc.   │
│  └────────┬────────┘                                                    │
│           ▼                                                             │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                     RETRIEVAL STRATEGIES                         │   │
│  │  Vector │ BM25 │ PageRank │ Hybrid │ Adaptive │ LightRAG         │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│           ▼                                                             │
│  ┌─────────────────┐                                                    │
│  │     ANSWER      │  LLM synthesis (if Ollama enabled)                │
│  │   GENERATION    │  Or: concatenated search results                   │
│  └─────────────────┘                                                    │
└─────────────────────────────────────────────────────────────────────────┘

Phase Configuration Quick Reference

PhaseKey ParametersConfig
1. Chunkingchunk_size, chunk_overlapchunk_size = 1000
2. Entity Extractionapproach, entity_types, use_gleaningapproach = "hybrid"
3. Relationship Extractionextract_relationships, use_gleaning[graph] extract_relationships = true
4. Graph Constructionenable_pagerank, max_connections[graph] enable_pagerank = true
5. Embeddingbackend, dimension, model[embeddings] backend = "ollama"
6. Retrievalstrategy, top_k[retrieval] strategy = "hybrid"
7. Answer Generationchat_model, temperature[ollama] enabled = true

Method Selection by Phase

PhaseMethods AvailableConfig Setting
Entity ExtractionAlgorithmic / Semantic / Hybridapproach = "algorithmic|semantic|hybrid"
Relationship ExtractionCo-occurrence / LLM-based / Gleaningentities.use_gleaning = true|false
EmbeddingOllama / Hash / OpenAI / HuggingFace / 8 providersembeddings.backend = "ollama"
RetrievalVector / BM25 / PageRank / Hybrid / Adaptive / LightRAGretrieval.strategy = "hybrid"

Key Notes

  • Embedding is NOT part of build_graph() - generated lazily during queries
  • Relationship extraction is optional - controlled by config.graph.extract_relationships
  • Gleaning extracts entities AND relationships together in multi-round LLM calls
  • See HOW_IT_WORKS.md for the full pipeline + parameter reference

Module Structure

graphrag-core/
├── src/
│   ├── builder/         # TypedBuilder with type-state pattern
│   ├── config/          # Hierarchical configuration (figment)
│   ├── core/            # Core traits, errors with suggestions
│   ├── embeddings/      # 8 embedding providers
│   ├── entity/          # LLM-based gleaning extraction
│   ├── graph/           # Knowledge graph construction
│   ├── retrieval/       # ExplainedAnswer, search strategies
│   └── templates/       # Sectoral configuration templates
└── examples/

Testing

# Quick test with starter features
cargo test --features starter

# Full test suite
cargo test --all-features

# Test specific modules
cargo test --features starter builder::
cargo test --features starter retrieval::

Documentation

Cross-Platform Support

  • Linux - Full support with all features
  • macOS - Full support with Metal GPU acceleration
  • Windows - Full support with CUDA GPU acceleration
  • WASM - Core functionality (use wasm-bundle feature)

License

MIT License - see ../LICENSE for details.


Part of the GraphRAG-rs project | Main README | How It Works

graphrag-cli

A modern Terminal User Interface (TUI) for GraphRAG operations, built with Ratatui.

Features

  • Multi-pane TUI — Results viewer, Raw results, tabbed Info panel (Stats / Sources / History)
  • Markdown rendering — LLM answers rendered with bold, italic, headers, bullet points, code blocks
  • Three query modes — ASK (fast), EXPLAIN (confidence + sources), REASON (query decomposition)
  • Zero-LLM support — Algorithmic pipeline with hash embeddings, no model required
  • Vim-style navigation — j/k scrolling, Ctrl+1/2/3/4 focus switching
  • Slash command system/config, /load, /mode, /reason, /export, /workspace, and more
  • Query history — Tracked per session, exportable to Markdown
  • Workspace persistence — Save/load knowledge graphs to disk
  • Direct integration — Uses graphrag-core as a library (no HTTP server needed)

Installation

cd graphrag-rs

# Debug build (fast compile)
cargo build -p graphrag-cli

# Release build (optimized)
cargo build -p graphrag-cli --release

Quick Start — Zero LLM (Symposium example)

Build a knowledge graph from Plato’s Symposium with no LLM required — pure algorithmic extraction using regex patterns, TF-IDF, BM25, and PageRank.

Option A — Interactive TUI

cd /home/dio/graphrag-rs

cargo run -p graphrag-cli -- tui

Then inside the TUI:

/config tests/e2e/configs/algo_hash_medium__symposium.json5
/load docs-example/Symposium.txt
Who is Socrates and what is his role in the Symposium?

Graph builds in ~3-5 seconds. No Ollama needed.

Option B — TUI with config pre-loaded

cargo run -p graphrag-cli -- tui \
  --config tests/e2e/configs/algo_hash_medium__symposium.json5

Then just:

/load docs-example/Symposium.txt
What is Eros according to Aristophanes?

Option C — Benchmark (non-interactive, JSON output)

cargo run -p graphrag-cli -- bench \
  --config tests/e2e/configs/algo_hash_medium__symposium.json5 \
  --book docs-example/Symposium.txt \
  --questions "Who is Socrates?|What is love according to Aristophanes?|What is the Ladder of Beauty?"

Outputs structured JSON with timings, entity counts, answers, confidence scores, and source references.

Available configs

ConfigGraph buildingEmbeddingsLLM synthesisSpeed
algo_hash_small__symposium.json5NLP/regexHash (256d)❌ none~1-2s
algo_hash_medium__symposium.json5NLP/regexHash (384d)❌ none~3-5s
algo_nlp_mistral__symposium.json5NLP/regexnomic-embed-text✅ mistral-nemo~5-15s*
kv_no_gleaning_mistral__symposium.json5LLM single-passnomic-embed-text✅ mistral-nemo~30-60s

* build ~5s, synthesis ~5-10s per question (with KV cache after the first)

algo_nlp_mistral__symposium.json5 is the recommended config for anyone who wants:

  • a graph built quickly with classic NLP methods (no LLM at build time)
  • real semantic search with nomic-embed-text
  • answers synthesized by Mistral at query time with KV cache enabled

Quick Start — With Ollama (full semantic pipeline)

Requires Ollama running with nomic-embed-text and an LLM (e.g. mistral-nemo:latest).

cargo run -p graphrag-cli -- tui \
  --config tests/e2e/configs/kv_no_gleaning_mistral__symposium.json5

Inside TUI:

/load docs-example/Symposium.txt
/mode explain
How does Diotima describe the ascent to absolute beauty?

The EXPLAIN mode shows confidence score and source references in the Sources tab (Ctrl+4 → Ctrl+N).


CLI Commands

graphrag-cli [OPTIONS] [COMMAND]

Options:
  -c, --config <FILE>      Configuration file to pre-load
  -w, --workspace <NAME>   Workspace name
  -d, --debug              Enable debug logging
      --format <text|json> Output format (default: text)

Commands:
  tui        Start interactive TUI (default)
  setup      Interactive wizard to create a config file
  validate   Validate a configuration file
  bench      Run full E2E benchmark (Init → Load → Query)
  workspace  Manage workspaces (list, create, info, delete)

bench example

cargo run -p graphrag-cli -- bench \
  -c my_config.json5 \
  -b my_document.txt \
  -q "Question 1?|Question 2?|Question 3?"

Output JSON includes: init_ms, build_ms, total_query_ms, entities, relationships, chunks, per-query answer, confidence, sources.


TUI Layout

┌─────────────────────────────────────────────────────────────┐
│  Query Input (Ctrl+1)  (type queries or /commands here)     │
├────────────────────────────────────┬────────────────────────┤
│  Results Viewer (Ctrl+2)           │  Info Panel (Ctrl+4)   │
│  Markdown-rendered LLM answer      │  ┌─Stats─┬─Sources─┬  │
│  with confidence header in EXPLAIN │  │       │History  │  │
│  mode: [EXPLAIN | 85% ████████░░]  │  └───────┴─────────┘  │
├────────────────────────────────────┤  Ctrl+N cycles tabs    │
│  Raw Results (Ctrl+3)              │  (when Info focused)   │
│  Sources list / search results     │                        │
│  before LLM processing             │                        │
└────────────────────────────────────┴────────────────────────┘
│  Status Bar  [mode badge]  ℹ status message                 │
└─────────────────────────────────────────────────────────────┘

Keyboard Shortcuts

Global (IDE-Safe)

KeyAction
? / Ctrl+HToggle help overlay
Ctrl+CQuit
Ctrl+NCycle focus forward (Input → Results → Raw → Info)
Ctrl+PCycle focus backward
Ctrl+1Focus Query Input
Ctrl+2Focus Results Viewer
Ctrl+3Focus Raw Results
Ctrl+4Focus Info Panel
Ctrl+N (Info Panel focused)Cycle tabs: Stats → Sources → History
EscReturn focus to input

Input Box

KeyAction
EnterSubmit query or /command
Ctrl+DClear input

Scrolling (when viewer focused)

KeyAction
j / Scroll down one line
k / Scroll up one line
Alt+↓ / Alt+↑Scroll down/up (works even from input)
PageDown / Ctrl+DScroll down one page
PageUp / Ctrl+UScroll up one page
Home / EndJump to top / bottom

Slash Commands

CommandDescription
/config <file>Load a config file (JSON5, JSON, TOML)
/config showDisplay the currently loaded config
/load <file>Load and process a document
/load <file> --rebuildForce full rebuild before loading
/clearClear graph (keep documents)
/rebuildRe-extract from loaded documents
/statsShow entity/relationship/chunk counts
/entities [filter]List entities, optionally filtered
/mode ask|explain|reasonSwitch query mode (sticky)
/reason <query>One-shot reasoning query (decomposition)
/export <file.md>Export query history to Markdown
/workspace listList saved workspaces
/workspace save <name>Save current graph to disk
/workspace <name>Load a saved workspace
/workspace delete <name>Delete a workspace
/helpShow full command help

Query Modes

Switch with /mode <mode> or the badge in the status bar shows the active mode.

ModeCommandWhat it does
ASK (default)/mode askPlain answer, fastest
EXPLAIN/mode explainAnswer + confidence score + source references; Sources tab auto-opens
REASON/mode reasonQuery decomposition — splits complex questions into sub-queries

One-shot override (doesn’t change sticky mode):

/reason Compare the main arguments of each speaker about love

Architecture

graphrag-cli/src/
├── main.rs                    # CLI entry point (clap)
├── app.rs                     # Main event loop, action routing
├── action.rs                  # Action enum, QueryMode, QueryExplainedPayload
├── commands/mod.rs            # Slash command parser
├── config.rs                  # Config file loading (JSON5/JSON/TOML)
├── theme.rs                   # Dark/light color themes
├── tui.rs                     # Terminal setup/teardown
├── query_history.rs           # Per-session query history
├── workspace.rs               # Workspace metadata management
├── mode.rs                    # Input mode detection
├── handlers/
│   ├── graphrag.rs            # Thread-safe GraphRAG wrapper (Arc<Mutex<>>)
│   ├── bench.rs               # Benchmark runner (JSON output)
│   └── file_ops.rs            # File utilities
└── ui/
    ├── markdown.rs            # Markdown → ratatui Line<'static> parser
    ├── spinner.rs             # Braille spinner animation
    └── components/
        ├── query_input.rs     # Text input widget
        ├── results_viewer.rs  # Markdown-rendered answer + scrollbar
        ├── raw_results_viewer.rs  # Raw search results
        ├── info_panel.rs      # 3-tab panel (Stats/Sources/History)
        ├── status_bar.rs      # Status + query mode badge
        └── help_overlay.rs    # Modal help popup

Technology Stack

  • Ratatui 0.29 — TUI framework (immediate mode rendering)
  • Crossterm 0.28 — Cross-platform terminal events
  • tui-textarea 0.7 — Multi-line input widget
  • Tokio 1.32 — Async runtime
  • Clap 4.5 — CLI argument parsing
  • Dialoguer 0.11 — Interactive setup wizard
  • color-eyre 0.6 — Error reporting
  • graphrag-core — Knowledge graph engine (direct library call)

License

Same license as the parent graphrag-rs project.

GraphRAG Server

Production-ready REST API server for GraphRAG with multiple backend options.

Migration Notice: The server has been migrated from Axum to Actix-web 4.9 with Apistos for automatic OpenAPI 3.0.3 documentation generation. All endpoints remain the same, but the server now includes automatic API documentation at /openapi.json.

Features

Storage Backends

  • Qdrant Integration - Production vector database with 100M+ vectors support (client-server)
  • LanceDB Integration - Serverless embedded database for native/desktop apps
  • Graceful Fallback - Works without external database (in-memory mode)

Embeddings

  • Ollama Integration - Local embeddings via Ollama (nomic-embed-text, etc.)
  • Hash-based Fallback - Deterministic embeddings without external dependencies
  • Auto-detection - Automatically uses Ollama if available, falls back otherwise

API Features

  • REST API - Clean HTTP endpoints for all operations powered by Actix-web 4.9
  • OpenAPI 3.0.3 - Automatic API documentation via Apistos
  • Swagger UI - Interactive API explorer at /swagger
  • Vector Search - Semantic search with cosine similarity
  • Real Embeddings - Generate actual embeddings for queries and documents
  • CORS Support - Ready for browser clients
  • Health Checks - Monitor server and database status
  • Metrics - Query counts, embedding statistics, and performance tracking
  • Entity/Relationship Storage - Store graph metadata in vector database payloads

Quick Start

1. Start Qdrant (Docker)

cd graphrag-server
docker-compose up -d

# Or manually:
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

2. Start GraphRAG Server

# With Qdrant (recommended)
cargo run --bin graphrag-server --features qdrant

# Without Qdrant (in-memory mode)
cargo run --bin graphrag-server --no-default-features

Server starts on http://0.0.0.0:8080

API Documentation:

  • OpenAPI Spec: http://localhost:8080/openapi.json
  • Swagger UI: http://localhost:8080/swagger

3. Test API

# Health check
curl http://localhost:8080/health

# Add a document
curl -X POST http://localhost:8080/api/documents \
  -H "Content-Type: application/json" \
  -d '{
    "title": "GraphRAG Introduction",
    "content": "GraphRAG combines knowledge graphs with retrieval-augmented generation for enhanced AI systems."
  }'

# Query
curl -X POST http://localhost:8080/api/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is GraphRAG?",
    "top_k": 5
  }'

Configuration

Set via environment variables:

# Embeddings (choose backend)
export EMBEDDING_BACKEND="ollama"  # or "hash" for fallback
export EMBEDDING_DIM="384"  # 384 for MiniLM, 768 for BERT
export OLLAMA_URL="http://localhost"
export OLLAMA_EMBEDDING_MODEL="nomic-embed-text"  # or "mxbai-embed-large"

# Qdrant connection (optional)
export QDRANT_URL="http://localhost:6334"
export COLLECTION_NAME="graphrag"

# Run server
cargo run --bin graphrag-server --features ollama

Feature Flags

# With Qdrant + Ollama embeddings (recommended for production)
cargo run --bin graphrag-server --features "qdrant,ollama"

# With LanceDB (serverless, embedded)
cargo run --bin graphrag-server --features "lancedb,ollama"

# Minimal (hash-based embeddings, in-memory storage)
cargo run --bin graphrag-server --no-default-features

# With authentication
cargo run --bin graphrag-server --features "qdrant,ollama,auth"

API Endpoints

Health & Info

GET /

API information and available endpoints.

curl http://localhost:8080/

GET /health

Health check with statistics.

curl http://localhost:8080/health

Response:

{
  "status": "healthy",
  "timestamp": "2025-10-01T12:00:00Z",
  "document_count": 42,
  "graph_built": true,
  "total_queries": 1337,
  "backend": "qdrant",
  "embeddings": {
    "backend": "ollama",
    "available": true,
    "stats": {
      "total_requests": 100,
      "ollama_success": 95,
      "ollama_failures": 5,
      "fallback_used": 5
    }
  }
}

Configuration

The server now supports dynamic configuration via JSON REST API, allowing you to initialize the full GraphRAG pipeline without TOML files.

GET /api/config

Get the current configuration.

curl http://localhost:8080/api/config

Response:

{
  "success": true,
  "config": {
    "output_dir": "./output",
    "chunk_size": 1000,
    "chunk_overlap": 200,
    "embeddings": { ... },
    "graph": { ... },
    ...
  },
  "graphrag_initialized": true
}

POST /api/config

Set configuration and initialize the full GraphRAG pipeline.

curl -X POST http://localhost:8080/api/config \
  -H "Content-Type: application/json" \
  -d '{
    "output_dir": "./output",
    "chunk_size": 1000,
    "chunk_overlap": 200,
    "embeddings": {
      "backend": "ollama",
      "dimension": 768,
      "model": "nomic-embed-text",
      "fallback_to_hash": true,
      "batch_size": 32
    },
    "graph": {
      "max_connections": 25,
      "similarity_threshold": 0.75
    },
    "text": {
      "chunk_size": 1000,
      "chunk_overlap": 200,
      "languages": ["en"]
    },
    "entities": {
      "min_confidence": 0.65,
      "entity_types": ["PERSON", "CONCEPT", "LOCATION", "EVENT", "ORGANIZATION"]
    },
    "retrieval": {
      "top_k": 15,
      "search_algorithm": "cosine"
    },
    "parallel": {
      "num_threads": 8,
      "enabled": true,
      "min_batch_size": 10,
      "chunk_batch_size": 100,
      "parallel_embeddings": true,
      "parallel_graph_ops": true,
      "parallel_vector_ops": true
    },
    "ollama": {
      "enabled": true,
      "host": "http://localhost",
      "port": 11434,
      "embedding_model": "nomic-embed-text",
      "chat_model": "llama3.1:8b",
      "timeout_seconds": 300,
      "max_retries": 3,
      "fallback_to_hash": true
    },
    "enhancements": {
      "enabled": true
    }
  }'

GET /api/config/template

Get configuration templates with examples (minimal, ollama_production, high_performance).

curl http://localhost:8080/api/config/template

Response:

{
  "template": { ... },
  "description": "Full GraphRAG configuration template with all options",
  "examples": [
    {
      "name": "minimal",
      "description": "Minimal configuration with hash-based embeddings",
      "config": { ... }
    },
    {
      "name": "ollama_production",
      "description": "Production setup with Ollama LLM and real embeddings",
      "config": { ... }
    },
    {
      "name": "high_performance",
      "description": "Optimized for speed with parallel processing",
      "config": { ... }
    }
  ]
}

GET /api/config/default

Get the default configuration.

curl http://localhost:8080/api/config/default

POST /api/config/validate

Validate configuration without applying it.

curl -X POST http://localhost:8080/api/config/validate \
  -H "Content-Type: application/json" \
  -d '{ ... config object ... }'

Response:

{
  "valid": true,
  "message": "Configuration is valid"
}

Documents

POST /api/documents

Add a document to the knowledge graph.

curl -X POST http://localhost:8080/api/documents \
  -H "Content-Type: application/json" \
  -d '{
    "title": "My Document",
    "content": "Document content here..."
  }'

Response:

{
  "success": true,
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "message": "Document added to Qdrant successfully",
  "backend": "qdrant"
}

GET /api/documents

List all documents.

curl http://localhost:8080/api/documents

DELETE /api/documents/:id

Delete a document by ID.

curl -X DELETE http://localhost:8080/api/documents/550e8400-e29b-41d4-a716-446655440000

Query

POST /api/query

Query the knowledge graph with semantic search.

curl -X POST http://localhost:8080/api/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How does GraphRAG work?",
    "top_k": 5
  }'

Response:

{
  "query": "How does GraphRAG work?",
  "results": [
    {
      "document_id": "doc-1",
      "title": "GraphRAG Overview",
      "similarity": 0.92,
      "excerpt": "GraphRAG combines knowledge graphs with retrieval..."
    }
  ],
  "processing_time_ms": 15,
  "backend": "qdrant"
}

Graph Operations

POST /api/graph/build

Build/rebuild the knowledge graph.

curl -X POST http://localhost:8080/api/graph/build

GET /api/graph/stats

Get graph statistics.

curl http://localhost:8080/api/graph/stats

Response:

{
  "document_count": 42,
  "entity_count": 420,
  "relationship_count": 630,
  "vector_count": 840,
  "graph_built": true,
  "backend": "qdrant"
}

Architecture

With Qdrant (Production)

┌─────────────────┐
│  REST Client    │ (Browser, CLI, etc.)
└────────┬────────┘
         │ HTTP
┌────────▼─────────────────────┐
│   GraphRAG Server            │
│   ┌──────────────────────┐   │
│   │ Actix-web REST API   │   │
│   │ + Apistos OpenAPI    │   │
│   │ + CORS               │   │
│   │ + Tracing            │   │
│   └──────────┬───────────┘   │
│              │                │
│   ┌──────────▼───────────┐   │
│   │ Qdrant Client        │   │
│   │ + Vector Search      │   │
│   │ + Metadata Storage   │   │
│   └──────────┬───────────┘   │
└──────────────┼────────────────┘
               │ gRPC (port 6334)
┌──────────────▼────────────────┐
│   Qdrant Vector Database      │
│   + 100M+ vector capacity     │
│   + JSON payload storage      │
│   + Filtering & search        │
└───────────────────────────────┘

Without Qdrant (Development/Testing)

┌─────────────────┐
│  REST Client    │
└────────┬────────┘
         │ HTTP
┌────────▼─────────────────────┐
│   GraphRAG Server            │
│   ┌──────────────────────┐   │
│   │ Actix-web REST API   │   │
│   │ + Apistos OpenAPI    │   │
│   └──────────┬───────────┘   │
│              │                │
│   ┌──────────▼───────────┐   │
│   │ In-Memory Storage    │   │
│   │ + Vec<Document>      │   │
│   │ + Keyword matching   │   │
│   └──────────────────────┘   │
└───────────────────────────────┘

Qdrant Storage Schema

Collection Configuration

  • Name: graphrag (configurable)
  • Dimension: 384 (MiniLM) or 768 (BERT)
  • Distance: Cosine similarity
  • Indexing: HNSW (Hierarchical Navigable Small World)

Document Payload Structure

Each document in Qdrant stores:

{
  "id": "doc-uuid",
  "title": "Document Title",
  "text": "Full document text",
  "chunk_index": 0,
  "entities": [
    {
      "id": "entity-uuid",
      "name": "Entity Name",
      "entity_type": "Person|Organization|Location",
      "properties": {}
    }
  ],
  "relationships": [
    {
      "source": "entity-1",
      "relation": "WORKS_FOR",
      "target": "entity-2",
      "properties": {}
    }
  ],
  "timestamp": "2025-10-01T12:00:00Z",
  "custom": {}
}

Development

Build

# Development build
cargo build --bin graphrag-server

# Production build with optimizations
cargo build --release --bin graphrag-server

Test

# Unit tests
cargo test --bin graphrag-server

# Integration tests (requires Qdrant running)
docker-compose up -d
cargo test --bin graphrag-server --features qdrant -- --test-threads=1

Run

# Development mode with auto-reload
cargo watch -x 'run --bin graphrag-server'

# Production mode
cargo run --release --bin graphrag-server

TODO

Short Term

  • Real embedding generation (Ollama integrated)
  • OpenAPI 3.0.3 documentation (via Apistos)
  • Swagger UI integration (apistos swagger-ui, served at /swagger)
  • Entity extraction from documents
  • Relationship extraction
  • Batch document upload
  • Pagination for document listing

Medium Term

  • Authentication & authorization (feature temporarily disabled)
  • Rate limiting
  • OpenTelemetry metrics
  • Prometheus endpoint
  • API versioning

Long Term

  • GraphQL API
  • WebSocket support for streaming
  • Multi-tenant support
  • Advanced graph algorithms (PageRank, community detection)
  • LanceDB integration (alternative to Qdrant)

Deployment

Docker

# Coming soon
FROM rust:1.75 AS builder
WORKDIR /app
COPY . .
RUN cargo build --release --bin graphrag-server

FROM debian:bookworm-slim
COPY --from=builder /app/target/release/graphrag-server /usr/local/bin/
EXPOSE 8080
CMD ["graphrag-server"]

Docker Compose (Full Stack)

version: '3.8'
services:
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - qdrant_data:/qdrant/storage

  graphrag-server:
    build: .
    ports:
      - "8080:8080"
    environment:
      - QDRANT_URL=http://qdrant:6334
      - COLLECTION_NAME=graphrag
      - EMBEDDING_DIM=384
    depends_on:
      - qdrant

Performance

Benchmarks (Preliminary)

Hardware: M1 MacBook Pro, 16GB RAM

OperationQdrant BackendIn-Memory
Add document5-10ms<1ms
Query (top 10)10-20ms5-10ms
Build graph (1k docs)~2s~1s
Build graph (10k docs)~15s~8s

Note: Qdrant scales much better for large datasets (100k+ documents).

Troubleshooting

“Could not connect to Qdrant”

Cause: Qdrant not running or wrong URL.

Solution:

# Check Qdrant is running
docker ps | grep qdrant

# Start if not running
docker-compose up -d

# Verify connection
curl http://localhost:6333/healthz

“Collection not found”

Cause: Collection not created.

Solution: Server auto-creates collection on first run. Check logs:

cargo run --bin graphrag-server 2>&1 | grep collection

Slow query performance

Cause: Large dataset without proper indexing.

Solutions:

  1. Ensure HNSW indexing is enabled in Qdrant
  2. Adjust top_k parameter (lower = faster)
  3. Use filters to narrow search space

License

MIT

Credits

  • Qdrant - https://qdrant.tech/
  • Actix-web - https://actix.rs/
  • Apistos - https://github.com/netwo-io/apistos (OpenAPI 3.0.3 documentation)
  • GraphRAG - https://github.com/automataIA/graphrag-rs

Backend Comparison

Qdrant

Best for: Production deployments, cloud environments, microservices

  • ✅ Scales to 100M+ vectors
  • ✅ Distributed deployment support
  • ✅ Advanced filtering and search
  • ✅ Persistent storage with automatic backups
  • Requires separate server (Docker/cloud)

LanceDB

Best for: Desktop apps, native applications, embedded use cases

  • ✅ No server required (embedded)
  • ✅ Zero-copy data access
  • ✅ Automatic versioning
  • ✅ Works offline
  • Single-process access
  • Placeholder implementation (see lancedb_store.rs for integration guide)

In-Memory

Best for: Development, testing, demos

  • ✅ No dependencies
  • ✅ Fast for small datasets
  • Data lost on restart
  • Limited scalability

Embeddings Backends

Best for: Local development, privacy-focused deployments

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull embedding model
ollama pull nomic-embed-text  # 384 dimensions, 274MB
# or
ollama pull mxbai-embed-large  # 1024 dimensions, 670MB

# Start server with Ollama
EMBEDDING_BACKEND=ollama cargo run --bin graphrag-server --features "qdrant,ollama"

Pros:

  • ✅ Real semantic embeddings
  • ✅ Local/private (no API calls)
  • ✅ Multiple model options
  • ✅ Automatic fallback if unavailable

Cons:

  • Requires Ollama service running
  • Slower than hash-based (100-200ms per embedding)

Hash-based Fallback

Best for: Testing, offline environments, minimal dependencies

# Start server with hash embeddings (no Ollama needed)
EMBEDDING_BACKEND=hash cargo run --bin graphrag-server

Pros:

  • ✅ No external dependencies
  • ✅ Fast (<1ms per embedding)
  • ✅ Deterministic
  • ✅ Works offline

Cons:

  • Not semantic (hash-based, not neural)
  • Lower search quality
  • Fixed dimension (384)

Example Workflows

Production Setup (Qdrant + Ollama)

# 1. Start Qdrant
docker-compose up -d

# 2. Start Ollama
ollama serve &
ollama pull nomic-embed-text

# 3. Start GraphRAG server
export EMBEDDING_BACKEND=ollama
export QDRANT_URL=http://localhost:6334
cargo run --release --bin graphrag-server --features "qdrant,ollama"

# 4. Add documents with real embeddings
curl -X POST http://localhost:8080/api/documents \
  -H "Content-Type: application/json" \
  -d '{"title":"AI Safety","content":"AI safety research focuses on..."}'

# 5. Query with semantic search
curl -X POST http://localhost:8080/api/query \
  -H "Content-Type: application/json" \
  -d '{"query":"Tell me about AI safety","top_k":5}'

Desktop App (LanceDB + Ollama)

# 1. Start Ollama
ollama serve &
ollama pull nomic-embed-text

# 2. Start GraphRAG with LanceDB (embedded)
export EMBEDDING_BACKEND=ollama
export LANCEDB_PATH=./data/graphrag.lance
cargo run --release --bin graphrag-server --features "lancedb,ollama"

# No external database needed! Data stored in ./data/

Minimal Setup (Hash embeddings)

# Just run the server - no dependencies!
EMBEDDING_BACKEND=hash cargo run --bin graphrag-server --no-default-features

# Works immediately with hash-based embeddings

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     GraphRAG Server                          │
│                                                               │
│  ┌──────────────┐      ┌──────────────┐                     │
│  │  Embedding   │      │   Storage    │                     │
│  │  Service     │      │   Backend    │                     │
│  │              │      │              │                     │
│  │  - Ollama    │      │  - Qdrant    │                     │
│  │  - Hash      │      │  - LanceDB   │                     │
│  │  Fallback    │      │  - Memory    │                     │
│  └──────────────┘      └──────────────┘                     │
│         │                      │                             │
│         └──────────┬───────────┘                             │
│                    │                                         │
│              ┌─────▼─────┐                                   │
│              │  REST API │                                   │
│              └───────────┘                                   │
└─────────────────────────────────────────────────────────────┘

Performance

Embeddings

  • Ollama (nomic-embed-text): ~100-200ms per document
  • Hash-based: <1ms per document
  • Caching: Automatic with LRU cache
  • Qdrant: <50ms for 1M vectors with HNSW index
  • LanceDB: <100ms for 100K vectors
  • In-memory: <10ms for 10K vectors

Troubleshooting

Ollama not connecting

# Check Ollama is running
curl http://localhost:11434/api/tags

# Check model is available
ollama list | grep nomic-embed-text

# Pull model if missing
ollama pull nomic-embed-text

Qdrant connection failed

# Check Qdrant is running
curl http://localhost:6333/

# Check Docker container
docker ps | grep qdrant

# Restart Qdrant
docker-compose restart

Slow embedding generation

# Use smaller model
ollama pull nomic-embed-text  # 384 dim, faster

# Or use hash fallback for testing
export EMBEDDING_BACKEND=hash

Migration to Actix-web + Apistos

What Changed?

Previous Stack:

  • Web Framework: Axum 0.8
  • Documentation: Manual/external tools

Current Stack:

  • Web Framework: Actix-web 4.9 (high-performance, production-ready)
  • Documentation: Apistos 0.6 (automatic OpenAPI 3.0.3 generation)
  • API Schema: Automatically generated from Rust types

Benefits

  1. Automatic API Documentation: OpenAPI 3.0.3 spec generated directly from code
  2. Type-Safe Schemas: Request/response models automatically documented via #[derive(JsonSchema, ApiComponent)]
  3. Production-Ready: Actix-web is battle-tested in high-traffic production environments
  4. Better Error Handling: Structured error responses with OpenAPI documentation

Breaking Changes

None! All API endpoints remain identical. Clients don’t need any changes.

Temporary Limitations

  • Authentication feature disabled: The auth feature requires middleware migration and is temporarily unavailable. Will be re-enabled in a future update.
  • Swagger UI setup incomplete: Basic OpenAPI spec is generated, but interactive Swagger UI is not yet fully configured (coming soon).

Developer Notes

When adding new endpoints:

#![allow(unused)]
fn main() {
use apistos::api_operation;
use apistos_gen::ApiErrorComponent;
use schemars::JsonSchema;

// Annotate request/response models
#[derive(Serialize, Deserialize, JsonSchema, ApiComponent)]
pub struct MyRequest {
    #[schemars(example = "example_value")]
    pub field: String,
}

// Annotate handlers
#[api_operation(
    tag = "my_tag",
    summary = "Short description",
    description = "Detailed description",
    error_code = 400,
    error_code = 500
)]
async fn my_handler(
    state: Data<AppState>,
    body: Json<MyRequest>,
) -> Result<Json<MyResponse>, ApiError> {
    // Handler logic
}

// Register with Apistos routing
.service(
    scope("/api/my-endpoint")
        .service(resource("").route(post().to(my_handler)))
)
}

License

See LICENSE in the root directory.

GraphRAG WASM — Browser-Native Knowledge Graph RAG

Rust Leptos

A complete GraphRAG pipeline — document ingestion, knowledge-graph build, retrieval, and LLM synthesis — running entirely in the browser via WebAssembly. No server required (an optional local Ollama backend is supported).

Quick Start

rustup target add wasm32-unknown-unknown
cargo install trunk

cd graphrag-wasm
trunk serve            # dev server on http://localhost:8080
trunk build --release  # production bundle in dist/

The UI: a 3-column chat shell

The interface is a single Nordic-Minimal chat shell (no tabs, no DaisyUI — a flat hand-written stylesheet). See Chat discussion.html for the reference mockup the layout mirrors verbatim.

ColumnContents
LeftRailBrand, source documents, Flat/Hierarchy toggle, Build button
StageActive source header, the thread of question/answer turns, the composer input
RightRailPer-query subgraph SVG, pipeline progress rows, mini-stats, reference cards

Answers are streamed token-by-token; inline citations ([1], [2]…) link to reference cards in the RightRail. The per-query subgraph unions the entities from the top-K retrieved chunks and lays them out with a built-in force-directed layout.

How it works (end-to-end, in the browser)

  1. Document processing — chunking with configurable size/overlap.
  2. Entity extraction — rule-based / WebLLM-assisted extraction.
  3. EmbeddingsONNX Runtime Web (MiniLM-L6), run off the main thread (ort.env.wasm.proxy = true) so the UI never blocks during inference.
  4. Knowledge graph — in-memory entities, chunks, and relationships.
  5. Retrieval — pure-Rust cosine similarity, top-K via VectorIndex::search.
  6. SynthesisWebLLM (in-browser) or Ollama (local server); citations are post-processed and wired to reference cards.

Documents persist across reloads in IndexedDB (see src/persist.rs).

What comes from graphrag-core vs. reimplemented here

This crate is not a mock — it links graphrag-core (path dependency, wasm-safe feature subset) and drives a real graphrag_core::GraphRAG instance: document ingestion (add_document_from_text), the knowledge-graph types (Entity, Relationship), Leiden community detection, and adaptive query routing all come straight from core.

The ML hot-path stages are reimplemented browser-side, because core’s native backends (Ollama HTTP, candle, the LLM extractors) do not run inside a browser:

StageSource
Document ingestion, graph types, Leiden, adaptive routinggraphrag-core
Embeddingswasm-side onnx_embedder.rs (ONNX Runtime Web / WebGPU, hash fallback)
Entity extractionwasm-side entity_extractor.rs (WebLLM-assisted or rule-based)
Vector searchwasm-side vector_search.rs (pure-Rust cosine)

Note: src/lib.rs also exposes a separate wasm_bindgen GraphRAG wrapper for direct JS use (new GraphRAG(384) + pure vector search) — distinct from graphrag_core::GraphRAG despite the shared name.

LLM backends: WebLLM vs Ollama

WebLLM (default) — 100% in-browser via WebGPU

import { UnifiedLlmClient } from './graphrag_wasm.js';
const llm = UnifiedLlmClient.withWebLLM("Phi-3-mini-4k-instruct-q4f16_1-MLC");
llm.setTemperature(0.7);
const answer = await llm.generate("What is GraphRAG?");
  • ✅ Full privacy (no data leaves the browser), works offline after model download.
  • First load downloads the model (~1–2 GB); needs a WebGPU-capable browser; small models only (1–3B).

WebLLM and ONNX inference both run in dedicated web workers (webllm-worker.js + ORT’s proxy worker), keeping main-thread blocking under ~50 ms.

Ollama HTTP — local server, larger models

const llm = UnifiedLlmClient.withOllama("http://localhost:11434", "llama3.1:8b");
const answer = await llm.generate("What is GraphRAG?");
  • ✅ 7B–70B+ models, better quality, full GPU (CUDA/Metal).
  • Requires a running Ollama server + CORS:
ollama pull llama3.1:8b
OLLAMA_ORIGINS="http://localhost:8080" ollama serve

UnifiedLlmClient exposes the same generate / chat / checkAvailability API for both backends, so switching is a one-line change.

Tech stack

ComponentTechnology
UILeptos (reactive Rust)
BuildTrunk
Stylingflat Nordic-Minimal CSS (tailwind.css, no @tailwind directives)
TokenizerHuggingFace tokenizers (unstable_wasm)
EmbeddingsONNX Runtime Web (off-main-thread, optional WebGPU)
LLMWebLLM (in-browser) or Ollama HTTP
Vector searchpure Rust (cosine similarity)
StorageIndexedDB

Project layout

graphrag-wasm/
├── src/
│   ├── main.rs                 # chat-shell UI (LeftRail / Stage / RightRail)
│   ├── components/
│   │   ├── chat_shell.rs       # data types, citation parser, subgraph builder
│   │   └── force_layout.rs     # force-directed subgraph layout
│   ├── webllm.rs               # WebLLM client (+ web-worker engine)
│   ├── ollama_http.rs          # Ollama HTTP client
│   ├── llm_provider.rs         # UnifiedLlmClient abstraction
│   ├── onnx_embedder.rs        # ONNX Runtime Web embeddings
│   ├── vector_search.rs        # cosine similarity
│   └── persist.rs              # IndexedDB persistence
├── webllm-worker.js            # WebWorker MLC engine handler
├── index.html                  # entry point + ORT/WebLLM worker wiring
├── tailwind.css                # flat stylesheet
└── Trunk.toml                  # build config

Browser support

Chrome/Edge 87+, Firefox 89+, Safari 15.2+ (incl. mobile). Requires WebAssembly + ES2020 modules; WebGPU is optional (accelerates embeddings/WebLLM when present).

Tests

A Playwright parity test (tests/playwright/chat_layout.sh) asserts the WASM SPA matches the mockup on 19 shared selectors. Unit tests:

cargo test --target wasm32-unknown-unknown

License

See the main repository LICENSE.

API Reference

The full Rust API reference is generated by rustdoc and hosted on docs.rs:

docs.rs/graphrag-core

This covers the public surface of the core library — GraphRAG, Config, the extractor traits, and every module. It is rebuilt automatically for each published release.

For the other crates:

To browse the API for an unpublished local checkout, run:

cargo doc --workspace --no-deps --open

Troubleshooting

{{#include ../../../docs/TROUBLESHOOTING_OBJC_EXCEPTION.md}}

Changelog

All notable changes to GraphRAG-RS will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Security

CI green: cargo-deny advisories/licenses + rustfmt (2026-05-31)

  • Vulnerabilities patched via lockfile bumps: rand 0.8.5→0.8.6 and 0.9.2→0.9.4 (RUSTSEC-2026-0097 unsoundness), bytes 1.10.1→1.11.1 (RUSTSEC-2026-0007 integer overflow), rustls-webpki 0.103.7→0.103.13 (RUSTSEC-2026-0049/0098/0099/0104 — CRL + name-constraint vulns). All patch-level, non-breaking.
  • deny.toml licenses: added BSL-1.0 (Boost) and CDLA-Permissive-2.0 (Mozilla CA bundle via webpki-roots) to the allow-list — both permissive, were failing the licenses job.
  • deny.toml advisory ignores (unfixable here, documented inline): unmaintained transitive crates proc-macro-error, bincode, json, number_prefix, paste, rustls-pemfile; lru 0.12 unsoundness (RUSTSEC-2026-0002, pinned by ratatui 0.29, unreachable in our usage); and time DoS (RUSTSEC-2026-0009) — its fix (≥0.3.47) requires rustc 1.88, above our MSRV 1.85, so time is held at 0.3.44 and the advisory accepted (reachable only via untrusted RFC-2822 parsing in the server, not core/cli). Revisit when MSRV moves to ≥1.88.
  • Formatting: ran cargo fmt --all over the workspace (71 files) to clear the long-standing rustfmt CI job. Mechanical, no behavior change.
  • --all-features advisory/license coverage: the cargo-deny-action defaults to --all-features, so CI also scans the optional lancedb tree (lance/datafusion/arrow). Patched lz4_flex 0.11.5→0.11.6 / 0.12.0→0.12.2 (RUSTSEC-2026-0041) and tar 0.4.44→0.4.46 (RUSTSEC-2026-0067/0068); allowed 0BSD (mock_instant). Added [graph] all-features = true to deny.toml so local cargo deny check sees the same graph as CI (prevents local≠CI drift).
  • CI SIGILL fix: set RUSTFLAGS = "-C target-cpu=x86-64-v2" in ci.yml to override the repo’s .cargo/config.toml -C target-cpu=native. On GitHub’s heterogeneous runners native can emit instructions the silicon traps (SIGILL crashing rustc/proc-macros, seen building ollama-rs). Verified the rustc invocation: an empty CARGO_BUILD_RUSTFLAGS is ignored and doesn’t override the config flag — only a non-empty RUSTFLAGS (highest precedence) fully replaces it. Local dev keeps target-cpu=native; CI uses the portable x86-64-v2 baseline.

Added

Documentation site (2026-05-31)

  • mdBook documentation site under book/, deployed to GitHub Pages at https://automataia.github.io/graphrag-rs/. Curated, English-only, user-facing TOC (book/src/SUMMARY.md) covering getting-started, concepts, configuration, features, and per-crate guides. Internal dev reports and Italian guides are intentionally excluded.
  • Chapters are thin {{#include}} wrappers over the canonical sources (HOW_IT_WORKS.md, crate READMEs, curated docs/*.md) so there is a single source of truth and no content drift. Front-door pages (introduction.md, getting-started/overview.md, quickstart.md) are authored.
  • Mermaid diagrams render via the mdbook-mermaid preprocessor; built-in client-side search enabled.
  • API reference links out to docs.rs/graphrag-core rather than self-hosting cargo doc.
  • New CI workflow .github/workflows/docs.yml builds the book (pinned mdbook 0.5.3 + mdbook-mermaid 0.17.0 prebuilt binaries) and deploys via actions/deploy-pages. The generated book/book/ output is git-ignored. Manual one-time step: set repo Settings → Pages → Source = “GitHub Actions”.
  • README: added a docs-site badge.
  • Translated to English the doc sources the site includes that still contained Italian: docs/INCREMENTAL_UPDATES.md, docs/TUI_USAGE_GUIDE.md, docs/ENRICHMENT_USAGE_GUIDE.md, docs/SUMMARIZATION_CONFIG.md, the graphrag-cli/README.md config table notes, and the Italian entries in this CHANGELOG. Fixed stale repo URLs (anthropics/*automataIA/graphrag-rs) in the translated guides. The public site is now English-only end to end.
  • Stripped decorative/pictographic emoji (the 📚🚀📖 family) from the doc sources the site includes, fixing “tofu” boxes that appeared wherever the viewer’s font lacked an emoji glyph (mdBook’s default theme has no emoji-font fallback — a generic missing-glyph issue, not a bug). Preserved arrows (→), box-drawing/ASCII diagrams (━│▼█), and data symbols (✅❌★☆); converted rating ⭐→★ to keep ratings rendering. Keycap-numbered headings (1./2.) replaced the 1️⃣ style.

[0.2.0] - 2026-05-31

Fixed

  • arrow workspace dep: added default-features = false to arrow = "57" in the workspace Cargo.toml. Previously, the default-features = false directive in graphrag-core/Cargo.toml was silently ignored by Cargo (build-time warning).
  • documentation metadata for the graphrag crate: added documentation = "https://docs.rs/graphrag" in graphrag/Cargo.toml, aligning the wrapper crate with graphrag-core and graphrag-cli.

Code/architecture/product quality audit (2026-05-30)

Added

  • CI/CD: new workflow .github/workflows/ci.yml. The repo previously had no CI automation. Blocking jobs: clippy --workspace --lib -D warnings (now green, see below), test -p graphrag-core --lib, cargo-deny. The fmt job is informational and non-blocking (continue-on-error) until the repo is made cargo fmt --all clean (pre-existing repo-wide formatting debt).
  • Security tooling: deny.toml (advisories + permissive licenses + duplicate ban) and SECURITY.md (private disclosure policy via GitHub Security Advisories).
  • Drift-guard tests (config/setconfig.rs): gliner_setconfig_default_matches_runtime and autosave_setconfig_default_matches_runtime fail at build time if the serde leaf-struct defaults diverge from the canonical runtime ones, preventing “5-point-sync” drift. OllamaConfig is excluded on purpose (by-design divergence: offline-first runtime vs user-facing schema).
  • Crate metadata: documentation (docs.rs) and readme fields added to graphrag-core and graphrag-cli for publishing on crates.io.

Documentation polish (2026-05-30)

  • graphrag/README.md: the wrapper meta-crate had no README (only Cargo.toml
    • src). Added: explains that it re-exports graphrag-core and provides the graphrag binary, with a binary quick-start + library usage and links to the core/root README.
  • Module //! headers added to the 10 graphrag-core modules that lacked them (previously starting with use/pub mod/#[cfg] or a /// on the first submodule): config, graph, generation, critic, retrieval, summarization, vector, entity, text, query. Every module’s rustdoc page now shows a description. Doc-comments only, no behavior change; clippy -p graphrag-core -D warnings stays green and cargo doc introduces no new warnings.

PageRank: score normalization (dangling nodes) (2026-05-30)

  • Bug fix: scores_to_entity_map in graph/pagerank.rs now L1-normalizes the scores (sum = 1.0). Dangling nodes (no outgoing edges) lost rank mass on every iteration, leaving the sum < 1.0. Single fix point → covers all paths (dense/parallel/sparse). Unblocks 3 previously-failing tests: test_pagerank_convergence, test_personalized_pagerank, test_precompute_global_pagerank (visible only under the pagerank feature, activated by --workspace feature-unification).

Swagger UI served at /swagger (2026-05-30)

  • graphrag-server: the Swagger UI was announced but not served (“coming soon”). Now exposed at /swagger via apistos’s native support (features = ["swagger-ui"], already enabled) — apistos-swagger-ui bundles the official Swagger UI assets, so no new dependency. Changed .build("/openapi.json").build_with(..., BuildConfig::default().with(SwaggerUIConfig::new(&"/swagger"))) in main.rs. README updated (removed “coming soon”).

Clean clippy on examples/tests + green doctests (2026-05-30)

  • Clippy examples/tests: cargo clippy --examples --tests -p graphrag-core -- -D warnings is now green. Bulk via cargo clippy --fix; manual tail: ///!//! (embeddings demo), .filter().next_back().rfind(), .clone() on a double ref → .iter().copied(), ignored let _ = on Result, std::slice::from_ref, removal of unused vars.
  • Doctest: cargo test --doc -p graphrag-core → 47 pass / 0 fail / 17 ignored. 7 illustrative, non-self-contained examples (require a live Ollama, an async runtime, or undefined setup variables — core::ChunkingStrategy, build_relationship_hierarchy, KV-cache Ollama, pipeline_executor, etc.) marked ```ignore. The hero example still runs and is green.
  • clippy --fix regression corrected: config/enhancements.rs:770--fix had removed mut from let count, seeing it as inactive under default features; restored let mut count with #[allow(unused_mut)] (the count += 1s are behind #[cfg(feature = ...)]).

Stale examples/tests recompile (2026-05-30)

  • Stale struct initializers: added the missing temporal/causal fields (all None) to the Entity literals (first_mentioned, last_mentioned, temporal_validity) and Relationship literals (embedding, temporal_type, temporal_range, causal_strength) in the llm_evaluation_demo, advanced_nlp_demo, hierarchical_graphrag_demo, workspace_demo, tom_sawyer_workspace examples. They had fallen behind the evolution of Entity/Relationship in core/mod.rs (Phase 1.2) and broke cargo build --examples.
  • complete_zero_cost_graphrag_demo: Config literal closed with ..Default::default() (it was missing advanced_features, gliner, suppress_progress_bars) and the EntityConfig literal completed with use_atomic_facts: false + max_fact_tokens: 400.
  • Per-feature gating (graphrag-core Cargo.toml): hierarchical_graphrag_demo now required-features = ["leiden"] (uses LeidenConfig / detect_hierarchical_communities, #[cfg(feature = "leiden")]) and the incremental_integration test required-features = ["incremental"] (it imported graphrag_core::incremental). So a default cargo build/test --workspace stays green without pulling in the optional features.
  • Chat discussion.html: added the standard line-clamp:3 property alongside -webkit-line-clamp (CSS vendorPrefix linter).
  • Verification: cargo build --examples --tests --workspace → clean Finished; cargo test -p graphrag-core --lib → 365 pass / 0 fail. The 3 pagerank tests that fail under --workspace feature-unification are pre-existing (confirmed on a clean tree).

Changed

  • Dependency dedup (anti-bloat): aligned two direct workspace dependencies to versions already present transitively, eliminating duplicate versions in graphrag-cli’s -e normal tree:
    • strum 0.25 → 0.26 (matches ratatui 0.29) — removes duplicate strum + strum_macros.
    • itertools 0.12 → 0.13 (matches ratatui/unicode-truncate).
    • Real duplicates in graphrag-cli’s normal tree dropped from 34 to 26. Verified that graphrag-core (the published crate) has only 4 unavoidable transitive duplicates (getrandom 0.2/0.3, webpki-roots 0.26/1.0, TLS stack). rand 0.8→0.9 NOT done (API-breaking, only deduplicated the unpublished server binary).

Fixed

  • CLI crash at startup on all non-TUI subcommands (index, ask, bench, setup, validate, …): color_eyre::install() was called twice — in graphrag-cli/src/main.rs:10 and again inside run() at lib.rs:197 — and the second install aborted with “could not set the provided Theme globally as another was already set”. Removed the duplicate install() from main.rs; now both binaries (graphrag-cli and the graphrag meta-crate, which doesn’t install on its own) install exactly once via run(). Caught by running the e2e benchmarks (bench).
  • MSRV corrected and verified: rust-version changed from 1.75 (false, never tested) to 1.85. The real floor is imposed by the direct dependency jsonfixer, which uses edition = "2024" (requires rustc ≥ 1.85). Build-verified on the 1.85 toolchain for graphrag-core and graphrag-cli. New msrv CI job that builds on 1.85. Analysis method: floor from cargo metadata (max rust_version declared among the normal deps) + build verification on a single toolchain (no costly bisect).
  • Lint debt zeroed (green workspace clippy): resolved 38 pre-existing clippy errors that surfaced under cargo clippy --workspace --lib -- -D warnings (Rust 1.95). Diagnosis: graphrag-core in isolation (default features) was already clean; the errors were in core’s optional modules (incremental, rograg, lightrag, embeddings/ollama) activated by the cli/server features + 3 errors of graphrag-cli’s own. Idiomatic fixes (to_vec(), iter_mut().enumerate(), if let Some, sort_by_key(Reverse(..)), type aliases NodeDeltaResult/ EdgeDeltaResult) and targeted, commented #[allow]s where a rename would break the serde API (PendingUpdateType) or for a private 10-argument helper. Not an interface break: the crates compile and link correctly.
  • GLiNER default drift: default_gliner_entity_labels/default_gliner_relation_labels in config/setconfig.rs were misaligned with the runtime GlinerConfig::default() (missing "concept" and "causes"). Now aligned with the canonical default (4 entity + 3 relation labels). Not observable in the existing e2e configs (they set the labels explicitly); relevant only when GLiNER is enabled via TOML while omitting the labels.

Documentation

  • Markdown doc consolidation (few but useful): reduced the ~55 tracked .md files to a keystone set. Deleted 39 files among process artifacts (report.md, TODO.md, *_COMPLETE.md, *_SUMMARY.md, *_STATUS.md, MERGE_COMPLETE.md, IMPLEMENTATION_SUMMARY.md) and satellite integration guides now covered by the keystones (graphrag-core/{ADVANCED_FEATURES,OLLAMA_INTEGRATION,LEIDEN_INTEGRATION,LIGHTRAG_INTEGRATION, HIPPORAG_INTEGRATION,CROSS_ENCODER_INTEGRATION,ENTITY_EXTRACTION,EMBEDDINGS_CONFIG, PIPELINE_ARCHITECTURE,QUICKSTART,ENRICHMENT_IMPLEMENTATION,WORKSPACE_PERSISTENCE_SUMMARY}.md, the src/{embeddings/README,graph/TRAVERSAL_GUIDE}.md, the entire series of non-README graphrag-wasm/*.md guides, examples/MULTI_DOCUMENT_PIPELINE.md). The surviving keystones: README.md, HOW_IT_WORKS.md, CHANGELOG.md, the 4 crate READMEs, config/JSON5_CONFIG_GUIDE.md. The docs/ folder is git-ignored (local notes) and is not touched.
  • Keystone staleness fixes: MSRV badge/prerequisites 1.701.85 in the root README; removed references to the deleted graphrag-leptos crate (workspace layout now 5-crate
    • the graphrag meta-crate, dependency graph updated); “Web UI” section rewritten around the chat-shell. HOW_IT_WORKS.md: the WASM section now points to graphrag-wasm (no longer to the deleted graphrag-leptos).
  • graphrag-wasm README rewritten: the old 5-tab DaisyUI UI is replaced by the documentation of the 3-column Nordic-Minimal chat-shell (LeftRail/Stage/RightRail), off-main-thread inference, citations, IndexedDB persistence; removed the dead links to the deleted satellite guides.
  • Internal links repointed: all links to the deleted docs (in README.md, HOW_IT_WORKS.md, graphrag-core/README.md) now point to HOW_IT_WORKS.md, config/JSON5_CONFIG_GUIDE.md, CHANGELOG.md, or docs.rs/graphrag-core.

Removed

  • Dead code: removed graphrag-server/src/main_axum_old.rs (~31KB, orphan file with no references, neither a bin-target nor a module).
  • Unused dependency: removed text_analysis = "0.3" from graphrag-core and from [workspace.dependencies] (detected with cargo machete, verified: no use in the code — the only match was the string "context_analysis"). The other cargo machete reports (getrandom, gline-rs, js-sys, web-sys, tower, text-splitter) are verified false positives (wasm/api feature-enablers or crates whose lib name differs from the package name, like gline-rsgliner) and kept.

Changed

graphrag-wasm chat-shell rewrite (Nordic-Minimal) (2026-05-17)

  • BREAKING: the 5-tab daisyUI UI (Build / Explore / Query / Hierarchy / Settings) is replaced by a single 3-column chat shell that mirrors the Chat discussion.html Nordic-Minimal mockup verbatim (palette, font stack Newsreader / Geist / Geist Mono, class names, citation/hover wiring).
    • New layout in graphrag-wasm/src/main.rs: LeftRail (brand + sources + Flat/Hierarchy toggle + Build button), Stage (head with active source, thread of Turns, composer), RightRail (subgraph SVG + pipeline rows + ministats + references). All real data: documents come from the existing IndexedDB signal, pipeline progress is driven by the existing BuildStatus/BuildStage, embeddings come from ONNX Runtime Web + tokenizer.json, retrieval from VectorIndex::search, answers from WebLLM (Phi-3-mini for synthesis, Qwen for extraction), citations are post-processed via parse_answer_with_cites and link to <button class="cite"><div class="ref-card"> through the reactive active_ref: Option<u32> signal — no inline JS.
    • New module graphrag-wasm/src/components/chat_shell.rs holds the data types (ChatTurn, RefCard, AnswerSegment, SubgraphData), the citation parser and the per-query build_subgraph builder that unions entities from the top-K retrieved chunks and feeds them through components::force_layout::ForceLayout (320×240 viewBox, 16-node / 21-edge cap matching the mockup density label).
    • Styling: graphrag-wasm/tailwind.css is now a flat Nordic-Minimal stylesheet (no @tailwind directives, no daisyUI); graphrag-wasm/index.html drops lucide CDN + MutationObserver and adds the Google-fonts preconnect block.
    • leptos-lucide-rs dependency removed from graphrag-wasm/Cargo.toml.
    • Legacy daisyUI components (components/{settings,hierarchy,ui_components,chat_component}.rs) remain on disk for reference but are no longer compiled — components/mod.rs only exports chat_shell + force_layout.
    • Parity test: graphrag-wasm/tests/playwright/chat_layout.sh drives playwright-cli: opens the mockup over python3 -m http.server and the WASM SPA on trunk serve, captures 1440×900 screenshots (tests/playwright/artifacts/{mockup,wasm}.png) and asserts 19 shared selectors (.app, .rail-left .doc-item, .stage-title, .bubble-q, .cite, .stages .pls, .graph-frame svg, .ref-card, .composer input, …). Current status: 19/19 pass.

Added

2026 best-practices pass (graphrag-core ↔ graphrag-wasm) (2026-05-16)

  • Off-main-thread inference (Stage 3b) for graphrag-wasm.

    • WebLLM: WebLLM::new and WebLLM::new_with_progress in graphrag-wasm/src/webllm.rs now auto-detect a pre-spawned window.webllmWorker and switch to CreateWebWorkerMLCEngine, keeping the same chat.completions.create surface (and chat_stream’s async-iterator) intact. Falls back to the main-thread engine if worker spawn fails. New sidecar graphrag-wasm/webllm-worker.js hosts WebWorkerMLCEngineHandler (15 LOC).
    • ONNX Runtime Web: ort.env.wasm.proxy = true + numThreads = 1 set immediately after ort.min.js loads in graphrag-wasm/index.html, so all InferenceSession.run calls execute in ORT’s dedicated worker.
    • Trade-off vs the plan’s gloo-worker route: no second wasm bundle, no Rust worker scaffolding, ~30 LOC swap. Verification (“main-thread blocked < 50 ms during inference”) met via the runtimes’ built-in workers.
  • Token-streaming UX in graphrag-wasm QueryTab. Replaced the blocking WebLLM::chat(...) call at graphrag-wasm/src/main.rs:1604 with chat_stream(...): tokens are now appended to the results signal incrementally as they arrive from the model, matching 2026 in-browser-LLM UX guidance. The pre-existing streaming API in graphrag-wasm/src/webllm.rs:334 was previously unused.

  • IndexedDB persistence for the document set. New graphrag-wasm/src/persist.rs wraps IndexedDBStore with open_store, save_document, delete_document, load_all_documents. The App component restores documents on first load; manual input, file upload, Symposium-demo load, and document-remove handlers all persist their mutations. Reloading the page now preserves the document set instead of resetting to empty.

  • WAI-ARIA tabs pattern in graphrag-wasm. All 5 tab panels are now mounted permanently inside a <main id="main-content"> landmark with hidden=move || active_tab.get() != Tab::X. Each tab button gained an id (tab-build, tab-explore, etc.) matching the panel’s aria-labelledby. This fixes Lighthouse aria-valid-attr-value and landmark-one-main audits, and preserves component state across tab switches.

  • SEO: added <meta name="description"> and <link rel="canonical"> plus <meta name="color-scheme" content="dark light"> to graphrag-wasm/index.html. External links in the footer gained rel="noopener noreferrer".

  • Downloaded MiniLM-L6-v2 ONNX model (87MB) to graphrag-wasm/models/minilm-l6.onnx for semantic query embeddings. Previously the directory was empty, causing fallback to hash-based embeddings which produced no meaningful search results.

Removed

Broken orphan example crates deleted (2026-05-16)

  • examples/web-app/ and examples/graphrag-leptos-demo/ both depended on the deleted graphrag-leptos crate (merged into graphrag-wasm in March 2025). They were excluded from the workspace so they did not block builds, but were misleading for newcomers. Functionality is fully covered by graphrag-wasm itself.
  • Dropped exclude = ["examples/web-app"] from root Cargo.toml.

graphrag_py Python bindings crate deleted (2026-05-16)

  • Removed graphrag_py/ directory and workspace member entry in root Cargo.toml.
  • Reason: legacy crate, pyo3 0.21 (out-of-date), last touched 4 commits ago before the KV-cache / GLiNER / contextual-enricher / persistence wave. API frozen pre-feb-2026, never published (publish = false), Development Status :: 4 - Beta.
  • BREAKING: Python bindings no longer build from this repo. Future Python support should live in a separate repo with current pyo3.

Changed

Clippy gate restored on wasm32-unknown-unknown target (2026-05-16)

cargo clippy --lib -p graphrag-core --no-default-features --features "wasm-bundle" --target wasm32-unknown-unknown -- -D warnings went from 54 errors → 0. Native default-features pass also restored to 0 errors. Both targets and the 363 native lib tests now pass cleanly under the PostToolUse clippy hook.

  • Mechanical lints auto-applied: sort_by_key (5×), clamp (5×), unwrap_or_default, is_some_and, manual_abs_diff, manual_pattern_char_comparison, collapsible_match, let_and_return, derivable_impls, field_reassign_with_default, needless_return.
  • Type aliases for boxed Fn benchmark callbacks in graphrag-core/src/monitoring/benchmark.rs:208-214: RetrievalFn, RerankerFn, LlmFn. Eliminates 3× type_complexity warnings.
  • HierarchicalLeidenResult type alias in graphrag-core/src/graph/leiden.rs:17 factored out the Result<(HashMap<.., HashMap<..>>, HashMap<..>)> return type of hierarchical_leiden.
  • Feature-gated dead-code under wasm: helper methods in gleaning_extractor.rs, llm_extractor.rs, chunking_strategies.rs, contextual_enricher.rs, late_chunking.rs are now #[cfg(feature = "async")]. Fields ollama_client (atomic_fact_extractor, llm_extractor), prompt_builder (llm_extractor), client (contextual_enricher), llm_extractor (gleaning_extractor), critic (graphrag/mod), api_key (late_chunking), and boundary_detector / coherence_scorer / min_chunk_chars (chunking_strategies) carry #[cfg_attr(not(feature = "async"), allow(dead_code))]. Five modules carry #![cfg_attr(not(feature = "async"), allow(unused_imports))] to silence imports that become dead when the async build_graph path is gone.
  • Restored imports lost during refactor: TextChunk, GraphRAGError, Document, HashMap, HashSet, Result, OllamaGenerationParams re-added to atomic_fact_extractor.rs, gleaning_extractor.rs, llm_extractor.rs, contextual_enricher.rs, late_chunking.rs. Underscored-but-still-used variables (_e → log-formatter args, _original_score, _total_chunks) rewritten to be self-consistent.

Fixed

WASM compilation broken after graphrag-core refactor (2026-05-16)

graphrag-core failed to compile for wasm32-unknown-unknown (65 errors → 0). The WASM build uses default-features = false (excludes async, tracing, tokio, parallel-processing), but many code paths used tracing:: calls and tokio without feature gates.

  • Added #[cfg(feature = "tracing")] gates to ~80 tracing:: calls across 15 files.
  • Gated tokio::runtime::Runtime in BoundaryAwareChunkingStrategy::chunk() behind #[cfg(feature = "async")] with sync fallback.
  • Split RetrievalSystem::batch_query() into #[cfg(feature = "parallel-processing")] and #[cfg(not(feature = "parallel-processing")) variants.
  • Fixed sync ask() (#[cfg(not(feature = "async"))) to call retrieval.query() instead of async query_internal().
  • Added #![recursion_limit = "512"] to graphrag-wasm main.rs for Leptos type depth.
  • Created missing graphrag-wasm/models/ directory required by Trunk.

Missing Relationship fields in sync build_graph() (2026-05-16)

graphrag-core/src/graphrag/build.rs:690: Relationship struct literal was missing embedding, temporal_type, temporal_range, and causal_strength fields added in Phase 1.2 (Advanced GraphRAG). Added all four with None defaults so the sync build path compiles without partial-init errors.

rograg::validator dropped quality metrics (2026-05-16)

graphrag-core/src/rograg/validator.rs:376: validate_response was computing coherence_score, relevance_score, factual_consistency_score, completeness_score, readability_score, and source_credibility_score then throwing them away (7 unused_variable / unused_assignments warnings). Now they:

  • Fold into validated_response.confidence via a new overall_quality() helper (mean of the metrics that were actually run — coherence / relevance / factual consistency are gated on their respective config flags; completeness / readability / source credibility always count).
  • Trigger a Medium IssueType::Quality validation issue when overall quality falls under 0.5.
  • Are emitted as a structured tracing::debug! event so the metrics are observable in logs without a public API change.

Changed

Server crate: color-eyre pretty errors at startup (2026-05-16)

  • graphrag-server/src/main.rs: main() return type std::io::Result<()>color_eyre::Result<()>, with color_eyre::install() at top.
  • Adds color-eyre = "0.6" to graphrag-server/Cargo.toml.
  • mimalloc allocator was already wired (no change).
  • Production unwraps in server crate audited: all 16 remaining unwraps are inside #[cfg(test)] blocks (qdrant_store, auth, embeddings, config_handler, etc.). Production paths use .map_err(...)? / .ok_or_else(...)? — already clean. Part of refactor-2026-05 server slice.

Documentation

Stale memory + CLAUDE.md notes refreshed (2026-05-16)

  • CLAUDE.md workspace layout: 6-crate → 5-crate (graphrag_py removed).
  • CLAUDE.md “Known gotchas”: replaced obsolete “12 failing unit tests” claim with verified status: cargo test -p graphrag-core --lib → 363 pass / 0 fail. The remaining cargo test --workspace failures come from stale examples (not tests) under graphrag-core/examples/ with missing Entity / Relationship fields; left untouched per project policy.
  • MEMORY.md (auto-memory) synced to the same wording.

Removed

Test suite aggressive pruning (2026-05-16)

User-requested clean-up: keep only indispensable, up-to-date tests; delete broken pre-existing failures, hanging tests, stale pre-refactor integration tests, and trivial construction-only sanity tests.

  • 23 broken / hanging / failing unit tests deleted:
    • async_graphrag::tests::* (6 tests on dead module)
    • entity::*::test_normalize_name (2 stale assertions)
    • entity::llm_relationship_extractor::test_fallback_extraction
    • reranking::cross_encoder::test_rerank_basic + test_confidence_filtering (need ONNX)
    • retrieval::symbolic_anchoring::test_extract_anchors (stale)
    • text::boundary_detection::test_sentence_detection + test_combined_detection
    • graph::incremental::tests::test_basic_entity_upsert + 6 ProductionGraphStore tests (deadlock in async lock contention — hung indefinitely)
    • rograg::logic_form::tests::test_pattern_parser + test_logic_form_retrieval
    • rograg::intent_classifier::tests::test_{factual,relational,temporal,causal,comparative,summary,definitional}_intent (7 stale assertions on intent classification)
    • rograg::quality_metrics::test_performance_stats_update
    • rograg::streaming::test_template_selection
    • incremental::lazy_propagation::test_lazy_propagation_basic
    • incremental::delta_computation::test_parallel_computation
  • 10 stale workspace-level integration test files deleted (./tests/*.rs, all pre-2026, predate the KV cache / GLiNER / persistence / file-split refactors): caching_integration.rs, config_integration_test.rs, http_endpoint_tests.rs, hybrid_retrieval_tests.rs, integration_tests.rs, modular_integration_tests.rs, property_tests.rs + .proptest-regressions, server_integration_tests.rs, zero_cost_approaches_integration_tests.rs, tests/parallel/. Plus graphrag-core/tests/ollama_enhancements.rs (didn’t compile — missing context field on OllamaGenerationParams).
  • 15 trivial test_*_creation patterns deleted (single-line constructions verifying only X::new().is_ok()): test_tree_creation, test_async_mock_llm_creation, test_incremental_pagerank_creation, test_processor_creation, test_agent_creation, test_function_caller_creation, test_cache_warmer_creation, test_retrieval_system_creation, test_enhanced_registry_creation, test_mock_llm_creation, test_answer_generator_creation, test_graphrag_creation, test_graph_indexer_creation, test_lancedb_creation, test_cached_client_creation. Plus 2 trivial Ollama adapter creation tests (entire test module in core/ollama_adapters.rs removed).
  • Tests retained: 7 integration test files in graphrag-core/tests/ (the 2026-02 refactor-era tests exercising KV cache, contextual enricher, GLiNER features, triple validation, dynamic weighting, BAR-RAG, text pipeline fixtures, incremental graph updates). ./tests/e2e/ benchmark scripts kept.
  • Verification matrix — all 100% green:
    • cargo test -p graphrag-core --lib363 passed, 0 failed (was 371/12 fail)
    • cargo test -p graphrag-core --lib --features rograg402 passed, 0 failed
    • cargo test -p graphrag-core --lib --features incremental390 passed, 0 failed

Fixed

Workspace-wide production unwrap() sweep (2026-05-16) — Part of refactor-2026-05 Phase 3 (extended)

  • Going beyond the original Phase 3 scope (voy_store, rograg/streaming, rograg/processor, cli/config, qdrant_store — all already verified test-only or previously cleaned), every remaining production .unwrap() in the workspace has been replaced with the appropriate safe alternative.
  • Mechanical sweeps by category:
    • 36 partial_cmp(...).unwrap() (f32 sort comparators, NaN-panic-prone) across ~23 files (async_graphrag, inference, retrieval/*, graph/*, summarization, vector, monitoring, nlp, generation, server handlers, etc.) → .unwrap_or(std::cmp::Ordering::Equal).
    • 22 lock()/read()/write().unwrap() (Mutex/RwLock acquisitions, poisoned-lock-panic-prone) → .expect("lock poisoned") / .expect("rwlock poisoned").
    • 12 Regex::new(...).unwrap() (static regex literals) → .expect("static regex literal").
    • duration_since(UNIX_EPOCH).unwrap() (system clock) → .expect("system clock before UNIX epoch").
    • Iterator and Option terminators (.first(), .last(), .next(), .min(), .max(), .pop(), .as_ref(), .as_mut(), .chars().next()) after checked-precondition usages → .expect(<reason>).
    • Targeted contextual fixes for result_map.remove, get_mut after contains_key, Self::new() in Default::default, NonZeroUsize::new on literal, caps.get(N), strip_prefix(...) after starts_with, etc.
  • Test-only infrastructure files (core/test_traits.rs, core/test_utils.rs) intentionally left untouched — their .unwrap() calls represent test-helper panic semantics by design (suite is called from test functions only).
  • Net result: workspace audit reports 0 production .unwrap() calls outside test infrastructure (down from ~178 pre-existing). All builds green: graphrag-core default + --features rograg + --features incremental, plus graphrag-cli, graphrag-server, graphrag wrapper.

Changed

Module split: retrieval/types.rs extracted (2026-05-16) — Part of refactor-2026-05 Phase 4 (final)

  • Extracted RetrievalConfig, SearchResult, ResultType, QueryAnalysis, QueryType, QueryIntent, QueryAnalysisResult, QueryResult, RetrievalStatistics (+ its print impl) from graphrag-core/src/retrieval/mod.rs into the new private module graphrag-core/src/retrieval/types.rs (199 LOC).
  • retrieval/mod.rs shrinks 1851 → 1666 LOC; the public API is preserved via pub use types::*; so crate::retrieval::SearchResult etc. resolve unchanged.
  • Restored one stripped doc comment (/// Statistics about the retrieval system) on RetrievalStatistics to satisfy #![warn(missing_docs)] — the sed extraction had eaten the line during slicing.
  • This was the last remaining Phase 4 item from the plan. Build + clippy clean (per the feedback-verify-with-build-clippy policy).

Sub-split: graphrag/ directory module (2026-05-16) — Part of refactor-2026-05 Phase 4

  • Follow-up to the earlier graphrag.rs single-file move. The 1753-LOC graphrag-core/src/graphrag.rs is now a directory module graphrag-core/src/graphrag/ with per-concern sub-files:
    • mod.rs (~105 LOC): struct GraphRAG, sub-module declarations, private ensure_initialized helper (bumped fnpub(super) fn so the sibling impl blocks can call it), #[cfg(test)] mod tests block with the two pre-existing tests.
    • lifecycle.rs (~189 LOC): new, default_local, builder, initialize, try_load_from_workspace, save_to_workspace, clear_graph.
    • documents.rs (~53 LOC): add_document_from_text, add_document.
    • build.rs (~715 LOC): async + sync build_graph paired methods.
    • ask.rs (~519 LOC, renamed from query.rs to avoid clash with use crate::query for the planner module): ask, ask_with_reasoning, ask_explained, query_internal, query_internal_with_results, generate_semantic_answer_from_results, remove_thinking_tags, ask_with_pagerank pair.
    • stats.rs (~85 LOC): config, is_initialized, has_documents, has_graph, knowledge_graph, knowledge_graph_mut, get_entity, get_entity_relationships, get_chunk.
    • factory.rs (~202 LOC): from_json5_file, from_config_file, from_config_and_document, quick_start, quick_start_with_config.
  • Each sub-file has its own impl GraphRAG { ... } block; Rust allows multiple impl blocks across files. All sub-files share an identical kitchen-sink import header (Config, core types, critic, ollama, persistence, query, retrieval, feature-gated parallel, plus use super::GraphRAG).
  • Public API preserved: graphrag_core::GraphRAG resolves via lib.rs’s pub use graphrag::GraphRAG; (unchanged from the single-file pass).
  • Verified per the new policy: cargo build -p graphrag-core + downstream crates green; cargo clippy -p graphrag-core -- -D warnings shows exactly one error in the new files (graphrag/ask.rs:408 clamp pattern) which is a verbatim carry-over from the previous graphrag.rs:1358 (originally lib.rs:1594) — net new errors: zero. Tests not re-run (pure file move; see feedback-verify-with-build-clippy memory entry).

God-file split: graph/incremental/ directory module (2026-05-16) — Part of refactor-2026-05 Phase 4

  • Converted graphrag-core/src/graph/incremental.rs (2905 LOC — the biggest god-file in the crate) into a directory module graphrag-core/src/graph/incremental/ with focused sub-files:
    • mod.rs (~395 LOC): doc + sub-module declarations + pub use re-exports + verbatim #[cfg(test)] mod tests block + the kitchen-sink use import block the tests rely on via super::*.
    • types.rs (~465 LOC): UpdateId, TransactionId, ChangeRecord, ChangeType, Operation, ChangeData, Document, GraphDelta, DeltaStatus, RollbackData, ConflictStrategy, Conflict, ConflictType, ConflictResolution, the IncrementalGraphStore trait, GraphStatistics, ConsistencyReport, InvalidationStrategy, CacheRegion.
    • helpers.rs (~496 LOC): SelectiveInvalidation, ConflictResolver, UpdateMonitor + impls + their satellite types (InvalidationStats, UpdateMetric, OperationLog, PerformanceStats).
    • manager.rs (~898 LOC): IncrementalGraphManager (both feature-gated and non-gated paired definitions kept adjacent), IncrementalConfig, IncrementalStatistics, IncrementalPageRank, BatchProcessor, PendingBatch, BatchMetrics, plus the impl GraphRAGError convenience constructors that conceptually belong here.
    • store.rs (~743 LOC): ProductionGraphStore + Transaction + TransactionStatus
      • IsolationLevel + ChangeEvent + ChangeEventType + impl IncrementalGraphStore for ProductionGraphStore + ChangeDataExt trait & impl.
  • Public API preserved via pub use cascade in mod.rs (crate::graph::incremental::* resolves unchanged).
  • Visibility-only bumps to keep the shared test module compiling across the new sub-module boundary:
    • IncrementalPageRank.scores: fieldpub(super) field
    • ConflictResolver.strategy: fieldpub(super) field
    • ConflictResolver::merge_entities: fnpub(super) fn
  • Verification strategy update (per user request): switched from cargo test --features incremental (which surfaces many pre-existing unrelated failures and obscures the signal we care about) to cargo build --features incremental + cargo clippy --features incremental -- -D warnings. The clippy run reports 34 errors, all in pre-existing files outside the split (graphrag.rs, retrieval/, text/, monitoring/, etc.); zero new errors in graph/incremental/. Downstream crates (graphrag-cli, graphrag-server, graphrag) build clean.

Module split: config/json_parser.rs extracted (2026-05-16) — Part of refactor-2026-05 Phase 4

  • Extracted Config::from_file (~553 LOC hand-rolled JSON reader using the json crate) and Config::to_file (~200 LOC writer) from graphrag-core/src/config/mod.rs into the new private module graphrag-core/src/config/json_parser.rs (769 LOC, with imports + impl Config { ... } wrapper).
  • config/mod.rs shrinks 2491 → 1737 LOC. Public API unchanged: both methods are still reachable as Config::from_file / Config::to_file via the new impl Config block (multiple impl blocks across files compile fine).
  • Distinct from config::json5_loader (serde-based typed JSON5 loader) and config::loader (multi-format dispatcher) — this is the bespoke json crate path.
  • 371 unit tests pass; 12 pre-existing failures unchanged.

God-file split: rograg/logic_form/ directory module (2026-05-16) — Part of refactor-2026-05 Phase 4

  • Converted graphrag-core/src/rograg/logic_form.rs (1517 LOC) into a directory module graphrag-core/src/rograg/logic_form/ with focused sub-files:
    • mod.rs (141 LOC): doc + sub-module declarations + pub use re-exports + verbatim #[cfg(test)] mod tests block.
    • types.rs (333 LOC): LogicFormError, LogicFormQuery, Predicate, Argument, ArgumentType, Constraint, ConstraintType, LogicQueryType, LogicFormResult, VariableBinding, LogicExecutionStats.
    • parser.rs (240 LOC): LogicFormParser trait + PatternBasedParser + LogicPattern + ArgumentExtractor + impls.
    • executor.rs (673 LOC): LogicFormExecutor + impls.
    • retriever.rs (217 LOC): LogicFormRetriever struct + Default + impl.
  • Public API preserved via pub use cascade through both logic_form/mod.rs and rograg/mod.rs (crate::rograg::LogicFormResult, crate::rograg::LogicFormRetriever, etc. still resolve unchanged).
  • Single non-mechanical change: bumped LogicFormExecutor::calculate_name_similarity from private fn to pub(super) fn — the existing test_name_similarity test in the shared tests module needs cross-submodule access. Visibility-only adjustment; no behavior or signature change.
  • Pre-existing test failures (test_logic_form_retrieval, test_pattern_parser) remain unchanged (verified by re-running them on main before the split).

God-file split: graphrag-core/src/graphrag.rs (2026-05-16) — Part of refactor-2026-05 Phase 4

  • Extracted the pub struct GraphRAG and its single impl GraphRAG { ... } block (constructors, lifecycle, build_graph, ask*, query_internal*, generate_semantic_answer_from_results, remove_thinking_tags, getters, factory methods, ensure_initialized, tests) from graphrag-core/src/lib.rs into the new private module file graphrag-core/src/graphrag.rs.
  • lib.rs is now a 263-LOC re-export shell (mod graphrag; pub use graphrag::GraphRAG;). graphrag.rs is 1753 LOC (header + verbatim impl + moved #[cfg(test)] mod tests).
  • Public API is preserved: graphrag_core::GraphRAG and graphrag_core::prelude::GraphRAG resolve through the new re-export with identical paths.
  • Added module-scoped imports at the top of graphrag.rs (Config, core types, critic, ollama, persistence, query, retrieval, feature-gated parallel) so the impl body compiles verbatim without inline path changes.
  • Both moved tests (test_graphrag_creation, test_builder_pattern) still pass. All other pre-existing test/doc failures remain unchanged (12 unit tests, 7 doctests).
  • Sub-splitting the impl across graphrag/{lifecycle,documents,build,query,stats}.rs remains deferred to a follow-up — single-file move first per plan.

Module split: retrieval/explained.rs (2026-05-16) — Part of refactor-2026-05 Phase 4

  • Extracted ExplainedAnswer, SourceReference, SourceType, ReasoningStep (and the ~160 LOC impl ExplainedAnswer block with from_results + format_display) from graphrag-core/src/retrieval/mod.rs into new graphrag-core/src/retrieval/explained.rs.
  • Public API preserved via pub use explained::* in retrieval/mod.rs — downstream callers see no change.
  • Net effect: retrieval/mod.rs shrinks from 2094 LOC → 1851 LOC; new explained.rs is 250 LOC.
  • Replaced legacy .min(1.0).max(0.0) with idiomatic .clamp(0.0, 1.0) in the moved from_results fn (clippy manual_clamp).
  • Larger god-file splits (lib.rs 1968 LOC, logic_form.rs 1517, incremental.rs 2905, config/mod.rs JSON loader) remain deferred — see plan file.

Fixed

Production unwrap removal (2026-05-16) — Part of refactor-2026-05 Phase 3

  • rograg/streaming.rs: regex unwrap()expect("static regex literal"); three partial_cmp(...).unwrap() calls on f32 confidence scores now use unwrap_or(Ordering::Equal) to avoid panics on NaN.
  • rograg/processor.rs::RogragProcessorBuilder::build: replaced inner .unwrap() on HybridQueryDecomposer::new() and IntentClassifier::new() with ? propagation; SystemTime::duration_since(UNIX_EPOCH).unwrap().expect("system clock before UNIX epoch") (genuine programmer-bug case).
  • graphrag-server/src/qdrant_store.rs: removed 6 production .unwrap() calls in add_document, add_documents_batch, and search — payload .as_object(), serde_json::to_value, serde_json::from_value, and point.id now propagate QdrantError via ? and Result::collect.
  • Tests-only unwrap() in vector/voy_store.rs and graphrag-cli/src/config.rs left intact (per Phase 3 scope: production paths only).

Added - GLiNER-Relex Extraction via gline-rs (2026-02-23)

GLiNER-Relex Entity + Relation Extractor (entity/gliner_extractor.rs, config/mod.rs, config/setconfig.rs, lib.rs)

  • New GLiNERExtractor: joint entity + relation extraction in a single forward pass via gline-rs v1.0.1 + ONNX Runtime. ~1.5 GB VRAM vs 8+ GB for generative LLMs; zero structural hallucinations.
  • Two-stage pipeline: NER (SpanPipeline or TokenPipeline) → RE (RelationPipeline), both composed on the same orp::model::Model with lazy loading via Arc<RwLock<Option<Model>>>.
  • Confidence scores propagated natively into Entity.confidence and Relationship.confidence.
  • Optional feature flag gliner: crate compiles and works normally without it.
  • tokio::task::spawn_blocking wrapper in lib.rs keeps the async runtime unblocked.
  • Config example (JSON5):
    gliner: {
      enabled: true,
      model_path: "./models/gliner-relex-large-v0.5.onnx",
      entity_labels: ["person", "organization", "location"],
      relation_labels: ["controls", "located in", "causes"],
      entity_threshold: 0.40,
      relation_threshold: 0.50,
      mode: "span",   // or "token" for gliner-multitask
      use_gpu: false,
    }
    

Added - Graph Persistence / Storage Choice (2026-02-23)

Storage Backend — In-Memory vs Disk (config/mod.rs, config/setconfig.rs, lib.rs)

  • AutoSaveConfig (and AutoSaveSetConfig in SetConfig) now expose:
    • base_dir: Option<String> — directory where workspace folders are stored (e.g. "./output")
    • workspace_name: Option<String> — sub-folder inside base_dir (default: "default")
    • enabled: boolfalse (default) = in-memory only; true = persist to disk
  • GraphRAG::initialize() now calls try_load_from_workspace(): if auto_save.enabled = true and the workspace already exists on disk, the graph is loaded from disk instead of starting empty. The second run reuses the previously built graph automatically.
  • GraphRAG::save_to_workspace() — new public method; also called automatically at the end of build_graph() when persistence is enabled.
  • No-op when enabled = false; zero performance cost for in-memory-only deployments.
  • Format hierarchy on disk: Parquet (if persistent-storage feature) → JSON fallback (always).
  • JSON5 config usage:
    auto_save: {
      enabled: true,
      base_dir: "./output",
      workspace_name: "my_project",
    }
    

Fixed - Extraction Temperature (2026-02-23)

Zero-Temperature Entity Extraction (entity/gleaning_extractor.rs, entity/llm_extractor.rs, config/setconfig.rs)

  • GleaningConfig::default() and LLMEntityExtractor::new() now use temperature: 0.0 (was 0.1)
    • Fully deterministic JSON output — eliminates spurious token variation that causes parse failures
    • Consistent with recommendations for structured extraction models (NuExtract, Triplex, etc.)
  • EntityExtractionConfig.temperature in SetConfig now defaults via default_extraction_temperature() = 0.0
    • Separate from default_temperature() = 0.1 used for general LLM parameters
    • Users can override in JSON5: entity_extraction.temperature = 0.0
  • ContextualEnricher retains 0.1 (generates natural language descriptions, not strict JSON)

Fixed & Improved - Entity Extraction, Query Quality & Sources (2026-02-23)

SetConfig use_gleaning Bug Fix (config/setconfig.rs)

  • Bug: when mode.approach = "semantic" with no semantic: sub-section, the else block hardcoded config.entities.use_gleaning = true regardless of the top-level entity_extraction.use_gleaning field
  • Fix: the else block now reads from self.entity_extraction.use_gleaning and max_gleaning_rounds directly
  • This affected ALL JSON5 configs using mode.approach = "semantic" without an explicit semantic: block

LLM Single-Pass Entity Extraction (lib.rs, entity/llm_extractor.rs, ollama/mod.rs)

  • New LLM single-pass path in lib.rs: ollama.enabled && !use_gleaning now uses LLMEntityExtractor instead of falling through to pattern-based regex extraction
  • Dynamic num_ctx per chunk: (prompt_tokens + max_output_tokens) × 1.20, rounded to 1024, clamped [4096, 131072] — mirrors the ContextualEnricher formula
  • LLMEntityExtractor now carries keep_alive: Option<String> and with_keep_alive() builder
  • call_llm_with_retry and call_llm_completion_check use generate_with_params instead of generate() to pass num_ctx and keep_alive — activates Ollama KV cache during entity extraction
  • GleaningEntityExtractor::new extracts keep_alive before consuming the client and threads it through
  • OllamaClient::config() getter added for field access without moving
  • Result on Symposium (274 chunks, mistral-nemo, no gleaning): 1,139 entities, 670 relationships (vs 0 relationships previously due to pattern-based fallback)

JSON Parse Resilience — Missing description Field (entity/prompts.rs)

  • EntityData.description is now annotated #[serde(default)]
  • When the LLM returns JSON with a missing description field (e.g. for Project Gutenberg license chunks), parsing succeeds with an empty string instead of falling through to the error path and losing all entities from that chunk
  • Fixes the "JSON repair failed: missing field 'description'" errors seen in the last ~10 chunks of Project Gutenberg books

Multi-Chunk Semantic Answer Generation (lib.rs, handlers/bench.rs)

  • generate_semantic_answer_from_results: reworked context assembly
    • Removed 400-char truncation: full chunk content is now passed to the LLM for each result
    • Deduplication: tracks seen chunk IDs to avoid repeating the same chunk from multiple entity hits
    • Relevance sorting: context sections sorted by score descending before joining
    • Synthesis prompt: updated instructions to ask the LLM to synthesize across ALL context sections
    • Dynamic num_ctx: prompt size calculated at runtime with 20% margin — activates KV cache for answering
    • generate_with_params used instead of generate() — passes num_ctx, keep_alive, temperature
  • bench.rs: switched from graphrag.ask() to graphrag.ask_explained()
    • sources in the JSON output now populated with actual chunk IDs and excerpts (was always [])

E2E Config — No-Gleaning Mistral Pipeline

  • New config tests/e2e/configs/kv_no_gleaning_mistral__symposium.json5
    • use_gleaning: false, keep_alive: "1h", chunk_size: 1000, chunk_overlap: 200
    • Uses mistral-nemo:latest for entity extraction and nomic-embed-text for embeddings

Added - Ollama KV Cache & Contextual Retrieval (2026-02-22)

Ollama KV Cache Parameters (ollama/mod.rs, config/mod.rs, config/setconfig.rs)

  • keep_alive field added to OllamaConfig and OllamaGenerationParams
    • Keeps the Ollama model loaded in VRAM between requests (prevents KV cache eviction)
    • Critical for multi-chunk document processing: without it, the model unloads between each chunk
    • Default: None (uses Ollama’s built-in 5-minute default)
    • Example: "1h" for book-length document processing sessions
  • num_ctx field added to OllamaConfig and OllamaGenerationParams
    • Explicitly sets the context window size (Ollama silently truncates to 2k-8k without this)
    • Goes into the options object in Ollama API requests; keep_alive is a top-level field
    • Default: None (uses Ollama’s default, usually 2048-8192 tokens)
    • Example: 32768 for documents up to ~130k characters
  • Both fields wired through the full config stack: JSON5 parser, OllamaSetConfig, request body

Contextual Chunk Enricher (text/contextual_enricher.rs)

  • New module implementing Anthropic’s Contextual Retrieval pattern
  • ContextualEnricher: augments each chunk with 2-3 sentences of document-level context before embedding
  • KV Cache optimization: static prefix (full document) is cached by Ollama; only the chunk suffix is re-evaluated per request
    • First chunk: ~2 min (loads document into KV cache on RTX 4070 with Mistral-NeMo 12B)
    • Subsequent chunks: ~3-5 sec each (only chunk tokens evaluated)
    • ~100 chunks from a 45k-token book: 5-10 minutes total vs hours without KV cache
  • calculate_num_ctx(): dynamic context window calculation per document
    • Formula: tokens(instructions) + tokens(document) + tokens(largest_chunk) + output_budget + 5% margin
    • Rounded to nearest 1024, clamped to [4096, 131072]
  • enrich_document_chunks() and enrich_chunks(): async, groups chunks by source document
  • Output format: [LLM context]\n\n[original chunk text] — preserves original text verbatim

Late Chunking Strategy (text/late_chunking.rs)

  • New LateChunkingStrategy implementing ChunkingStrategy trait (Jina AI technique)
  • Produces chunks annotated with position_in_document metadata (byte spans) for post-hoc pooling
  • JinaLateChunkingClient: calls Jina Embeddings API v2 with late_chunking: true
  • split_into_sections(): handles documents exceeding model context window (8192 tokens for Jina v3)
  • LateChunkingConfig: configurable chunk size, overlap, max document tokens, position annotation

E2E Benchmark KV Cache Support (tests/e2e/run_benchmarks.sh)

  • Three new pipeline dimensions: keep_alive, num_ctx, ollama_timeout
  • All existing pipelines updated with explicit defaults (keep_alive=none, num_ctx=0)
  • Semantic/hybrid pipelines with Ollama now default to keep_alive=30m (model stays loaded during build phase)
  • Three new KV cache pipelines targeting long document processing:
    • kv_semantic_mistral: semantic approach, Mistral-NeMo, keep_alive=1h, num_ctx=32768, timeout=300s
    • kv_hybrid_mistral: hybrid approach, Mistral-NeMo, keep_alive=1h, num_ctx=32768, timeout=300s
    • kv_semantic_qwen3: semantic approach, Qwen3 8B Q4, keep_alive=1h, num_ctx=16384, timeout=300s
  • KV Cache settings shown in run header when active
  • Generated JSON5 configs include keep_alive and num_ctx in the ollama section

Tests

  • tests/contextual_enricher_e2e.rs: 4 tests for ContextualEnricher
    • test_enriched_chunk_contains_original_and_context (#[ignore], requires ENABLE_OLLAMA_TESTS=1)
    • test_kv_cache_speedup (#[ignore]) — measures per-chunk timing and speedup ratio
    • test_num_ctx_calculation_sanity — always-run, validates num_ctx formula bounds
    • test_disabled_enricher_returns_chunks_unchanged — always-run no-op safety check

Added - Service Registry Completion (2025-02-11)

Core Infrastructure

  • Complete test utilities module (core/test_utils.rs):
    • MockEmbedder: Deterministic hash-based embedding generation with dimension support
    • MockLanguageModel: Configurable response mapping for testing
    • MockVectorStore: In-memory vector store with cosine similarity search
    • MockRetriever: Simple retriever for testing search pipelines
    • All mocks fully implement core Async* traits
    • 100% test coverage with 5 passing test cases

Adapter Implementations

  • Entity extraction adapter (core/entity_adapters.rs):

    • GraphIndexerAdapter bridges LightRAG’s GraphIndexer to AsyncEntityExtractor trait
    • Configurable confidence threshold filtering
    • Entity type conversion from domain-specific to core types
    • Batch extraction support
    • Feature-gated with lightrag feature
  • Retrieval system adapter (core/retrieval_adapters.rs):

    • RetrievalSystemAdapter implements AsyncRetriever trait
    • Integration with KnowledgeGraph-based retrieval
    • Batch search support
    • Comprehensive documentation on graph requirements
    • Feature-gated with basic-retrieval feature
  • Metrics collector implementation (monitoring/metrics_collector.rs):

    • Thread-safe metrics with DashMap for counters, gauges, and histograms
    • Atomic operations for zero-lock contention
    • Histogram statistics: count, sum, mean, min, max, p50, p95, p99
    • Timer support with start/finish API
    • Metric tagging with key-value pairs
    • 7/7 passing tests for all metric types
    • Feature-gated with dashmap and monitoring features

Registry Integration

  • Service registration in ServiceConfig::build_registry():
    • Entity extractor registration (with lightrag feature)
    • Retriever registration (with basic-retrieval feature)
    • Metrics collector registration (with dashmap + monitoring features)
    • Mock services for testing via with_test_defaults()
    • Proper feature-gating for modular compilation

Documentation

  • Architectural documentation:

    • Documented trait hierarchy for vector stores (domain-specific vs generic)
    • Explained when to use adapters vs direct implementations
    • Clarified graph integration requirements for retrieval
    • Added TODO markers for future unification work
    • Inline examples in all adapter modules
  • Code quality improvements:

    • Removed unused imports across multiple modules
    • Fixed parameter name warnings in data import
    • Commented out incomplete vector-memory feature gate
    • Clean compilation with async,ollama,dashmap,monitoring,basic-retrieval,lightrag features

Testing

  • 310 tests passing in graphrag-core library
  • All new service implementations verified:
    • test_mock_embedder: Hash-based deterministic embeddings
    • test_mock_language_model: Response mapping
    • test_mock_vector_store: Cosine similarity search
    • test_mock_retriever: Basic search operations
    • Metrics collector tests: counters, gauges, histograms, timers
  • Integration tests for service registration and retrieval

Added - Ollama Advanced Integration (2025-02-11)

Streaming Support

  • Real-time token generation with tokio channel-based streaming
  • generate_streaming() method returns tokio::sync::mpsc::Receiver<String>
  • Server-Sent Events (SSE) parsing for Ollama streaming API
  • Background task spawning for non-blocking stream reads
  • Automatic statistics recording for streamed responses
  • Example usage in test suite (tests/ollama_enhancements.rs)

Custom Generation Parameters

  • OllamaGenerationParams struct for fine-grained control:
    • num_predict: Maximum tokens to generate
    • temperature: Sampling temperature (0.0 - 1.0)
    • top_p: Nucleus sampling threshold
    • top_k: Top-k sampling
    • stop: Stop sequences (array of strings)
    • repeat_penalty: Repetition control
  • generate_with_params() method for custom parameter usage
  • Integration with AsyncLanguageModel trait’s complete_with_params()
  • Automatic conversion between core and Ollama parameter formats

Model Response Caching

  • DashMap-based caching for thread-safe concurrent access
  • Automatic cache population on API responses
  • Cache hit detection before making API calls
  • Performance: <1ms for cache hits vs 100-1000ms for API calls
  • Cache management API:
    • clear_cache(): Clear all cached responses
    • cache_size(): Get number of cached items
  • Configurable via OllamaConfig.enable_caching (default: true)
  • 80%+ hit rate on repeated queries
  • 6x cost reduction potential

Metrics & Usage Tracking

  • OllamaUsageStats struct with atomic counters:
    • total_requests: Total number of API calls
    • successful_requests: Successful completions
    • failed_requests: Failed attempts
    • total_tokens: Cumulative token count (estimated)
  • Thread-safe atomic operations (Arc<AtomicU64>)
  • Zero lock contention for metrics updates
  • API methods:
    • record_success(tokens): Record successful request
    • record_failure(): Record failed request
    • get_success_rate(): Calculate success percentage (0.0 - 1.0)
  • Integration with AsyncLanguageModel::get_usage_stats()
  • Automatic token estimation (~4 characters per token)

Service Registry Integration

  • Type-safe service injection for Ollama services
  • OllamaEmbedderAdapter implements AsyncEmbedder trait
  • OllamaLanguageModelAdapter implements AsyncLanguageModel trait
  • Automatic registration in ServiceConfig::build_registry()
  • Support for both embeddings and language model services
  • MemoryVectorStore registration for in-memory operations

Documentation

  • Complete OLLAMA_INTEGRATION.md guide with:
    • Setup and prerequisites
    • Basic and advanced usage examples
    • Supported models (embeddings and LLM)
    • Configuration options reference
    • Batch processing examples
    • Custom parameter examples
    • Performance tips and troubleshooting
  • Updated graphrag-core/README.md with new features
  • Updated main README.md with Ollama integration section
  • API reference with code examples
  • Sources and external documentation links

Testing

  • 8 new test cases in tests/ollama_enhancements.rs:
    • Config with caching test
    • Custom generation parameters test
    • Client statistics API test
    • Stats recording test
    • Cache management test
    • Default parameters test
    • Adapter integration tests
  • All tests passing (13/13 total including registry tests)
  • Compilation verified with all feature combinations

Configuration Updates

  • Added enable_caching: bool to OllamaConfig
  • Updated all OllamaConfig initializers across codebase:
    • config/mod.rs: TOML parsing
    • config/setconfig.rs: Config mapping
    • entity/llm_relationship_extractor.rs: LLM extraction
  • Default caching: enabled (true)

Changed

  • Model info updated: supports_streaming now returns true
  • AsyncLanguageModel implementation: Now uses generate_with_params() internally
  • OllamaClient structure: Added stats and cache fields
  • Error handling: Improved with metrics recording on failures
  • Test count: Increased from 214+ to 220+ test cases

Fixed

  • Missing enable_caching field in OllamaConfig initializers
  • Incorrect ModelUsageStats field mapping in adapter
  • Iterator reference error in execute_caused_query
  • Compilation warnings for unused imports

[0.1.1] - Previous Release

Added - Core GraphRAG Implementation

  • Temporal and causal reasoning for RoGRAG
  • Graph indexer with 23 relationship patterns
  • Service registry pattern for dependency injection
  • GraphRAGBuilder with fluent API
  • Parquet persistence for entities, relationships, documents
  • Memory vector store implementation
  • Complete trait-based architecture

Added - Research Features

  • LightRAG dual-level retrieval (6000x token reduction)
  • Leiden community detection (+15% modularity)
  • Cross-encoder reranking (+20% accuracy)
  • HippoRAG personalized PageRank (10-30x cost reduction)
  • Semantic chunking with better boundaries

Added - Infrastructure

  • Comprehensive test suite (214+ tests)
  • Production-grade logging with tracing
  • Feature flags for modular compilation
  • WASM support with WebGPU acceleration
  • Docker Compose deployment

[0.1.0] - Initial Release

Added

  • Basic GraphRAG pipeline
  • Entity and relationship extraction
  • Vector embeddings support
  • Graph construction and querying
  • REST API server
  • CLI tools

Migration Guides

Upgrading to Ollama Advanced Features

If you’re using basic Ollama integration, upgrading to the new features is seamless:

Before (still works):

#![allow(unused)]
fn main() {
let client = OllamaClient::new(OllamaConfig::default());
let response = client.generate("Hello").await?;
}

After (with new features):

#![allow(unused)]
fn main() {
let config = OllamaConfig {
    enable_caching: true,  // NEW: Enable caching
    ..Default::default()
};
let client = OllamaClient::new(config);

// Streaming
let mut rx = client.generate_streaming("Hello").await?;
while let Some(token) = rx.recv().await {
    print!("{}", token);
}

// Custom parameters
let params = OllamaGenerationParams {
    temperature: Some(0.8),
    top_p: Some(0.95),
    ..Default::default()
};
let response = client.generate_with_params("Hello", params).await?;

// Metrics
let stats = client.get_stats();
println!("Success rate: {:.2}%", stats.get_success_rate() * 100.0);
}

Breaking Changes

None! All new features are opt-in and backward compatible.


Development

Building from Source

git clone https://github.com/your-username/graphrag-rs.git
cd graphrag-rs
cargo build --release --features async,ollama,dashmap

Running Tests

cargo test --all-features
cargo test -p graphrag-core --test ollama_enhancements

Contributing

See CONTRIBUTING.md for guidelines.


For complete documentation, see: