GraphRAG-RS
GraphRAG-RS is a modular, portable GraphRAG implementation written in Rust. It builds a knowledge graph from your documents — chunking, embeddings, entity and relationship extraction, community detection — and answers questions over that graph with citations.
The same core library runs natively and in the browser via WebAssembly, with a config-driven pipeline that scales from a zero-dependency pattern matcher to a full LLM-enriched extraction stack.
Why GraphRAG-RS
- One library, three personalities. Pattern-only (no LLM, < 10 ms/chunk), LLM + KV-cache
enrichment (Ollama), or a hybrid — selected at runtime from
Config, not at compile time. - Native + WASM.
graphrag-coreiscrate-type = ["rlib", "cdylib"]; the browser build uses a Voy vector store. - Turnkey.
cargo run -p graphrag-cli -- index ./docs.txtthenask "..."— zero config to start. - Modular crates. Use the core library, the TUI/CLI, the REST server, or the WASM bindings.
Where to go next
| If you want to… | Start here |
|---|---|
| Install and run your first query | Installation & Quick Start |
| Understand the pipeline | How It Works |
| Configure extraction & models | Configuration Guide |
| Browse the API | docs.rs/graphrag-core |
Overview
GraphRAG-RS is a 5-crate Cargo workspace. You pick the entry point that fits your deployment.
| Crate | Role |
|---|---|
graphrag-core | Core library — all GraphRAG logic. Native + WASM (rlib + cdylib). |
graphrag-cli | Turnkey TUI + CLI binary. In-process use of the core (no HTTP). |
graphrag-server | Actix-web REST API with OpenAPI + optional Qdrant. |
graphrag-wasm | Browser bindings (Voy vector store, WebLLM, ONNX). |
graphrag | Wrapper meta-crate that re-exports graphrag-core for the hello-world experience. |
The config-driven pipeline
The same code runs three ways, selected at runtime from Config — not at compile time:
- Pattern-only — no LLM, regex extractor, < 10 ms per chunk.
Config::default()works offline via hash-fallback embeddings. - LLM-enriched — Ollama with KV-cache reuse (
keep_alive+ dynamicnum_ctx) for higher-quality entity and relationship extraction. - Hybrid — selective LLM stages over a fast base pipeline.
See How It Works for the full 7-stage pipeline.
Deployment options
- Server — multi-tenant, GPU workloads, large corpora. Qdrant + Ollama. See graphrag-server.
- WASM (client-side) — privacy-first, offline, zero infrastructure. Full pipeline in the browser with ONNX embeddings and WebLLM synthesis. See graphrag-wasm.
- Embedded library — call
graphrag-coredirectly from your Rust app.
Prerequisites
- Rust 1.85+ (add the
wasm32-unknown-unknowntarget for WASM builds). - Ollama (optional) for LLM-quality extraction / real embeddings:
ollama pull nomic-embed-text. - Docker (optional) for the Qdrant vector database.
Continue to Installation & Quick Start.
Installation & Quick Start
CLI (turnkey, zero config)
cargo install --path graphrag-cli # one-time install
graphrag index ./mydoc.txt # builds ./graphrag-data
graphrag ask "What is the main topic?" # answers from the graph
Add --ollama to either command for LLM-quality entity extraction (requires ollama serve running
locally). With no flags, the CLI uses sensible defaults — hash-fallback embeddings, pattern-based
extraction, and a persistent workspace.
Run graphrag with no arguments for the interactive TUI, or graphrag setup for the config
wizard. See CLI & TUI Usage.
Library (Rust)
use graphrag::GraphRAG;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let mut g = GraphRAG::quick_start("Plato's Symposium full text here...").await?;
println!("{}", g.ask("Who is Diotima?").await?);
Ok(())
}
For more control, use GraphRAG::builder() or Config::quick(workspace) with .with_ollama() /
.with_chunk_size(). The typical flow is:
Config::quick(workspace)orGraphRAG::builder().add_document(doc)→build_graph()(chunking → embeddings → entities → relationships → persist).ask(q)/ask_explained(q)/ask_with_reasoning(q).
System dependencies
| Platform | Install |
|---|---|
| Linux (Debian/Ubuntu) | sudo apt install -y build-essential pkg-config |
| macOS | xcode-select --install |
| Windows | Visual Studio Build Tools with C++ support |
For WASM builds: rustup target add wasm32-unknown-unknown and cargo install trunk wasm-bindgen-cli.
Optional services
ollama pull nomic-embed-text # local embeddings / LLM extraction
docker-compose up -d # Qdrant vector database (server mode)
Next: understand the pipeline or tune the configuration.
CLI & TUI Usage
{{#include ../../../docs/TUI_USAGE_GUIDE.md}}
How GraphRAG Works: A Complete Guide
Understanding the 7-Stage Pipeline from Document to Answer
What is GraphRAG?
GraphRAG (Graph-based Retrieval-Augmented Generation) is an intelligent system that transforms unstructured text into a knowledge graph and uses it to answer questions with unprecedented accuracy and context awareness.
Think of it like this:
Imagine a brilliant librarian who:
- Reads every book in the library
- Creates an interconnected index of people, places, concepts, and their relationships
- When you ask a question, uses this knowledge map to find relevant information
- Combines multiple sources to give you a comprehensive, contextual answer
That’s exactly what GraphRAG does, but at machine scale with scientific precision.
Why GraphRAG vs Traditional RAG?
| Feature | Traditional RAG | GraphRAG |
|---|---|---|
| Knowledge Storage | Flat vector chunks | Interconnected knowledge graph |
| Context Understanding | Semantic similarity only | Relationships + concepts + hierarchy |
| Multi-hop Reasoning | ❌ Limited | ✅ Natural via graph traversal |
| Token Efficiency | Baseline | 6000x reduction (LightRAG) |
| Accuracy | Good | 15% better (empirical studies) |
Configuration-Driven Dynamic Pipeline
GraphRAG-rs adapts its behavior based on your TOML configuration - the same codebase can run as:
- Fast, lightweight system (pattern-based, no LLM, <10ms processing)
- High-accuracy AI system (LLM-based, gleaning, contextual extraction)
- Hybrid approach (selective LLM use for critical stages)
All controlled by simple TOML settings - no code changes required.
How Configuration Changes the Pipeline
# Example 1: Fast, No-LLM Pipeline
[entity_extraction]
use_gleaning = false # ← Pattern-based extraction
[ollama]
enabled = false # ← No LLM required
# Result: <10ms entity extraction, good quality
# Example 2: High-Accuracy LLM Pipeline
[entity_extraction]
use_gleaning = true # ← LLM-based extraction
max_gleaning_rounds = 4 # ← 4 refinement passes
[ollama]
enabled = true
chat_model = "llama3.1:8b" # ← AI-powered extraction
# Result: 200-500ms entity extraction, excellent quality
Dynamic Stage Selection
The system automatically selects implementations based on config:
| Stage | Config Setting | Implementation | Performance |
|---|---|---|---|
| Text Chunking | chunk_size, chunk_overlap | Fixed/Adaptive | Always fast |
| Embeddings | embeddings.backend | Hash/Ollama/ONNX | Varies |
| Entity Extraction | use_gleaning + ollama.enabled | Pattern/LLM | 10ms vs 500ms |
| Relationships | extract_relationships | Pattern/LLM | Auto-selected |
| Retrieval | retrieval.strategy | Vector/BM25/Hybrid/PageRank | Varies |
| Generation | generation.backend | Mock/Ollama/WebLLM | Varies |
Logged during startup:
[INFO] Configuration loaded from: symposium_config.toml
[INFO] Using LLM-based entity extraction with gleaning (max_rounds: 4)
[INFO] Using Ollama embeddings: nomic-embed-text (768 dimensions)
[INFO] Using hybrid retrieval: vector (40%) + bm25 (30%) + pagerank (30%)
[INFO] Using Ollama generation: llama3.1:8b
Three Pipeline Approaches: Choose Your Strategy
GraphRAG-rs offers three distinct pipeline approaches, each optimized for different use cases and resource constraints. This approach-based architecture lets you explicitly choose your quality vs. speed trade-off.
The Three Approaches
┌─────────────────┬──────────────────┬─────────────────┐
│ SEMANTIC │ ALGORITHMIC │ HYBRID │
│ │ │ │
│ Neural/LLM │ Pattern-based │ Best of Both │
│ High Quality │ High Speed │ Balanced │
│ GPU Preferred │ CPU Only │ Moderate GPU │
└─────────────────┴──────────────────┴─────────────────┘
1. Semantic Pipeline (Neural/LLM-based)
Philosophy: Use deep learning and LLMs for maximum understanding and quality.
Technology Stack:
- Embeddings: Neural models (HuggingFace, OpenAI, Ollama)
- Entity Extraction: LLM-based with gleaning (iterative refinement)
- Retrieval: Vector similarity search (cosine similarity, HNSW)
- Graph Construction: Semantic relationships with PageRank
Configuration:
[mode]
approach = "semantic"
[semantic.embeddings]
backend = "huggingface"
model_name = "sentence-transformers/all-MiniLM-L6-v2"
[semantic.entity_extraction]
use_gleaning = true
max_gleaning_rounds = 3
llm_model = "llama3.1:8b"
[semantic.retrieval]
strategy = "vector_similarity"
use_hnsw_index = true
Performance:
- Quality: ★★★★★ (90-95% accuracy)
- Speed: ★★☆☆☆ (100-500 docs/sec)
- Resource: ★★★★★ (High: 4-8GB, GPU recommended)
Best For: Research papers, legal documents, philosophical texts, narrative fiction, nuanced content analysis.
2. Algorithmic Pipeline (Pattern-based)
Philosophy: Use traditional NLP and pattern matching for speed and deterministic behavior.
Technology Stack:
- Embeddings: Hash-based with TF-IDF weighting
- Entity Extraction: Pattern matching (regex, capitalization rules)
- Retrieval: BM25 keyword-based retrieval
- Graph Construction: Co-occurrence based relationships
Configuration:
[mode]
approach = "algorithmic"
[algorithmic.embeddings]
backend = "hash"
hash_size = 1024
use_tfidf_weighting = true
[algorithmic.entity_extraction]
use_gleaning = false
use_patterns = true
extract_capitalized = true
[algorithmic.retrieval]
strategy = "bm25"
bm25_k1 = 1.5
bm25_b = 0.75
Performance:
- Quality: ★★★☆☆ (70-85% accuracy)
- Speed: ★★★★★ (1000-5000 docs/sec)
- Resource: ★☆☆☆☆ (Low: 1-2GB, CPU only)
Best For: Large-scale processing, resource-constrained environments, real-time applications, structured data, privacy-sensitive systems (no external APIs).
3. Hybrid Pipeline (Combined)
Philosophy: Combine semantic and algorithmic approaches for balanced quality and performance.
Technology Stack:
- Embeddings: Dual (neural + hash-based)
- Entity Extraction: LLM + pattern fusion
- Retrieval: RRF (Reciprocal Rank Fusion) combining vector + BM25
- Graph Construction: Cross-validated relationships
Configuration:
[mode]
approach = "hybrid"
[hybrid.weights]
semantic_weight = 0.6
algorithmic_weight = 0.4
[hybrid.embeddings]
primary_backend = "huggingface"
secondary_backend = "hash"
fusion_strategy = "weighted"
[hybrid.entity_extraction]
use_gleaning = true
use_patterns = true
max_gleaning_rounds = 2
[hybrid.retrieval]
fusion_strategy = "rrf"
rrf_k = 60
Performance:
- Quality: ★★★★☆ (85-95% accuracy)
- Speed: ★★★☆☆ (200-1000 docs/sec)
- Resource: ★★★☆☆ (Medium: 3-4GB, moderate GPU)
Best For: Production systems, diverse query workloads, mixed document types, applications requiring both quality and efficiency.
How Approach Selection Works
The [mode] section in your TOML config controls the entire pipeline:
# Option 1: Semantic (high quality)
[mode]
approach = "semantic"
# Option 2: Algorithmic (high speed)
[mode]
approach = "algorithmic"
# Option 3: Hybrid (balanced)
[mode]
approach = "hybrid"
This single setting automatically configures:
- Which embedding implementation to use
- Whether to use LLM-based or pattern-based entity extraction
- Which retrieval strategy to employ
- How to construct graph relationships
Dynamic Pipeline Selection at Runtime:
#![allow(unused)]
fn main() {
// In src/lib.rs:346 - build_graph() method
// The system checks config.approach and selects implementations:
match config.approach.as_str() {
"semantic" => {
// Use LLM-based gleaning extraction
if config.entities.use_gleaning && config.ollama.enabled {
extract_entities_with_gleaning()
}
}
"algorithmic" => {
// Use pattern-based extraction
extract_entities_with_patterns()
}
"hybrid" => {
// Use both and fuse results
let llm_entities = extract_entities_with_gleaning();
let pattern_entities = extract_entities_with_patterns();
fuse_entity_results(llm_entities, pattern_entities)
}
}
}
Approach Comparison Matrix
| Aspect | Semantic | Algorithmic | Hybrid |
|---|---|---|---|
| Entity Extraction | LLM + gleaning (3-4 rounds) | Regex + capitalization | LLM + patterns (2 rounds) |
| Embeddings | Neural (HuggingFace/Ollama) | Hash + TF-IDF | Dual (neural + hash) |
| Retrieval | Vector similarity (HNSW) | BM25 keyword search | RRF fusion |
| Graph Relationships | Semantic similarity | Co-occurrence | Cross-validated |
| Processing Time | 500ms-1s per doc | 10-50ms per doc | 100-300ms per doc |
| Memory Usage | 4-8GB | 1-2GB | 3-4GB |
| GPU Required | Recommended | No | Optional |
| LLM Required | Yes (Ollama/OpenAI) | No | Yes (with fallback) |
| Accuracy | 90-95% | 70-85% | 85-95% |
| Best Use Case | Research, legal, literature | Large-scale, real-time | Production, general-purpose |
Quick Start by Approach
Semantic Pipeline:
cp config/templates/semantic_pipeline.toml my_config.toml
# Edit paths in my_config.toml
cargo run --example your_example -- my_config.toml
Algorithmic Pipeline:
cp config/templates/algorithmic_pipeline.toml my_config.toml
# Edit paths in my_config.toml
cargo run --example your_example -- my_config.toml
# No Ollama required!
Hybrid Pipeline:
cp config/templates/hybrid_pipeline.toml my_config.toml
# Edit paths in my_config.toml
cargo run --example your_example -- my_config.toml
For detailed configuration guide, see CONFIGURATION_GUIDE.md.
LazyGraphRAG & E2GraphRAG: Ultra-Efficient Approaches
New in 2025: Revolutionary approaches achieving 0.1% of traditional indexing cost while maintaining 90%+ quality.
Overview: Cost-Optimized GraphRAG
These cutting-edge implementations eliminate expensive LLM-based entity extraction during indexing:
┌──────────────────┬─────────────────┬────────────────┐
│ Traditional │ LazyGraphRAG │ E2GraphRAG │
│ GraphRAG │ │ │
│ │ │ │
│ LLM-based │ Concept-based │ Pattern-based │
│ High Cost │ 0.1% Cost │ 0.05% Cost │
│ 95% Quality │ 92% Quality │ 88% Quality │
└──────────────────┴─────────────────┴────────────────┘
LazyGraphRAG (Microsoft Research, 2025)
Philosophy: Zero LLM for indexing, concept graph from co-occurrence, iterative deepening for queries.
Key Features:
- No LLM Calls During Indexing: Uses noun phrase extraction
- 1000x Cheaper Indexing: $0.10 vs $100 per 1M tokens
- 100x Faster Indexing: 1000 docs/sec vs 10 docs/sec
- 700x Cheaper Queries: $0.0014 vs $1.00 per query
- 92% Quality: Acceptable trade-off for massive cost savings
Technology Stack:
- Concept Extraction: Regex-based noun phrases (no LLM)
- Graph Construction: Co-occurrence with Jaccard similarity
- Indexing: Bidirectional entity-chunk index (O(1) lookups)
- Query Processing: Iterative deepening search
- Refinement: Query expansion via concept graph traversal
Configuration:
[experimental]
lazy_graphrag = true
[experimental.lazy_graphrag_config]
use_concept_extraction = true
min_concept_length = 3
max_concept_words = 5
co_occurrence_threshold = 1
use_query_refinement = true
max_refinement_iterations = 3
use_bidirectional_index = true
Performance:
- Quality: ★★★★☆ (92% accuracy) | Speed: ★★★★★ (1000 docs/sec)
- Cost: ★★★★★ (0.1% of traditional) | Resource: ★☆☆☆☆ (200MB RAM)
Example:
#![allow(unused)]
fn main() {
use graphrag_core::lightrag::LazyGraphRAGPipeline;
let mut pipeline = LazyGraphRAGPipeline::default();
pipeline.index_document("doc1", "Machine Learning transforms AI...");
pipeline.build_graph(); // Fast, no LLM!
let results = pipeline.query("machine learning applications");
println!("Found {} chunks", results.chunk_count());
}
E2GraphRAG (2025)
Philosophy: Pattern-based entity extraction, no LLM required, deterministic output.
Key Features:
- 100x Faster Entity Extraction: 5ms vs 500ms per chunk
- 2000x Cheaper: $0.05 per 1M tokens
- ✅ Deterministic: Fully reproducible results
Configuration:
[experimental]
e2_graphrag = true
[experimental.e2_graphrag_config]
use_lightweight_ner = true
entity_types = ["PERSON", "ORGANIZATION", "LOCATION", "CONCEPT"]
use_capitalization_detection = true
use_noun_phrase_extraction = true
Cost Comparison
| Approach | Indexing Cost | Query Cost | Speed | Quality |
|---|---|---|---|---|
| Traditional | $100/1M | $1.00/query | 10 docs/sec | 95% |
| LazyGraphRAG | $0.10/1M | $0.0014/query | 1000 docs/sec | 92% |
| E2GraphRAG | $0.05/1M | $0.001/query | 2000 docs/sec | 88% |
ROI Example (1M docs, 10k queries/month):
- Traditional: $220k/year
- LazyGraphRAG: $268/year (820x cheaper!)
- E2GraphRAG: $170/year (1300x cheaper!)
For complete documentation, see docs/LAZYGRAPHRAG_E2GRAPHRAG.md.
The 7-Stage Pipeline
GraphRAG-rs processes documents through 7 interconnected stages, transforming raw text into intelligent, queryable knowledge. Let’s explore each stage with a real example using The Adventures of Tom Sawyer.
flowchart TB
Input[Raw Document<br/>434,401 characters] --> Stage1
subgraph Pipeline ["GraphRAG 7-Stage Pipeline"]
Stage1[Stage 1: Text Chunking<br/>Break into 492 chunks]
Stage2[Stage 2: Embeddings<br/>Generate 384-dim vectors]
Stage3[Stage 3: Entity Extraction<br/>Find 429 entities]
Stage4[Stage 4: Graph Construction<br/>Build knowledge graph]
Stage5[Stage 5: Dual-Level Retrieval<br/>Smart search]
Stage6[Stage 6: Query Processing<br/>Understand question]
Stage7[Stage 7: Answer Generation<br/>Compose response]
Stage1 --> Stage2
Stage2 --> Stage3
Stage3 --> Stage4
Stage4 --> Stage5
Query[User Query] --> Stage6
Stage6 --> Stage5
Stage5 --> Stage7
end
Stage7 --> Output[✅ Final Answer<br/>with sources]
style Stage1 fill:#e1f5ff
style Stage2 fill:#fff4e6
style Stage3 fill:#f3e5f5
style Stage4 fill:#e8f5e9
style Stage5 fill:#fff9c4
style Stage6 fill:#fce4ec
style Stage7 fill:#e0f2f1
Stage 1: Text Chunking
What it does: Divides long documents into overlapping, semantically meaningful segments.
Why: LLMs have token limits (typically 4K-32K tokens). Chunking allows processing of arbitrarily large documents while preserving local context through overlap.
Process Details
Input:
"Tom!" No answer. "TOM!" No answer. "What's gone with that boy, I wonder?
You TOM!" No answer. The old lady pulled her spectacles down and looked
over them about the room; then she put them up and looked out under them...
Configuration (from config/templates/narrative_fiction.toml):
chunk_size = 800 # ~200 words
chunk_overlap = 200 # 50 words overlap
Output: 492 overlapping chunks
Chunk 1: "Tom! No answer. TOM! No answer. What's gone..." [800 chars]
Chunk 2: "...What's gone with that boy, I wonder? You TOM!..." [800 chars, 200 overlap]
Chunk 3: "...You TOM! No answer. The old lady pulled her..." [800 chars, 200 overlap]
...
Chunk 492: "...the end of Tom Sawyer's adventures." [final chunk]
Why Overlap Matters
Without Overlap (❌ Context Loss):
Chunk A: "...Tom found the treasure under the"
Chunk B: "cross marked on the old tree..."
❌ Entity "treasure under the cross" split across chunks
With 200-char Overlap (✅ Preserved):
Chunk A: "...Tom found the treasure under the cross marked on..."
Chunk B: "...treasure under the cross marked on the old tree..."
✅ Complete entity captured in both chunks
Module: src/text/chunking.rs
Performance: ~0.01s for 434KB document
Stage 2: Embeddings Generation
What it does: Converts text chunks into high-dimensional numerical vectors that capture semantic meaning.
Why: Computers can’t understand text directly. Embeddings transform words into numbers while preserving meaning relationships (e.g., “king - man + woman ≈ queen”).
The Vector Space
Each chunk becomes a 384-dimensional vector where similar meanings cluster together:
"Tom and Huck found treasure" → [0.23, -0.45, 0.67, ..., 0.12] (384 numbers)
"The boys discovered gold" → [0.21, -0.42, 0.69, ..., 0.14] (close!)
"The weather was sunny" → [-0.67, 0.23, -0.12, ..., 0.45] (far away)
Embedding Backends
GraphRAG-rs supports multiple embedding strategies:
| Backend | Performance | Use Case | Implementation |
|---|---|---|---|
| Ollama (nomic-embed-text) | 100-200ms/chunk | Production semantic search | src/ollama/embeddings.rs |
| ONNX Runtime Web | 3-8ms/chunk (GPU) | WASM browser deployment | graphrag-wasm/src/onnx_embedder.rs |
| Hash-based (TF) | <1ms/chunk | Testing, offline, no dependencies | src/embeddings/hash_embedder.rs |
| Candle (planned) | 50-100ms/chunk | 100% Rust, CPU-only | Future |
Real Example Output
#![allow(unused)]
fn main() {
// From examples/real_ollama_pipeline.rs
let embedding = embedder.generate_embedding_async(
"Tom found the treasure in the cave"
).await?;
// Result: Vec<f32> with 384 dimensions
// [0.234, -0.456, 0.678, 0.123, ..., -0.234]
// L2 norm: ~1.0 (normalized)
}
Module: src/embeddings/neural/mod.rs
Performance:
- Ollama: ~100ms per chunk (5-10 chunks/sec)
- ONNX GPU: ~3-8ms per chunk (125-333 chunks/sec, 25-40x faster)
Stage 3: Entity Extraction
What it does: Identifies and extracts named entities (people, places, concepts, events) and their relationships from each chunk.
Why: Entities are the nodes of our knowledge graph. Without them, we’d just have disconnected chunks of text.
Dynamic Pipeline Configuration
GraphRAG-rs now adapts Stage 3 based on your TOML configuration. The system automatically chooses the optimal extraction method:
# Configuration controls the pipeline behavior
[entity_extraction]
use_gleaning = true # ← If TRUE: LLM-based extraction
# If FALSE: Pattern-based extraction
max_gleaning_rounds = 4 # ← Number of refinement passes
[ollama]
enabled = true # ← Must be TRUE for LLM extraction
chat_model = "llama3.1:8b" # ← LLM model for extraction
The pipeline dynamically selects:
| Config Setting | Pipeline Behavior | Performance | Quality |
|---|---|---|---|
use_gleaning = false | Pattern-Based (regex + capitalization) | <10ms/chunk | ★★★ Good |
use_gleaning = true + ollama.enabled = true | LLM-Based (gleaning with Ollama) | 200-500ms/chunk | ★★★★★ Excellent |
use_gleaning = true + ollama.enabled = false | ❌ Error | - | N/A |
Logged Output:
[INFO] Using LLM-based entity extraction with gleaning (max_rounds: 4)
✓ Ollama client initialized
✓ Model: llama3.1:8b
✓ Entity types: PERSON, CONCEPT, ARGUMENT, LOCATION, ...
or
[INFO] Using pattern-based entity extraction
✓ Fast regex-based extraction
✓ No LLM required
Entity Types
GraphRAG recognizes these entity categories (fully customizable via config):
PERSON → "Tom Sawyer", "Huckleberry Finn", "Aunt Polly"
LOCATION → "Mississippi River", "St. Petersburg", "McDougal's Cave"
CONCEPT → "treasure hunting", "freedom", "childhood innocence"
EVENT → "witnessing the murder", "finding the treasure", "trial scene"
Customize via TOML:
[pipeline.entity_extraction]
entity_types = [
"PERSON", # Your custom types!
"CONCEPT",
"ARGUMENT",
"MYTHOLOGICAL_REFERENCE" # ← Philosophical texts
]
Extraction Methods (Config-Driven)
A. Pattern-Based (Fast, Deterministic)
#![allow(unused)]
fn main() {
// Enabled when: use_gleaning = false
// src/entity/mod.rs - Regex + capitalization
Keywords: ["Tom Sawyer", "Huck", "treasure", "cave"]
Performance: <10ms per chunk
Found: 189 entities in Symposium, 429 in Tom Sawyer
}
B. LLM-Based Gleaning (Accurate, Contextual)
#![allow(unused)]
fn main() {
// Enabled when: use_gleaning = true && ollama.enabled = true
// src/entity/gleaning_extractor.rs - Uses Ollama llama3.1:8b
Prompt: "Extract entities of types: PERSON, CONCEPT, ARGUMENT...
from this text. Return JSON..."
Input: "Tom and Huck found the treasure under the cross..."
LLM Output (Round 1):
[
{"name": "Tom Sawyer", "type": "PERSON", "confidence": 0.95},
{"name": "Huckleberry Finn", "type": "PERSON", "confidence": 0.93},
{"name": "treasure", "type": "CONCEPT", "confidence": 0.88},
{"name": "cross marker", "type": "LOCATION", "confidence": 0.85}
]
Performance: 200-500ms per chunk
}
Gleaning (Multi-Pass LLM Refinement)
Gleaning is an iterative process controlled by max_gleaning_rounds:
Configuration: max_gleaning_rounds = 4
Round 1: Extract obvious entities → Found 100 entities
Round 2: "Did you miss any entities?" → Found 15 more entities
Round 3: "Any relationships?" → Found 8 relationships
Round 4: "Final check for concepts" → Found 2 subtle concepts
Total: 125 entities, 8 relationships
[INFO] ✅ Extraction complete after 4 rounds
[INFO] Final gleaning results: 125 entities, 8 relationships
Module: src/entity/gleaning_extractor.rs
Performance:
- Pattern-based: <10ms per chunk
- LLM-based gleaning: 200-500ms per chunk ×
max_gleaning_rounds- 1 round: ~300ms
- 4 rounds: ~1200ms
Configuration Examples
Example 1: Fast Pattern-Based (No LLM)
[entity_extraction]
enabled = true
min_confidence = 0.7
use_gleaning = false # ← Pattern-based extraction
[ollama]
enabled = false # ← No LLM needed
Result: <10ms per chunk, good quality, no API/GPU required
Example 2: High-Quality LLM-Based
[entity_extraction]
enabled = true
min_confidence = 0.6 # ← Lower for philosophical nuance
use_gleaning = true # ← LLM-based extraction
max_gleaning_rounds = 4 # ← 4 refinement passes
[ollama]
enabled = true
chat_model = "llama3.1:8b" # ← LLM for extraction
Result: 200-500ms per chunk, excellent quality, custom entity types
Real Output Example
{
"entity_id": "ent_tom_sawyer_001",
"name": "Tom Sawyer",
"type": "PERSON",
"chunk_ids": ["chunk_001", "chunk_015", "chunk_234"],
"confidence": 0.95,
"description": "Main protagonist, adventurous boy",
"extraction_method": "gleaning_llm", // ← Indicates LLM extraction
"gleaning_round": 1 // ← Found in first pass
}
Stage 4: Knowledge Graph Construction
What it does: Connects extracted entities into a unified, queryable graph structure with typed relationships.
Why: A graph reveals how entities relate, not just that they co-occur. This enables multi-hop reasoning and contextual understanding.
Graph Structure
graph LR
TomSawyer[Tom Sawyer<br/>PERSON]
Huck[Huckleberry Finn<br/>PERSON]
Treasure[Treasure<br/>CONCEPT]
Cave[McDougal's Cave<br/>LOCATION]
InjunJoe[Injun Joe<br/>PERSON]
TomSawyer -->|FRIEND_OF| Huck
TomSawyer -->|FOUND| Treasure
Treasure -->|LOCATED_IN| Cave
InjunJoe -->|GUARDS| Treasure
TomSawyer -->|WITNESSED_MURDER_BY| InjunJoe
Huck -->|HELPED_FIND| Treasure
style TomSawyer fill:#e3f2fd
style Huck fill:#e3f2fd
style Treasure fill:#fff9c4
style Cave fill:#e8f5e9
style InjunJoe fill:#fce4ec
Graph Components
Nodes (Entities):
#![allow(unused)]
fn main() {
pub struct Entity {
pub id: EntityId,
pub name: String,
pub entity_type: String,
pub description: String,
pub chunk_references: Vec<ChunkId>,
}
}
Edges (Relationships):
#![allow(unused)]
fn main() {
pub struct Relationship {
pub source: EntityId,
pub target: EntityId,
pub relation_type: String, // "FRIEND_OF", "FOUND", etc.
pub confidence: f32,
}
}
Advanced Features
A. Incremental Updates (Zero-Downtime)
#![allow(unused)]
fn main() {
// src/graph/incremental.rs
graph.add_document("Tom Sawyer"); // 429 entities added
graph.add_document("Symposium"); // 189 entities added
// Automatically merges 58 duplicate entities!
}
B. PageRank Scoring (Fast-GraphRAG)
#![allow(unused)]
fn main() {
// src/graph/pagerank.rs
let scores = pagerank.compute_personalized(
seed_entities: ["Tom Sawyer", "Huck Finn"],
max_iterations: 20
);
// Ranks entities by importance: 27x faster retrieval!
}
C. Community Detection (Hierarchical Clustering)
Community 1: Tom Sawyer storyline (347 entities)
├─ Subgraph: Treasure hunting (45 entities)
├─ Subgraph: School adventures (89 entities)
└─ Subgraph: Courtroom drama (23 entities)
Community 2: Philosophical concepts (189 entities)
└─ From Symposium document
Module: src/graph/mod.rs, src/graph/incremental.rs
Performance:
- Graph construction: ~50ms for 500 entities
- PageRank: ~20ms (cached, 27x speedup vs traditional)
Stage 5: Dual-Level Retrieval (LightRAG)
What it does: Searches the knowledge graph at two levels simultaneously - specific entities (low-level) and broad concepts (high-level).
Why: Traditional RAG searches only chunks. LightRAG searches entities AND their community context, achieving 6000x token reduction.
The Dual-Level Approach
Query: "What did Tom and Huck find in the cave?"
LOW-LEVEL RETRIEVAL (Specific):
→ Search entities: "Tom Sawyer", "Huck Finn", "cave"
→ Results: 12 entity matches
HIGH-LEVEL RETRIEVAL (Contextual):
→ Search communities: "treasure hunting" storyline
→ Results: 45 related entities in same narrative arc
FUSION:
→ Combine both levels with Reciprocal Rank Fusion (RRF)
→ Final results: Top 10 most relevant entities
Retrieval Strategies
GraphRAG-rs implements 4 complementary strategies:
| Strategy | What It Does | When to Use | Module |
|---|---|---|---|
| Vector Similarity | Semantic embedding search | “What is X about?” | src/retrieval/mod.rs |
| BM25 Keyword | Term-frequency search | Exact name/phrase lookup | src/retrieval/bm25.rs |
| Graph Traversal | Follow entity relationships | “How are X and Y related?” | src/graph/pagerank.rs |
| Hybrid Fusion | Combines all 3 above | General queries | src/retrieval/hybrid.rs |
Reciprocal Rank Fusion (RRF)
Formula:
RRF_score(entity) = Σ (1 / (k + rank_in_strategy))
for each strategy
Example:
Entity: "Tom Sawyer"
Vector search rank: 2 → score = 1/(60+2) = 0.0161
BM25 rank: 1 → score = 1/(60+1) = 0.0164
PageRank rank: 3 → score = 1/(60+3) = 0.0159
Total RRF = 0.0484 (ranked #1 overall!)
Module: src/lightrag/dual_retrieval.rs
Performance:
- Low-level retrieval: ~20ms
- High-level retrieval: ~30ms
- Fusion: ~10ms
- Total: ~60ms (vs 2-5 seconds traditional GraphRAG)
Stage 6: Query Processing
What it does: Analyzes the user’s question to determine intent, entities, and optimal search strategy.
Why: “What is love?” requires different processing than “When did Tom find the treasure?” - query understanding guides retrieval.
Query Analysis Components
A. Intent Classification
#![allow(unused)]
fn main() {
// src/query/advanced_pipeline.rs
pub enum QueryIntent {
Factual, // "What is X?"
Relational, // "How is X related to Y?"
Temporal, // "When did X happen?"
Causal, // "Why did X happen?"
Comparative, // "Compare X and Y"
Exploratory, // "Tell me about X"
}
}
B. Entity Extraction from Query
Query: "How did Tom and Huck find the treasure in McDougal's Cave?"
Extracted Entities:
- "Tom" (PERSON)
- "Huck" (PERSON)
- "treasure" (CONCEPT)
- "McDougal's Cave" (LOCATION)
Intent: Relational + Temporal
Strategy: Graph traversal + vector search hybrid
C. Query Decomposition (ROGRAG)
For complex queries, break into sub-queries:
Complex: "Compare Tom's and Huck's roles in finding the treasure"
Decomposed:
1. "What role did Tom play in finding the treasure?"
2. "What role did Huck play in finding the treasure?"
3. [Synthesis] "Compare the two roles"
Accuracy boost: 60% → 75% (15% improvement!)
Advanced Query Pipeline
#![allow(unused)]
fn main() {
// src/query/advanced_pipeline.rs:165-200
pub async fn execute_query() -> Result<QueryResult> {
// Step 1: Analyze query
let analysis = self.analyze_query(query).await?;
// Step 2: Vector similarity search
let vector_scores = self.vector_search(query, graph).await?;
// Step 3: PageRank propagation
let pagerank_scores = self.pagerank_propagation(&analysis).await?;
// Step 4: Relationship scoring
let rel_scores = self.score_relationships(graph).await?;
// Step 5: Multi-modal fusion
let combined = self.fuse_scores(vector_scores, pagerank_scores, rel_scores);
// Step 6: Rank and filter
let results = self.apply_ranking_policies(combined);
return results;
}
}
Module: src/query/advanced_pipeline.rs, src/rograg/
Performance:
- Query analysis: ~50ms
- Decomposition (if needed): ~100ms
Stage 7: Answer Generation
What it does: Synthesizes retrieved entities, relationships, and chunks into a coherent, natural language answer.
Why: Raw search results are just data. Generation transforms them into human-readable, contextual answers with citations.
Generation Pipeline
Retrieved Context:
Entity 1: Tom Sawyer (confidence: 0.95)
- Relevant chunk: "Tom and Huck ventured into the cave..."
Entity 2: Treasure (confidence: 0.92)
- Relevant chunk: "They found twelve thousand dollars in gold..."
Relationship: Tom FOUND Treasure (confidence: 0.88)
↓ LLM Prompt Construction ↓
System Prompt:
"You are a knowledgeable assistant. Answer based ONLY on provided context."
Context Assembly:
[Include top 5 chunks with source attribution]
[Include entity descriptions]
[Include relationship graph snippet]
User Question:
"How did Tom and Huck find the treasure?"
↓ LLM Generation (Ollama llama3.1:8b) ↓
Generated Answer:
"Tom Sawyer and Huckleberry Finn discovered the treasure in McDougal's Cave
after witnessing Injun Joe hide it there. They found approximately $12,000
in gold coins under a cross marked on a rock. This discovery came after Tom
got lost in the cave with Becky Thatcher and noticed the hiding spot while
trying to find an exit.
Sources: Chapter 33 (cave discovery), Chapter 35 (counting the treasure)"
LLM Backend Options
| Backend | Throughput | Use Case | Module |
|---|---|---|---|
| Ollama (llama3.1:8b) | ~15-30 tok/s | Production server | src/ollama/async_generation.rs |
| WebLLM (Phi-3) | 40-62 tok/s (GPU) | WASM browser | graphrag-wasm/src/webllm.rs |
| Mock LLM | Instant | Testing, demos | src/generation/async_mock_llm.rs |
Caching (6x Cost Reduction)
#![allow(unused)]
fn main() {
// src/caching/cached_client.rs
let cache_key = generate_semantic_key(prompt);
if let Some(cached) = cache.get(&cache_key) {
return cached; // 80%+ hit rate in production!
}
let response = llm.generate(prompt).await?;
cache.put(cache_key, response.clone());
return response;
}
Cache Performance:
- Hit rate: 80%+ (typical workload)
- Cost reduction: 6x
- Latency reduction: 50-100ms → 5ms (16-20x faster)
Module: src/generation/mod.rs, src/caching/
Performance:
- Generation: 1-3 seconds (depending on answer length)
- Cached: ~5ms
Complete Pipeline Performance
Real Benchmark: Tom Sawyer (434KB)
| Stage | Time | Memory | Output |
|---|---|---|---|
| 1. Chunking | 0.01s | +0.2 MB | 492 chunks |
| 2. Embeddings | 0.08s | +1.2 MB | 492 vectors (384-dim) |
| 3. Entity Extraction | 0.05s | +0.3 MB | 429 entities |
| 4. Graph Construction | 0.05s | +0.2 MB | 429 nodes, ~800 edges |
| 5. Dual Retrieval | 0.06s | +0.1 MB | Top 10 results |
| 6. Query Processing | 0.05s | - | Query plan |
| 7. Answer Generation | 1.2s | - | Final answer |
| TOTAL | 1.5s | 2.0 MB | ✅ Complete |
Source: examples/multi_document_pipeline.rs - production benchmarks
Scalability
| Documents | Total Time | Memory | Entities |
|---|---|---|---|
| 1 (Tom Sawyer) | 0.21s | 1.8 MB | 429 |
| 2 (+ Symposium) | 0.33s | 2.5 MB | 618 |
| 10 (estimated) | ~2s | ~15 MB | ~3000 |
| 100 (estimated) | ~20s | ~150 MB | ~30K |
With PageRank + LightRAG optimizations:
- 27x faster retrieval
- 6000x fewer tokens processed
- 6x cost reduction (caching)
Alternative Techniques for Each Stage
GraphRAG-rs is highly modular with pluggable implementations for each pipeline stage. Choose the best technique based on your requirements using the core::traits abstraction layer.
Architecture: Trait-Based Plugin System
#![allow(unused)]
fn main() {
// src/core/traits.rs - Core abstraction layer
pub trait Embedder { ... } // Stage 2: Embeddings
pub trait EntityExtractor { ... } // Stage 3: Entity Extraction
pub trait VectorStore { ... } // Stage 5: Vector Search
pub trait Retriever { ... } // Stage 5: Retrieval
pub trait LanguageModel { ... } // Stage 7: Generation
pub trait GraphStore { ... } // Stage 4: Graph Storage
}
Stage 1: Text Chunking - 3 Strategies
| Strategy | Algorithm | Use Case | Module |
|---|---|---|---|
| Hierarchical | RecursiveCharacterTextSplitter | Recommended - preserves semantic boundaries | src/text/chunking.rs |
| Fixed-Size | Simple character-based | Fast, predictable chunks | src/text/mod.rs |
| Semantic | Sentence-aware splitting | Academic papers, legal documents | src/text/mod.rs |
Hierarchical Separator Precedence:
#![allow(unused)]
fn main() {
[
"\n\n", // Paragraph breaks (priority 1)
"\n", // Line breaks
". ", // Sentence endings
"! ", // Exclamations
"? ", // Questions
"; ", // Semicolons
" ", // Word boundaries
"", // Character fallback
]
}
Configuration:
[pipeline]
chunk_size = 800 # Characters per chunk
chunk_overlap = 200 # Overlap for context preservation
min_chunk_size = 50 # Skip tiny chunks
Stage 2: Embeddings - 11 Providers
GraphRAG Core now supports 11 embedding backends via unified configuration:
Free/Local Providers
| Provider | Performance | Quality | GPU | Platform | Module |
|---|---|---|---|---|---|
| HuggingFace Hub | First: ~2s Cached: 50-100ms | ★★★★ | ❌ CPU | All | graphrag-core/src/embeddings/huggingface.rs |
| Ollama (nomic-embed-text) | 100-200ms | ★★★★★ | ✅ CUDA/Metal | Server | src/ollama/embeddings.rs |
| ONNX Runtime Web | 3-8ms (GPU) | ★★★★ | ✅ WebGPU | WASM | graphrag-wasm/src/onnx_embedder.rs |
| Hash-based (TF-IDF) | <1ms | ★★★ | ❌ CPU-only | Testing | src/embeddings/hash_embedder.rs |
API Providers (Production)
| Provider | Cost/1M tokens | Quality | Best For | Module |
|---|---|---|---|---|
| OpenAI | $0.13 | ★★★★★ | Best quality | graphrag-core/src/embeddings/api_providers.rs |
| Voyage AI | Medium | ★★★★★ | Domain-specific (code, finance, law) | graphrag-core/src/embeddings/api_providers.rs |
| Cohere | $0.10 | ★★★★ | Multilingual (100+ langs) | graphrag-core/src/embeddings/api_providers.rs |
| Jina AI | $0.02 | ★★★★ | Cost-optimized | graphrag-core/src/embeddings/api_providers.rs |
| Mistral AI | $0.10 | ★★★★ | RAG-optimized | graphrag-core/src/embeddings/api_providers.rs |
| Together AI | $0.008 | ★★★★ | Cheapest | graphrag-core/src/embeddings/api_providers.rs |
Planned
| Provider | Status | Notes |
|---|---|---|
| Candle | Planned | 100% Rust, CPU-only |
| Burn + wgpu | 70% | GPU acceleration, 100% Rust |
Models Available:
HuggingFace Hub (100+ models):
sentence-transformers/all-MiniLM-L6-v2 → 384 dim (default, recommended)
sentence-transformers/all-mpnet-base-v2 → 768 dim (balanced)
BAAI/bge-large-en-v1.5 → 1024 dim (best quality)
intfloat/e5-small-v2 → 384 dim (E5 family)
paraphrase-multilingual-MiniLM-L12-v2 → 384 dim (50+ languages)
API Providers:
OpenAI: text-embedding-3-small (1536), text-embedding-3-large (3072)
Voyage: voyage-3-large (1024), voyage-code-3 (1024), voyage-finance-2, voyage-law-2
Cohere: embed-english-v3.0 (1024), embed-multilingual-v3.0 (1024)
Jina: jina-embeddings-v3 (1024), jina-embeddings-v4 (multimodal)
Mistral: mistral-embed (1024), codestral-embed (code)
Together: BAAI/bge-large-en-v1.5 (1024), BAAI/bge-base-en-v1.5 (768)
Ollama: nomic-embed-text (768)
Trait Implementation:
#![allow(unused)]
fn main() {
#[async_trait::async_trait]
pub trait EmbeddingProvider: Send + Sync {
/// Initialize the embedding provider (e.g., download models)
async fn initialize(&mut self) -> Result<()>;
/// Generate embedding for single text
async fn embed(&self, text: &str) -> Result<Vec<f32>>;
/// Generate embeddings for multiple texts (batch processing)
async fn embed_batch(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>;
/// Get the embedding dimension
fn dimensions(&self) -> usize;
/// Check if the provider is available and ready
fn is_available(&self) -> bool;
/// Get the provider name
fn provider_name(&self) -> &str;
}
}
Configuration:
[embeddings]
backend = "huggingface" # Free, offline (default)
# backend = "openai" # Best quality ($0.13/1M)
# backend = "voyage" # Anthropic recommended
# backend = "cohere" # Multilingual
# backend = "jina" # Cost-optimized ($0.02/1M)
# backend = "mistral" # RAG-optimized
# backend = "together" # Cheapest ($0.008/1M)
# backend = "ollama" # Local GPU
model = "sentence-transformers/all-MiniLM-L6-v2"
dimension = 384
batch_size = 32
cache_dir = "~/.cache/huggingface" # For HuggingFace
# api_key = "..." # For API providers (or set env vars)
# Environment variables (recommended for API keys):
# OPENAI_API_KEY, VOYAGE_API_KEY, COHERE_API_KEY, JINA_API_KEY, MISTRAL_API_KEY, TOGETHER_API_KEY
See: config/JSON5_CONFIG_GUIDE.md for the complete configuration reference.
Stage 3: Entity Extraction - Config-Driven Selection
The system automatically chooses the extraction method based on your configuration:
| Method | Accuracy | Speed | Enabled When | Module |
|---|---|---|---|---|
| LLM Gleaning (Multi-Pass) | ★★★★★ | 200-500ms | use_gleaning = true + ollama.enabled = true | src/entity/gleaning_extractor.rs |
| Pattern-Based (Keywords) | ★★★ | <10ms | use_gleaning = false | src/entity/mod.rs |
| NER Hybrid | ★★★★ | 50-100ms | Future | src/entity/mod.rs |
| Semantic Merging | ★★★★ | Medium | semantic_merging = true | src/entity/semantic_merging.rs |
Entity Types (Fully Customizable):
# Configure your own entity types!
[pipeline.entity_extraction]
entity_types = [
"PERSON", # "Tom Sawyer", "Socrates"
"LOCATION", # "Mississippi River", "Athens"
"CONCEPT", # "treasure hunting", "Eros"
"EVENT", # "murder witness", "symposium"
"ARGUMENT", # Philosophical arguments
"MYTHOLOGICAL_REFERENCE" # Gods, myths
]
Gleaning Process (LLM-Based, Config-Controlled):
[entity_extraction]
use_gleaning = true # ← Enable LLM extraction
max_gleaning_rounds = 4 # ← Number of refinement passes
[ollama]
enabled = true
chat_model = "llama3.1:8b" # ← LLM for extraction
Runtime Behavior:
Round 1: Extract obvious entities → 100 entities
Round 2: "Did you miss any entities?" → +15 entities
Round 3: "Find relationships" → 8 relationships
Round 4: "Final check for concepts" → 2 subtle concepts
Total: 125 entities, 8 relationships
[INFO] ✅ Extraction complete after 4 rounds
Trait Implementation:
#![allow(unused)]
fn main() {
pub trait EntityExtractor {
fn extract(&self, text: &str) -> Result<Vec<Entity>>;
fn extract_with_confidence(&self, text: &str) -> Result<Vec<(Entity, f32)>>;
fn set_confidence_threshold(&mut self, threshold: f32);
}
#[async_trait]
pub trait AsyncEntityExtractor {
async fn extract(&self, text: &str) -> Result<Vec<Entity>>;
async fn extract_batch(&self, texts: &[&str]) -> Result<Vec<Vec<Entity>>>;
async fn extract_batch_concurrent(&self, texts: &[&str], max_concurrent: usize);
}
}
Configuration (Controls Behavior):
[entity_extraction]
enabled = true
min_confidence = 0.6 # ← Minimum confidence threshold
use_gleaning = true # ← Pattern-based (false) vs LLM-based (true)
max_gleaning_rounds = 4 # ← Number of LLM refinement passes
semantic_merging = true # ← Deduplicate similar entities
automatic_linking = true # ← Auto-link related entities
[pipeline.entity_extraction]
entity_types = ["PERSON", "CONCEPT", ...] # ← Custom types
confidence_threshold = 0.7
[ollama]
enabled = true # ← Required for LLM-based extraction
chat_model = "llama3.1:8b" # ← LLM model
The pipeline reads this config at startup and selects the appropriate implementation automatically.
Stage 4: Graph Construction - 3 Storage Backends
| Backend | Scale | Features | Platform | Module |
|---|---|---|---|---|
| In-Memory (Default) | <100K entities | Fast, incremental updates | All | src/graph/incremental.rs |
| Qdrant | >1M entities | Production vector DB, JSON payload | Server | src/storage/qdrant.rs |
| Neo4j (planned) | >100K entities | Complex graph queries, Cypher | Server | Future |
| LanceDB (70% complete) | >500K entities | Serverless, embedded | Desktop | src/storage/lancedb.rs |
Graph Features:
| Feature | Implementation | Status | Module |
|---|---|---|---|
| Incremental Updates | Zero-downtime ACID-like | ✅ Complete | src/graph/incremental.rs |
| PageRank | Personalized importance scoring | ✅ Complete | src/graph/pagerank.rs |
| Community Detection | Leiden algorithm clustering | ✅ Complete | src/graph/mod.rs |
| Semantic Deduplication | Entity merging (58 duplicates) | ✅ Complete | src/entity/semantic_merging.rs |
Trait Implementation:
#![allow(unused)]
fn main() {
pub trait GraphStore {
fn add_node(&mut self, node: Node) -> Result<String>;
fn add_edge(&mut self, from: &str, to: &str, edge: Edge) -> Result<String>;
fn find_nodes(&self, criteria: &str) -> Result<Vec<Node>>;
fn get_neighbors(&self, node_id: &str) -> Result<Vec<Node>>;
fn traverse(&self, start_id: &str, max_depth: usize) -> Result<Vec<Node>>;
}
}
Configuration:
[graph]
backend = "in-memory" # or "qdrant", "neo4j"
enable_incremental = true
enable_pagerank = true
enable_community_detection = true
deduplication_threshold = 0.85
Stage 5: Retrieval - 5 Strategies
| Strategy | Algorithm | Strengths | Module |
|---|---|---|---|
| Vector Similarity | Cosine similarity on embeddings | Semantic understanding | src/retrieval/mod.rs |
| BM25 Keyword | TF-IDF term matching | Exact phrases, names | src/retrieval/bm25.rs |
| PageRank | Graph importance propagation | Entity relevance (27x faster) | src/retrieval/pagerank_retrieval.rs |
| Hybrid (RRF) | Reciprocal Rank Fusion | Recommended - combines all | src/retrieval/hybrid.rs |
| Adaptive | Strategy auto-selection | Context-aware switching | src/retrieval/adaptive.rs |
LightRAG Dual-Level (6000x token reduction):
Query: "What did Tom find in the cave?"
LOW-LEVEL: Search specific entities (Tom, cave, treasure)
→ 12 entity matches
HIGH-LEVEL: Search community context (treasure hunting storyline)
→ 45 related entities in narrative arc
FUSION: RRF combines both levels
→ Top 10 most relevant results
Reciprocal Rank Fusion Formula:
#![allow(unused)]
fn main() {
RRF_score(entity) = Σ (1 / (k + rank_i))
where k = 60 (constant), rank_i = rank in strategy i
}
Trait Implementation:
#![allow(unused)]
fn main() {
pub trait Retriever {
fn search(&self, query: Query, k: usize) -> Result<Vec<SearchResult>>;
fn search_with_context(&self, query: Query, context: &str, k: usize);
}
#[async_trait]
pub trait AsyncRetriever {
async fn search(&self, query: Query, k: usize) -> Result<Vec<SearchResult>>;
async fn search_batch(&self, queries: Vec<Query>, k: usize);
}
}
Configuration:
[retrieval]
strategy = "hybrid" # or "vector", "bm25", "pagerank", "adaptive"
k = 10 # Top-k results
enable_lightrag = true # Dual-level retrieval
fusion_weights = { vector = 0.4, bm25 = 0.3, pagerank = 0.3 }
Stage 6: Query Processing - 3 Analyzers
| Analyzer | Capabilities | Module |
|---|---|---|
| Basic | Intent classification (Factual/Relational/Temporal) | src/query/mod.rs |
| Advanced | Multi-modal scoring + Entity extraction | src/query/advanced_pipeline.rs |
| ROGRAG | Query decomposition + Logic forms | src/rograg/logic_form.rs |
Query Intent Types:
#![allow(unused)]
fn main() {
pub enum QueryIntent {
Factual, // "What is X?"
Relational, // "How is X related to Y?"
Temporal, // "When did X happen?"
Causal, // "Why did X happen?"
Comparative, // "Compare X and Y"
Exploratory, // "Tell me about X"
}
}
ROGRAG Decomposition:
Complex: "Compare Tom's and Huck's roles in finding the treasure"
Decomposed:
1. "What role did Tom play in finding the treasure?"
2. "What role did Huck play in finding the treasure?"
3. [Synthesis] "Compare the two roles"
Accuracy: 60% → 75% (+15% boost!)
Configuration:
[query_processing]
analyzer = "advanced" # or "basic", "rograg"
enable_decomposition = true
max_sub_queries = 5
confidence_threshold = 0.6
Stage 7: Answer Generation - 4 LLM Backends
| Backend | Throughput | Quality | Platform | Module |
|---|---|---|---|---|
| Ollama (llama3.1:8b) | 15-30 tok/s | ★★★★★ | Server | src/ollama/async_generation.rs |
| WebLLM (Phi-3) | 40-62 tok/s (GPU) | ★★★★ | WASM | graphrag-wasm/src/webllm.rs |
| MockLLM | Instant | ★★ | Testing | src/generation/async_mock_llm.rs |
| OpenAI-Compatible API | Varies | ★★★★★ | Server | Future |
Caching Layer (6x cost reduction):
#![allow(unused)]
fn main() {
// src/caching/cached_client.rs
let cache_key = generate_semantic_key(prompt);
if let Some(cached) = cache.get(&cache_key) {
return cached; // 80%+ hit rate!
}
let response = llm.generate(prompt).await?;
cache.put(cache_key, response.clone());
}
Trait Implementation:
#![allow(unused)]
fn main() {
pub trait LanguageModel {
fn complete(&self, prompt: &str) -> Result<String>;
fn complete_with_params(&self, prompt: &str, params: GenerationParams);
fn is_available(&self) -> bool;
}
#[async_trait]
pub trait AsyncLanguageModel {
async fn complete(&self, prompt: &str) -> Result<String>;
async fn complete_batch(&self, prompts: &[&str]) -> Result<Vec<String>>;
async fn complete_streaming(&self, prompt: &str) -> Stream<String>;
}
}
Configuration:
[generation]
backend = "ollama" # or "webllm", "mock"
model = "llama3.1:8b"
temperature = 0.7
max_tokens = 1000
enable_caching = true
cache_ttl_seconds = 3600
Configuration Matrix: Choose Your Stack
Use Case: Production Server
[pipeline]
chunk_size = 800
chunk_overlap = 200
[embeddings]
provider = "ollama"
model = "nomic-embed-text"
device = "cuda"
[entity_extraction]
method = "gleaning"
llm_model = "llama3.1:8b"
[graph]
backend = "qdrant"
enable_pagerank = true
[retrieval]
strategy = "hybrid"
enable_lightrag = true
[generation]
backend = "ollama"
model = "llama3.1:8b"
enable_caching = true
Use Case: WASM Browser (Privacy-First)
[embeddings]
provider = "onnx_web"
model = "all-MiniLM-L6-v2"
device = "webgpu"
[entity_extraction]
method = "pattern" # No LLM required
[graph]
backend = "in-memory"
enable_pagerank = true
[retrieval]
strategy = "hybrid"
enable_lightrag = true
[generation]
backend = "webllm"
model = "Phi-3-mini"
Use Case: Testing/Development
[embeddings]
provider = "hash" # <1ms, deterministic
[entity_extraction]
method = "pattern"
[graph]
backend = "in-memory"
[retrieval]
strategy = "vector"
[generation]
backend = "mock" # Instant responses
Module Reference:
- Core Traits:
src/core/traits.rs(lines 1-1291) - All pluggable abstractions - Hybrid Embedder:
src/embeddings/hybrid.rs- Auto-fallback system - Retrieval Strategies:
src/retrieval/- 5 retrieval implementations - Configuration:
src/config/toml_config.rs- TOML-based setup
How to Customize Parameters and Tools
GraphRAG-rs offers 3 progressive levels of customization - from simple TOML files to programmatic trait implementations.
Level 1: TOML Configuration Files (Easiest)
Modify 60+ parameters without touching code using TOML configuration.
Where to Write Alternative Settings?
✅ Option 1: Use Pre-Built Templates (Copy & Modify)
# 1. Copy a template that matches your use case
cp config/templates/narrative_fiction.toml my_config.toml
# 2. Edit the file to change settings
nano my_config.toml
# 3. Run GraphRAG with your config
cargo run --bin simple_cli my_config.toml "Your question"
✅ Option 2: Create Your Own Config File
# 1. Create a new .toml file anywhere
touch my_custom_config.toml
# 2. Add your settings (see examples below)
nano my_custom_config.toml
# 3. Use it
cargo run --bin simple_cli my_custom_config.toml
✅ Option 3: Edit Existing Examples
# Modify the example configs
nano docs-example/symposium_config.toml
nano docs-example/config_tom_sawyer_complete.toml
How TOML Configuration Works
TOML files specify alternative implementations like this:
# Example: my_config.toml
# Stage 2: Choose embedding provider
[embeddings]
provider = "ollama" # Alternative: "neural", "hybrid", "hash"
model = "nomic-embed-text" # Alternative: "all-MiniLM-L6-v2"
device = "cuda" # Alternative: "cpu", "auto"
# Stage 3: Choose entity extraction method
[pipeline.entity_extraction]
model_name = "llama3.1:8b" # Uses LLM for extraction
temperature = 0.1 # Alternative: 0.7 for creative
entity_types = ["PERSON", "LOCATION", "CONCEPT"] # Customize types!
# Stage 5: Choose retrieval strategy
[retrieval]
strategy = "hybrid" # Alternative: "vector", "bm25", "pagerank", "adaptive"
enable_lightrag = true # Alternative: false (standard retrieval)
# Stage 7: Choose LLM backend
[generation]
backend = "ollama" # Alternative: "webllm", "mock"
model = "llama3.1:8b" # Alternative: any Ollama model
enable_caching = true # Alternative: false (no cache)
The system automatically uses your settings! No code changes needed.
Pre-Built Templates (Recommended Starting Point)
Located in config/templates/, optimized for different document types:
| Template | Optimized For | Chunk Size | Key Settings |
|---|---|---|---|
narrative_fiction.toml | Books, novels, stories | 800 chars | High overlap (300), character-focused |
academic_research.toml | Papers, studies, theses | 1024 chars | Semantic chunking, citation extraction |
technical_documentation.toml | Manuals, API docs | 512 chars | Code-aware, hierarchical entities |
legal_documents.toml | Contracts, laws | 512 chars | Low temperature (0.1), precision mode |
web_blog_content.toml | Articles, blogs | 600 chars | Fast processing, keyword extraction |
dynamic_universal.toml | General purpose | Adaptive | Auto-detects optimal settings |
Example: Customize for Your Document Type
# 1. Copy a template
cp config/templates/narrative_fiction.toml my_config.toml
# 2. Edit parameters (see full list below)
nano my_config.toml
# 3. Use your config
cargo run --bin simple_cli my_config.toml "Your question"
Complete TOML Configuration Reference
A. General Settings
[general]
input_document_path = "path/to/document.txt" # Your document
output_dir = "./output/my_project" # Results directory
log_level = "info" # error|warn|info|debug|trace
max_threads = 4 # 0 = auto-detect CPU cores
enable_profiling = true # Performance metrics
B. Pipeline Workflows
[pipeline]
workflows = [
"extract_text", # Stage 1: Chunking
"extract_entities", # Stage 3: Entity extraction
"build_graph", # Stage 4: Graph construction
"detect_communities" # Stage 4: Community detection
]
parallel_execution = true # Enable concurrent processing
C. Stage 1: Text Chunking
[pipeline.text_extraction]
chunk_size = 800 # Characters per chunk
chunk_overlap = 300 # Overlap for context (typically 25-50% of chunk_size)
min_chunk_size = 200 # Skip chunks smaller than this
clean_control_chars = true # Remove \r, \t, etc.
normalize_whitespace = true # Collapse multiple spaces
# Optional text cleaning
[pipeline.text_extraction.cleaning]
remove_urls = false # Strip http:// links
remove_emails = false # Strip email addresses
remove_special_chars = false # Keep punctuation by default
D. Stage 2: Embeddings
[embeddings]
provider = "ollama" # Options: ollama, neural, hybrid, hash
model = "nomic-embed-text" # Model name (depends on provider)
dimension = 768 # Embedding vector size
batch_size = 32 # Embeddings per batch
device = "cuda" # Options: cuda, cpu, auto
cache_size = 10000 # Number of cached embeddings
# Ollama-specific settings
[ollama]
base_url = "http://localhost:11434"
embedding_model = "nomic-embed-text"
generation_model = "llama3.1:8b"
timeout_seconds = 300
E. Stage 3: Entity Extraction
[pipeline.entity_extraction]
model_name = "llama3.1:8b" # LLM for extraction
temperature = 0.1 # Lower = more deterministic (0.0-1.0)
max_tokens = 1500 # Maximum response length
confidence_threshold = 0.6 # Minimum confidence to keep entity
# Entity types to extract (fully customizable!)
entity_types = [
"PERSON", # People, characters
"LOCATION", # Places, settings
"CONCEPT", # Abstract ideas, themes
"EVENT", # Actions, occurrences
"ORGANIZATION", # Groups, institutions
"OBJECT", # Physical items
"EMOTION", # Feelings, states
"THEME" # Overarching topics
]
# Advanced: Entity filtering
[pipeline.entity_extraction.filters]
min_entity_length = 2 # Minimum characters
max_entity_length = 100 # Maximum characters
allowed_patterns = [ # Regex patterns to allow
"^[A-Z][a-zA-Z\\s'-]+$" # Capitalized words
]
excluded_patterns = [ # Regex patterns to exclude
"^(the|and|but)$", # Common stop words
"^\\d+$" # Pure numbers
]
# Gleaning (multi-pass extraction)
[entity_extraction]
use_gleaning = true # Enable iterative extraction
max_gleaning_rounds = 4 # Number of refinement passes
gleaning_improvement_threshold = 0.08 # Min improvement to continue
F. Stage 4: Graph Construction
[pipeline.graph_building]
relation_scorer = "cosine_similarity" # or "jaccard", "levenshtein"
min_relation_score = 0.4 # Minimum similarity to create edge
max_connections_per_node = 25 # Limit edges per entity
bidirectional_relations = true # A→B implies B→A
character_centrality_boost = 1.5 # Boost importance of main entities
# Community detection
[pipeline.community_detection]
algorithm = "leiden" # Options: leiden, louvain
resolution = 0.6 # Lower = tighter communities
min_community_size = 2 # Minimum entities per community
max_community_size = 15 # Maximum entities per community
# Semantic merging (entity deduplication)
[entity_extraction]
semantic_merging = true
merge_similarity_threshold = 0.85 # How similar to merge (0.0-1.0)
automatic_linking = true
linking_confidence_threshold = 0.7
G. Stage 5: Retrieval
[retrieval]
strategy = "hybrid" # Options: vector, bm25, pagerank, hybrid, adaptive
k = 10 # Top-k results to return
enable_lightrag = true # Dual-level retrieval
enable_pagerank = true # Graph importance scoring
# Hybrid strategy weights (must sum to ~1.0)
[retrieval.fusion_weights]
vector = 0.4 # Semantic similarity weight
bm25 = 0.3 # Keyword matching weight
pagerank = 0.3 # Graph importance weight
H. Stage 6: Query Processing
[query_processing]
analyzer = "advanced" # Options: basic, advanced, rograg
enable_decomposition = true # Break complex queries into sub-queries
max_sub_queries = 5 # Maximum decomposition depth
confidence_threshold = 0.6 # Minimum confidence for query understanding
I. Stage 7: Answer Generation
[generation]
backend = "ollama" # Options: ollama, webllm, mock
model = "llama3.1:8b"
temperature = 0.7 # Creativity (0.0-1.0)
max_tokens = 1000 # Maximum answer length
top_p = 0.9 # Nucleus sampling (0.0-1.0)
enable_caching = true # Cache LLM responses
cache_ttl_seconds = 3600 # Cache expiration (1 hour)
J. Performance Tuning
[performance]
batch_size = 32 # Items per batch
max_concurrent_requests = 10 # Parallel API calls
embedding_cache_size = 10000 # Cached embeddings
enable_gpu = true # GPU acceleration
gpu_device = 0 # GPU device ID (0 = first GPU)
K. Experimental Features
[experimental]
enable_rograg = true # Query decomposition (+15% accuracy)
enable_fast_graphrag = true # PageRank retrieval (27x faster)
enable_lightrag = true # Dual-level retrieval (6000x tokens)
Real-World Example: Optimizing for Plato’s Symposium
# config/symposium_optimized.toml
[general]
input_document_path = "Symposium.txt"
output_dir = "./output/symposium"
[pipeline.text_extraction]
chunk_size = 800 # Larger for complete philosophical arguments
chunk_overlap = 300 # High overlap for dialogue continuity
[pipeline.entity_extraction]
temperature = 0.1 # Low for consistent concept extraction
entity_types = [
"PERSON", # Socrates, Phaedrus, etc.
"CONCEPT", # Eros, Beauty, Love
"ARGUMENT", # Philosophical positions
"DIALOGUE_SPEAKER", # Who said what
"MYTHOLOGICAL_REFERENCE" # Gods, myths
]
confidence_threshold = 0.6 # Lower for philosophical nuance
[pipeline.graph_building]
min_relation_score = 0.4 # Lower for subtle philosophical connections
max_connections_per_node = 25 # Higher for complex concept networks
[retrieval]
strategy = "hybrid" # Best for philosophical queries
enable_lightrag = true
fusion_weights = { vector = 0.5, bm25 = 0.2, pagerank = 0.3 }
Results:
- ✅ Captures 189 philosophical entities (vs 120 with defaults)
- ✅ Identifies speaker-argument relationships
- ✅ 85% query accuracy on philosophical questions
Level 2: Runtime API Configuration (Intermediate)
Modify parameters programmatically using the Builder API.
#![allow(unused)]
fn main() {
use graphrag_rs::{GraphRAG, ConfigPreset};
let mut graphrag = GraphRAG::builder()
// Choose preset as starting point
.with_preset(ConfigPreset::PerformanceOptimized)
// Override specific parameters
.chunk_size(1024) // Stage 1
.chunk_overlap(256)
.embedding_model("all-mpnet-base-v2") // Stage 2
.embedding_dimension(768)
.entity_confidence(0.7) // Stage 3
.max_gleaning_rounds(3)
.enable_pagerank(true) // Stage 4
.enable_lightrag(true) // Stage 5
.retrieval_strategy("hybrid") // Stage 5
.top_k(15)
.llm_temperature(0.8) // Stage 7
.max_tokens(1500)
// Auto-detect available tools
.auto_detect_llm()
.auto_detect_embedder()
.build()?;
// Process document
graphrag.add_document("Your text")?;
// Query with custom parameters
let answer = graphrag.ask_with_params(
"Your question",
QueryParams {
max_results: 10,
min_confidence: 0.7,
enable_decomposition: true,
}
)?;
}
Available Builder Methods:
| Category | Methods | Description |
|---|---|---|
| Text Processing | chunk_size(), chunk_overlap(), min_chunk_size() | Stage 1 chunking |
| Embeddings | embedding_model(), embedding_dimension(), embedding_provider() | Stage 2 vectors |
| Entity Extraction | entity_confidence(), max_gleaning_rounds(), entity_types() | Stage 3 NER |
| Graph | enable_pagerank(), enable_incremental(), graph_backend() | Stage 4 graph |
| Retrieval | retrieval_strategy(), enable_lightrag(), top_k() | Stage 5 search |
| Query | query_analyzer(), enable_decomposition() | Stage 6 understanding |
| Generation | llm_model(), llm_temperature(), max_tokens(), enable_caching() | Stage 7 LLM |
Level 3: Custom Trait Implementations (Advanced)
Replace entire pipeline stages with custom implementations.
Example: Custom Embedder
#![allow(unused)]
fn main() {
use graphrag_rs::core::traits::{Embedder, Result};
pub struct MyCustomEmbedder {
api_key: String,
model: String,
}
impl Embedder for MyCustomEmbedder {
type Error = std::io::Error;
fn embed(&self, text: &str) -> Result<Vec<f32>> {
// Your custom embedding logic
// Call external API, use custom model, etc.
let embedding = my_api_call(text, &self.api_key)?;
Ok(embedding)
}
fn embed_batch(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>> {
texts.iter()
.map(|text| self.embed(text))
.collect()
}
fn dimension(&self) -> usize {
1024 // Your embedding dimension
}
fn is_ready(&self) -> bool {
!self.api_key.is_empty()
}
}
// Use your custom embedder
let custom_embedder = MyCustomEmbedder {
api_key: "your-key".to_string(),
model: "custom-model-v1".to_string(),
};
let graphrag = GraphRAG::builder()
.with_embedder(Box::new(custom_embedder))
.build()?;
}
Example: Custom Entity Extractor
#![allow(unused)]
fn main() {
use graphrag_rs::core::traits::{EntityExtractor, Result};
use graphrag_rs::core::Entity;
pub struct MyCustomNER {
model_path: String,
}
impl EntityExtractor for MyCustomNER {
type Entity = Entity;
type Error = std::io::Error;
fn extract(&self, text: &str) -> Result<Vec<Entity>> {
// Your custom NER logic
// Could use spaCy, Flair, custom ML model, etc.
let entities = my_ner_model(text, &self.model_path)?;
Ok(entities)
}
fn extract_with_confidence(&self, text: &str) -> Result<Vec<(Entity, f32)>> {
let entities = self.extract(text)?;
entities.into_iter()
.map(|e| (e, 0.95)) // Add confidence scores
.collect()
}
fn set_confidence_threshold(&mut self, threshold: f32) {
// Store threshold for filtering
}
}
}
Available Traits to Implement
| Trait | Stage | What You Can Replace |
|---|---|---|
Embedder / AsyncEmbedder | 2 | Embedding generation (OpenAI, Cohere, custom) |
EntityExtractor / AsyncEntityExtractor | 3 | Entity extraction (spaCy, Flair, custom NER) |
VectorStore / AsyncVectorStore | 5 | Vector search (Pinecone, Weaviate, Milvus) |
Retriever / AsyncRetriever | 5 | Retrieval strategy (custom ranking, filters) |
LanguageModel / AsyncLanguageModel | 7 | LLM generation (OpenAI, Anthropic, local) |
GraphStore / AsyncGraphStore | 4 | Graph storage (Neo4j, ArangoDB, custom) |
Storage / AsyncStorage | All | Persistence layer (PostgreSQL, MongoDB) |
See: src/core/traits.rs (lines 1-1291) for complete trait definitions.
Configuration Validation & Testing
# 1. Validate TOML configuration
cargo run --bin simple_cli my_config.toml --validate
# 2. Dry-run with mock LLM (instant, no API calls)
cargo run --bin simple_cli my_config.toml --dry-run
# 3. Profile performance with your config
cargo run --bin simple_cli my_config.toml --profile
# 4. Compare configurations
cargo run --bin benchmark_configs config1.toml config2.toml
Quick Reference: Key Parameters by Use Case
| Use Case | Chunk Size | Overlap | Temperature | Entity Confidence | Retrieval |
|---|---|---|---|---|---|
| Fiction/Novels | 800 | 300 (38%) | 0.7 | 0.6 | hybrid |
| Academic Papers | 1024 | 256 (25%) | 0.1 | 0.7 | vector |
| Legal Documents | 512 | 128 (25%) | 0.1 | 0.8 | bm25 |
| Technical Docs | 512 | 200 (39%) | 0.3 | 0.7 | hybrid |
| Blog Posts | 600 | 150 (25%) | 0.5 | 0.6 | adaptive |
| Philosophical Texts | 800 | 300 (38%) | 0.1 | 0.6 | hybrid |
Pro Tips:
- Start with templates:
config/templates/covers 90% of use cases - Iterate: Run with defaults → profile → adjust → rerun
- Document-specific: Longer chunks (800-1024) for narrative, shorter (512) for technical
- Temperature: Lower (0.1-0.3) for factual, higher (0.7-0.9) for creative
- Confidence threshold: Lower (0.5-0.6) for nuanced texts, higher (0.7-0.8) for precision
- Retrieval:
hybridis best general-purpose,bm25for exact matches,vectorfor semantic
Module References:
- TOML Config:
src/config/toml_config.rs- All configuration structures - Builder API:
src/builder.rs- Fluent API for runtime config - Core Traits:
src/core/traits.rs- Pluggable implementations - Templates:
config/templates/- Pre-optimized configurations
Three Deployment Architectures
GraphRAG-rs uniquely supports three distinct deployment modes - choose based on your requirements:
1. Server-Only (Production Ready ✅)
Architecture:
┌─────────────┐
│ Client App │ (React/Vue/Mobile)
└──────┬──────┘
│ REST API
┌──────▼────────────────────┐
│ graphrag-server │
│ ├─ Actix-web REST API │
│ ├─ Apistos OpenAPI 3.0.3 │
│ ├─ Qdrant Vector DB │
│ ├─ Ollama Embeddings │
│ └─ GPU Acceleration │
└───────────────────────────┘
Best For:
- Multi-tenant SaaS (>1000 users)
- Large datasets (>1M documents)
- GPU-accelerated inference
- Mobile apps (thin clients)
Tech Stack:
Backend: Rust + Actix-web 4.9 + Apistos (OpenAPI 3.0.3) + Tokio
Vector DB: Qdrant (scales to 100M+ vectors)
Embeddings: Ollama (nomic-embed-text, GPU)
LLM: Ollama (llama3.1:8b, GPU)
Binary Size: 5.2 MB (optimized release)
Performance:
- Startup: <1s
- Query: 500ms-2s (end-to-end)
- Throughput: 20 queries/sec
2. WASM-Only (60% Complete )
Architecture:
┌───────────────────────────┐
│ Browser │
│ ┌─────────────────────┐ │
│ │ Leptos UI (WASM) │ │
│ │ ├─ ONNX Embeddings │ │ ← GPU via WebGPU
│ │ ├─ WebLLM Inference │ │ ← 40-62 tok/s GPU
│ │ ├─ Voy Vector Search│ │ ← 75KB pure Rust
│ │ └─ IndexedDB Storage│ │ ← Offline persistence
│ └─────────────────────┘ │
└───────────────────────────┘
↑ NO SERVER REQUIRED!
Best For:
- Privacy-first applications
- Offline-first tools
- Zero infrastructure cost
- Edge deployment (CDN)
Tech Stack:
Frontend: Leptos 0.8 + Trunk
ML: ONNX Runtime Web (WebGPU, 3-8ms embeddings)
LLM: WebLLM (WebGPU, 40-62 tok/s)
Vector Search: Voy (75KB k-d tree)
Storage: IndexedDB + Cache API
WASM Size: ~2MB (gzipped)
Performance:
- Cold start: 2-3s (model loading)
- Embeddings: 3-8ms per chunk (GPU)
- LLM: 40-62 tok/s (GPU)
- Storage: 50% browser quota (~5-10GB)
3. Hybrid (Planned )
Architecture:
┌───────────────────────────┐
│ Browser │
│ ┌─────────────────────┐ │
│ │ WASM Client (Fast) │ │ ← Real-time UI
│ │ + GPU Embeddings │ │ ← 3-8ms GPU
│ │ + Local Cache │ │ ← Offline-first
│ └──────────┬──────────┘ │
└─────────────┼─────────────┘
│ Optional WebSocket
┌─────────────▼─────────────┐
│ Server (Heavy Compute) │
│ ├─ Batch Processing │ ← Large documents
│ ├─ Multi-user Sync │ ← Shared knowledge
│ └─ Background Jobs │ ← Scheduled updates
└───────────────────────────┘
Best For:
- Enterprise applications
- Multi-device sync
- Best UX + Scalability
- Collaborative knowledge management
Status: Architecture designed, Phase 3 implementation
Optional Components & Features
GraphRAG-rs is modular - enable only what you need via feature flags:
LightRAG (Dual-Level Retrieval)
What: Searches entities (low-level) + communities (high-level) simultaneously
Impact:
- ✅ 6000x token reduction vs traditional GraphRAG
- ✅ 60ms query time (vs 2-5 seconds)
- ✅ Better context retention
Enable:
# Cargo.toml
[features]
lightrag = []
# Usage
cargo build --features lightrag
Module: src/lightrag/dual_retrieval.rs
PageRank (Fast-GraphRAG)
What: Ranks entities by graph importance, personalizing to query context
Impact:
- ✅ 27x performance boost in retrieval
- ✅ 6x cost reduction
- ✅ Better relevance ranking
Enable:
[features]
pagerank = []
# Usage
cargo build --features pagerank
Module: src/graph/pagerank.rs
ROGRAG (Query Decomposition)
What: Breaks complex queries into sub-queries with logic-based reasoning
Impact:
- ✅ 15% accuracy improvement (60% → 75%)
- ✅ Handles multi-hop questions
- ✅ Structured reasoning traces
Enable:
[features]
rograg = []
Module: src/rograg/logic_form.rs
GPU Acceleration
Options:
| Backend | Platform | Performance | Module |
|---|---|---|---|
| CUDA | NVIDIA | 20-50x speedup | --features cuda |
| Metal | Apple Silicon | 15-30x speedup | --features metal |
| Vulkan | Cross-platform | 10-25x speedup | --features vulkan |
| WebGPU | Browser | 25-40x speedup | --features webgpu |
Example:
# NVIDIA GPU acceleration
cargo build --release --features "neural-embeddings,cuda,ollama"
# Apple Silicon
cargo build --release --features "neural-embeddings,metal,ollama"
Intelligent Caching
What: Caches LLM responses with semantic key generation
Impact:
- ✅ 80%+ hit rate in production
- ✅ 6x cost reduction
- ✅ 16-20x latency reduction (100ms → 5ms)
Enable:
[features]
caching = ["moka"]
Module: src/caching/cached_client.rs
Monitoring & Metrics
GraphRAG-rs includes comprehensive performance tracking across the entire pipeline.
PipelineStage Tracking
#![allow(unused)]
fn main() {
// src/monitoring/metrics.rs
pub enum PipelineStage {
QueryExpansion,
HybridRetrieval,
BM25Search,
VectorSearch,
ResultFusion,
Reranking,
ConfidenceFiltering,
TotalPipeline,
}
}
Real-Time Metrics
#![allow(unused)]
fn main() {
let mut timer = TimingBreakdown::new();
timer.start_stage(PipelineStage::VectorSearch);
let results = vector_search(query).await?;
let duration = timer.end_stage(PipelineStage::VectorSearch);
println!("Vector search: {:?}", duration);
// Output: Vector search: 23ms
}
Performance Breakdown
Query Performance Breakdown:
Total time: 342ms
Expanded queries: 3
Raw results: 45
Final results: 10
Average confidence: 0.87
Stage timings:
QueryExpansion: 52ms (15.2%)
VectorSearch: 103ms (30.1%)
BM25Search: 45ms (13.2%)
ResultFusion: 67ms (19.6%)
Reranking: 48ms (14.0%)
ConfidenceFiltering: 27ms (7.9%)
Module: src/monitoring/metrics.rs, src/monitoring/benchmark.rs
Learn More
Documentation
- ARCHITECTURE.md - Deep technical dive into implementation
- examples/ - Hands-on code examples
- IMPLEMENTATION_PLAN.md - Development roadmap
- diagram.md - Visual architecture diagrams
Practical Examples
Getting Started:
examples/01_basic_usage.rs- One-line APIexamples/02_stateful_api.rs- Multi-query sessionsexamples/03_builder_api.rs- Full configuration
Advanced:
examples/real_ollama_pipeline.rs- Complete 7-stage walkthroughexamples/multi_document_pipeline.rs- Incremental graph constructionexamples/graphrag_multi_doc_server.rs- Production REST API
Configuration Templates
Pre-optimized configs for different document types:
config/templates/
├── narrative_fiction.toml # Books, novels (800-char chunks)
├── academic_research.toml # Papers, studies (1024-char chunks)
├── technical_documentation.toml # Manuals, specs (512-char chunks)
├── legal_documents.toml # Contracts, laws (512-char, low temp)
├── web_blog_content.toml # Articles, blogs (600-char chunks)
└── dynamic_universal.toml # General-purpose (adaptive)
Research Papers
GraphRAG-rs implements cutting-edge research:
-
Microsoft GraphRAG (2024) - “From Local to Global: A Graph RAG Approach”
- Base architecture foundation
- Community detection algorithms
-
Fast-GraphRAG (2024) - PageRank-based retrieval
- 27x performance improvement
- 6x cost reduction
-
LightRAG (2024) - “Simple and Fast Retrieval-Augmented Generation”
- Dual-level retrieval
- 6000x token reduction
-
ROGRAG (2024) - Robust query processing
- Query decomposition
- 60% → 75% accuracy boost
Quick Start: See It In Action
1. One-Liner (Simplest)
#![allow(unused)]
fn main() {
use graphrag_rs::simple;
let answer = simple::answer(
"Tom found treasure in the cave",
"What did Tom find?"
)?;
// Output: "Tom found treasure in the cave."
}
2. Multi-Query Session
#![allow(unused)]
fn main() {
use graphrag_rs::easy::SimpleGraphRAG;
let mut graph = SimpleGraphRAG::from_text("Your document")?;
graph.ask("What are the main themes?")?;
graph.ask("Who are the characters?")?;
}
3. Production Server
# Start Ollama
ollama serve &
ollama pull llama3.1:8b
ollama pull nomic-embed-text
# Start GraphRAG server
export EMBEDDING_BACKEND=ollama
cargo run --release --bin graphrag-server --features "qdrant,ollama"
# Query via REST API
curl -X POST http://localhost:8080/api/query \
-H "Content-Type: application/json" \
-d '{"query": "What did Tom find in the cave?"}'
4. WASM Browser (100% client-side)
cd graphrag-wasm
trunk serve --open
# Visit http://localhost:8080
# Upload document → Build graph → Query → Get answers (100% client-side!)
Configuration-Driven Behavior: Complete Examples
Example 1: Fast Pattern-Based Pipeline (No LLM)
Use Case: Testing, development, offline deployment, resource-constrained environments
Configuration (fast_config.toml):
[general]
log_level = "info"
[entity_extraction]
enabled = true
min_confidence = 0.7
use_gleaning = false # ← Pattern-based extraction
[ollama]
enabled = false # ← No LLM required
[embeddings]
backend = "hash" # ← Fast hash-based embeddings
dimension = 128
[retrieval]
strategy = "vector" # ← Simple vector search
Runtime Behavior:
[INFO] Configuration loaded from: fast_config.toml
[INFO] Using pattern-based entity extraction
✓ Regex + capitalization-based
✓ No LLM required
[INFO] Using hash-based embeddings (128 dimensions)
[INFO] Using vector retrieval strategy
Pipeline Performance:
Chunking: 0.01s
Embeddings: 0.002s (<1ms per chunk)
Entity Extraction: 0.005s (<10ms per chunk)
Graph Construction: 0.05s
Query Processing: 0.03s
TOTAL: 0.097s (~100ms)
Results: ✅ Ultra-fast, ✅ No dependencies, ✅ Offline-capable, Good quality (not excellent)
Example 2: High-Accuracy LLM Pipeline (Symposium Philosophy)
Use Case: Academic analysis, philosophical texts, high-quality extraction
Configuration (symposium_config.toml):
[general]
input_document_path = "info/Symposium.txt"
log_level = "info"
[entity_extraction]
enabled = true
min_confidence = 0.6 # ← Lower for philosophical nuance
use_gleaning = true # ← LLM-based extraction
max_gleaning_rounds = 4 # ← 4 refinement passes
semantic_merging = true
automatic_linking = true
[pipeline.entity_extraction]
model_name = "llama3.1:8b"
temperature = 0.1 # ← Low for consistent concept extraction
entity_types = [
"PERSON", # Socrates, Phaedrus
"CONCEPT", # Eros, Beauty, Love
"ARGUMENT", # Philosophical positions
"MYTHOLOGICAL_REFERENCE" # Gods, myths
]
[ollama]
enabled = true
host = "http://localhost"
port = 11434
chat_model = "llama3.1:8b" # ← AI-powered extraction
embedding_model = "nomic-embed-text"
fallback_to_hash = false # ← Error if Ollama fails
[embeddings]
backend = "ollama"
model = "nomic-embed-text"
dimension = 768
[retrieval]
strategy = "hybrid" # ← Best for philosophical queries
enable_lightrag = true
Runtime Behavior:
[INFO] Configuration loaded from: symposium_config.toml
[INFO] Using LLM-based entity extraction with gleaning (max_rounds: 4)
✓ Ollama client initialized
✓ Model: llama3.1:8b
✓ Entity types: PERSON, CONCEPT, ARGUMENT, MYTHOLOGICAL_REFERENCE
Processing Symposium.txt (189 KB, 455 chunks):
Chunk 1/455:
Round 1: Extract entities → Found 8 entities (PERSON: 2, CONCEPT: 4, ARGUMENT: 2)
Round 2: "Did you miss any entities?" → Found 2 more (CONCEPT: 2)
Round 3: "Find relationships" → Found 3 relationships
Round 4: "Final check" → Found 1 subtle concept
✅ Extraction complete: 11 entities, 3 relationships
... (processing all chunks) ...
[INFO] Final Results:
Entities: 317 (PERSON: 89, CONCEPT: 156, ARGUMENT: 45, MYTHOLOGICAL_REFERENCE: 27)
Relationships: 455
Communities: 12 (speaker groups, concept clusters)
Processing Time: 325ms per chunk average
[INFO] Using Ollama embeddings: nomic-embed-text (768 dimensions)
[INFO] Using hybrid retrieval: vector (40%) + bm25 (30%) + pagerank (30%)
Query: "What is love according to Socrates?"
VectorSearch: 123ms
BM25Search: 45ms
PageRankScore: 67ms
Fusion (RRF): 28ms
TOTAL: 263ms
Answer: "According to Socrates in the Symposium, love (Eros) is the
pursuit of beauty and wisdom. Socrates relates Diotima's teaching
that love is not a god but a spirit that mediates between mortals
and the divine..."
Results: ★★★★★ Excellent quality, ✅ Contextual understanding, ✅ Custom entity types, Requires Ollama/GPU
Example 3: Hybrid Configuration (Tom Sawyer Narrative)
Use Case: Fiction analysis, balanced quality/performance
Configuration (tom_sawyer_config.toml):
[entity_extraction]
enabled = true
min_confidence = 0.65
use_gleaning = true # ← LLM-based
max_gleaning_rounds = 2 # ← Only 2 rounds (faster)
[ollama]
enabled = true
chat_model = "llama3.1:8b"
[embeddings]
backend = "ollama" # ← Real semantic embeddings
model = "nomic-embed-text"
fallback_to_hash = true # ← Fallback if Ollama unavailable
[retrieval]
strategy = "hybrid"
enable_lightrag = true # ← Dual-level retrieval
Runtime Behavior:
[INFO] Using LLM-based entity extraction with gleaning (max_rounds: 2)
[INFO] Using Ollama embeddings with hash fallback
Processing Tom Sawyer (434 KB, 492 chunks):
Chunking: 0.01s
Embeddings: 0.08s (Ollama, 768-dim)
Entity Extraction: 0.6s (LLM, 2 rounds)
Graph Construction: 0.05s
TOTAL: 0.74s (~750ms)
Query: "How did Tom and Huck find the treasure?"
Low-level retrieval: 23ms (entities: Tom, Huck, treasure)
High-level retrieval: 31ms (community: treasure hunting storyline)
Fusion: 12ms
TOTAL: 66ms
Answer: "Tom and Huck discovered the treasure in McDougal's Cave after
witnessing Injun Joe hide it there..."
Results: ★★★★ Very good quality, Balanced performance, ✅ Fallback safety
Configuration Comparison Matrix
| Config | Entity Extraction | Embeddings | Query Time | Quality | Best For |
|---|---|---|---|---|---|
| Fast | Pattern (10ms) | Hash | 100ms | ★★★ Good | Testing, offline |
| Symposium | LLM 4-round (1.2s) | Ollama | 263ms | ★★★★★ Excellent | Philosophy, analysis |
| Tom Sawyer | LLM 2-round (600ms) | Ollama | 66ms | ★★★★ Very good | Fiction, balanced |
Key Insight: The same codebase adapts automatically - you control behavior through configuration!
Key Takeaways
- 7 Stages: Text → Chunks → Vectors → Entities → Graph → Retrieval → Query → Answer
- 3 Architectures: Server-Only ✅ | WASM-Only | Hybrid
- Configuration-Driven: Same code, different behavior via TOML settings
- Dynamic Selection: Pipeline adapts based on
use_gleaning,ollama.enabled,retrieval.strategy - State-of-the-Art: LightRAG (6000x reduction) + PageRank (27x speedup) + ROGRAG (+15% accuracy)
- Production-Ready: 5.2MB binary, <1s startup, 500ms-2s queries
- Modular: Enable only what you need via feature flags
- GPU-Accelerated: CUDA, Metal, Vulkan, WebGPU support
GraphRAG transforms documents into intelligent knowledge that answers questions with unprecedented accuracy and context awareness - all controlled by simple TOML configuration.
Last Updated: October 2025 | GraphRAG-rs v1.0
LazyGraphRAG / E2GraphRAG
{{#include ../../../docs/LAZYGRAPHRAG_E2GRAPHRAG.md}}
Configuration Guide
{{#include ../../../docs/CONFIGURATION_GUIDE.md}}
JSON5 Configuration System for GraphRAG
Type-safe, validated configuration for GraphRAG pipelines.
Table of Contents
Why JSON5?
The Critical Advantage: Comments!
Unlike standard JSON, JSON5 allows comments to document your configuration choices:
❌ Standard JSON:
{
"temperature": 0.1,
"chunk_size": 800
}
No comments allowed - JSON syntax forbids comments entirely!
✅ JSON5:
{
// Low temperature for consistent character analysis
"temperature": 0.1, // 0.05-0.3 optimal for narrative (IBM 2024)
// Larger chunks capture complete narrative scenes
"chunk_size": 800, // LlamaIndex research: 800-1024 for narratives
}
Comments everywhere - document choices, cite research, explain “why”!
JSON5 Features
-
Comments (
//and/* */)- Document WHY you chose parameter values
- Add research references inline
- Explain domain-specific choices
-
Trailing Commas ✅
{ "a": 1, "b": 2, // ← This trailing comma is valid! } -
Flexible Syntax
- More forgiving than strict JSON
- Numbers:
+123,0xFF,Infinity,NaN - Multi-line strings
- Unquoted keys (we use quoted for consistency)
-
Schema Validation
- Real-time autocomplete in VSCode
- Catch errors before runtime
- Range and enum validation
- Hover documentation
JSON5 vs JSON
| Feature | JSON | JSON5 |
|---|---|---|
| Comments | ❌ | ✅ // or /* */ |
| Trailing commas | ❌ | ✅ |
| Unquoted keys | ❌ | ✅ |
| Numbers | Limited | +123, 0xFF, Infinity |
| Strings | Single line | Multi-line |
| Schema support | ✅ | ✅ |
| Autocomplete | ✅ | ✅ |
| Validation | ✅ | ✅ |
Winner: JSON5 = Best of JSON (tooling) + Comments + Flexible syntax
Quick Start
1. Use an Existing Template
GraphRAG provides 13 pre-configured templates for different use cases:
# List available templates
ls config/templates/*.graphrag.json5
# Copy a template
cp config/templates/narrative_fiction.graphrag.json5 my_config.graphrag.json5
# Edit with autocomplete in VSCode!
code my_config.graphrag.json5
Available templates:
semantic_pipeline.graphrag.json5- LLM-based semantic analysisalgorithmic_pipeline.graphrag.json5- Fast pattern-based extractionhybrid_pipeline.graphrag.json5- Combined semantic + algorithmicnarrative_fiction.graphrag.json5- Novels, stories, literaturetechnical_documentation.graphrag.json5- API docs, manualsacademic_research.graphrag.json5- Research papers, theseslegal_documents.graphrag.json5- Contracts, regulationsweb_blog_content.graphrag.json5- Blog posts, articles- And more!
2. Template Structure
{
// ==========================================================================
// GraphRAG Configuration - YOUR PROJECT NAME
// ==========================================================================
// VSCode: This file has autocomplete! Press Ctrl+Space for suggestions.
// ==========================================================================
"$schema": "../schema/graphrag-config.schema.json",
"mode": {
"approach": "semantic" // Options: semantic | algorithmic | hybrid
},
"general": {
"input_document_path": "path/to/your/document.txt",
"output_dir": "./output/analysis",
"log_level": "info",
"max_threads": 4
},
"pipeline": {
"workflows": ["extract_text", "extract_entities", "build_graph"],
"text_extraction": {
"chunk_size": 800,
"chunk_overlap": 300
},
"entity_extraction": {
"model_name": "llama3.1:8b",
"temperature": 0.1,
"entity_types": ["PERSON", "LOCATION", "EVENT"]
}
},
"ollama": {
"enabled": true,
"chat_model": "llama3.1:8b",
"embedding_model": "nomic-embed-text"
}
}
3. Load in Rust (Coming Soon)
use graphrag_core::config::json5_loader::load_json5_config;
fn main() -> Result<()> {
let config: GraphRAGConfig = load_json5_config("my_config.graphrag.json5")?;
println!("Approach: {:?}", config.mode.approach);
Ok(())
}
VSCode Setup
Automatic Setup (Already Done!)
The repository includes:
.vscode/settings.json- Schema mapping for*.graphrag.json5files.vscode/graphrag.code-snippets- Quick templates
What You Get
1. Autocomplete (Press Ctrl+Space)
{
"mode": {
"approach": "" // ← Press Ctrl+Space here: semantic | algorithmic | hybrid
}
}
2. Real-time Validation
{
"general": {
"max_threads": 999 // ❌ Red underline: Maximum is 128
}
}
3. Hover Documentation
- Hover over any field
- See description, valid range, default value
- Research-based recommendations
4. Error Prevention
{
"mode": {
"approach": "invalid" // ❌ Error: must be semantic/algorithmic/hybrid
},
"text_processing": {
"chunk_size": 99999 // ❌ Error: maximum is 4096
}
}
Manual Setup (If Needed)
If autocomplete doesn’t work automatically:
- Open VSCode Settings (Ctrl+,)
- Search for “json.schemas”
- Verify this mapping exists:
"json.schemas": [{ "fileMatch": ["*.graphrag.json5", "*.graphrag.json"], "url": "./config/schema/graphrag-config.schema.json" }] - Reload VSCode: Ctrl+Shift+P → “Reload Window”
Creating Configurations
Option 1: Copy a Template
Start with a template matching your use case:
# For semantic pipeline (LLM-based, high quality)
cp config/templates/semantic_pipeline.graphrag.json5 my_config.graphrag.json5
# For narrative fiction (novels, stories)
cp config/templates/narrative_fiction.graphrag.json5 my_novel_config.graphrag.json5
# For technical docs (API documentation, manuals)
cp config/templates/technical_documentation.graphrag.json5 my_api_docs.graphrag.json5
# For hybrid approach (balanced quality and speed)
cp config/templates/hybrid_pipeline.graphrag.json5 my_hybrid_config.graphrag.json5
Then customize:
- Update
input_document_path - Adjust
output_dir - Customize
entity_typesfor your domain - Tune parameters based on your needs
Option 2: Build from Scratch
In VSCode:
- Create
my_config.graphrag.json5 - Add schema reference:
{ "$schema": "../config/schema/graphrag-config.schema.json" } - Press Ctrl+Space and follow autocomplete suggestions!
The schema will guide you through all required and optional fields.
Option 3: Use Code Snippets
In VSCode:
- Create new file:
my_config.graphrag.json5 - Type
graphrag-semanticand press Tab - Full template inserted!
Available snippets:
graphrag-semantic- Semantic pipeline templategraphrag-algorithmic- Algorithmic pipeline templategraphrag-hybrid- Hybrid pipeline template
✅ Validation
Real-time (VSCode)
Errors show immediately as you type:
{
"mode": {
"approach": "semantic"
},
"general": {
"max_threads": 999, // ❌ Error: Maximum is 128
"log_level": "invalid" // ❌ Error: Must be trace/debug/info/warn/error
},
"ollama": {
"temperature": 5.0 // ❌ Error: Maximum is 2.0
}
}
CLI Validation
Validate before running your application:
# Validate single config
uv run --with jsonschema --with json5 python scripts/validate_json5_configs.py \
--config my_config.graphrag.json5
# Validate all configs in directory
uv run --with jsonschema --with json5 python scripts/validate_json5_configs.py \
--dir config/templates
# Custom schema
uv run --with jsonschema --with json5 python scripts/validate_json5_configs.py \
--config my_config.json5 \
--schema path/to/schema.json
Output:
Validating 1 configuration file(s)...
✅ my_config.graphrag.json5
============================================================
Validation Complete: 1/1 valid
All configurations are valid!
Error output example:
❌ my_config.graphrag.json5
• Path: general → max_threads
Error: 999 is greater than the maximum of 128
Allowed range: 1-128
• Path: mode → approach
Error: 'invalid' is not one of ['semantic', 'algorithmic', 'hybrid']
Allowed values: "semantic", "algorithmic", "hybrid"
Programmatic Validation (Rust - Coming Soon)
use graphrag_core::config::schema_validator::validate_config_file;
fn main() -> Result<()> {
validate_config_file(
"my_config.graphrag.json5",
"config/schema/graphrag-config.schema.json"
)?;
println!("✅ Configuration is valid!");
Ok(())
}
Examples
Example 1: Minimal Semantic Config
{
"$schema": "../config/schema/graphrag-config.schema.json",
"mode": { "approach": "semantic" },
"general": {
"input_document_path": "data/document.pdf",
"output_dir": "./output"
},
"pipeline": {
"workflows": ["extract_text", "extract_entities", "build_graph"]
},
"ollama": {
"enabled": true,
"host": "http://localhost",
"port": 11434,
"chat_model": "llama3.1:8b"
}
}
Example 2: Narrative Fiction
{
"$schema": "../config/schema/graphrag-config.schema.json",
"mode": { "approach": "semantic" },
"general": {
"input_document_path": "novels/tom_sawyer.txt",
"output_dir": "./output/narrative",
"log_level": "info"
},
// Narrative-optimized chunking (LlamaIndex 2024 research)
"pipeline": {
"text_extraction": {
"chunk_size": 800, // Captures complete scenes
"chunk_overlap": 300, // 37.5% overlap for character continuity
"min_chunk_size": 200
},
"entity_extraction": {
"model_name": "llama3.1:8b",
"temperature": 0.1, // Low for consistent character analysis
"entity_types": [
"PERSON", // Characters
"CHARACTER_TRAIT", // Personality, appearance
"LOCATION", // Settings, places
"EMOTION", // Emotional states
"THEME", // Literary themes
"RELATIONSHIP", // Character relationships
"EVENT" // Plot events
],
"confidence_threshold": 0.6 // Captures literary nuances
}
},
"ollama": {
"enabled": true,
"chat_model": "llama3.1:8b",
"generation": {
"temperature": 0.3, // Balanced for narrative analysis
"max_tokens": 1500
}
}
}
Example 3: Technical Documentation
{
"$schema": "../config/schema/graphrag-config.schema.json",
"mode": { "approach": "semantic" },
"general": {
"input_document_path": "docs/api_reference.md",
"output_dir": "./output/tech_docs"
},
// Technical precision (Databricks 2024 research)
"pipeline": {
"text_extraction": {
"chunk_size": 512, // Smaller chunks for precision
"chunk_overlap": 100, // 20% minimal overlap
"min_chunk_size": 128
},
"entity_extraction": {
"model_name": "llama3.1:8b",
"temperature": 0.05, // Maximum precision
"entity_types": [
"API_ENDPOINT", // REST endpoints
"FUNCTION", // Functions, methods
"PARAMETER", // Function parameters
"ERROR_CODE", // Error codes, exceptions
"LIBRARY", // External libraries
"VERSION", // Version numbers
"DATA_TYPE" // Data types
],
"confidence_threshold": 0.8 // High accuracy for technical content
}
},
"ollama": {
"enabled": true,
"generation": {
"temperature": 0.1, // Very low for technical precision
"max_tokens": 1200
}
}
}
Example 4: Hybrid Pipeline
{
"$schema": "../config/schema/graphrag-config.schema.json",
// Hybrid: Combines semantic (LLM) + algorithmic (patterns)
"mode": { "approach": "hybrid" },
"general": {
"input_document_path": "data/mixed_content",
"output_dir": "./output/hybrid"
},
"pipeline": {
"workflows": ["extract_text", "extract_entities", "build_graph"],
"text_extraction": {
"chunk_size": 600,
"chunk_overlap": 150
},
"entity_extraction": {
"model_name": "llama3.1:8b",
"temperature": 0.15,
"entity_types": ["PERSON", "ORGANIZATION", "LOCATION", "CONCEPT"],
"confidence_threshold": 0.6
}
},
"ollama": {
"enabled": true,
"chat_model": "llama3.1:8b",
"fallback_to_hash": true // Graceful degradation if LLM fails
},
"performance": {
"batch_processing": true,
"batch_size": 32,
"worker_threads": 6,
"cache_embeddings": true
}
}
Troubleshooting
Autocomplete Not Working
Problem: No suggestions when typing
Solutions:
- ✅ Verify
$schemafield points to correct path - ✅ Check file extension is
.graphrag.json5or.json5 - ✅ Reload VSCode: Ctrl+Shift+P → “Reload Window”
- ✅ Check
.vscode/settings.jsonhas schema mapping - ✅ Ensure you’re in VSCode (not other editors)
Validation Errors
Problem: Red underlines everywhere
Common Fixes:
| Error | Fix |
|---|---|
Missing required field | Add required fields: mode, general |
Invalid enum value | Use Ctrl+Space to see valid options |
Number out of range | Hover to see valid range (e.g., 0.0-1.0) |
Wrong type | Ensure strings have quotes, numbers don’t |
Additional properties not allowed | Remove unsupported fields |
Example fixes:
// ❌ Wrong
{
"mode": { "approach": "semantic" },
"unsupported_field": "value" // Error: additional property
}
// ✅ Correct
{
"$schema": "../config/schema/graphrag-config.schema.json",
"mode": { "approach": "semantic" },
"general": {
"input_document_path": "data/input.txt",
"output_dir": "./output"
}
}
Schema Path Issues
Problem: VSCode can’t find schema
Solution: Use relative path from config file location:
{
// If config is in project root:
"$schema": "./config/schema/graphrag-config.schema.json",
// If config is in config/:
"$schema": "./schema/graphrag-config.schema.json",
// If config is in config/templates/:
"$schema": "../schema/graphrag-config.schema.json"
}
“Property keys must be doublequoted” Warning
Problem: VSCode shows warnings on unquoted keys (e.g., mode: {...})
Why This Happens:
- VSCode treats
.json5files as JSONC (JSON with Comments) - JSONC requires quoted keys:
"mode": {...} - JSON5 allows unquoted keys:
mode: {...}✅ Valid! - This is a false positive - your JSON5 syntax is correct
Example Warning:
{
mode: { // VSCode warning: "Property keys must be doublequoted"
approach: "semantic"
}
}
Solutions:
Option 1: Ignore the Warnings (Recommended)
- These are cosmetic warnings only
- Your JSON5 files are valid and will work correctly
- The warnings don’t affect functionality
Option 2: Install JSON5 Extension
- Install “JSON5 syntax” extension from VSCode marketplace
- Provides true JSON5 language support
- Eliminates false positives
Option 3: Use Quoted Keys
{
"mode": { // ✅ No warning with quoted keys
"approach": "semantic"
}
}
Trade-off: Loses the readability advantage of unquoted keys
Our Recommendation: Ignore the warnings. They’re false positives caused by VSCode’s JSONC mode not fully supporting JSON5’s unquoted key feature. Your configs are valid and will work correctly.
Best Practices
1. Always Use $schema Reference
{
// ✅ First line: enables autocomplete and validation
"$schema": "../config/schema/graphrag-config.schema.json",
// ... rest of config
}
This single line enables:
- ✅ Real-time autocomplete
- ✅ Instant error detection
- ✅ Hover documentation
- ✅ Type validation
2. Document with Comments
{
"pipeline": {
"text_extraction": {
// Research-based: LlamaIndex 2024 study shows 800-1024 optimal
// for narrative continuity and character relationship tracking.
// See: https://www.llamaindex.ai/blog/evaluating-chunk-size
"chunk_size": 800,
// 37.5% overlap preserves scene boundaries and dialogue context.
// Critical for maintaining character consistency across chunks.
// Pinecone 2024: "Chunking Strategies for LLM Applications"
"chunk_overlap": 300
}
}
}
3. Use Descriptive Filenames
✅ Good:
- narrative_dickens_analysis.graphrag.json5
- api_docs_v2_production.graphrag.json5
- legal_contracts_compliance.graphrag.json5
❌ Bad:
- config.json5
- test.json5
- c1.json5
4. Validate Before Running
# Always validate before deploying
uv run --with jsonschema --with json5 python scripts/validate_json5_configs.py \
--config production.graphrag.json5
5. Version Control Your Configs
git add my_project.graphrag.json5
git commit -m "feat: add GraphRAG config for project XYZ"
Keep configs in version control to track changes over time.
6. Document Custom Parameters
{
"entity_extraction": {
// Custom threshold chosen after A/B testing:
// - 0.7: 85% precision, 72% recall
// - 0.6: 78% precision, 84% recall ← chosen
// - 0.5: 65% precision, 91% recall
// Decision: Prioritize recall for this corpus (historical texts)
"confidence_threshold": 0.6
}
}
Advantages Summary
Why JSON5 for GraphRAG?
✅ Comments - Document configuration choices inline ✅ Autocomplete - VSCode suggests all available fields ✅ Validation - Catch errors before runtime ✅ Research Documentation - Cite sources directly in config ✅ Trailing Commas - More forgiving, easier editing ✅ Schema Support - Full IDE integration ✅ Better DX - Faster development, fewer errors ✅ Self-Documenting - Configuration explains itself
Available Templates (All Validated ✅)
All 13 templates pass JSON Schema validation:
- ✅
semantic_pipeline.graphrag.json5- General semantic - ✅
algorithmic_pipeline.graphrag.json5- General algorithmic - ✅
hybrid_pipeline.graphrag.json5- General hybrid - ✅
narrative_fiction.graphrag.json5- Novels, stories - ✅
technical_documentation.graphrag.json5- API docs, manuals - ✅
academic_research.graphrag.json5- Research papers - ✅
legal_documents.graphrag.json5- Contracts, regulations - ✅
web_blog_content.graphrag.json5- Blog posts, articles - ✅
dynamic_universal.graphrag.json5- Adaptive configuration - ✅
enrichment_example.graphrag.json5- Text enrichment - ✅
semantic.graphrag.json5- Basic semantic - ✅
algorithmic.graphrag.json5- Basic algorithmic - ✅
hybrid.graphrag.json5- Basic hybrid
Status: 13/13 pass JSON Schema validation
Additional Resources
- JSON Schema:
config/schema/graphrag-config.schema.json - Template Examples:
config/templates/*.graphrag.json5 - Validation Scripts:
scripts/README.md - VSCode Settings:
.vscode/settings.json - Code Snippets:
.vscode/graphrag.code-snippets
Common Questions
Q: What file extension should I use?
A: Use .graphrag.json5 for automatic schema mapping, or .json5 for general JSON5 files.
Q: Can I use regular JSON instead of JSON5? A: Yes! JSON5 is a superset of JSON. Any valid JSON is valid JSON5. But you’ll lose the ability to add comments.
Q: How do I know which template to use? A: Match your content type:
- Novels/stories →
narrative_fiction - API docs →
technical_documentation - Research papers →
academic_research - Legal docs →
legal_documents - Mixed content →
hybrid_pipeline
Q: What if I need to customize entity types?
A: Edit the entity_types array in your config:
"entity_types": [
"CUSTOM_TYPE_1",
"CUSTOM_TYPE_2",
"PERSON",
"LOCATION"
]
Q: How do I tune for my specific domain? A: Start with the closest template, then adjust:
chunk_size- larger for better context, smaller for precisionconfidence_threshold- higher for precision, lower for recallentity_types- add domain-specific typestemperature- lower for consistency, higher for variety
Ready to start?
cp config/templates/semantic_pipeline.graphrag.json5 my_config.graphrag.json5
code my_config.graphrag.json5
Press Ctrl+Space and let autocomplete guide you!
Auto-Save & Persistence
{{#include ../../../docs/AUTO_SAVE_CONFIGURATION.md}}
Summarization
{{#include ../../../docs/SUMMARIZATION_CONFIG.md}}
Entity Enrichment
{{#include ../../../docs/ENRICHMENT_USAGE_GUIDE.md}}
GLiNER-Relex Extraction
{{#include ../../../docs/GLINER_RELEX_GUIDE.md}}
Incremental Updates
{{#include ../../../docs/INCREMENTAL_UPDATES.md}}
Embeddings Reference
{{#include ../../../docs/EMBEDDINGS_REFERENCE.md}}
Model Recommendations
{{#include ../../../docs/BEST_MODELS_RECOMMENDATION.md}}
Qwen3 Integration
{{#include ../../../docs/QWEN3_INTEGRATION_GUIDE.md}}
GraphRAG Core
The core library for GraphRAG-rs, providing portable functionality for both native and WASM deployments.
Overview
graphrag-core is the foundational library that powers GraphRAG-rs. It provides:
- Embedding Generation: 8 provider backends (HuggingFace, OpenAI, Voyage AI, Cohere, Jina, Mistral, Together AI, Ollama)
- Entity Extraction: TRUE LLM-based gleaning extraction with multi-round refinement (Microsoft GraphRAG-style)
- Graph Construction: Incremental updates, PageRank, community detection
- Retrieval Strategies: Vector, BM25, PageRank, hybrid, adaptive
- Configuration System: Hierarchical TOML-based configuration with environment variable overrides
- Cross-Platform: Works on native (Linux, macOS, Windows) and WASM
Quick Start (5 Lines!)
use graphrag_core::prelude::*;
#[tokio::main]
async fn main() -> Result<()> {
let mut graphrag = GraphRAG::quick_start("Your document text here").await?;
let answer = graphrag.ask("What is the main topic?").await?;
println!("{}", answer);
Ok(())
}
Or with detailed explanations:
#![allow(unused)]
fn main() {
let explained = graphrag.ask_explained("What is the main topic?").await?;
println!("Answer: {}", explained.answer);
println!("Confidence: {:.0}%", explained.confidence * 100.0);
for step in &explained.reasoning_steps {
println!("Step {}: {}", step.step_number, step.description);
}
}
Installation
Add to your Cargo.toml:
[dependencies]
# Choose a feature bundle:
graphrag-core = { version = "0.1", features = ["starter"] } # Basic setup
# OR
graphrag-core = { version = "0.1", features = ["full"] } # Production-ready
# OR
graphrag-core = { version = "0.1", features = ["research"] } # Advanced features
Feature Bundles
| Bundle | Description | Includes |
|---|---|---|
starter | Minimal setup to get started | async, ollama, memory-storage, basic-retrieval |
full | Production-ready with common features | starter + pagerank, lightrag, caching, parallel-processing, leiden |
wasm-bundle | Browser-safe features only | memory-storage, basic-retrieval, leiden |
research | Advanced experimental features | full + rograg, cross-encoder, incremental, monitoring |
Three Ways to Configure
1. TypedBuilder (Compile-Time Safety)
#![allow(unused)]
fn main() {
use graphrag_core::prelude::*;
// Build won't compile until required fields are set!
let graphrag = TypedBuilder::new()
.with_output_dir("./output") // Required
.with_ollama() // Required: choose LLM backend
.with_chunk_size(512) // Optional
.with_top_k(10) // Optional
.build()?;
}
Available LLM backends:
.with_ollama()- Local Ollama (recommended).with_ollama_custom("host", 8080, "model")- Custom Ollama config.with_hash_embeddings()- Offline, no LLM needed.with_candle_embeddings()- Local neural embeddings
2. Hierarchical Config (with figment)
Enable with the hierarchical-config feature:
#![allow(unused)]
fn main() {
// Loads configuration from 5 sources (in priority order):
// 1. Code defaults (lowest priority)
// 2. ~/.graphrag/config.toml (user config)
// 3. ./graphrag.toml (project config)
// 4. Environment variables (GRAPHRAG_*)
// 5. Builder overrides (highest priority)
let config = Config::load()?; // Automatically merges all sources
let graphrag = GraphRAG::new(config)?;
}
Environment variable overrides:
export GRAPHRAG_OLLAMA_HOST=my-server
export GRAPHRAG_OLLAMA_PORT=8080
export GRAPHRAG_CHUNK_SIZE=1000
3. TOML Configuration File
# graphrag.toml
output_dir = "./output"
approach = "hybrid" # semantic, algorithmic, or hybrid
chunk_size = 1000
chunk_overlap = 200
[embeddings]
backend = "ollama"
dimension = 768
model = "nomic-embed-text:latest"
[ollama]
enabled = true
host = "localhost"
port = 11434
chat_model = "llama3.2:3b"
[entities]
min_confidence = 0.7
use_gleaning = true
max_gleaning_rounds = 3
entity_types = ["PERSON", "ORGANIZATION", "LOCATION", "DATE", "EVENT"]
Load with:
#![allow(unused)]
fn main() {
let config = Config::from_toml_file("graphrag.toml")?;
let graphrag = GraphRAG::new(config)?;
}
Sectoral Templates
Pre-configured templates for specific domains:
| Template | Best For | Entity Types |
|---|---|---|
general.toml | Mixed documents | PERSON, ORGANIZATION, LOCATION, DATE, EVENT |
legal.toml | Contracts, agreements | PARTY, JURISDICTION, CLAUSE_TYPE, OBLIGATION |
medical.toml | Clinical notes | PATIENT, DIAGNOSIS, MEDICATION, SYMPTOM |
financial.toml | Reports, filings | COMPANY, TICKER, MONETARY_VALUE, METRIC |
technical.toml | API docs, code | FUNCTION, CLASS, MODULE, API_ENDPOINT |
Using templates:
#![allow(unused)]
fn main() {
let config = Config::from_toml_file("templates/legal.toml")?;
}
Or via CLI:
graphrag-cli setup --template legal
Explained Answers
Get transparency into how answers are generated:
#![allow(unused)]
fn main() {
let explained = graphrag.ask_explained("Who founded the company?").await?;
// Access detailed information:
println!("Answer: {}", explained.answer);
println!("Confidence: {:.0}%", explained.confidence * 100.0);
// Reasoning trace
for step in &explained.reasoning_steps {
println!("{}. {} (confidence: {:.0}%)",
step.step_number,
step.description,
step.confidence * 100.0
);
}
// Source references
for source in &explained.sources {
println!("Source: {} ({:?})", source.id, source.source_type);
println!(" Excerpt: {}", source.excerpt);
}
// Or get formatted output
println!("{}", explained.format_display());
}
Output:
**Answer:** John Smith founded Acme Corp in 2015.
**Confidence:** 85%
**Reasoning:**
1. Analyzed query: "Who founded the company?" (confidence: 95%)
2. Found 3 relevant entities (confidence: 85%)
3. Retrieved 5 relevant text chunks (confidence: 85%)
4. Synthesized answer from retrieved information (confidence: 85%)
**Sources:**
1. [TextChunk] chunk_123 (relevance: 92%)
2. [Entity] john_smith (relevance: 88%)
Error Handling
Errors implement standard std::error::Error and carry descriptive messages:
#![allow(unused)]
fn main() {
match graphrag.ask("question").await {
Ok(answer) => println!("{}", answer),
Err(e) => {
println!("Error: {}", e);
}
}
}
CLI Setup Wizard
Interactive configuration wizard:
graphrag-cli setup
# With template:
graphrag-cli setup --template legal
# Custom output:
graphrag-cli setup --output ./my-config.toml
Wizard prompts:
- Select use case (General, Legal, Medical, Financial, Technical)
- Choose LLM provider (Ollama or pattern-based)
- Configure Ollama settings (if selected)
- Set output directory
Full Usage Example
use graphrag_core::prelude::*;
#[tokio::main]
async fn main() -> Result<()> {
// Option 1: Quick start (simplest)
let mut graphrag = GraphRAG::quick_start("Your document text").await?;
// Option 2: TypedBuilder (compile-time safe)
let mut graphrag = TypedBuilder::new()
.with_output_dir("./output")
.with_ollama()
.with_chunk_size(512)
.build_and_init()?;
// Add documents
graphrag.add_document_from_text("Document content here")?;
// Build knowledge graph
graphrag.build_graph().await?;
// Query
let answer = graphrag.ask("What are the main topics?").await?;
println!("{}", answer);
// Or with explanations
let explained = graphrag.ask_explained("What are the main topics?").await?;
println!("{}", explained.format_display());
Ok(())
}
Embedding Providers
GraphRAG Core supports 8 embedding backends:
| Provider | Cost | Quality | Feature Flag | Use Case |
|---|---|---|---|---|
| HuggingFace | Free | ★★★★ | huggingface-hub | Offline, 100+ models |
| OpenAI | $0.13/1M | ★★★★★ | ureq | Best quality |
| Voyage AI | Medium | ★★★★★ | ureq | Anthropic recommended |
| Cohere | $0.10/1M | ★★★★ | ureq | Multilingual (100+ langs) |
| Jina AI | $0.02/1M | ★★★★ | ureq | Cost-optimized |
| Mistral | $0.10/1M | ★★★★ | ureq | RAG-optimized |
| Together AI | $0.008/1M | ★★★★ | ureq | Cheapest |
| Ollama | Free | ★★★★ | ollama + async | Local GPU + LLM |
Advanced Features
LightRAG (Dual-Level Retrieval)
[retrieval]
strategy = "hybrid"
enable_lightrag = true # 6000x token reduction!
PageRank (Fast-GraphRAG)
[graph]
enable_pagerank = true # 27x performance boost
RoGRAG (Logic Form Reasoning)
#![allow(unused)]
fn main() {
// Enable with feature flag: rograg
let answer = graphrag.ask_with_reasoning("Why did X cause Y?").await?;
}
Intelligent Caching
[generation]
enable_caching = true # 80%+ hit rate, 6x cost reduction
Pipeline Architecture
GraphRAG uses a configurable pipeline with different methods for each phase:
┌─────────────────────────────────────────────────────────────────────────┐
│ build_graph() │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ │
│ │ CHUNKING │ TextProcessor splits document into chunks │
│ │ (always runs) │ Configurable: chunk_size, chunk_overlap │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ ENTITY EXTRACTION │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ Algorithmic │ │ Semantic │ │ Hybrid │ │ │
│ │ │ (pattern-based) │ │ (LLM-based) │ │ (both + fusion) │ │ │
│ │ │ Fast │ │ Accurate │ │ Balanced │ │ │
│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ RELATIONSHIP EXTRACTION │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ Co-occurrence │ │ LLM-based │ │ Gleaning │ │ │
│ │ │ entity proximity│ │ GraphRAG method │ │ multi-round LLM │ │ │
│ │ │ Fast │ │ Semantic │ │ Iterative │ │ │
│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │
│ │ Optional: config.graph.extract_relationships = true/false │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ GRAPH │ Entities + Relationships → KnowledgeGraph │
│ │ CONSTRUCTION │ Supports: PageRank, Community Detection │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ ask() / query │
├─────────────────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ │
│ │ EMBEDDING │ Generated on-demand (lazy evaluation) │
│ │ GENERATION │ 8 providers: Ollama, OpenAI, HuggingFace, etc. │
│ └────────┬────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ RETRIEVAL STRATEGIES │ │
│ │ Vector │ BM25 │ PageRank │ Hybrid │ Adaptive │ LightRAG │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ ANSWER │ LLM synthesis (if Ollama enabled) │
│ │ GENERATION │ Or: concatenated search results │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Phase Configuration Quick Reference
| Phase | Key Parameters | Config |
|---|---|---|
| 1. Chunking | chunk_size, chunk_overlap | chunk_size = 1000 |
| 2. Entity Extraction | approach, entity_types, use_gleaning | approach = "hybrid" |
| 3. Relationship Extraction | extract_relationships, use_gleaning | [graph] extract_relationships = true |
| 4. Graph Construction | enable_pagerank, max_connections | [graph] enable_pagerank = true |
| 5. Embedding | backend, dimension, model | [embeddings] backend = "ollama" |
| 6. Retrieval | strategy, top_k | [retrieval] strategy = "hybrid" |
| 7. Answer Generation | chat_model, temperature | [ollama] enabled = true |
Method Selection by Phase
| Phase | Methods Available | Config Setting |
|---|---|---|
| Entity Extraction | Algorithmic / Semantic / Hybrid | approach = "algorithmic|semantic|hybrid" |
| Relationship Extraction | Co-occurrence / LLM-based / Gleaning | entities.use_gleaning = true|false |
| Embedding | Ollama / Hash / OpenAI / HuggingFace / 8 providers | embeddings.backend = "ollama" |
| Retrieval | Vector / BM25 / PageRank / Hybrid / Adaptive / LightRAG | retrieval.strategy = "hybrid" |
Key Notes
- Embedding is NOT part of
build_graph()- generated lazily during queries - Relationship extraction is optional - controlled by
config.graph.extract_relationships - Gleaning extracts entities AND relationships together in multi-round LLM calls
- See HOW_IT_WORKS.md for the full pipeline + parameter reference
Module Structure
graphrag-core/
├── src/
│ ├── builder/ # TypedBuilder with type-state pattern
│ ├── config/ # Hierarchical configuration (figment)
│ ├── core/ # Core traits, errors with suggestions
│ ├── embeddings/ # 8 embedding providers
│ ├── entity/ # LLM-based gleaning extraction
│ ├── graph/ # Knowledge graph construction
│ ├── retrieval/ # ExplainedAnswer, search strategies
│ └── templates/ # Sectoral configuration templates
└── examples/
Testing
# Quick test with starter features
cargo test --features starter
# Full test suite
cargo test --all-features
# Test specific modules
cargo test --features starter builder::
cargo test --features starter retrieval::
Documentation
- HOW_IT_WORKS.md - 7-stage pipeline, approaches, embeddings, entity extraction, Ollama
- config/JSON5_CONFIG_GUIDE.md - Full JSON5/TOML configuration reference
- templates/README.md - Sectoral template guide
- CHANGELOG.md - Feature history and recent updates
- docs.rs/graphrag-core - Full API reference
Cross-Platform Support
- ✅ Linux - Full support with all features
- ✅ macOS - Full support with Metal GPU acceleration
- ✅ Windows - Full support with CUDA GPU acceleration
- ✅ WASM - Core functionality (use
wasm-bundlefeature)
License
MIT License - see ../LICENSE for details.
Part of the GraphRAG-rs project | Main README | How It Works
graphrag-cli
A modern Terminal User Interface (TUI) for GraphRAG operations, built with Ratatui.
Features
- Multi-pane TUI — Results viewer, Raw results, tabbed Info panel (Stats / Sources / History)
- Markdown rendering — LLM answers rendered with bold, italic, headers, bullet points, code blocks
- Three query modes — ASK (fast), EXPLAIN (confidence + sources), REASON (query decomposition)
- Zero-LLM support — Algorithmic pipeline with hash embeddings, no model required
- Vim-style navigation — j/k scrolling, Ctrl+1/2/3/4 focus switching
- Slash command system —
/config,/load,/mode,/reason,/export,/workspace, and more - Query history — Tracked per session, exportable to Markdown
- Workspace persistence — Save/load knowledge graphs to disk
- Direct integration — Uses
graphrag-coreas a library (no HTTP server needed)
Installation
cd graphrag-rs
# Debug build (fast compile)
cargo build -p graphrag-cli
# Release build (optimized)
cargo build -p graphrag-cli --release
Quick Start — Zero LLM (Symposium example)
Build a knowledge graph from Plato’s Symposium with no LLM required — pure algorithmic extraction using regex patterns, TF-IDF, BM25, and PageRank.
Option A — Interactive TUI
cd /home/dio/graphrag-rs
cargo run -p graphrag-cli -- tui
Then inside the TUI:
/config tests/e2e/configs/algo_hash_medium__symposium.json5
/load docs-example/Symposium.txt
Who is Socrates and what is his role in the Symposium?
Graph builds in ~3-5 seconds. No Ollama needed.
Option B — TUI with config pre-loaded
cargo run -p graphrag-cli -- tui \
--config tests/e2e/configs/algo_hash_medium__symposium.json5
Then just:
/load docs-example/Symposium.txt
What is Eros according to Aristophanes?
Option C — Benchmark (non-interactive, JSON output)
cargo run -p graphrag-cli -- bench \
--config tests/e2e/configs/algo_hash_medium__symposium.json5 \
--book docs-example/Symposium.txt \
--questions "Who is Socrates?|What is love according to Aristophanes?|What is the Ladder of Beauty?"
Outputs structured JSON with timings, entity counts, answers, confidence scores, and source references.
Available configs
| Config | Graph building | Embeddings | LLM synthesis | Speed |
|---|---|---|---|---|
algo_hash_small__symposium.json5 | NLP/regex | Hash (256d) | ❌ none | ~1-2s |
algo_hash_medium__symposium.json5 | NLP/regex | Hash (384d) | ❌ none | ~3-5s |
algo_nlp_mistral__symposium.json5 | NLP/regex | nomic-embed-text | ✅ mistral-nemo | ~5-15s* |
kv_no_gleaning_mistral__symposium.json5 | LLM single-pass | nomic-embed-text | ✅ mistral-nemo | ~30-60s |
* build ~5s, synthesis ~5-10s per question (with KV cache after the first)
algo_nlp_mistral__symposium.json5 is the recommended config for anyone who wants:
- a graph built quickly with classic NLP methods (no LLM at build time)
- real semantic search with
nomic-embed-text - answers synthesized by Mistral at query time with KV cache enabled
Quick Start — With Ollama (full semantic pipeline)
Requires Ollama running with nomic-embed-text and an LLM (e.g. mistral-nemo:latest).
cargo run -p graphrag-cli -- tui \
--config tests/e2e/configs/kv_no_gleaning_mistral__symposium.json5
Inside TUI:
/load docs-example/Symposium.txt
/mode explain
How does Diotima describe the ascent to absolute beauty?
The EXPLAIN mode shows confidence score and source references in the Sources tab (Ctrl+4 → Ctrl+N).
CLI Commands
graphrag-cli [OPTIONS] [COMMAND]
Options:
-c, --config <FILE> Configuration file to pre-load
-w, --workspace <NAME> Workspace name
-d, --debug Enable debug logging
--format <text|json> Output format (default: text)
Commands:
tui Start interactive TUI (default)
setup Interactive wizard to create a config file
validate Validate a configuration file
bench Run full E2E benchmark (Init → Load → Query)
workspace Manage workspaces (list, create, info, delete)
bench example
cargo run -p graphrag-cli -- bench \
-c my_config.json5 \
-b my_document.txt \
-q "Question 1?|Question 2?|Question 3?"
Output JSON includes: init_ms, build_ms, total_query_ms, entities, relationships, chunks, per-query answer, confidence, sources.
TUI Layout
┌─────────────────────────────────────────────────────────────┐
│ Query Input (Ctrl+1) (type queries or /commands here) │
├────────────────────────────────────┬────────────────────────┤
│ Results Viewer (Ctrl+2) │ Info Panel (Ctrl+4) │
│ Markdown-rendered LLM answer │ ┌─Stats─┬─Sources─┬ │
│ with confidence header in EXPLAIN │ │ │History │ │
│ mode: [EXPLAIN | 85% ████████░░] │ └───────┴─────────┘ │
├────────────────────────────────────┤ Ctrl+N cycles tabs │
│ Raw Results (Ctrl+3) │ (when Info focused) │
│ Sources list / search results │ │
│ before LLM processing │ │
└────────────────────────────────────┴────────────────────────┘
│ Status Bar [mode badge] ℹ status message │
└─────────────────────────────────────────────────────────────┘
Keyboard Shortcuts
Global (IDE-Safe)
| Key | Action |
|---|---|
? / Ctrl+H | Toggle help overlay |
Ctrl+C | Quit |
Ctrl+N | Cycle focus forward (Input → Results → Raw → Info) |
Ctrl+P | Cycle focus backward |
Ctrl+1 | Focus Query Input |
Ctrl+2 | Focus Results Viewer |
Ctrl+3 | Focus Raw Results |
Ctrl+4 | Focus Info Panel |
Ctrl+N (Info Panel focused) | Cycle tabs: Stats → Sources → History |
Esc | Return focus to input |
Input Box
| Key | Action |
|---|---|
Enter | Submit query or /command |
Ctrl+D | Clear input |
Scrolling (when viewer focused)
| Key | Action |
|---|---|
j / ↓ | Scroll down one line |
k / ↑ | Scroll up one line |
Alt+↓ / Alt+↑ | Scroll down/up (works even from input) |
PageDown / Ctrl+D | Scroll down one page |
PageUp / Ctrl+U | Scroll up one page |
Home / End | Jump to top / bottom |
Slash Commands
| Command | Description |
|---|---|
/config <file> | Load a config file (JSON5, JSON, TOML) |
/config show | Display the currently loaded config |
/load <file> | Load and process a document |
/load <file> --rebuild | Force full rebuild before loading |
/clear | Clear graph (keep documents) |
/rebuild | Re-extract from loaded documents |
/stats | Show entity/relationship/chunk counts |
/entities [filter] | List entities, optionally filtered |
/mode ask|explain|reason | Switch query mode (sticky) |
/reason <query> | One-shot reasoning query (decomposition) |
/export <file.md> | Export query history to Markdown |
/workspace list | List saved workspaces |
/workspace save <name> | Save current graph to disk |
/workspace <name> | Load a saved workspace |
/workspace delete <name> | Delete a workspace |
/help | Show full command help |
Query Modes
Switch with /mode <mode> or the badge in the status bar shows the active mode.
| Mode | Command | What it does |
|---|---|---|
ASK (default) | /mode ask | Plain answer, fastest |
EXPLAIN | /mode explain | Answer + confidence score + source references; Sources tab auto-opens |
REASON | /mode reason | Query decomposition — splits complex questions into sub-queries |
One-shot override (doesn’t change sticky mode):
/reason Compare the main arguments of each speaker about love
Architecture
graphrag-cli/src/
├── main.rs # CLI entry point (clap)
├── app.rs # Main event loop, action routing
├── action.rs # Action enum, QueryMode, QueryExplainedPayload
├── commands/mod.rs # Slash command parser
├── config.rs # Config file loading (JSON5/JSON/TOML)
├── theme.rs # Dark/light color themes
├── tui.rs # Terminal setup/teardown
├── query_history.rs # Per-session query history
├── workspace.rs # Workspace metadata management
├── mode.rs # Input mode detection
├── handlers/
│ ├── graphrag.rs # Thread-safe GraphRAG wrapper (Arc<Mutex<>>)
│ ├── bench.rs # Benchmark runner (JSON output)
│ └── file_ops.rs # File utilities
└── ui/
├── markdown.rs # Markdown → ratatui Line<'static> parser
├── spinner.rs # Braille spinner animation
└── components/
├── query_input.rs # Text input widget
├── results_viewer.rs # Markdown-rendered answer + scrollbar
├── raw_results_viewer.rs # Raw search results
├── info_panel.rs # 3-tab panel (Stats/Sources/History)
├── status_bar.rs # Status + query mode badge
└── help_overlay.rs # Modal help popup
Technology Stack
- Ratatui 0.29 — TUI framework (immediate mode rendering)
- Crossterm 0.28 — Cross-platform terminal events
- tui-textarea 0.7 — Multi-line input widget
- Tokio 1.32 — Async runtime
- Clap 4.5 — CLI argument parsing
- Dialoguer 0.11 — Interactive setup wizard
- color-eyre 0.6 — Error reporting
- graphrag-core — Knowledge graph engine (direct library call)
License
Same license as the parent graphrag-rs project.
GraphRAG Server
Production-ready REST API server for GraphRAG with multiple backend options.
Migration Notice: The server has been migrated from Axum to Actix-web 4.9 with Apistos for automatic OpenAPI 3.0.3 documentation generation. All endpoints remain the same, but the server now includes automatic API documentation at
/openapi.json.
Features
Storage Backends
- ✅ Qdrant Integration - Production vector database with 100M+ vectors support (client-server)
- ✅ LanceDB Integration - Serverless embedded database for native/desktop apps
- ✅ Graceful Fallback - Works without external database (in-memory mode)
Embeddings
- ✅ Ollama Integration - Local embeddings via Ollama (nomic-embed-text, etc.)
- ✅ Hash-based Fallback - Deterministic embeddings without external dependencies
- ✅ Auto-detection - Automatically uses Ollama if available, falls back otherwise
API Features
- ✅ REST API - Clean HTTP endpoints for all operations powered by Actix-web 4.9
- ✅ OpenAPI 3.0.3 - Automatic API documentation via Apistos
- ✅ Swagger UI - Interactive API explorer at
/swagger - ✅ Vector Search - Semantic search with cosine similarity
- ✅ Real Embeddings - Generate actual embeddings for queries and documents
- ✅ CORS Support - Ready for browser clients
- ✅ Health Checks - Monitor server and database status
- ✅ Metrics - Query counts, embedding statistics, and performance tracking
- ✅ Entity/Relationship Storage - Store graph metadata in vector database payloads
Quick Start
1. Start Qdrant (Docker)
cd graphrag-server
docker-compose up -d
# Or manually:
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
2. Start GraphRAG Server
# With Qdrant (recommended)
cargo run --bin graphrag-server --features qdrant
# Without Qdrant (in-memory mode)
cargo run --bin graphrag-server --no-default-features
Server starts on http://0.0.0.0:8080
API Documentation:
- OpenAPI Spec:
http://localhost:8080/openapi.json - Swagger UI:
http://localhost:8080/swagger
3. Test API
# Health check
curl http://localhost:8080/health
# Add a document
curl -X POST http://localhost:8080/api/documents \
-H "Content-Type: application/json" \
-d '{
"title": "GraphRAG Introduction",
"content": "GraphRAG combines knowledge graphs with retrieval-augmented generation for enhanced AI systems."
}'
# Query
curl -X POST http://localhost:8080/api/query \
-H "Content-Type: application/json" \
-d '{
"query": "What is GraphRAG?",
"top_k": 5
}'
Configuration
Set via environment variables:
# Embeddings (choose backend)
export EMBEDDING_BACKEND="ollama" # or "hash" for fallback
export EMBEDDING_DIM="384" # 384 for MiniLM, 768 for BERT
export OLLAMA_URL="http://localhost"
export OLLAMA_EMBEDDING_MODEL="nomic-embed-text" # or "mxbai-embed-large"
# Qdrant connection (optional)
export QDRANT_URL="http://localhost:6334"
export COLLECTION_NAME="graphrag"
# Run server
cargo run --bin graphrag-server --features ollama
Feature Flags
# With Qdrant + Ollama embeddings (recommended for production)
cargo run --bin graphrag-server --features "qdrant,ollama"
# With LanceDB (serverless, embedded)
cargo run --bin graphrag-server --features "lancedb,ollama"
# Minimal (hash-based embeddings, in-memory storage)
cargo run --bin graphrag-server --no-default-features
# With authentication
cargo run --bin graphrag-server --features "qdrant,ollama,auth"
API Endpoints
Health & Info
GET /
API information and available endpoints.
curl http://localhost:8080/
GET /health
Health check with statistics.
curl http://localhost:8080/health
Response:
{
"status": "healthy",
"timestamp": "2025-10-01T12:00:00Z",
"document_count": 42,
"graph_built": true,
"total_queries": 1337,
"backend": "qdrant",
"embeddings": {
"backend": "ollama",
"available": true,
"stats": {
"total_requests": 100,
"ollama_success": 95,
"ollama_failures": 5,
"fallback_used": 5
}
}
}
Configuration
The server now supports dynamic configuration via JSON REST API, allowing you to initialize the full GraphRAG pipeline without TOML files.
GET /api/config
Get the current configuration.
curl http://localhost:8080/api/config
Response:
{
"success": true,
"config": {
"output_dir": "./output",
"chunk_size": 1000,
"chunk_overlap": 200,
"embeddings": { ... },
"graph": { ... },
...
},
"graphrag_initialized": true
}
POST /api/config
Set configuration and initialize the full GraphRAG pipeline.
curl -X POST http://localhost:8080/api/config \
-H "Content-Type: application/json" \
-d '{
"output_dir": "./output",
"chunk_size": 1000,
"chunk_overlap": 200,
"embeddings": {
"backend": "ollama",
"dimension": 768,
"model": "nomic-embed-text",
"fallback_to_hash": true,
"batch_size": 32
},
"graph": {
"max_connections": 25,
"similarity_threshold": 0.75
},
"text": {
"chunk_size": 1000,
"chunk_overlap": 200,
"languages": ["en"]
},
"entities": {
"min_confidence": 0.65,
"entity_types": ["PERSON", "CONCEPT", "LOCATION", "EVENT", "ORGANIZATION"]
},
"retrieval": {
"top_k": 15,
"search_algorithm": "cosine"
},
"parallel": {
"num_threads": 8,
"enabled": true,
"min_batch_size": 10,
"chunk_batch_size": 100,
"parallel_embeddings": true,
"parallel_graph_ops": true,
"parallel_vector_ops": true
},
"ollama": {
"enabled": true,
"host": "http://localhost",
"port": 11434,
"embedding_model": "nomic-embed-text",
"chat_model": "llama3.1:8b",
"timeout_seconds": 300,
"max_retries": 3,
"fallback_to_hash": true
},
"enhancements": {
"enabled": true
}
}'
GET /api/config/template
Get configuration templates with examples (minimal, ollama_production, high_performance).
curl http://localhost:8080/api/config/template
Response:
{
"template": { ... },
"description": "Full GraphRAG configuration template with all options",
"examples": [
{
"name": "minimal",
"description": "Minimal configuration with hash-based embeddings",
"config": { ... }
},
{
"name": "ollama_production",
"description": "Production setup with Ollama LLM and real embeddings",
"config": { ... }
},
{
"name": "high_performance",
"description": "Optimized for speed with parallel processing",
"config": { ... }
}
]
}
GET /api/config/default
Get the default configuration.
curl http://localhost:8080/api/config/default
POST /api/config/validate
Validate configuration without applying it.
curl -X POST http://localhost:8080/api/config/validate \
-H "Content-Type: application/json" \
-d '{ ... config object ... }'
Response:
{
"valid": true,
"message": "Configuration is valid"
}
Documents
POST /api/documents
Add a document to the knowledge graph.
curl -X POST http://localhost:8080/api/documents \
-H "Content-Type: application/json" \
-d '{
"title": "My Document",
"content": "Document content here..."
}'
Response:
{
"success": true,
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"message": "Document added to Qdrant successfully",
"backend": "qdrant"
}
GET /api/documents
List all documents.
curl http://localhost:8080/api/documents
DELETE /api/documents/:id
Delete a document by ID.
curl -X DELETE http://localhost:8080/api/documents/550e8400-e29b-41d4-a716-446655440000
Query
POST /api/query
Query the knowledge graph with semantic search.
curl -X POST http://localhost:8080/api/query \
-H "Content-Type: application/json" \
-d '{
"query": "How does GraphRAG work?",
"top_k": 5
}'
Response:
{
"query": "How does GraphRAG work?",
"results": [
{
"document_id": "doc-1",
"title": "GraphRAG Overview",
"similarity": 0.92,
"excerpt": "GraphRAG combines knowledge graphs with retrieval..."
}
],
"processing_time_ms": 15,
"backend": "qdrant"
}
Graph Operations
POST /api/graph/build
Build/rebuild the knowledge graph.
curl -X POST http://localhost:8080/api/graph/build
GET /api/graph/stats
Get graph statistics.
curl http://localhost:8080/api/graph/stats
Response:
{
"document_count": 42,
"entity_count": 420,
"relationship_count": 630,
"vector_count": 840,
"graph_built": true,
"backend": "qdrant"
}
Architecture
With Qdrant (Production)
┌─────────────────┐
│ REST Client │ (Browser, CLI, etc.)
└────────┬────────┘
│ HTTP
┌────────▼─────────────────────┐
│ GraphRAG Server │
│ ┌──────────────────────┐ │
│ │ Actix-web REST API │ │
│ │ + Apistos OpenAPI │ │
│ │ + CORS │ │
│ │ + Tracing │ │
│ └──────────┬───────────┘ │
│ │ │
│ ┌──────────▼───────────┐ │
│ │ Qdrant Client │ │
│ │ + Vector Search │ │
│ │ + Metadata Storage │ │
│ └──────────┬───────────┘ │
└──────────────┼────────────────┘
│ gRPC (port 6334)
┌──────────────▼────────────────┐
│ Qdrant Vector Database │
│ + 100M+ vector capacity │
│ + JSON payload storage │
│ + Filtering & search │
└───────────────────────────────┘
Without Qdrant (Development/Testing)
┌─────────────────┐
│ REST Client │
└────────┬────────┘
│ HTTP
┌────────▼─────────────────────┐
│ GraphRAG Server │
│ ┌──────────────────────┐ │
│ │ Actix-web REST API │ │
│ │ + Apistos OpenAPI │ │
│ └──────────┬───────────┘ │
│ │ │
│ ┌──────────▼───────────┐ │
│ │ In-Memory Storage │ │
│ │ + Vec<Document> │ │
│ │ + Keyword matching │ │
│ └──────────────────────┘ │
└───────────────────────────────┘
Qdrant Storage Schema
Collection Configuration
- Name:
graphrag(configurable) - Dimension: 384 (MiniLM) or 768 (BERT)
- Distance: Cosine similarity
- Indexing: HNSW (Hierarchical Navigable Small World)
Document Payload Structure
Each document in Qdrant stores:
{
"id": "doc-uuid",
"title": "Document Title",
"text": "Full document text",
"chunk_index": 0,
"entities": [
{
"id": "entity-uuid",
"name": "Entity Name",
"entity_type": "Person|Organization|Location",
"properties": {}
}
],
"relationships": [
{
"source": "entity-1",
"relation": "WORKS_FOR",
"target": "entity-2",
"properties": {}
}
],
"timestamp": "2025-10-01T12:00:00Z",
"custom": {}
}
Development
Build
# Development build
cargo build --bin graphrag-server
# Production build with optimizations
cargo build --release --bin graphrag-server
Test
# Unit tests
cargo test --bin graphrag-server
# Integration tests (requires Qdrant running)
docker-compose up -d
cargo test --bin graphrag-server --features qdrant -- --test-threads=1
Run
# Development mode with auto-reload
cargo watch -x 'run --bin graphrag-server'
# Production mode
cargo run --release --bin graphrag-server
TODO
Short Term
- Real embedding generation (Ollama integrated)
- OpenAPI 3.0.3 documentation (via Apistos)
- Swagger UI integration (apistos
swagger-ui, served at/swagger) - Entity extraction from documents
- Relationship extraction
- Batch document upload
- Pagination for document listing
Medium Term
- Authentication & authorization (feature temporarily disabled)
- Rate limiting
- OpenTelemetry metrics
- Prometheus endpoint
- API versioning
Long Term
- GraphQL API
- WebSocket support for streaming
- Multi-tenant support
- Advanced graph algorithms (PageRank, community detection)
- LanceDB integration (alternative to Qdrant)
Deployment
Docker
# Coming soon
FROM rust:1.75 AS builder
WORKDIR /app
COPY . .
RUN cargo build --release --bin graphrag-server
FROM debian:bookworm-slim
COPY --from=builder /app/target/release/graphrag-server /usr/local/bin/
EXPOSE 8080
CMD ["graphrag-server"]
Docker Compose (Full Stack)
version: '3.8'
services:
qdrant:
image: qdrant/qdrant:latest
ports:
- "6333:6333"
- "6334:6334"
volumes:
- qdrant_data:/qdrant/storage
graphrag-server:
build: .
ports:
- "8080:8080"
environment:
- QDRANT_URL=http://qdrant:6334
- COLLECTION_NAME=graphrag
- EMBEDDING_DIM=384
depends_on:
- qdrant
Performance
Benchmarks (Preliminary)
Hardware: M1 MacBook Pro, 16GB RAM
| Operation | Qdrant Backend | In-Memory |
|---|---|---|
| Add document | 5-10ms | <1ms |
| Query (top 10) | 10-20ms | 5-10ms |
| Build graph (1k docs) | ~2s | ~1s |
| Build graph (10k docs) | ~15s | ~8s |
Note: Qdrant scales much better for large datasets (100k+ documents).
Troubleshooting
“Could not connect to Qdrant”
Cause: Qdrant not running or wrong URL.
Solution:
# Check Qdrant is running
docker ps | grep qdrant
# Start if not running
docker-compose up -d
# Verify connection
curl http://localhost:6333/healthz
“Collection not found”
Cause: Collection not created.
Solution: Server auto-creates collection on first run. Check logs:
cargo run --bin graphrag-server 2>&1 | grep collection
Slow query performance
Cause: Large dataset without proper indexing.
Solutions:
- Ensure HNSW indexing is enabled in Qdrant
- Adjust
top_kparameter (lower = faster) - Use filters to narrow search space
License
MIT
Credits
- Qdrant - https://qdrant.tech/
- Actix-web - https://actix.rs/
- Apistos - https://github.com/netwo-io/apistos (OpenAPI 3.0.3 documentation)
- GraphRAG - https://github.com/automataIA/graphrag-rs
Backend Comparison
Qdrant
Best for: Production deployments, cloud environments, microservices
- ✅ Scales to 100M+ vectors
- ✅ Distributed deployment support
- ✅ Advanced filtering and search
- ✅ Persistent storage with automatic backups
- Requires separate server (Docker/cloud)
LanceDB
Best for: Desktop apps, native applications, embedded use cases
- ✅ No server required (embedded)
- ✅ Zero-copy data access
- ✅ Automatic versioning
- ✅ Works offline
- Single-process access
- Placeholder implementation (see lancedb_store.rs for integration guide)
In-Memory
Best for: Development, testing, demos
- ✅ No dependencies
- ✅ Fast for small datasets
- Data lost on restart
- Limited scalability
Embeddings Backends
Ollama (Recommended)
Best for: Local development, privacy-focused deployments
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull embedding model
ollama pull nomic-embed-text # 384 dimensions, 274MB
# or
ollama pull mxbai-embed-large # 1024 dimensions, 670MB
# Start server with Ollama
EMBEDDING_BACKEND=ollama cargo run --bin graphrag-server --features "qdrant,ollama"
Pros:
- ✅ Real semantic embeddings
- ✅ Local/private (no API calls)
- ✅ Multiple model options
- ✅ Automatic fallback if unavailable
Cons:
- Requires Ollama service running
- Slower than hash-based (100-200ms per embedding)
Hash-based Fallback
Best for: Testing, offline environments, minimal dependencies
# Start server with hash embeddings (no Ollama needed)
EMBEDDING_BACKEND=hash cargo run --bin graphrag-server
Pros:
- ✅ No external dependencies
- ✅ Fast (<1ms per embedding)
- ✅ Deterministic
- ✅ Works offline
Cons:
- Not semantic (hash-based, not neural)
- Lower search quality
- Fixed dimension (384)
Example Workflows
Production Setup (Qdrant + Ollama)
# 1. Start Qdrant
docker-compose up -d
# 2. Start Ollama
ollama serve &
ollama pull nomic-embed-text
# 3. Start GraphRAG server
export EMBEDDING_BACKEND=ollama
export QDRANT_URL=http://localhost:6334
cargo run --release --bin graphrag-server --features "qdrant,ollama"
# 4. Add documents with real embeddings
curl -X POST http://localhost:8080/api/documents \
-H "Content-Type: application/json" \
-d '{"title":"AI Safety","content":"AI safety research focuses on..."}'
# 5. Query with semantic search
curl -X POST http://localhost:8080/api/query \
-H "Content-Type: application/json" \
-d '{"query":"Tell me about AI safety","top_k":5}'
Desktop App (LanceDB + Ollama)
# 1. Start Ollama
ollama serve &
ollama pull nomic-embed-text
# 2. Start GraphRAG with LanceDB (embedded)
export EMBEDDING_BACKEND=ollama
export LANCEDB_PATH=./data/graphrag.lance
cargo run --release --bin graphrag-server --features "lancedb,ollama"
# No external database needed! Data stored in ./data/
Minimal Setup (Hash embeddings)
# Just run the server - no dependencies!
EMBEDDING_BACKEND=hash cargo run --bin graphrag-server --no-default-features
# Works immediately with hash-based embeddings
Architecture
┌─────────────────────────────────────────────────────────────┐
│ GraphRAG Server │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Embedding │ │ Storage │ │
│ │ Service │ │ Backend │ │
│ │ │ │ │ │
│ │ - Ollama │ │ - Qdrant │ │
│ │ - Hash │ │ - LanceDB │ │
│ │ Fallback │ │ - Memory │ │
│ └──────────────┘ └──────────────┘ │
│ │ │ │
│ └──────────┬───────────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ REST API │ │
│ └───────────┘ │
└─────────────────────────────────────────────────────────────┘
Performance
Embeddings
- Ollama (nomic-embed-text): ~100-200ms per document
- Hash-based: <1ms per document
- Caching: Automatic with LRU cache
Vector Search
- Qdrant: <50ms for 1M vectors with HNSW index
- LanceDB: <100ms for 100K vectors
- In-memory: <10ms for 10K vectors
Troubleshooting
Ollama not connecting
# Check Ollama is running
curl http://localhost:11434/api/tags
# Check model is available
ollama list | grep nomic-embed-text
# Pull model if missing
ollama pull nomic-embed-text
Qdrant connection failed
# Check Qdrant is running
curl http://localhost:6333/
# Check Docker container
docker ps | grep qdrant
# Restart Qdrant
docker-compose restart
Slow embedding generation
# Use smaller model
ollama pull nomic-embed-text # 384 dim, faster
# Or use hash fallback for testing
export EMBEDDING_BACKEND=hash
Migration to Actix-web + Apistos
What Changed?
Previous Stack:
- Web Framework: Axum 0.8
- Documentation: Manual/external tools
Current Stack:
- Web Framework: Actix-web 4.9 (high-performance, production-ready)
- Documentation: Apistos 0.6 (automatic OpenAPI 3.0.3 generation)
- API Schema: Automatically generated from Rust types
Benefits
- Automatic API Documentation: OpenAPI 3.0.3 spec generated directly from code
- Type-Safe Schemas: Request/response models automatically documented via
#[derive(JsonSchema, ApiComponent)] - Production-Ready: Actix-web is battle-tested in high-traffic production environments
- Better Error Handling: Structured error responses with OpenAPI documentation
Breaking Changes
None! All API endpoints remain identical. Clients don’t need any changes.
Temporary Limitations
- Authentication feature disabled: The
authfeature requires middleware migration and is temporarily unavailable. Will be re-enabled in a future update. - Swagger UI setup incomplete: Basic OpenAPI spec is generated, but interactive Swagger UI is not yet fully configured (coming soon).
Developer Notes
When adding new endpoints:
#![allow(unused)]
fn main() {
use apistos::api_operation;
use apistos_gen::ApiErrorComponent;
use schemars::JsonSchema;
// Annotate request/response models
#[derive(Serialize, Deserialize, JsonSchema, ApiComponent)]
pub struct MyRequest {
#[schemars(example = "example_value")]
pub field: String,
}
// Annotate handlers
#[api_operation(
tag = "my_tag",
summary = "Short description",
description = "Detailed description",
error_code = 400,
error_code = 500
)]
async fn my_handler(
state: Data<AppState>,
body: Json<MyRequest>,
) -> Result<Json<MyResponse>, ApiError> {
// Handler logic
}
// Register with Apistos routing
.service(
scope("/api/my-endpoint")
.service(resource("").route(post().to(my_handler)))
)
}
License
See LICENSE in the root directory.
GraphRAG WASM — Browser-Native Knowledge Graph RAG
A complete GraphRAG pipeline — document ingestion, knowledge-graph build, retrieval, and LLM synthesis — running entirely in the browser via WebAssembly. No server required (an optional local Ollama backend is supported).
Quick Start
rustup target add wasm32-unknown-unknown
cargo install trunk
cd graphrag-wasm
trunk serve # dev server on http://localhost:8080
trunk build --release # production bundle in dist/
The UI: a 3-column chat shell
The interface is a single Nordic-Minimal chat shell (no tabs, no DaisyUI — a flat
hand-written stylesheet). See Chat discussion.html for the
reference mockup the layout mirrors verbatim.
| Column | Contents |
|---|---|
| LeftRail | Brand, source documents, Flat/Hierarchy toggle, Build button |
| Stage | Active source header, the thread of question/answer turns, the composer input |
| RightRail | Per-query subgraph SVG, pipeline progress rows, mini-stats, reference cards |
Answers are streamed token-by-token; inline citations ([1], [2]…) link to
reference cards in the RightRail. The per-query subgraph unions the entities from the
top-K retrieved chunks and lays them out with a built-in force-directed layout.
How it works (end-to-end, in the browser)
- Document processing — chunking with configurable size/overlap.
- Entity extraction — rule-based / WebLLM-assisted extraction.
- Embeddings — ONNX Runtime Web (MiniLM-L6), run off the main thread
(
ort.env.wasm.proxy = true) so the UI never blocks during inference. - Knowledge graph — in-memory entities, chunks, and relationships.
- Retrieval — pure-Rust cosine similarity, top-K via
VectorIndex::search. - Synthesis — WebLLM (in-browser) or Ollama (local server); citations are post-processed and wired to reference cards.
Documents persist across reloads in IndexedDB (see src/persist.rs).
What comes from graphrag-core vs. reimplemented here
This crate is not a mock — it links graphrag-core (path dependency, wasm-safe
feature subset) and drives a real graphrag_core::GraphRAG instance: document ingestion
(add_document_from_text), the knowledge-graph types (Entity, Relationship), Leiden
community detection, and adaptive query routing all come straight from core.
The ML hot-path stages are reimplemented browser-side, because core’s native backends (Ollama HTTP, candle, the LLM extractors) do not run inside a browser:
| Stage | Source |
|---|---|
| Document ingestion, graph types, Leiden, adaptive routing | graphrag-core |
| Embeddings | wasm-side onnx_embedder.rs (ONNX Runtime Web / WebGPU, hash fallback) |
| Entity extraction | wasm-side entity_extractor.rs (WebLLM-assisted or rule-based) |
| Vector search | wasm-side vector_search.rs (pure-Rust cosine) |
Note:
src/lib.rsalso exposes a separatewasm_bindgenGraphRAGwrapper for direct JS use (new GraphRAG(384)+ pure vector search) — distinct fromgraphrag_core::GraphRAGdespite the shared name.
LLM backends: WebLLM vs Ollama
WebLLM (default) — 100% in-browser via WebGPU
import { UnifiedLlmClient } from './graphrag_wasm.js';
const llm = UnifiedLlmClient.withWebLLM("Phi-3-mini-4k-instruct-q4f16_1-MLC");
llm.setTemperature(0.7);
const answer = await llm.generate("What is GraphRAG?");
- ✅ Full privacy (no data leaves the browser), works offline after model download.
- First load downloads the model (~1–2 GB); needs a WebGPU-capable browser; small models only (1–3B).
WebLLM and ONNX inference both run in dedicated web workers
(webllm-worker.js + ORT’s proxy worker), keeping main-thread blocking under ~50 ms.
Ollama HTTP — local server, larger models
const llm = UnifiedLlmClient.withOllama("http://localhost:11434", "llama3.1:8b");
const answer = await llm.generate("What is GraphRAG?");
- ✅ 7B–70B+ models, better quality, full GPU (CUDA/Metal).
- Requires a running Ollama server + CORS:
ollama pull llama3.1:8b
OLLAMA_ORIGINS="http://localhost:8080" ollama serve
UnifiedLlmClient exposes the same generate / chat / checkAvailability API for both
backends, so switching is a one-line change.
Tech stack
| Component | Technology |
|---|---|
| UI | Leptos (reactive Rust) |
| Build | Trunk |
| Styling | flat Nordic-Minimal CSS (tailwind.css, no @tailwind directives) |
| Tokenizer | HuggingFace tokenizers (unstable_wasm) |
| Embeddings | ONNX Runtime Web (off-main-thread, optional WebGPU) |
| LLM | WebLLM (in-browser) or Ollama HTTP |
| Vector search | pure Rust (cosine similarity) |
| Storage | IndexedDB |
Project layout
graphrag-wasm/
├── src/
│ ├── main.rs # chat-shell UI (LeftRail / Stage / RightRail)
│ ├── components/
│ │ ├── chat_shell.rs # data types, citation parser, subgraph builder
│ │ └── force_layout.rs # force-directed subgraph layout
│ ├── webllm.rs # WebLLM client (+ web-worker engine)
│ ├── ollama_http.rs # Ollama HTTP client
│ ├── llm_provider.rs # UnifiedLlmClient abstraction
│ ├── onnx_embedder.rs # ONNX Runtime Web embeddings
│ ├── vector_search.rs # cosine similarity
│ └── persist.rs # IndexedDB persistence
├── webllm-worker.js # WebWorker MLC engine handler
├── index.html # entry point + ORT/WebLLM worker wiring
├── tailwind.css # flat stylesheet
└── Trunk.toml # build config
Browser support
Chrome/Edge 87+, Firefox 89+, Safari 15.2+ (incl. mobile). Requires WebAssembly + ES2020 modules; WebGPU is optional (accelerates embeddings/WebLLM when present).
Tests
A Playwright parity test (tests/playwright/chat_layout.sh) asserts the WASM SPA matches
the mockup on 19 shared selectors. Unit tests:
cargo test --target wasm32-unknown-unknown
License
See the main repository LICENSE.
API Reference
The full Rust API reference is generated by rustdoc and hosted on docs.rs:
→ docs.rs/graphrag-core
This covers the public surface of the core library — GraphRAG, Config, the extractor traits,
and every module. It is rebuilt automatically for each published release.
For the other crates:
graphrag— wrapper / hello-world meta-crategraphrag-core— core library
To browse the API for an unpublished local checkout, run:
cargo doc --workspace --no-deps --open
Troubleshooting
{{#include ../../../docs/TROUBLESHOOTING_OBJC_EXCEPTION.md}}
Changelog
All notable changes to GraphRAG-RS will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
Security
CI green: cargo-deny advisories/licenses + rustfmt (2026-05-31)
- Vulnerabilities patched via lockfile bumps:
rand0.8.5→0.8.6 and 0.9.2→0.9.4 (RUSTSEC-2026-0097 unsoundness),bytes1.10.1→1.11.1 (RUSTSEC-2026-0007 integer overflow),rustls-webpki0.103.7→0.103.13 (RUSTSEC-2026-0049/0098/0099/0104 — CRL + name-constraint vulns). All patch-level, non-breaking. deny.tomllicenses: addedBSL-1.0(Boost) andCDLA-Permissive-2.0(Mozilla CA bundle viawebpki-roots) to the allow-list — both permissive, were failing the licenses job.deny.tomladvisory ignores (unfixable here, documented inline): unmaintained transitive cratesproc-macro-error,bincode,json,number_prefix,paste,rustls-pemfile;lru0.12 unsoundness (RUSTSEC-2026-0002, pinned by ratatui 0.29, unreachable in our usage); andtimeDoS (RUSTSEC-2026-0009) — its fix (≥0.3.47) requires rustc 1.88, above our MSRV 1.85, sotimeis held at 0.3.44 and the advisory accepted (reachable only via untrusted RFC-2822 parsing in the server, not core/cli). Revisit when MSRV moves to ≥1.88.- Formatting: ran
cargo fmt --allover the workspace (71 files) to clear the long-standingrustfmtCI job. Mechanical, no behavior change. --all-featuresadvisory/license coverage: thecargo-deny-actiondefaults to--all-features, so CI also scans the optionallancedbtree (lance/datafusion/arrow). Patchedlz4_flex0.11.5→0.11.6 / 0.12.0→0.12.2 (RUSTSEC-2026-0041) andtar0.4.44→0.4.46 (RUSTSEC-2026-0067/0068); allowed0BSD(mock_instant). Added[graph] all-features = truetodeny.tomlso localcargo deny checksees the same graph as CI (prevents local≠CI drift).- CI SIGILL fix: set
RUSTFLAGS = "-C target-cpu=x86-64-v2"inci.ymlto override the repo’s.cargo/config.toml-C target-cpu=native. On GitHub’s heterogeneous runnersnativecan emit instructions the silicon traps (SIGILL crashing rustc/proc-macros, seen buildingollama-rs). Verified the rustc invocation: an emptyCARGO_BUILD_RUSTFLAGSis ignored and doesn’t override the config flag — only a non-emptyRUSTFLAGS(highest precedence) fully replaces it. Local dev keepstarget-cpu=native; CI uses the portablex86-64-v2baseline.
Added
Documentation site (2026-05-31)
- mdBook documentation site under
book/, deployed to GitHub Pages athttps://automataia.github.io/graphrag-rs/. Curated, English-only, user-facing TOC (book/src/SUMMARY.md) covering getting-started, concepts, configuration, features, and per-crate guides. Internal dev reports and Italian guides are intentionally excluded. - Chapters are thin
{{#include}}wrappers over the canonical sources (HOW_IT_WORKS.md, crate READMEs, curateddocs/*.md) so there is a single source of truth and no content drift. Front-door pages (introduction.md,getting-started/overview.md,quickstart.md) are authored. - Mermaid diagrams render via the
mdbook-mermaidpreprocessor; built-in client-side search enabled. - API reference links out to docs.rs/graphrag-core rather than
self-hosting
cargo doc. - New CI workflow
.github/workflows/docs.ymlbuilds the book (pinnedmdbook0.5.3 +mdbook-mermaid0.17.0 prebuilt binaries) and deploys viaactions/deploy-pages. The generatedbook/book/output is git-ignored. Manual one-time step: set repo Settings → Pages → Source = “GitHub Actions”. - README: added a docs-site badge.
- Translated to English the doc sources the site includes that still contained Italian:
docs/INCREMENTAL_UPDATES.md,docs/TUI_USAGE_GUIDE.md,docs/ENRICHMENT_USAGE_GUIDE.md,docs/SUMMARIZATION_CONFIG.md, thegraphrag-cli/README.mdconfig table notes, and the Italian entries in this CHANGELOG. Fixed stale repo URLs (anthropics/*→automataIA/graphrag-rs) in the translated guides. The public site is now English-only end to end. - Stripped decorative/pictographic emoji (the 📚🚀📖 family) from the doc sources the site
includes, fixing “tofu” boxes that appeared wherever the viewer’s font lacked an emoji glyph
(mdBook’s default theme has no emoji-font fallback — a generic missing-glyph issue, not a bug).
Preserved arrows (→), box-drawing/ASCII diagrams (━│▼█), and data symbols (✅❌★☆); converted
rating ⭐→★ to keep ratings rendering. Keycap-numbered headings (
1./2.) replaced the1️⃣style.
[0.2.0] - 2026-05-31
Fixed
arrowworkspace dep: addeddefault-features = falsetoarrow = "57"in the workspaceCargo.toml. Previously, thedefault-features = falsedirective ingraphrag-core/Cargo.tomlwas silently ignored by Cargo (build-time warning).documentationmetadata for thegraphragcrate: addeddocumentation = "https://docs.rs/graphrag"ingraphrag/Cargo.toml, aligning the wrapper crate withgraphrag-coreandgraphrag-cli.
Code/architecture/product quality audit (2026-05-30)
Added
- CI/CD: new workflow
.github/workflows/ci.yml. The repo previously had no CI automation. Blocking jobs:clippy --workspace --lib -D warnings(now green, see below),test -p graphrag-core --lib,cargo-deny. Thefmtjob is informational and non-blocking (continue-on-error) until the repo is madecargo fmt --allclean (pre-existing repo-wide formatting debt). - Security tooling:
deny.toml(advisories + permissive licenses + duplicate ban) andSECURITY.md(private disclosure policy via GitHub Security Advisories). - Drift-guard tests (
config/setconfig.rs):gliner_setconfig_default_matches_runtimeandautosave_setconfig_default_matches_runtimefail at build time if the serde leaf-struct defaults diverge from the canonical runtime ones, preventing “5-point-sync” drift.OllamaConfigis excluded on purpose (by-design divergence: offline-first runtime vs user-facing schema). - Crate metadata:
documentation(docs.rs) andreadmefields added tographrag-coreandgraphrag-clifor publishing on crates.io.
Documentation polish (2026-05-30)
graphrag/README.md: the wrapper meta-crate had no README (onlyCargo.tomlsrc). Added: explains that it re-exportsgraphrag-coreand provides thegraphragbinary, with a binary quick-start + library usage and links to the core/root README.
- Module
//!headers added to the 10graphrag-coremodules that lacked them (previously starting withuse/pub mod/#[cfg]or a///on the first submodule):config,graph,generation,critic,retrieval,summarization,vector,entity,text,query. Every module’s rustdoc page now shows a description. Doc-comments only, no behavior change;clippy -p graphrag-core -D warningsstays green andcargo docintroduces no new warnings.
PageRank: score normalization (dangling nodes) (2026-05-30)
- Bug fix:
scores_to_entity_mapin graph/pagerank.rs now L1-normalizes the scores (sum = 1.0). Dangling nodes (no outgoing edges) lost rank mass on every iteration, leaving the sum < 1.0. Single fix point → covers all paths (dense/parallel/sparse). Unblocks 3 previously-failing tests:test_pagerank_convergence,test_personalized_pagerank,test_precompute_global_pagerank(visible only under thepagerankfeature, activated by--workspacefeature-unification).
Swagger UI served at /swagger (2026-05-30)
- graphrag-server: the Swagger UI was announced but not served (“coming soon”).
Now exposed at
/swaggervia apistos’s native support (features = ["swagger-ui"], already enabled) —apistos-swagger-uibundles the official Swagger UI assets, so no new dependency. Changed.build("/openapi.json")→.build_with(..., BuildConfig::default().with(SwaggerUIConfig::new(&"/swagger")))in main.rs. README updated (removed “coming soon”).
Clean clippy on examples/tests + green doctests (2026-05-30)
- Clippy examples/tests:
cargo clippy --examples --tests -p graphrag-core -- -D warningsis now green. Bulk viacargo clippy --fix; manual tail:///!→//!(embeddings demo),.filter().next_back()→.rfind(),.clone()on a double ref →.iter().copied(), ignoredlet _ =onResult,std::slice::from_ref, removal of unused vars. - Doctest:
cargo test --doc -p graphrag-core→ 47 pass / 0 fail / 17 ignored. 7 illustrative, non-self-contained examples (require a live Ollama, an async runtime, or undefined setup variables —core::ChunkingStrategy,build_relationship_hierarchy, KV-cache Ollama,pipeline_executor, etc.) marked```ignore. The hero example still runs and is green. clippy --fixregression corrected:config/enhancements.rs:770—--fixhad removedmutfromlet count, seeing it as inactive under default features; restoredlet mut countwith#[allow(unused_mut)](thecount += 1s are behind#[cfg(feature = ...)]).
Stale examples/tests recompile (2026-05-30)
- Stale struct initializers: added the missing temporal/causal fields (all
None) to theEntityliterals (first_mentioned,last_mentioned,temporal_validity) andRelationshipliterals (embedding,temporal_type,temporal_range,causal_strength) in thellm_evaluation_demo,advanced_nlp_demo,hierarchical_graphrag_demo,workspace_demo,tom_sawyer_workspaceexamples. They had fallen behind the evolution ofEntity/Relationshipincore/mod.rs(Phase 1.2) and brokecargo build --examples. complete_zero_cost_graphrag_demo:Configliteral closed with..Default::default()(it was missingadvanced_features,gliner,suppress_progress_bars) and theEntityConfigliteral completed withuse_atomic_facts: false+max_fact_tokens: 400.- Per-feature gating (
graphrag-coreCargo.toml):hierarchical_graphrag_demonowrequired-features = ["leiden"](usesLeidenConfig/detect_hierarchical_communities,#[cfg(feature = "leiden")]) and theincremental_integrationtestrequired-features = ["incremental"](it importedgraphrag_core::incremental). So a defaultcargo build/test --workspacestays green without pulling in the optional features. Chat discussion.html: added the standardline-clamp:3property alongside-webkit-line-clamp(CSSvendorPrefixlinter).- Verification:
cargo build --examples --tests --workspace→ clean Finished;cargo test -p graphrag-core --lib→ 365 pass / 0 fail. The 3pageranktests that fail under--workspacefeature-unification are pre-existing (confirmed on a clean tree).
Changed
- Dependency dedup (anti-bloat): aligned two direct workspace dependencies
to versions already present transitively, eliminating duplicate versions
in
graphrag-cli’s-e normaltree:strum0.25 → 0.26 (matchesratatui 0.29) — removes duplicatestrum+strum_macros.itertools0.12 → 0.13 (matchesratatui/unicode-truncate).- Real duplicates in
graphrag-cli’s normal tree dropped from 34 to 26. Verified thatgraphrag-core(the published crate) has only 4 unavoidable transitive duplicates (getrandom0.2/0.3,webpki-roots0.26/1.0, TLS stack).rand0.8→0.9 NOT done (API-breaking, only deduplicated the unpublished server binary).
Fixed
- CLI crash at startup on all non-TUI subcommands (
index,ask,bench,setup,validate, …):color_eyre::install()was called twice — ingraphrag-cli/src/main.rs:10and again insiderun()atlib.rs:197— and the second install aborted with “could not set the providedThemeglobally as another was already set”. Removed the duplicateinstall()frommain.rs; now both binaries (graphrag-cliand thegraphragmeta-crate, which doesn’t install on its own) install exactly once viarun(). Caught by running the e2e benchmarks (bench). - MSRV corrected and verified:
rust-versionchanged from1.75(false, never tested) to1.85. The real floor is imposed by the direct dependencyjsonfixer, which usesedition = "2024"(requires rustc ≥ 1.85). Build-verified on the 1.85 toolchain forgraphrag-coreandgraphrag-cli. NewmsrvCI job that builds on 1.85. Analysis method: floor fromcargo metadata(maxrust_versiondeclared among the normal deps) + build verification on a single toolchain (no costly bisect). - Lint debt zeroed (green workspace clippy): resolved 38 pre-existing clippy
errors that surfaced under
cargo clippy --workspace --lib -- -D warnings(Rust 1.95). Diagnosis:graphrag-corein isolation (default features) was already clean; the errors were in core’s optional modules (incremental,rograg,lightrag,embeddings/ollama) activated by the cli/server features + 3 errors ofgraphrag-cli’s own. Idiomatic fixes (to_vec(),iter_mut().enumerate(),if let Some,sort_by_key(Reverse(..)), type aliasesNodeDeltaResult/EdgeDeltaResult) and targeted, commented#[allow]s where a rename would break the serde API (PendingUpdateType) or for a private 10-argument helper. Not an interface break: the crates compile and link correctly. - GLiNER default drift:
default_gliner_entity_labels/default_gliner_relation_labelsinconfig/setconfig.rswere misaligned with the runtimeGlinerConfig::default()(missing"concept"and"causes"). Now aligned with the canonical default (4 entity + 3 relation labels). Not observable in the existing e2e configs (they set the labels explicitly); relevant only when GLiNER is enabled via TOML while omitting the labels.
Documentation
- Markdown doc consolidation (few but useful): reduced the ~55 tracked
.mdfiles to a keystone set. Deleted 39 files among process artifacts (report.md,TODO.md,*_COMPLETE.md,*_SUMMARY.md,*_STATUS.md,MERGE_COMPLETE.md,IMPLEMENTATION_SUMMARY.md) and satellite integration guides now covered by the keystones (graphrag-core/{ADVANCED_FEATURES,OLLAMA_INTEGRATION,LEIDEN_INTEGRATION,LIGHTRAG_INTEGRATION, HIPPORAG_INTEGRATION,CROSS_ENCODER_INTEGRATION,ENTITY_EXTRACTION,EMBEDDINGS_CONFIG, PIPELINE_ARCHITECTURE,QUICKSTART,ENRICHMENT_IMPLEMENTATION,WORKSPACE_PERSISTENCE_SUMMARY}.md, thesrc/{embeddings/README,graph/TRAVERSAL_GUIDE}.md, the entire series of non-READMEgraphrag-wasm/*.mdguides,examples/MULTI_DOCUMENT_PIPELINE.md). The surviving keystones:README.md,HOW_IT_WORKS.md,CHANGELOG.md, the 4 crate READMEs,config/JSON5_CONFIG_GUIDE.md. Thedocs/folder is git-ignored (local notes) and is not touched. - Keystone staleness fixes: MSRV badge/prerequisites
1.70→1.85in the root README; removed references to the deletedgraphrag-leptoscrate (workspace layout now 5-crate- the
graphragmeta-crate, dependency graph updated); “Web UI” section rewritten around the chat-shell.HOW_IT_WORKS.md: the WASM section now points tographrag-wasm(no longer to the deletedgraphrag-leptos).
- the
- graphrag-wasm README rewritten: the old 5-tab DaisyUI UI is replaced by the documentation of the 3-column Nordic-Minimal chat-shell (LeftRail/Stage/RightRail), off-main-thread inference, citations, IndexedDB persistence; removed the dead links to the deleted satellite guides.
- Internal links repointed: all links to the deleted docs (in
README.md,HOW_IT_WORKS.md,graphrag-core/README.md) now point toHOW_IT_WORKS.md,config/JSON5_CONFIG_GUIDE.md,CHANGELOG.md, ordocs.rs/graphrag-core.
Removed
- Dead code: removed
graphrag-server/src/main_axum_old.rs(~31KB, orphan file with no references, neither a bin-target nor a module). - Unused dependency: removed
text_analysis = "0.3"fromgraphrag-coreand from[workspace.dependencies](detected withcargo machete, verified: no use in the code — the only match was the string"context_analysis"). The othercargo machetereports (getrandom,gline-rs,js-sys,web-sys,tower,text-splitter) are verified false positives (wasm/api feature-enablers or crates whose lib name differs from the package name, likegline-rs→gliner) and kept.
Changed
graphrag-wasm chat-shell rewrite (Nordic-Minimal) (2026-05-17)
- BREAKING: the 5-tab daisyUI UI (
Build / Explore / Query / Hierarchy / Settings) is replaced by a single 3-column chat shell that mirrors theChat discussion.htmlNordic-Minimal mockup verbatim (palette, font stackNewsreader / Geist / Geist Mono, class names, citation/hover wiring).- New layout in graphrag-wasm/src/main.rs:
LeftRail(brand + sources + Flat/Hierarchy toggle + Build button),Stage(head with active source, thread ofTurns, composer),RightRail(subgraph SVG + pipeline rows + ministats + references). All real data:documentscome from the existing IndexedDB signal, pipeline progress is driven by the existingBuildStatus/BuildStage, embeddings come from ONNX Runtime Web +tokenizer.json, retrieval fromVectorIndex::search, answers from WebLLM (Phi-3-minifor synthesis, Qwen for extraction), citations are post-processed viaparse_answer_with_citesand link to<button class="cite">↔<div class="ref-card">through the reactiveactive_ref: Option<u32>signal — no inline JS. - New module graphrag-wasm/src/components/chat_shell.rs
holds the data types (
ChatTurn,RefCard,AnswerSegment,SubgraphData), the citation parser and the per-querybuild_subgraphbuilder that unions entities from the top-K retrieved chunks and feeds them throughcomponents::force_layout::ForceLayout(320×240 viewBox, 16-node / 21-edge cap matching the mockup density label). - Styling: graphrag-wasm/tailwind.css is now a
flat Nordic-Minimal stylesheet (no
@tailwinddirectives, no daisyUI); graphrag-wasm/index.html drops lucide CDN + MutationObserver and adds the Google-fonts preconnect block. leptos-lucide-rsdependency removed from graphrag-wasm/Cargo.toml.- Legacy daisyUI components (
components/{settings,hierarchy,ui_components,chat_component}.rs) remain on disk for reference but are no longer compiled —components/mod.rsonly exportschat_shell+force_layout. - Parity test: graphrag-wasm/tests/playwright/chat_layout.sh
drives
playwright-cli: opens the mockup overpython3 -m http.serverand the WASM SPA ontrunk serve, captures 1440×900 screenshots (tests/playwright/artifacts/{mockup,wasm}.png) and asserts 19 shared selectors (.app,.rail-left .doc-item,.stage-title,.bubble-q,.cite,.stages .pls,.graph-frame svg,.ref-card,.composer input, …). Current status: 19/19 pass.
- New layout in graphrag-wasm/src/main.rs:
Added
2026 best-practices pass (graphrag-core ↔ graphrag-wasm) (2026-05-16)
-
Off-main-thread inference (Stage 3b) for graphrag-wasm.
- WebLLM:
WebLLM::newandWebLLM::new_with_progressin graphrag-wasm/src/webllm.rs now auto-detect a pre-spawnedwindow.webllmWorkerand switch toCreateWebWorkerMLCEngine, keeping the samechat.completions.createsurface (andchat_stream’s async-iterator) intact. Falls back to the main-thread engine if worker spawn fails. New sidecar graphrag-wasm/webllm-worker.js hostsWebWorkerMLCEngineHandler(15 LOC). - ONNX Runtime Web:
ort.env.wasm.proxy = true+numThreads = 1set immediately afterort.min.jsloads in graphrag-wasm/index.html, so allInferenceSession.runcalls execute in ORT’s dedicated worker. - Trade-off vs the plan’s gloo-worker route: no second wasm bundle, no Rust worker scaffolding, ~30 LOC swap. Verification (“main-thread blocked < 50 ms during inference”) met via the runtimes’ built-in workers.
- WebLLM:
-
Token-streaming UX in graphrag-wasm QueryTab. Replaced the blocking
WebLLM::chat(...)call at graphrag-wasm/src/main.rs:1604 withchat_stream(...): tokens are now appended to the results signal incrementally as they arrive from the model, matching 2026 in-browser-LLM UX guidance. The pre-existing streaming API in graphrag-wasm/src/webllm.rs:334 was previously unused. -
IndexedDB persistence for the document set. New graphrag-wasm/src/persist.rs wraps
IndexedDBStorewithopen_store,save_document,delete_document,load_all_documents. TheAppcomponent restores documents on first load; manual input, file upload, Symposium-demo load, and document-remove handlers all persist their mutations. Reloading the page now preserves the document set instead of resetting to empty. -
WAI-ARIA tabs pattern in graphrag-wasm. All 5 tab panels are now mounted permanently inside a
<main id="main-content">landmark withhidden=move || active_tab.get() != Tab::X. Each tab button gained anid(tab-build,tab-explore, etc.) matching the panel’saria-labelledby. This fixes Lighthousearia-valid-attr-valueandlandmark-one-mainaudits, and preserves component state across tab switches. -
SEO: added
<meta name="description">and<link rel="canonical">plus<meta name="color-scheme" content="dark light">to graphrag-wasm/index.html. External links in the footer gainedrel="noopener noreferrer". -
Downloaded MiniLM-L6-v2 ONNX model (87MB) to
graphrag-wasm/models/minilm-l6.onnxfor semantic query embeddings. Previously the directory was empty, causing fallback to hash-based embeddings which produced no meaningful search results.
Removed
Broken orphan example crates deleted (2026-05-16)
examples/web-app/andexamples/graphrag-leptos-demo/both depended on the deletedgraphrag-leptoscrate (merged intographrag-wasmin March 2025). They were excluded from the workspace so they did not block builds, but were misleading for newcomers. Functionality is fully covered bygraphrag-wasmitself.- Dropped
exclude = ["examples/web-app"]from rootCargo.toml.
graphrag_py Python bindings crate deleted (2026-05-16)
- Removed
graphrag_py/directory and workspace member entry in rootCargo.toml. - Reason: legacy crate, pyo3 0.21 (out-of-date), last touched 4 commits ago before the
KV-cache / GLiNER / contextual-enricher / persistence wave. API frozen pre-feb-2026,
never published (
publish = false),Development Status :: 4 - Beta. - BREAKING: Python bindings no longer build from this repo. Future Python support should live in a separate repo with current pyo3.
Changed
Clippy gate restored on wasm32-unknown-unknown target (2026-05-16)
cargo clippy --lib -p graphrag-core --no-default-features --features "wasm-bundle" --target wasm32-unknown-unknown -- -D warnings went from 54 errors → 0. Native
default-features pass also restored to 0 errors. Both targets and the 363 native
lib tests now pass cleanly under the PostToolUse clippy hook.
- Mechanical lints auto-applied:
sort_by_key(5×),clamp(5×),unwrap_or_default,is_some_and,manual_abs_diff,manual_pattern_char_comparison,collapsible_match,let_and_return,derivable_impls,field_reassign_with_default,needless_return. - Type aliases for boxed
Fnbenchmark callbacks in graphrag-core/src/monitoring/benchmark.rs:208-214:RetrievalFn,RerankerFn,LlmFn. Eliminates 3×type_complexitywarnings. HierarchicalLeidenResulttype alias in graphrag-core/src/graph/leiden.rs:17 factored out theResult<(HashMap<.., HashMap<..>>, HashMap<..>)>return type ofhierarchical_leiden.- Feature-gated dead-code under wasm: helper methods in
gleaning_extractor.rs,llm_extractor.rs,chunking_strategies.rs,contextual_enricher.rs,late_chunking.rsare now#[cfg(feature = "async")]. Fieldsollama_client(atomic_fact_extractor, llm_extractor),prompt_builder(llm_extractor),client(contextual_enricher),llm_extractor(gleaning_extractor),critic(graphrag/mod),api_key(late_chunking), andboundary_detector/coherence_scorer/min_chunk_chars(chunking_strategies) carry#[cfg_attr(not(feature = "async"), allow(dead_code))]. Five modules carry#![cfg_attr(not(feature = "async"), allow(unused_imports))]to silence imports that become dead when the async build_graph path is gone. - Restored imports lost during refactor:
TextChunk,GraphRAGError,Document,HashMap,HashSet,Result,OllamaGenerationParamsre-added to atomic_fact_extractor.rs, gleaning_extractor.rs, llm_extractor.rs, contextual_enricher.rs, late_chunking.rs. Underscored-but-still-used variables (_e→ log-formatter args,_original_score,_total_chunks) rewritten to be self-consistent.
Fixed
WASM compilation broken after graphrag-core refactor (2026-05-16)
graphrag-core failed to compile for wasm32-unknown-unknown (65 errors → 0). The WASM
build uses default-features = false (excludes async, tracing, tokio, parallel-processing),
but many code paths used tracing:: calls and tokio without feature gates.
- Added
#[cfg(feature = "tracing")]gates to ~80tracing::calls across 15 files. - Gated
tokio::runtime::RuntimeinBoundaryAwareChunkingStrategy::chunk()behind#[cfg(feature = "async")]with sync fallback. - Split
RetrievalSystem::batch_query()into#[cfg(feature = "parallel-processing")]and#[cfg(not(feature = "parallel-processing"))variants. - Fixed sync
ask()(#[cfg(not(feature = "async"))) to callretrieval.query()instead of asyncquery_internal(). - Added
#![recursion_limit = "512"]tographrag-wasmmain.rs for Leptos type depth. - Created missing
graphrag-wasm/models/directory required by Trunk.
Missing Relationship fields in sync build_graph() (2026-05-16)
graphrag-core/src/graphrag/build.rs:690:
Relationship struct literal was missing embedding, temporal_type, temporal_range,
and causal_strength fields added in Phase 1.2 (Advanced GraphRAG). Added all four
with None defaults so the sync build path compiles without partial-init errors.
rograg::validator dropped quality metrics (2026-05-16)
graphrag-core/src/rograg/validator.rs:376:
validate_response was computing coherence_score, relevance_score,
factual_consistency_score, completeness_score, readability_score, and
source_credibility_score then throwing them away (7 unused_variable /
unused_assignments warnings). Now they:
- Fold into
validated_response.confidencevia a newoverall_quality()helper (mean of the metrics that were actually run — coherence / relevance / factual consistency are gated on their respective config flags; completeness / readability / source credibility always count). - Trigger a
MediumIssueType::Qualityvalidation issue when overall quality falls under 0.5. - Are emitted as a structured
tracing::debug!event so the metrics are observable in logs without a public API change.
Changed
Server crate: color-eyre pretty errors at startup (2026-05-16)
graphrag-server/src/main.rs:main()return typestd::io::Result<()>→color_eyre::Result<()>, withcolor_eyre::install()at top.- Adds
color-eyre = "0.6"tographrag-server/Cargo.toml. - mimalloc allocator was already wired (no change).
- Production unwraps in server crate audited: all 16 remaining unwraps are inside
#[cfg(test)]blocks (qdrant_store, auth, embeddings, config_handler, etc.). Production paths use.map_err(...)?/.ok_or_else(...)?— already clean. Part of refactor-2026-05 server slice.
Documentation
Stale memory + CLAUDE.md notes refreshed (2026-05-16)
- CLAUDE.md workspace layout: 6-crate → 5-crate (graphrag_py removed).
- CLAUDE.md “Known gotchas”: replaced obsolete “12 failing unit tests” claim with
verified status:
cargo test -p graphrag-core --lib→ 363 pass / 0 fail. The remainingcargo test --workspacefailures come from stale examples (not tests) undergraphrag-core/examples/with missingEntity/Relationshipfields; left untouched per project policy. - MEMORY.md (auto-memory) synced to the same wording.
Removed
Test suite aggressive pruning (2026-05-16)
User-requested clean-up: keep only indispensable, up-to-date tests; delete broken pre-existing failures, hanging tests, stale pre-refactor integration tests, and trivial construction-only sanity tests.
- 23 broken / hanging / failing unit tests deleted:
async_graphrag::tests::*(6 tests on dead module)entity::*::test_normalize_name(2 stale assertions)entity::llm_relationship_extractor::test_fallback_extractionreranking::cross_encoder::test_rerank_basic+test_confidence_filtering(need ONNX)retrieval::symbolic_anchoring::test_extract_anchors(stale)text::boundary_detection::test_sentence_detection+test_combined_detectiongraph::incremental::tests::test_basic_entity_upsert+ 6 ProductionGraphStore tests (deadlock in async lock contention — hung indefinitely)rograg::logic_form::tests::test_pattern_parser+test_logic_form_retrievalrograg::intent_classifier::tests::test_{factual,relational,temporal,causal,comparative,summary,definitional}_intent(7 stale assertions on intent classification)rograg::quality_metrics::test_performance_stats_updaterograg::streaming::test_template_selectionincremental::lazy_propagation::test_lazy_propagation_basicincremental::delta_computation::test_parallel_computation
- 10 stale workspace-level integration test files deleted (
./tests/*.rs, all pre-2026, predate the KV cache / GLiNER / persistence / file-split refactors):caching_integration.rs,config_integration_test.rs,http_endpoint_tests.rs,hybrid_retrieval_tests.rs,integration_tests.rs,modular_integration_tests.rs,property_tests.rs+.proptest-regressions,server_integration_tests.rs,zero_cost_approaches_integration_tests.rs,tests/parallel/. Plusgraphrag-core/tests/ollama_enhancements.rs(didn’t compile — missingcontextfield onOllamaGenerationParams). - 15 trivial
test_*_creationpatterns deleted (single-line constructions verifying onlyX::new().is_ok()):test_tree_creation,test_async_mock_llm_creation,test_incremental_pagerank_creation,test_processor_creation,test_agent_creation,test_function_caller_creation,test_cache_warmer_creation,test_retrieval_system_creation,test_enhanced_registry_creation,test_mock_llm_creation,test_answer_generator_creation,test_graphrag_creation,test_graph_indexer_creation,test_lancedb_creation,test_cached_client_creation. Plus 2 trivial Ollama adapter creation tests (entire test module incore/ollama_adapters.rsremoved). - Tests retained: 7 integration test files in
graphrag-core/tests/(the 2026-02 refactor-era tests exercising KV cache, contextual enricher, GLiNER features, triple validation, dynamic weighting, BAR-RAG, text pipeline fixtures, incremental graph updates)../tests/e2e/benchmark scripts kept. - Verification matrix — all 100% green:
cargo test -p graphrag-core --lib→ 363 passed, 0 failed (was 371/12 fail)cargo test -p graphrag-core --lib --features rograg→ 402 passed, 0 failedcargo test -p graphrag-core --lib --features incremental→ 390 passed, 0 failed
Fixed
Workspace-wide production unwrap() sweep (2026-05-16) — Part of refactor-2026-05 Phase 3 (extended)
- Going beyond the original Phase 3 scope (
voy_store,rograg/streaming,rograg/processor,cli/config,qdrant_store— all already verified test-only or previously cleaned), every remaining production.unwrap()in the workspace has been replaced with the appropriate safe alternative. - Mechanical sweeps by category:
- 36
partial_cmp(...).unwrap()(f32 sort comparators, NaN-panic-prone) across ~23 files (async_graphrag,inference,retrieval/*,graph/*,summarization,vector,monitoring,nlp,generation, server handlers, etc.) →.unwrap_or(std::cmp::Ordering::Equal). - 22
lock()/read()/write().unwrap()(Mutex/RwLock acquisitions, poisoned-lock-panic-prone) →.expect("lock poisoned")/.expect("rwlock poisoned"). - 12
Regex::new(...).unwrap()(static regex literals) →.expect("static regex literal"). duration_since(UNIX_EPOCH).unwrap()(system clock) →.expect("system clock before UNIX epoch").- Iterator and Option terminators (
.first(),.last(),.next(),.min(),.max(),.pop(),.as_ref(),.as_mut(),.chars().next()) after checked-precondition usages →.expect(<reason>). - Targeted contextual fixes for
result_map.remove,get_mutaftercontains_key,Self::new()inDefault::default,NonZeroUsize::newon literal,caps.get(N),strip_prefix(...)afterstarts_with, etc.
- 36
- Test-only infrastructure files (
core/test_traits.rs,core/test_utils.rs) intentionally left untouched — their.unwrap()calls represent test-helper panic semantics by design (suite is called from test functions only). - Net result: workspace audit reports 0 production
.unwrap()calls outside test infrastructure (down from ~178 pre-existing). All builds green:graphrag-coredefault +--features rograg+--features incremental, plusgraphrag-cli,graphrag-server,graphragwrapper.
Changed
Module split: retrieval/types.rs extracted (2026-05-16) — Part of refactor-2026-05 Phase 4 (final)
- Extracted
RetrievalConfig,SearchResult,ResultType,QueryAnalysis,QueryType,QueryIntent,QueryAnalysisResult,QueryResult,RetrievalStatistics(+ itsprintimpl) fromgraphrag-core/src/retrieval/mod.rsinto the new private modulegraphrag-core/src/retrieval/types.rs(199 LOC). retrieval/mod.rsshrinks 1851 → 1666 LOC; the public API is preserved viapub use types::*;socrate::retrieval::SearchResultetc. resolve unchanged.- Restored one stripped doc comment (
/// Statistics about the retrieval system) onRetrievalStatisticsto satisfy#![warn(missing_docs)]— the sed extraction had eaten the line during slicing. - This was the last remaining Phase 4 item from the plan. Build + clippy clean
(per the
feedback-verify-with-build-clippypolicy).
Sub-split: graphrag/ directory module (2026-05-16) — Part of refactor-2026-05 Phase 4
- Follow-up to the earlier
graphrag.rssingle-file move. The 1753-LOCgraphrag-core/src/graphrag.rsis now a directory modulegraphrag-core/src/graphrag/with per-concern sub-files:mod.rs(~105 LOC): structGraphRAG, sub-module declarations, privateensure_initializedhelper (bumpedfn→pub(super) fnso the siblingimplblocks can call it),#[cfg(test)] mod testsblock with the two pre-existing tests.lifecycle.rs(~189 LOC):new,default_local,builder,initialize,try_load_from_workspace,save_to_workspace,clear_graph.documents.rs(~53 LOC):add_document_from_text,add_document.build.rs(~715 LOC): async + syncbuild_graphpaired methods.ask.rs(~519 LOC, renamed fromquery.rsto avoid clash withuse crate::queryfor the planner module):ask,ask_with_reasoning,ask_explained,query_internal,query_internal_with_results,generate_semantic_answer_from_results,remove_thinking_tags,ask_with_pagerankpair.stats.rs(~85 LOC):config,is_initialized,has_documents,has_graph,knowledge_graph,knowledge_graph_mut,get_entity,get_entity_relationships,get_chunk.factory.rs(~202 LOC):from_json5_file,from_config_file,from_config_and_document,quick_start,quick_start_with_config.
- Each sub-file has its own
impl GraphRAG { ... }block; Rust allows multiple impl blocks across files. All sub-files share an identical kitchen-sink import header (Config, core types,critic,ollama,persistence,query,retrieval, feature-gatedparallel, plususe super::GraphRAG). - Public API preserved:
graphrag_core::GraphRAGresolves vialib.rs’spub use graphrag::GraphRAG;(unchanged from the single-file pass). - Verified per the new policy:
cargo build -p graphrag-core+ downstream crates green;cargo clippy -p graphrag-core -- -D warningsshows exactly one error in the new files (graphrag/ask.rs:408clamp pattern) which is a verbatim carry-over from the previousgraphrag.rs:1358(originallylib.rs:1594) — net new errors: zero. Tests not re-run (pure file move; seefeedback-verify-with-build-clippymemory entry).
God-file split: graph/incremental/ directory module (2026-05-16) — Part of refactor-2026-05 Phase 4
- Converted
graphrag-core/src/graph/incremental.rs(2905 LOC — the biggest god-file in the crate) into a directory modulegraphrag-core/src/graph/incremental/with focused sub-files:mod.rs(~395 LOC): doc + sub-module declarations +pub usere-exports + verbatim#[cfg(test)] mod testsblock + the kitchen-sinkuseimport block the tests rely on viasuper::*.types.rs(~465 LOC):UpdateId,TransactionId,ChangeRecord,ChangeType,Operation,ChangeData,Document,GraphDelta,DeltaStatus,RollbackData,ConflictStrategy,Conflict,ConflictType,ConflictResolution, theIncrementalGraphStoretrait,GraphStatistics,ConsistencyReport,InvalidationStrategy,CacheRegion.helpers.rs(~496 LOC):SelectiveInvalidation,ConflictResolver,UpdateMonitor+ impls + their satellite types (InvalidationStats,UpdateMetric,OperationLog,PerformanceStats).manager.rs(~898 LOC):IncrementalGraphManager(both feature-gated and non-gated paired definitions kept adjacent),IncrementalConfig,IncrementalStatistics,IncrementalPageRank,BatchProcessor,PendingBatch,BatchMetrics, plus theimpl GraphRAGErrorconvenience constructors that conceptually belong here.store.rs(~743 LOC):ProductionGraphStore+Transaction+TransactionStatusIsolationLevel+ChangeEvent+ChangeEventType+impl IncrementalGraphStore for ProductionGraphStore+ChangeDataExttrait & impl.
- Public API preserved via
pub usecascade inmod.rs(crate::graph::incremental::*resolves unchanged). - Visibility-only bumps to keep the shared test module compiling across the new
sub-module boundary:
IncrementalPageRank.scores:field→pub(super) fieldConflictResolver.strategy:field→pub(super) fieldConflictResolver::merge_entities:fn→pub(super) fn
- Verification strategy update (per user request): switched from
cargo test --features incremental(which surfaces many pre-existing unrelated failures and obscures the signal we care about) tocargo build --features incremental+cargo clippy --features incremental -- -D warnings. The clippy run reports 34 errors, all in pre-existing files outside the split (graphrag.rs,retrieval/,text/,monitoring/, etc.); zero new errors ingraph/incremental/. Downstream crates (graphrag-cli,graphrag-server,graphrag) build clean.
Module split: config/json_parser.rs extracted (2026-05-16) — Part of refactor-2026-05 Phase 4
- Extracted
Config::from_file(~553 LOC hand-rolled JSON reader using thejsoncrate) andConfig::to_file(~200 LOC writer) fromgraphrag-core/src/config/mod.rsinto the new private modulegraphrag-core/src/config/json_parser.rs(769 LOC, with imports +impl Config { ... }wrapper). config/mod.rsshrinks 2491 → 1737 LOC. Public API unchanged: both methods are still reachable asConfig::from_file/Config::to_filevia the newimpl Configblock (multiple impl blocks across files compile fine).- Distinct from
config::json5_loader(serde-based typed JSON5 loader) andconfig::loader(multi-format dispatcher) — this is the bespokejsoncrate path. - 371 unit tests pass; 12 pre-existing failures unchanged.
God-file split: rograg/logic_form/ directory module (2026-05-16) — Part of refactor-2026-05 Phase 4
- Converted
graphrag-core/src/rograg/logic_form.rs(1517 LOC) into a directory modulegraphrag-core/src/rograg/logic_form/with focused sub-files:mod.rs(141 LOC): doc + sub-module declarations +pub usere-exports + verbatim#[cfg(test)] mod testsblock.types.rs(333 LOC):LogicFormError,LogicFormQuery,Predicate,Argument,ArgumentType,Constraint,ConstraintType,LogicQueryType,LogicFormResult,VariableBinding,LogicExecutionStats.parser.rs(240 LOC):LogicFormParsertrait +PatternBasedParser+LogicPattern+ArgumentExtractor+ impls.executor.rs(673 LOC):LogicFormExecutor+ impls.retriever.rs(217 LOC):LogicFormRetrieverstruct +Default+ impl.
- Public API preserved via
pub usecascade through bothlogic_form/mod.rsandrograg/mod.rs(crate::rograg::LogicFormResult,crate::rograg::LogicFormRetriever, etc. still resolve unchanged). - Single non-mechanical change: bumped
LogicFormExecutor::calculate_name_similarityfrom privatefntopub(super) fn— the existingtest_name_similaritytest in the sharedtestsmodule needs cross-submodule access. Visibility-only adjustment; no behavior or signature change. - Pre-existing test failures (
test_logic_form_retrieval,test_pattern_parser) remain unchanged (verified by re-running them onmainbefore the split).
God-file split: graphrag-core/src/graphrag.rs (2026-05-16) — Part of refactor-2026-05 Phase 4
- Extracted the
pub struct GraphRAGand its singleimpl GraphRAG { ... }block (constructors, lifecycle, build_graph, ask*, query_internal*, generate_semantic_answer_from_results, remove_thinking_tags, getters, factory methods, ensure_initialized, tests) fromgraphrag-core/src/lib.rsinto the new private module filegraphrag-core/src/graphrag.rs. lib.rsis now a 263-LOC re-export shell (mod graphrag; pub use graphrag::GraphRAG;).graphrag.rsis 1753 LOC (header + verbatim impl + moved#[cfg(test)] mod tests).- Public API is preserved:
graphrag_core::GraphRAGandgraphrag_core::prelude::GraphRAGresolve through the new re-export with identical paths. - Added module-scoped imports at the top of
graphrag.rs(Config, core types,critic,ollama,persistence,query,retrieval, feature-gatedparallel) so the impl body compiles verbatim without inline path changes. - Both moved tests (
test_graphrag_creation,test_builder_pattern) still pass. All other pre-existing test/doc failures remain unchanged (12 unit tests, 7 doctests). - Sub-splitting the impl across
graphrag/{lifecycle,documents,build,query,stats}.rsremains deferred to a follow-up — single-file move first per plan.
Module split: retrieval/explained.rs (2026-05-16) — Part of refactor-2026-05 Phase 4
- Extracted
ExplainedAnswer,SourceReference,SourceType,ReasoningStep(and the ~160 LOCimpl ExplainedAnswerblock withfrom_results+format_display) fromgraphrag-core/src/retrieval/mod.rsinto newgraphrag-core/src/retrieval/explained.rs. - Public API preserved via
pub use explained::*inretrieval/mod.rs— downstream callers see no change. - Net effect:
retrieval/mod.rsshrinks from 2094 LOC → 1851 LOC; newexplained.rsis 250 LOC. - Replaced legacy
.min(1.0).max(0.0)with idiomatic.clamp(0.0, 1.0)in the movedfrom_resultsfn (clippymanual_clamp). - Larger god-file splits (lib.rs 1968 LOC, logic_form.rs 1517, incremental.rs 2905, config/mod.rs JSON loader) remain deferred — see plan file.
Fixed
Production unwrap removal (2026-05-16) — Part of refactor-2026-05 Phase 3
rograg/streaming.rs: regexunwrap()→expect("static regex literal"); threepartial_cmp(...).unwrap()calls on f32 confidence scores now useunwrap_or(Ordering::Equal)to avoid panics on NaN.rograg/processor.rs::RogragProcessorBuilder::build: replaced inner.unwrap()onHybridQueryDecomposer::new()andIntentClassifier::new()with?propagation;SystemTime::duration_since(UNIX_EPOCH).unwrap()→.expect("system clock before UNIX epoch")(genuine programmer-bug case).graphrag-server/src/qdrant_store.rs: removed 6 production.unwrap()calls inadd_document,add_documents_batch, andsearch— payload.as_object(),serde_json::to_value,serde_json::from_value, andpoint.idnow propagateQdrantErrorvia?andResult::collect.- Tests-only
unwrap()invector/voy_store.rsandgraphrag-cli/src/config.rsleft intact (per Phase 3 scope: production paths only).
Added - GLiNER-Relex Extraction via gline-rs (2026-02-23)
GLiNER-Relex Entity + Relation Extractor (entity/gliner_extractor.rs, config/mod.rs, config/setconfig.rs, lib.rs)
- New
GLiNERExtractor: joint entity + relation extraction in a single forward pass viagline-rsv1.0.1 + ONNX Runtime. ~1.5 GB VRAM vs 8+ GB for generative LLMs; zero structural hallucinations. - Two-stage pipeline: NER (SpanPipeline or TokenPipeline) → RE (RelationPipeline), both
composed on the same
orp::model::Modelwith lazy loading viaArc<RwLock<Option<Model>>>. - Confidence scores propagated natively into
Entity.confidenceandRelationship.confidence. - Optional feature flag
gliner: crate compiles and works normally without it. tokio::task::spawn_blockingwrapper inlib.rskeeps the async runtime unblocked.- Config example (JSON5):
gliner: { enabled: true, model_path: "./models/gliner-relex-large-v0.5.onnx", entity_labels: ["person", "organization", "location"], relation_labels: ["controls", "located in", "causes"], entity_threshold: 0.40, relation_threshold: 0.50, mode: "span", // or "token" for gliner-multitask use_gpu: false, }
Added - Graph Persistence / Storage Choice (2026-02-23)
Storage Backend — In-Memory vs Disk (config/mod.rs, config/setconfig.rs, lib.rs)
AutoSaveConfig(andAutoSaveSetConfigin SetConfig) now expose:base_dir: Option<String>— directory where workspace folders are stored (e.g."./output")workspace_name: Option<String>— sub-folder insidebase_dir(default:"default")enabled: bool—false(default) = in-memory only;true= persist to disk
GraphRAG::initialize()now callstry_load_from_workspace(): ifauto_save.enabled = trueand the workspace already exists on disk, the graph is loaded from disk instead of starting empty. The second run reuses the previously built graph automatically.GraphRAG::save_to_workspace()— new public method; also called automatically at the end ofbuild_graph()when persistence is enabled.- No-op when
enabled = false; zero performance cost for in-memory-only deployments. - Format hierarchy on disk: Parquet (if
persistent-storagefeature) → JSON fallback (always). - JSON5 config usage:
auto_save: { enabled: true, base_dir: "./output", workspace_name: "my_project", }
Fixed - Extraction Temperature (2026-02-23)
Zero-Temperature Entity Extraction (entity/gleaning_extractor.rs, entity/llm_extractor.rs, config/setconfig.rs)
GleaningConfig::default()andLLMEntityExtractor::new()now usetemperature: 0.0(was0.1)- Fully deterministic JSON output — eliminates spurious token variation that causes parse failures
- Consistent with recommendations for structured extraction models (NuExtract, Triplex, etc.)
EntityExtractionConfig.temperaturein SetConfig now defaults viadefault_extraction_temperature() = 0.0- Separate from
default_temperature() = 0.1used for general LLM parameters - Users can override in JSON5:
entity_extraction.temperature = 0.0
- Separate from
ContextualEnricherretains0.1(generates natural language descriptions, not strict JSON)
Fixed & Improved - Entity Extraction, Query Quality & Sources (2026-02-23)
SetConfig use_gleaning Bug Fix (config/setconfig.rs)
- Bug: when
mode.approach = "semantic"with nosemantic:sub-section, theelseblock hardcodedconfig.entities.use_gleaning = trueregardless of the top-levelentity_extraction.use_gleaningfield - Fix: the
elseblock now reads fromself.entity_extraction.use_gleaningandmax_gleaning_roundsdirectly - This affected ALL JSON5 configs using
mode.approach = "semantic"without an explicitsemantic:block
LLM Single-Pass Entity Extraction (lib.rs, entity/llm_extractor.rs, ollama/mod.rs)
- New LLM single-pass path in
lib.rs:ollama.enabled && !use_gleaningnow usesLLMEntityExtractorinstead of falling through to pattern-based regex extraction - Dynamic
num_ctxper chunk:(prompt_tokens + max_output_tokens) × 1.20, rounded to 1024, clamped[4096, 131072]— mirrors theContextualEnricherformula LLMEntityExtractornow carrieskeep_alive: Option<String>andwith_keep_alive()buildercall_llm_with_retryandcall_llm_completion_checkusegenerate_with_paramsinstead ofgenerate()to passnum_ctxandkeep_alive— activates Ollama KV cache during entity extractionGleaningEntityExtractor::newextractskeep_alivebefore consuming the client and threads it throughOllamaClient::config()getter added for field access without moving- Result on Symposium (274 chunks, mistral-nemo, no gleaning): 1,139 entities, 670 relationships (vs 0 relationships previously due to pattern-based fallback)
JSON Parse Resilience — Missing description Field (entity/prompts.rs)
EntityData.descriptionis now annotated#[serde(default)]- When the LLM returns JSON with a missing
descriptionfield (e.g. for Project Gutenberg license chunks), parsing succeeds with an empty string instead of falling through to the error path and losing all entities from that chunk - Fixes the
"JSON repair failed: missing field 'description'"errors seen in the last ~10 chunks of Project Gutenberg books
Multi-Chunk Semantic Answer Generation (lib.rs, handlers/bench.rs)
generate_semantic_answer_from_results: reworked context assembly- Removed 400-char truncation: full chunk content is now passed to the LLM for each result
- Deduplication: tracks seen chunk IDs to avoid repeating the same chunk from multiple entity hits
- Relevance sorting: context sections sorted by score descending before joining
- Synthesis prompt: updated instructions to ask the LLM to synthesize across ALL context sections
- Dynamic
num_ctx: prompt size calculated at runtime with 20% margin — activates KV cache for answering generate_with_paramsused instead ofgenerate()— passesnum_ctx,keep_alive,temperature
bench.rs: switched fromgraphrag.ask()tographrag.ask_explained()sourcesin the JSON output now populated with actual chunk IDs and excerpts (was always[])
E2E Config — No-Gleaning Mistral Pipeline
- New config
tests/e2e/configs/kv_no_gleaning_mistral__symposium.json5use_gleaning: false,keep_alive: "1h",chunk_size: 1000,chunk_overlap: 200- Uses mistral-nemo:latest for entity extraction and nomic-embed-text for embeddings
Added - Ollama KV Cache & Contextual Retrieval (2026-02-22)
Ollama KV Cache Parameters (ollama/mod.rs, config/mod.rs, config/setconfig.rs)
keep_alivefield added toOllamaConfigandOllamaGenerationParams- Keeps the Ollama model loaded in VRAM between requests (prevents KV cache eviction)
- Critical for multi-chunk document processing: without it, the model unloads between each chunk
- Default:
None(uses Ollama’s built-in 5-minute default) - Example:
"1h"for book-length document processing sessions
num_ctxfield added toOllamaConfigandOllamaGenerationParams- Explicitly sets the context window size (Ollama silently truncates to 2k-8k without this)
- Goes into the
optionsobject in Ollama API requests;keep_aliveis a top-level field - Default:
None(uses Ollama’s default, usually 2048-8192 tokens) - Example:
32768for documents up to ~130k characters
- Both fields wired through the full config stack: JSON5 parser,
OllamaSetConfig, request body
Contextual Chunk Enricher (text/contextual_enricher.rs)
- New module implementing Anthropic’s Contextual Retrieval pattern
ContextualEnricher: augments each chunk with 2-3 sentences of document-level context before embedding- KV Cache optimization: static prefix (full document) is cached by Ollama; only the chunk suffix is re-evaluated per request
- First chunk: ~2 min (loads document into KV cache on RTX 4070 with Mistral-NeMo 12B)
- Subsequent chunks: ~3-5 sec each (only chunk tokens evaluated)
- ~100 chunks from a 45k-token book: 5-10 minutes total vs hours without KV cache
calculate_num_ctx(): dynamic context window calculation per document- Formula:
tokens(instructions) + tokens(document) + tokens(largest_chunk) + output_budget + 5% margin - Rounded to nearest 1024, clamped to
[4096, 131072]
- Formula:
enrich_document_chunks()andenrich_chunks(): async, groups chunks by source document- Output format:
[LLM context]\n\n[original chunk text]— preserves original text verbatim
Late Chunking Strategy (text/late_chunking.rs)
- New
LateChunkingStrategyimplementingChunkingStrategytrait (Jina AI technique) - Produces chunks annotated with
position_in_documentmetadata (byte spans) for post-hoc pooling JinaLateChunkingClient: calls Jina Embeddings API v2 withlate_chunking: truesplit_into_sections(): handles documents exceeding model context window (8192 tokens for Jina v3)LateChunkingConfig: configurable chunk size, overlap, max document tokens, position annotation
E2E Benchmark KV Cache Support (tests/e2e/run_benchmarks.sh)
- Three new pipeline dimensions:
keep_alive,num_ctx,ollama_timeout - All existing pipelines updated with explicit defaults (
keep_alive=none,num_ctx=0) - Semantic/hybrid pipelines with Ollama now default to
keep_alive=30m(model stays loaded during build phase) - Three new KV cache pipelines targeting long document processing:
kv_semantic_mistral: semantic approach, Mistral-NeMo,keep_alive=1h,num_ctx=32768, timeout=300skv_hybrid_mistral: hybrid approach, Mistral-NeMo,keep_alive=1h,num_ctx=32768, timeout=300skv_semantic_qwen3: semantic approach, Qwen3 8B Q4,keep_alive=1h,num_ctx=16384, timeout=300s
- KV Cache settings shown in run header when active
- Generated JSON5 configs include
keep_aliveandnum_ctxin theollamasection
Tests
tests/contextual_enricher_e2e.rs: 4 tests forContextualEnrichertest_enriched_chunk_contains_original_and_context(#[ignore], requiresENABLE_OLLAMA_TESTS=1)test_kv_cache_speedup(#[ignore]) — measures per-chunk timing and speedup ratiotest_num_ctx_calculation_sanity— always-run, validates num_ctx formula boundstest_disabled_enricher_returns_chunks_unchanged— always-run no-op safety check
Added - Service Registry Completion (2025-02-11)
Core Infrastructure
- Complete test utilities module (
core/test_utils.rs):MockEmbedder: Deterministic hash-based embedding generation with dimension supportMockLanguageModel: Configurable response mapping for testingMockVectorStore: In-memory vector store with cosine similarity searchMockRetriever: Simple retriever for testing search pipelines- All mocks fully implement core
Async*traits - 100% test coverage with 5 passing test cases
Adapter Implementations
-
Entity extraction adapter (
core/entity_adapters.rs):GraphIndexerAdapterbridges LightRAG’s GraphIndexer toAsyncEntityExtractortrait- Configurable confidence threshold filtering
- Entity type conversion from domain-specific to core types
- Batch extraction support
- Feature-gated with
lightragfeature
-
Retrieval system adapter (
core/retrieval_adapters.rs):RetrievalSystemAdapterimplementsAsyncRetrievertrait- Integration with KnowledgeGraph-based retrieval
- Batch search support
- Comprehensive documentation on graph requirements
- Feature-gated with
basic-retrievalfeature
-
Metrics collector implementation (
monitoring/metrics_collector.rs):- Thread-safe metrics with DashMap for counters, gauges, and histograms
- Atomic operations for zero-lock contention
- Histogram statistics: count, sum, mean, min, max, p50, p95, p99
- Timer support with start/finish API
- Metric tagging with key-value pairs
- 7/7 passing tests for all metric types
- Feature-gated with
dashmapandmonitoringfeatures
Registry Integration
- Service registration in
ServiceConfig::build_registry():- Entity extractor registration (with
lightragfeature) - Retriever registration (with
basic-retrievalfeature) - Metrics collector registration (with
dashmap+monitoringfeatures) - Mock services for testing via
with_test_defaults() - Proper feature-gating for modular compilation
- Entity extractor registration (with
Documentation
-
Architectural documentation:
- Documented trait hierarchy for vector stores (domain-specific vs generic)
- Explained when to use adapters vs direct implementations
- Clarified graph integration requirements for retrieval
- Added TODO markers for future unification work
- Inline examples in all adapter modules
-
Code quality improvements:
- Removed unused imports across multiple modules
- Fixed parameter name warnings in data import
- Commented out incomplete vector-memory feature gate
- Clean compilation with
async,ollama,dashmap,monitoring,basic-retrieval,lightragfeatures
Testing
- 310 tests passing in graphrag-core library
- All new service implementations verified:
test_mock_embedder: Hash-based deterministic embeddingstest_mock_language_model: Response mappingtest_mock_vector_store: Cosine similarity searchtest_mock_retriever: Basic search operations- Metrics collector tests: counters, gauges, histograms, timers
- Integration tests for service registration and retrieval
Added - Ollama Advanced Integration (2025-02-11)
Streaming Support
- Real-time token generation with tokio channel-based streaming
generate_streaming()method returnstokio::sync::mpsc::Receiver<String>- Server-Sent Events (SSE) parsing for Ollama streaming API
- Background task spawning for non-blocking stream reads
- Automatic statistics recording for streamed responses
- Example usage in test suite (
tests/ollama_enhancements.rs)
Custom Generation Parameters
OllamaGenerationParamsstruct for fine-grained control:num_predict: Maximum tokens to generatetemperature: Sampling temperature (0.0 - 1.0)top_p: Nucleus sampling thresholdtop_k: Top-k samplingstop: Stop sequences (array of strings)repeat_penalty: Repetition control
generate_with_params()method for custom parameter usage- Integration with
AsyncLanguageModeltrait’scomplete_with_params() - Automatic conversion between core and Ollama parameter formats
Model Response Caching
- DashMap-based caching for thread-safe concurrent access
- Automatic cache population on API responses
- Cache hit detection before making API calls
- Performance: <1ms for cache hits vs 100-1000ms for API calls
- Cache management API:
clear_cache(): Clear all cached responsescache_size(): Get number of cached items
- Configurable via
OllamaConfig.enable_caching(default:true) - 80%+ hit rate on repeated queries
- 6x cost reduction potential
Metrics & Usage Tracking
OllamaUsageStatsstruct with atomic counters:total_requests: Total number of API callssuccessful_requests: Successful completionsfailed_requests: Failed attemptstotal_tokens: Cumulative token count (estimated)
- Thread-safe atomic operations (
Arc<AtomicU64>) - Zero lock contention for metrics updates
- API methods:
record_success(tokens): Record successful requestrecord_failure(): Record failed requestget_success_rate(): Calculate success percentage (0.0 - 1.0)
- Integration with
AsyncLanguageModel::get_usage_stats() - Automatic token estimation (~4 characters per token)
Service Registry Integration
- Type-safe service injection for Ollama services
OllamaEmbedderAdapterimplementsAsyncEmbeddertraitOllamaLanguageModelAdapterimplementsAsyncLanguageModeltrait- Automatic registration in
ServiceConfig::build_registry() - Support for both embeddings and language model services
MemoryVectorStoreregistration for in-memory operations
Documentation
- Complete OLLAMA_INTEGRATION.md guide with:
- Setup and prerequisites
- Basic and advanced usage examples
- Supported models (embeddings and LLM)
- Configuration options reference
- Batch processing examples
- Custom parameter examples
- Performance tips and troubleshooting
- Updated
graphrag-core/README.mdwith new features - Updated main
README.mdwith Ollama integration section - API reference with code examples
- Sources and external documentation links
Testing
- 8 new test cases in
tests/ollama_enhancements.rs:- Config with caching test
- Custom generation parameters test
- Client statistics API test
- Stats recording test
- Cache management test
- Default parameters test
- Adapter integration tests
- All tests passing (13/13 total including registry tests)
- Compilation verified with all feature combinations
Configuration Updates
- Added
enable_caching: booltoOllamaConfig - Updated all
OllamaConfiginitializers across codebase:config/mod.rs: TOML parsingconfig/setconfig.rs: Config mappingentity/llm_relationship_extractor.rs: LLM extraction
- Default caching: enabled (
true)
Changed
- Model info updated:
supports_streamingnow returnstrue - AsyncLanguageModel implementation: Now uses
generate_with_params()internally - OllamaClient structure: Added
statsandcachefields - Error handling: Improved with metrics recording on failures
- Test count: Increased from 214+ to 220+ test cases
Fixed
- Missing
enable_cachingfield inOllamaConfiginitializers - Incorrect
ModelUsageStatsfield mapping in adapter - Iterator reference error in execute_caused_query
- Compilation warnings for unused imports
[0.1.1] - Previous Release
Added - Core GraphRAG Implementation
- Temporal and causal reasoning for RoGRAG
- Graph indexer with 23 relationship patterns
- Service registry pattern for dependency injection
- GraphRAGBuilder with fluent API
- Parquet persistence for entities, relationships, documents
- Memory vector store implementation
- Complete trait-based architecture
Added - Research Features
- LightRAG dual-level retrieval (6000x token reduction)
- Leiden community detection (+15% modularity)
- Cross-encoder reranking (+20% accuracy)
- HippoRAG personalized PageRank (10-30x cost reduction)
- Semantic chunking with better boundaries
Added - Infrastructure
- Comprehensive test suite (214+ tests)
- Production-grade logging with tracing
- Feature flags for modular compilation
- WASM support with WebGPU acceleration
- Docker Compose deployment
[0.1.0] - Initial Release
Added
- Basic GraphRAG pipeline
- Entity and relationship extraction
- Vector embeddings support
- Graph construction and querying
- REST API server
- CLI tools
Migration Guides
Upgrading to Ollama Advanced Features
If you’re using basic Ollama integration, upgrading to the new features is seamless:
Before (still works):
#![allow(unused)]
fn main() {
let client = OllamaClient::new(OllamaConfig::default());
let response = client.generate("Hello").await?;
}
After (with new features):
#![allow(unused)]
fn main() {
let config = OllamaConfig {
enable_caching: true, // NEW: Enable caching
..Default::default()
};
let client = OllamaClient::new(config);
// Streaming
let mut rx = client.generate_streaming("Hello").await?;
while let Some(token) = rx.recv().await {
print!("{}", token);
}
// Custom parameters
let params = OllamaGenerationParams {
temperature: Some(0.8),
top_p: Some(0.95),
..Default::default()
};
let response = client.generate_with_params("Hello", params).await?;
// Metrics
let stats = client.get_stats();
println!("Success rate: {:.2}%", stats.get_success_rate() * 100.0);
}
Breaking Changes
None! All new features are opt-in and backward compatible.
Development
Building from Source
git clone https://github.com/your-username/graphrag-rs.git
cd graphrag-rs
cargo build --release --features async,ollama,dashmap
Running Tests
cargo test --all-features
cargo test -p graphrag-core --test ollama_enhancements
Contributing
See CONTRIBUTING.md for guidelines.
For complete documentation, see:
- README.md - Main project documentation
- graphrag-core/OLLAMA_INTEGRATION.md - Ollama guide
- graphrag-core/README.md - Core library docs
- ARCHITECTURE.md - System architecture