Changelog

All notable changes to GraphRAG-RS will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Security

CI green: cargo-deny advisories/licenses + rustfmt (2026-05-31)

Vulnerabilities patched via lockfile bumps: rand 0.8.5→0.8.6 and 0.9.2→0.9.4 (RUSTSEC-2026-0097 unsoundness), bytes 1.10.1→1.11.1 (RUSTSEC-2026-0007 integer overflow), rustls-webpki 0.103.7→0.103.13 (RUSTSEC-2026-0049/0098/0099/0104 — CRL + name-constraint vulns). All patch-level, non-breaking.
deny.toml licenses: added BSL-1.0 (Boost) and CDLA-Permissive-2.0 (Mozilla CA bundle via webpki-roots) to the allow-list — both permissive, were failing the licenses job.
deny.toml advisory ignores (unfixable here, documented inline): unmaintained transitive crates proc-macro-error, bincode, json, number_prefix, paste, rustls-pemfile; lru 0.12 unsoundness (RUSTSEC-2026-0002, pinned by ratatui 0.29, unreachable in our usage); and time DoS (RUSTSEC-2026-0009) — its fix (≥0.3.47) requires rustc 1.88, above our MSRV 1.85, so time is held at 0.3.44 and the advisory accepted (reachable only via untrusted RFC-2822 parsing in the server, not core/cli). Revisit when MSRV moves to ≥1.88.
Formatting: ran cargo fmt --all over the workspace (71 files) to clear the long-standing rustfmt CI job. Mechanical, no behavior change.
--all-features advisory/license coverage: the cargo-deny-action defaults to --all-features, so CI also scans the optional lancedb tree (lance/datafusion/arrow). Patched lz4_flex 0.11.5→0.11.6 / 0.12.0→0.12.2 (RUSTSEC-2026-0041) and tar 0.4.44→0.4.46 (RUSTSEC-2026-0067/0068); allowed 0BSD (mock_instant). Added [graph] all-features = true to deny.toml so local cargo deny check sees the same graph as CI (prevents local≠CI drift).
CI SIGILL fix: set RUSTFLAGS = "-C target-cpu=x86-64-v2" in ci.yml to override the repo’s .cargo/config.toml -C target-cpu=native. On GitHub’s heterogeneous runners native can emit instructions the silicon traps (SIGILL crashing rustc/proc-macros, seen building ollama-rs). Verified the rustc invocation: an empty CARGO_BUILD_RUSTFLAGS is ignored and doesn’t override the config flag — only a non-empty RUSTFLAGS (highest precedence) fully replaces it. Local dev keeps target-cpu=native; CI uses the portable x86-64-v2 baseline.

Added

Documentation site (2026-05-31)

mdBook documentation site under book/, deployed to GitHub Pages at https://automataia.github.io/graphrag-rs/. Curated, English-only, user-facing TOC (book/src/SUMMARY.md) covering getting-started, concepts, configuration, features, and per-crate guides. Internal dev reports and Italian guides are intentionally excluded.
Chapters are thin {{#include}} wrappers over the canonical sources (HOW_IT_WORKS.md, crate READMEs, curated docs/*.md) so there is a single source of truth and no content drift. Front-door pages (introduction.md, getting-started/overview.md, quickstart.md) are authored.
Mermaid diagrams render via the mdbook-mermaid preprocessor; built-in client-side search enabled.
API reference links out to docs.rs/graphrag-core rather than self-hosting cargo doc.
New CI workflow .github/workflows/docs.yml builds the book (pinned mdbook 0.5.3 + mdbook-mermaid 0.17.0 prebuilt binaries) and deploys via actions/deploy-pages. The generated book/book/ output is git-ignored. Manual one-time step: set repo Settings → Pages → Source = “GitHub Actions”.
README: added a docs-site badge.
Translated to English the doc sources the site includes that still contained Italian: docs/INCREMENTAL_UPDATES.md, docs/TUI_USAGE_GUIDE.md, docs/ENRICHMENT_USAGE_GUIDE.md, docs/SUMMARIZATION_CONFIG.md, the graphrag-cli/README.md config table notes, and the Italian entries in this CHANGELOG. Fixed stale repo URLs (anthropics/* → automataIA/graphrag-rs) in the translated guides. The public site is now English-only end to end.
Stripped decorative/pictographic emoji (the 📚🚀📖 family) from the doc sources the site includes, fixing “tofu” boxes that appeared wherever the viewer’s font lacked an emoji glyph (mdBook’s default theme has no emoji-font fallback — a generic missing-glyph issue, not a bug). Preserved arrows (→), box-drawing/ASCII diagrams (━│▼█), and data symbols (✅❌★☆); converted rating ⭐→★ to keep ratings rendering. Keycap-numbered headings (1./2.) replaced the 1️⃣ style.

[0.2.0] - 2026-05-31

Fixed

arrow workspace dep: added default-features = false to arrow = "57" in the workspace Cargo.toml. Previously, the default-features = false directive in graphrag-core/Cargo.toml was silently ignored by Cargo (build-time warning).
documentation metadata for the graphrag crate: added documentation = "https://docs.rs/graphrag" in graphrag/Cargo.toml, aligning the wrapper crate with graphrag-core and graphrag-cli.

Code/architecture/product quality audit (2026-05-30)

Added

CI/CD: new workflow .github/workflows/ci.yml. The repo previously had no CI automation. Blocking jobs: clippy --workspace --lib -D warnings (now green, see below), test -p graphrag-core --lib, cargo-deny. The fmt job is informational and non-blocking (continue-on-error) until the repo is made cargo fmt --all clean (pre-existing repo-wide formatting debt).
Security tooling: deny.toml (advisories + permissive licenses + duplicate ban) and SECURITY.md (private disclosure policy via GitHub Security Advisories).
Drift-guard tests (config/setconfig.rs): gliner_setconfig_default_matches_runtime and autosave_setconfig_default_matches_runtime fail at build time if the serde leaf-struct defaults diverge from the canonical runtime ones, preventing “5-point-sync” drift. OllamaConfig is excluded on purpose (by-design divergence: offline-first runtime vs user-facing schema).
Crate metadata: documentation (docs.rs) and readme fields added to graphrag-core and graphrag-cli for publishing on crates.io.

Documentation polish (2026-05-30)

graphrag/README.md: the wrapper meta-crate had no README (only Cargo.toml
- src). Added: explains that it re-exports graphrag-core and provides the graphrag binary, with a binary quick-start + library usage and links to the core/root README.
Module //! headers added to the 10 graphrag-core modules that lacked them (previously starting with use/pub mod/#[cfg] or a /// on the first submodule): config, graph, generation, critic, retrieval, summarization, vector, entity, text, query. Every module’s rustdoc page now shows a description. Doc-comments only, no behavior change; clippy -p graphrag-core -D warnings stays green and cargo doc introduces no new warnings.

PageRank: score normalization (dangling nodes) (2026-05-30)

Bug fix: scores_to_entity_map in graph/pagerank.rs now L1-normalizes the scores (sum = 1.0). Dangling nodes (no outgoing edges) lost rank mass on every iteration, leaving the sum < 1.0. Single fix point → covers all paths (dense/parallel/sparse). Unblocks 3 previously-failing tests: test_pagerank_convergence, test_personalized_pagerank, test_precompute_global_pagerank (visible only under the pagerank feature, activated by --workspace feature-unification).

Swagger UI served at `/swagger` (2026-05-30)

graphrag-server: the Swagger UI was announced but not served (“coming soon”). Now exposed at /swagger via apistos’s native support (features = ["swagger-ui"], already enabled) — apistos-swagger-ui bundles the official Swagger UI assets, so no new dependency. Changed .build("/openapi.json") → .build_with(..., BuildConfig::default().with(SwaggerUIConfig::new(&"/swagger"))) in main.rs. README updated (removed “coming soon”).

Clean clippy on examples/tests + green doctests (2026-05-30)

Clippy examples/tests: cargo clippy --examples --tests -p graphrag-core -- -D warnings is now green. Bulk via cargo clippy --fix; manual tail: ///!→//! (embeddings demo), .filter().next_back()→.rfind(), .clone() on a double ref → .iter().copied(), ignored let _ = on Result, std::slice::from_ref, removal of unused vars.
Doctest: cargo test --doc -p graphrag-core → 47 pass / 0 fail / 17 ignored. 7 illustrative, non-self-contained examples (require a live Ollama, an async runtime, or undefined setup variables — core::ChunkingStrategy, build_relationship_hierarchy, KV-cache Ollama, pipeline_executor, etc.) marked ```ignore. The hero example still runs and is green.
clippy --fix regression corrected: config/enhancements.rs:770 — --fix had removed mut from let count, seeing it as inactive under default features; restored let mut count with #[allow(unused_mut)] (the count += 1s are behind #[cfg(feature = ...)]).

Stale examples/tests recompile (2026-05-30)

Stale struct initializers: added the missing temporal/causal fields (all None) to the Entity literals (first_mentioned, last_mentioned, temporal_validity) and Relationship literals (embedding, temporal_type, temporal_range, causal_strength) in the llm_evaluation_demo, advanced_nlp_demo, hierarchical_graphrag_demo, workspace_demo, tom_sawyer_workspace examples. They had fallen behind the evolution of Entity/Relationship in core/mod.rs (Phase 1.2) and broke cargo build --examples.
complete_zero_cost_graphrag_demo: Config literal closed with ..Default::default() (it was missing advanced_features, gliner, suppress_progress_bars) and the EntityConfig literal completed with use_atomic_facts: false + max_fact_tokens: 400.
Per-feature gating (graphrag-core Cargo.toml): hierarchical_graphrag_demo now required-features = ["leiden"] (uses LeidenConfig / detect_hierarchical_communities, #[cfg(feature = "leiden")]) and the incremental_integration test required-features = ["incremental"] (it imported graphrag_core::incremental). So a default cargo build/test --workspace stays green without pulling in the optional features.
Chat discussion.html: added the standard line-clamp:3 property alongside -webkit-line-clamp (CSS vendorPrefix linter).
Verification: cargo build --examples --tests --workspace → clean Finished; cargo test -p graphrag-core --lib → 365 pass / 0 fail. The 3 pagerank tests that fail under --workspace feature-unification are pre-existing (confirmed on a clean tree).

Changed

Dependency dedup (anti-bloat): aligned two direct workspace dependencies to versions already present transitively, eliminating duplicate versions in graphrag-cli’s -e normal tree:
- strum 0.25 → 0.26 (matches ratatui 0.29) — removes duplicate strum + strum_macros.
- itertools 0.12 → 0.13 (matches ratatui/unicode-truncate).
- Real duplicates in graphrag-cli’s normal tree dropped from 34 to 26. Verified that graphrag-core (the published crate) has only 4 unavoidable transitive duplicates (getrandom 0.2/0.3, webpki-roots 0.26/1.0, TLS stack). rand 0.8→0.9 NOT done (API-breaking, only deduplicated the unpublished server binary).

Fixed

CLI crash at startup on all non-TUI subcommands (index, ask, bench, setup, validate, …): color_eyre::install() was called twice — in graphrag-cli/src/main.rs:10 and again inside run() at lib.rs:197 — and the second install aborted with “could not set the provided Theme globally as another was already set”. Removed the duplicate install() from main.rs; now both binaries (graphrag-cli and the graphrag meta-crate, which doesn’t install on its own) install exactly once via run(). Caught by running the e2e benchmarks (bench).
MSRV corrected and verified: rust-version changed from 1.75 (false, never tested) to 1.85. The real floor is imposed by the direct dependency jsonfixer, which uses edition = "2024" (requires rustc ≥ 1.85). Build-verified on the 1.85 toolchain for graphrag-core and graphrag-cli. New msrv CI job that builds on 1.85. Analysis method: floor from cargo metadata (max rust_version declared among the normal deps) + build verification on a single toolchain (no costly bisect).
Lint debt zeroed (green workspace clippy): resolved 38 pre-existing clippy errors that surfaced under cargo clippy --workspace --lib -- -D warnings (Rust 1.95). Diagnosis: graphrag-core in isolation (default features) was already clean; the errors were in core’s optional modules (incremental, rograg, lightrag, embeddings/ollama) activated by the cli/server features + 3 errors of graphrag-cli’s own. Idiomatic fixes (to_vec(), iter_mut().enumerate(), if let Some, sort_by_key(Reverse(..)), type aliases NodeDeltaResult/ EdgeDeltaResult) and targeted, commented #[allow]s where a rename would break the serde API (PendingUpdateType) or for a private 10-argument helper. Not an interface break: the crates compile and link correctly.
GLiNER default drift: default_gliner_entity_labels/default_gliner_relation_labels in config/setconfig.rs were misaligned with the runtime GlinerConfig::default() (missing "concept" and "causes"). Now aligned with the canonical default (4 entity + 3 relation labels). Not observable in the existing e2e configs (they set the labels explicitly); relevant only when GLiNER is enabled via TOML while omitting the labels.

Documentation

Markdown doc consolidation (few but useful): reduced the ~55 tracked .md files to a keystone set. Deleted 39 files among process artifacts (report.md, TODO.md, *_COMPLETE.md, *_SUMMARY.md, *_STATUS.md, MERGE_COMPLETE.md, IMPLEMENTATION_SUMMARY.md) and satellite integration guides now covered by the keystones (graphrag-core/{ADVANCED_FEATURES,OLLAMA_INTEGRATION,LEIDEN_INTEGRATION,LIGHTRAG_INTEGRATION, HIPPORAG_INTEGRATION,CROSS_ENCODER_INTEGRATION,ENTITY_EXTRACTION,EMBEDDINGS_CONFIG, PIPELINE_ARCHITECTURE,QUICKSTART,ENRICHMENT_IMPLEMENTATION,WORKSPACE_PERSISTENCE_SUMMARY}.md, the src/{embeddings/README,graph/TRAVERSAL_GUIDE}.md, the entire series of non-README graphrag-wasm/*.md guides, examples/MULTI_DOCUMENT_PIPELINE.md). The surviving keystones: README.md, HOW_IT_WORKS.md, CHANGELOG.md, the 4 crate READMEs, config/JSON5_CONFIG_GUIDE.md. The docs/ folder is git-ignored (local notes) and is not touched.
Keystone staleness fixes: MSRV badge/prerequisites 1.70 → 1.85 in the root README; removed references to the deleted graphrag-leptos crate (workspace layout now 5-crate
- the graphrag meta-crate, dependency graph updated); “Web UI” section rewritten around the chat-shell. HOW_IT_WORKS.md: the WASM section now points to graphrag-wasm (no longer to the deleted graphrag-leptos).
graphrag-wasm README rewritten: the old 5-tab DaisyUI UI is replaced by the documentation of the 3-column Nordic-Minimal chat-shell (LeftRail/Stage/RightRail), off-main-thread inference, citations, IndexedDB persistence; removed the dead links to the deleted satellite guides.
Internal links repointed: all links to the deleted docs (in README.md, HOW_IT_WORKS.md, graphrag-core/README.md) now point to HOW_IT_WORKS.md, config/JSON5_CONFIG_GUIDE.md, CHANGELOG.md, or docs.rs/graphrag-core.

Removed

Dead code: removed graphrag-server/src/main_axum_old.rs (~31KB, orphan file with no references, neither a bin-target nor a module).
Unused dependency: removed text_analysis = "0.3" from graphrag-core and from [workspace.dependencies] (detected with cargo machete, verified: no use in the code — the only match was the string "context_analysis"). The other cargo machete reports (getrandom, gline-rs, js-sys, web-sys, tower, text-splitter) are verified false positives (wasm/api feature-enablers or crates whose lib name differs from the package name, like gline-rs→gliner) and kept.

Changed

graphrag-wasm chat-shell rewrite (Nordic-Minimal) (2026-05-17)

BREAKING: the 5-tab daisyUI UI (Build / Explore / Query / Hierarchy / Settings) is replaced by a single 3-column chat shell that mirrors the Chat discussion.html Nordic-Minimal mockup verbatim (palette, font stack Newsreader / Geist / Geist Mono, class names, citation/hover wiring).
- New layout in graphrag-wasm/src/main.rs: LeftRail (brand + sources + Flat/Hierarchy toggle + Build button), Stage (head with active source, thread of Turns, composer), RightRail (subgraph SVG + pipeline rows + ministats + references). All real data: documents come from the existing IndexedDB signal, pipeline progress is driven by the existing BuildStatus/BuildStage, embeddings come from ONNX Runtime Web + tokenizer.json, retrieval from VectorIndex::search, answers from WebLLM (Phi-3-mini for synthesis, Qwen for extraction), citations are post-processed via parse_answer_with_cites and link to <button class="cite"> ↔ <div class="ref-card"> through the reactive active_ref: Option<u32> signal — no inline JS.
- New module graphrag-wasm/src/components/chat_shell.rs holds the data types (ChatTurn, RefCard, AnswerSegment, SubgraphData), the citation parser and the per-query build_subgraph builder that unions entities from the top-K retrieved chunks and feeds them through components::force_layout::ForceLayout (320×240 viewBox, 16-node / 21-edge cap matching the mockup density label).
- Styling: graphrag-wasm/tailwind.css is now a flat Nordic-Minimal stylesheet (no @tailwind directives, no daisyUI); graphrag-wasm/index.html drops lucide CDN + MutationObserver and adds the Google-fonts preconnect block.
- leptos-lucide-rs dependency removed from graphrag-wasm/Cargo.toml.
- Legacy daisyUI components (components/{settings,hierarchy,ui_components,chat_component}.rs) remain on disk for reference but are no longer compiled — components/mod.rs only exports chat_shell + force_layout.
- Parity test: graphrag-wasm/tests/playwright/chat_layout.sh drives playwright-cli: opens the mockup over python3 -m http.server and the WASM SPA on trunk serve, captures 1440×900 screenshots (tests/playwright/artifacts/{mockup,wasm}.png) and asserts 19 shared selectors (.app, .rail-left .doc-item, .stage-title, .bubble-q, .cite, .stages .pls, .graph-frame svg, .ref-card, .composer input, …). Current status: 19/19 pass.

Added

2026 best-practices pass (graphrag-core ↔ graphrag-wasm) (2026-05-16)

Off-main-thread inference (Stage 3b) for graphrag-wasm.
- WebLLM: WebLLM::new and WebLLM::new_with_progress in graphrag-wasm/src/webllm.rs now auto-detect a pre-spawned window.webllmWorker and switch to CreateWebWorkerMLCEngine, keeping the same chat.completions.create surface (and chat_stream’s async-iterator) intact. Falls back to the main-thread engine if worker spawn fails. New sidecar graphrag-wasm/webllm-worker.js hosts WebWorkerMLCEngineHandler (15 LOC).
- ONNX Runtime Web: ort.env.wasm.proxy = true + numThreads = 1 set immediately after ort.min.js loads in graphrag-wasm/index.html, so all InferenceSession.run calls execute in ORT’s dedicated worker.
- Trade-off vs the plan’s gloo-worker route: no second wasm bundle, no Rust worker scaffolding, ~30 LOC swap. Verification (“main-thread blocked < 50 ms during inference”) met via the runtimes’ built-in workers.
Token-streaming UX in graphrag-wasm QueryTab. Replaced the blocking WebLLM::chat(...) call at graphrag-wasm/src/main.rs:1604 with chat_stream(...): tokens are now appended to the results signal incrementally as they arrive from the model, matching 2026 in-browser-LLM UX guidance. The pre-existing streaming API in graphrag-wasm/src/webllm.rs:334 was previously unused.
IndexedDB persistence for the document set. New graphrag-wasm/src/persist.rs wraps IndexedDBStore with open_store, save_document, delete_document, load_all_documents. The App component restores documents on first load; manual input, file upload, Symposium-demo load, and document-remove handlers all persist their mutations. Reloading the page now preserves the document set instead of resetting to empty.
WAI-ARIA tabs pattern in graphrag-wasm. All 5 tab panels are now mounted permanently inside a <main id="main-content"> landmark with hidden=move || active_tab.get() != Tab::X. Each tab button gained an id (tab-build, tab-explore, etc.) matching the panel’s aria-labelledby. This fixes Lighthouse aria-valid-attr-value and landmark-one-main audits, and preserves component state across tab switches.
SEO: added <meta name="description"> and <link rel="canonical"> plus <meta name="color-scheme" content="dark light"> to graphrag-wasm/index.html. External links in the footer gained rel="noopener noreferrer".
Downloaded MiniLM-L6-v2 ONNX model (87MB) to graphrag-wasm/models/minilm-l6.onnx for semantic query embeddings. Previously the directory was empty, causing fallback to hash-based embeddings which produced no meaningful search results.

Removed

Broken orphan example crates deleted (2026-05-16)

examples/web-app/ and examples/graphrag-leptos-demo/ both depended on the deleted graphrag-leptos crate (merged into graphrag-wasm in March 2025). They were excluded from the workspace so they did not block builds, but were misleading for newcomers. Functionality is fully covered by graphrag-wasm itself.
Dropped exclude = ["examples/web-app"] from root Cargo.toml.

`graphrag_py` Python bindings crate deleted (2026-05-16)

Removed graphrag_py/ directory and workspace member entry in root Cargo.toml.
Reason: legacy crate, pyo3 0.21 (out-of-date), last touched 4 commits ago before the KV-cache / GLiNER / contextual-enricher / persistence wave. API frozen pre-feb-2026, never published (publish = false), Development Status :: 4 - Beta.
BREAKING: Python bindings no longer build from this repo. Future Python support should live in a separate repo with current pyo3.

cargo clippy --lib -p graphrag-core --no-default-features --features "wasm-bundle" --target wasm32-unknown-unknown -- -D warnings went from 54 errors → 0. Native default-features pass also restored to 0 errors. Both targets and the 363 native lib tests now pass cleanly under the PostToolUse clippy hook.

Mechanical lints auto-applied: sort_by_key (5×), clamp (5×), unwrap_or_default, is_some_and, manual_abs_diff, manual_pattern_char_comparison, collapsible_match, let_and_return, derivable_impls, field_reassign_with_default, needless_return.
Type aliases for boxed Fn benchmark callbacks in graphrag-core/src/monitoring/benchmark.rs:208-214: RetrievalFn, RerankerFn, LlmFn. Eliminates 3× type_complexity warnings.
HierarchicalLeidenResult type alias in graphrag-core/src/graph/leiden.rs:17 factored out the Result<(HashMap<.., HashMap<..>>, HashMap<..>)> return type of hierarchical_leiden.
Feature-gated dead-code under wasm: helper methods in gleaning_extractor.rs, llm_extractor.rs, chunking_strategies.rs, contextual_enricher.rs, late_chunking.rs are now #[cfg(feature = "async")]. Fields ollama_client (atomic_fact_extractor, llm_extractor), prompt_builder (llm_extractor), client (contextual_enricher), llm_extractor (gleaning_extractor), critic (graphrag/mod), api_key (late_chunking), and boundary_detector / coherence_scorer / min_chunk_chars (chunking_strategies) carry #[cfg_attr(not(feature = "async"), allow(dead_code))]. Five modules carry #![cfg_attr(not(feature = "async"), allow(unused_imports))] to silence imports that become dead when the async build_graph path is gone.
Restored imports lost during refactor: TextChunk, GraphRAGError, Document, HashMap, HashSet, Result, OllamaGenerationParams re-added to atomic_fact_extractor.rs, gleaning_extractor.rs, llm_extractor.rs, contextual_enricher.rs, late_chunking.rs. Underscored-but-still-used variables (_e → log-formatter args, _original_score, _total_chunks) rewritten to be self-consistent.

Fixed

WASM compilation broken after graphrag-core refactor (2026-05-16)

graphrag-core failed to compile for wasm32-unknown-unknown (65 errors → 0). The WASM build uses default-features = false (excludes async, tracing, tokio, parallel-processing), but many code paths used tracing:: calls and tokio without feature gates.

Added #[cfg(feature = "tracing")] gates to ~80 tracing:: calls across 15 files.
Gated tokio::runtime::Runtime in BoundaryAwareChunkingStrategy::chunk() behind #[cfg(feature = "async")] with sync fallback.
Split RetrievalSystem::batch_query() into #[cfg(feature = "parallel-processing")] and #[cfg(not(feature = "parallel-processing")) variants.
Fixed sync ask() (#[cfg(not(feature = "async"))) to call retrieval.query() instead of async query_internal().
Added #![recursion_limit = "512"] to graphrag-wasm main.rs for Leptos type depth.
Created missing graphrag-wasm/models/ directory required by Trunk.

Missing `Relationship` fields in sync `build_graph()` (2026-05-16)

graphrag-core/src/graphrag/build.rs:690: Relationship struct literal was missing embedding, temporal_type, temporal_range, and causal_strength fields added in Phase 1.2 (Advanced GraphRAG). Added all four with None defaults so the sync build path compiles without partial-init errors.

`rograg::validator` dropped quality metrics (2026-05-16)

graphrag-core/src/rograg/validator.rs:376: validate_response was computing coherence_score, relevance_score, factual_consistency_score, completeness_score, readability_score, and source_credibility_score then throwing them away (7 unused_variable / unused_assignments warnings). Now they:

Fold into validated_response.confidence via a new overall_quality() helper (mean of the metrics that were actually run — coherence / relevance / factual consistency are gated on their respective config flags; completeness / readability / source credibility always count).
Trigger a Medium IssueType::Quality validation issue when overall quality falls under 0.5.
Are emitted as a structured tracing::debug! event so the metrics are observable in logs without a public API change.

Changed

Server crate: color-eyre pretty errors at startup (2026-05-16)

graphrag-server/src/main.rs: main() return type std::io::Result<()> → color_eyre::Result<()>, with color_eyre::install() at top.
Adds color-eyre = "0.6" to graphrag-server/Cargo.toml.
mimalloc allocator was already wired (no change).
Production unwraps in server crate audited: all 16 remaining unwraps are inside #[cfg(test)] blocks (qdrant_store, auth, embeddings, config_handler, etc.). Production paths use .map_err(...)? / .ok_or_else(...)? — already clean. Part of refactor-2026-05 server slice.

Documentation

Stale memory + CLAUDE.md notes refreshed (2026-05-16)

CLAUDE.md workspace layout: 6-crate → 5-crate (graphrag_py removed).
CLAUDE.md “Known gotchas”: replaced obsolete “12 failing unit tests” claim with verified status: cargo test -p graphrag-core --lib → 363 pass / 0 fail. The remaining cargo test --workspace failures come from stale examples (not tests) under graphrag-core/examples/ with missing Entity / Relationship fields; left untouched per project policy.
MEMORY.md (auto-memory) synced to the same wording.

Removed

Test suite aggressive pruning (2026-05-16)

User-requested clean-up: keep only indispensable, up-to-date tests; delete broken pre-existing failures, hanging tests, stale pre-refactor integration tests, and trivial construction-only sanity tests.

23 broken / hanging / failing unit tests deleted:
- async_graphrag::tests::* (6 tests on dead module)
- entity::*::test_normalize_name (2 stale assertions)
- entity::llm_relationship_extractor::test_fallback_extraction
- reranking::cross_encoder::test_rerank_basic + test_confidence_filtering (need ONNX)
- retrieval::symbolic_anchoring::test_extract_anchors (stale)
- text::boundary_detection::test_sentence_detection + test_combined_detection
- graph::incremental::tests::test_basic_entity_upsert + 6 ProductionGraphStore tests (deadlock in async lock contention — hung indefinitely)
- rograg::logic_form::tests::test_pattern_parser + test_logic_form_retrieval
- rograg::intent_classifier::tests::test_{factual,relational,temporal,causal,comparative,summary,definitional}_intent (7 stale assertions on intent classification)
- rograg::quality_metrics::test_performance_stats_update
- rograg::streaming::test_template_selection
- incremental::lazy_propagation::test_lazy_propagation_basic
- incremental::delta_computation::test_parallel_computation
10 stale workspace-level integration test files deleted (./tests/*.rs, all pre-2026, predate the KV cache / GLiNER / persistence / file-split refactors): caching_integration.rs, config_integration_test.rs, http_endpoint_tests.rs, hybrid_retrieval_tests.rs, integration_tests.rs, modular_integration_tests.rs, property_tests.rs + .proptest-regressions, server_integration_tests.rs, zero_cost_approaches_integration_tests.rs, tests/parallel/. Plus graphrag-core/tests/ollama_enhancements.rs (didn’t compile — missing context field on OllamaGenerationParams).
15 trivial test_*_creation patterns deleted (single-line constructions verifying only X::new().is_ok()): test_tree_creation, test_async_mock_llm_creation, test_incremental_pagerank_creation, test_processor_creation, test_agent_creation, test_function_caller_creation, test_cache_warmer_creation, test_retrieval_system_creation, test_enhanced_registry_creation, test_mock_llm_creation, test_answer_generator_creation, test_graphrag_creation, test_graph_indexer_creation, test_lancedb_creation, test_cached_client_creation. Plus 2 trivial Ollama adapter creation tests (entire test module in core/ollama_adapters.rs removed).
Tests retained: 7 integration test files in graphrag-core/tests/ (the 2026-02 refactor-era tests exercising KV cache, contextual enricher, GLiNER features, triple validation, dynamic weighting, BAR-RAG, text pipeline fixtures, incremental graph updates). ./tests/e2e/ benchmark scripts kept.
Verification matrix — all 100% green:
- cargo test -p graphrag-core --lib → 363 passed, 0 failed (was 371/12 fail)
- cargo test -p graphrag-core --lib --features rograg → 402 passed, 0 failed
- cargo test -p graphrag-core --lib --features incremental → 390 passed, 0 failed

Fixed

Workspace-wide production `unwrap()` sweep (2026-05-16) — Part of refactor-2026-05 Phase 3 (extended)

Going beyond the original Phase 3 scope (voy_store, rograg/streaming, rograg/processor, cli/config, qdrant_store — all already verified test-only or previously cleaned), every remaining production .unwrap() in the workspace has been replaced with the appropriate safe alternative.
Mechanical sweeps by category:
- 36 partial_cmp(...).unwrap() (f32 sort comparators, NaN-panic-prone) across ~23 files (async_graphrag, inference, retrieval/*, graph/*, summarization, vector, monitoring, nlp, generation, server handlers, etc.) → .unwrap_or(std::cmp::Ordering::Equal).
- 22 lock()/read()/write().unwrap() (Mutex/RwLock acquisitions, poisoned-lock-panic-prone) → .expect("lock poisoned") / .expect("rwlock poisoned").
- 12 Regex::new(...).unwrap() (static regex literals) → .expect("static regex literal").
- duration_since(UNIX_EPOCH).unwrap() (system clock) → .expect("system clock before UNIX epoch").
- Iterator and Option terminators (.first(), .last(), .next(), .min(), .max(), .pop(), .as_ref(), .as_mut(), .chars().next()) after checked-precondition usages → .expect(<reason>).
- Targeted contextual fixes for result_map.remove, get_mut after contains_key, Self::new() in Default::default, NonZeroUsize::new on literal, caps.get(N), strip_prefix(...) after starts_with, etc.
Test-only infrastructure files (core/test_traits.rs, core/test_utils.rs) intentionally left untouched — their .unwrap() calls represent test-helper panic semantics by design (suite is called from test functions only).
Net result: workspace audit reports 0 production .unwrap() calls outside test infrastructure (down from ~178 pre-existing). All builds green: graphrag-core default + --features rograg + --features incremental, plus graphrag-cli, graphrag-server, graphrag wrapper.

Changed

Module split: retrieval/types.rs extracted (2026-05-16) — Part of refactor-2026-05 Phase 4 (final)

Extracted RetrievalConfig, SearchResult, ResultType, QueryAnalysis, QueryType, QueryIntent, QueryAnalysisResult, QueryResult, RetrievalStatistics (+ its print impl) from graphrag-core/src/retrieval/mod.rs into the new private module graphrag-core/src/retrieval/types.rs (199 LOC).
retrieval/mod.rs shrinks 1851 → 1666 LOC; the public API is preserved via pub use types::*; so crate::retrieval::SearchResult etc. resolve unchanged.
Restored one stripped doc comment (/// Statistics about the retrieval system) on RetrievalStatistics to satisfy #![warn(missing_docs)] — the sed extraction had eaten the line during slicing.
This was the last remaining Phase 4 item from the plan. Build + clippy clean (per the feedback-verify-with-build-clippy policy).

Sub-split: graphrag/ directory module (2026-05-16) — Part of refactor-2026-05 Phase 4

Follow-up to the earlier graphrag.rs single-file move. The 1753-LOC graphrag-core/src/graphrag.rs is now a directory module graphrag-core/src/graphrag/ with per-concern sub-files:
- mod.rs (~105 LOC): struct GraphRAG, sub-module declarations, private ensure_initialized helper (bumped fn → pub(super) fn so the sibling impl blocks can call it), #[cfg(test)] mod tests block with the two pre-existing tests.
- lifecycle.rs (~189 LOC): new, default_local, builder, initialize, try_load_from_workspace, save_to_workspace, clear_graph.
- documents.rs (~53 LOC): add_document_from_text, add_document.
- build.rs (~715 LOC): async + sync build_graph paired methods.
- ask.rs (~519 LOC, renamed from query.rs to avoid clash with use crate::query for the planner module): ask, ask_with_reasoning, ask_explained, query_internal, query_internal_with_results, generate_semantic_answer_from_results, remove_thinking_tags, ask_with_pagerank pair.
- stats.rs (~85 LOC): config, is_initialized, has_documents, has_graph, knowledge_graph, knowledge_graph_mut, get_entity, get_entity_relationships, get_chunk.
- factory.rs (~202 LOC): from_json5_file, from_config_file, from_config_and_document, quick_start, quick_start_with_config.
Each sub-file has its own impl GraphRAG { ... } block; Rust allows multiple impl blocks across files. All sub-files share an identical kitchen-sink import header (Config, core types, critic, ollama, persistence, query, retrieval, feature-gated parallel, plus use super::GraphRAG).
Public API preserved: graphrag_core::GraphRAG resolves via lib.rs’s pub use graphrag::GraphRAG; (unchanged from the single-file pass).
Verified per the new policy: cargo build -p graphrag-core + downstream crates green; cargo clippy -p graphrag-core -- -D warnings shows exactly one error in the new files (graphrag/ask.rs:408 clamp pattern) which is a verbatim carry-over from the previous graphrag.rs:1358 (originally lib.rs:1594) — net new errors: zero. Tests not re-run (pure file move; see feedback-verify-with-build-clippy memory entry).

God-file split: graph/incremental/ directory module (2026-05-16) — Part of refactor-2026-05 Phase 4

Converted graphrag-core/src/graph/incremental.rs (2905 LOC — the biggest god-file in the crate) into a directory module graphrag-core/src/graph/incremental/ with focused sub-files:
- mod.rs (~395 LOC): doc + sub-module declarations + pub use re-exports + verbatim #[cfg(test)] mod tests block + the kitchen-sink use import block the tests rely on via super::*.
- types.rs (~465 LOC): UpdateId, TransactionId, ChangeRecord, ChangeType, Operation, ChangeData, Document, GraphDelta, DeltaStatus, RollbackData, ConflictStrategy, Conflict, ConflictType, ConflictResolution, the IncrementalGraphStore trait, GraphStatistics, ConsistencyReport, InvalidationStrategy, CacheRegion.
- helpers.rs (~496 LOC): SelectiveInvalidation, ConflictResolver, UpdateMonitor + impls + their satellite types (InvalidationStats, UpdateMetric, OperationLog, PerformanceStats).
- manager.rs (~898 LOC): IncrementalGraphManager (both feature-gated and non-gated paired definitions kept adjacent), IncrementalConfig, IncrementalStatistics, IncrementalPageRank, BatchProcessor, PendingBatch, BatchMetrics, plus the impl GraphRAGError convenience constructors that conceptually belong here.
- store.rs (~743 LOC): ProductionGraphStore + Transaction + TransactionStatus
  - IsolationLevel + ChangeEvent + ChangeEventType + impl IncrementalGraphStore for ProductionGraphStore + ChangeDataExt trait & impl.
Public API preserved via pub use cascade in mod.rs (crate::graph::incremental::* resolves unchanged).
Visibility-only bumps to keep the shared test module compiling across the new sub-module boundary:
- IncrementalPageRank.scores: field → pub(super) field
- ConflictResolver.strategy: field → pub(super) field
- ConflictResolver::merge_entities: fn → pub(super) fn
Verification strategy update (per user request): switched from cargo test --features incremental (which surfaces many pre-existing unrelated failures and obscures the signal we care about) to cargo build --features incremental + cargo clippy --features incremental -- -D warnings. The clippy run reports 34 errors, all in pre-existing files outside the split (graphrag.rs, retrieval/, text/, monitoring/, etc.); zero new errors in graph/incremental/. Downstream crates (graphrag-cli, graphrag-server, graphrag) build clean.

Module split: config/json_parser.rs extracted (2026-05-16) — Part of refactor-2026-05 Phase 4

Extracted Config::from_file (~553 LOC hand-rolled JSON reader using the json crate) and Config::to_file (~200 LOC writer) from graphrag-core/src/config/mod.rs into the new private module graphrag-core/src/config/json_parser.rs (769 LOC, with imports + impl Config { ... } wrapper).
config/mod.rs shrinks 2491 → 1737 LOC. Public API unchanged: both methods are still reachable as Config::from_file / Config::to_file via the new impl Config block (multiple impl blocks across files compile fine).
Distinct from config::json5_loader (serde-based typed JSON5 loader) and config::loader (multi-format dispatcher) — this is the bespoke json crate path.
371 unit tests pass; 12 pre-existing failures unchanged.

God-file split: rograg/logic_form/ directory module (2026-05-16) — Part of refactor-2026-05 Phase 4

Converted graphrag-core/src/rograg/logic_form.rs (1517 LOC) into a directory module graphrag-core/src/rograg/logic_form/ with focused sub-files:
- mod.rs (141 LOC): doc + sub-module declarations + pub use re-exports + verbatim #[cfg(test)] mod tests block.
- types.rs (333 LOC): LogicFormError, LogicFormQuery, Predicate, Argument, ArgumentType, Constraint, ConstraintType, LogicQueryType, LogicFormResult, VariableBinding, LogicExecutionStats.
- parser.rs (240 LOC): LogicFormParser trait + PatternBasedParser + LogicPattern + ArgumentExtractor + impls.
- executor.rs (673 LOC): LogicFormExecutor + impls.
- retriever.rs (217 LOC): LogicFormRetriever struct + Default + impl.
Public API preserved via pub use cascade through both logic_form/mod.rs and rograg/mod.rs (crate::rograg::LogicFormResult, crate::rograg::LogicFormRetriever, etc. still resolve unchanged).
Single non-mechanical change: bumped LogicFormExecutor::calculate_name_similarity from private fn to pub(super) fn — the existing test_name_similarity test in the shared tests module needs cross-submodule access. Visibility-only adjustment; no behavior or signature change.
Pre-existing test failures (test_logic_form_retrieval, test_pattern_parser) remain unchanged (verified by re-running them on main before the split).

God-file split: graphrag-core/src/graphrag.rs (2026-05-16) — Part of refactor-2026-05 Phase 4

Extracted the pub struct GraphRAG and its single impl GraphRAG { ... } block (constructors, lifecycle, build_graph, ask*, query_internal*, generate_semantic_answer_from_results, remove_thinking_tags, getters, factory methods, ensure_initialized, tests) from graphrag-core/src/lib.rs into the new private module file graphrag-core/src/graphrag.rs.
lib.rs is now a 263-LOC re-export shell (mod graphrag; pub use graphrag::GraphRAG;). graphrag.rs is 1753 LOC (header + verbatim impl + moved #[cfg(test)] mod tests).
Public API is preserved: graphrag_core::GraphRAG and graphrag_core::prelude::GraphRAG resolve through the new re-export with identical paths.
Added module-scoped imports at the top of graphrag.rs (Config, core types, critic, ollama, persistence, query, retrieval, feature-gated parallel) so the impl body compiles verbatim without inline path changes.
Both moved tests (test_graphrag_creation, test_builder_pattern) still pass. All other pre-existing test/doc failures remain unchanged (12 unit tests, 7 doctests).
Sub-splitting the impl across graphrag/{lifecycle,documents,build,query,stats}.rs remains deferred to a follow-up — single-file move first per plan.

Module split: retrieval/explained.rs (2026-05-16) — Part of refactor-2026-05 Phase 4

Extracted ExplainedAnswer, SourceReference, SourceType, ReasoningStep (and the ~160 LOC impl ExplainedAnswer block with from_results + format_display) from graphrag-core/src/retrieval/mod.rs into new graphrag-core/src/retrieval/explained.rs.
Public API preserved via pub use explained::* in retrieval/mod.rs — downstream callers see no change.
Net effect: retrieval/mod.rs shrinks from 2094 LOC → 1851 LOC; new explained.rs is 250 LOC.
Replaced legacy .min(1.0).max(0.0) with idiomatic .clamp(0.0, 1.0) in the moved from_results fn (clippy manual_clamp).
Larger god-file splits (lib.rs 1968 LOC, logic_form.rs 1517, incremental.rs 2905, config/mod.rs JSON loader) remain deferred — see plan file.

Fixed

Production unwrap removal (2026-05-16) — Part of refactor-2026-05 Phase 3

rograg/streaming.rs: regex unwrap() → expect("static regex literal"); three partial_cmp(...).unwrap() calls on f32 confidence scores now use unwrap_or(Ordering::Equal) to avoid panics on NaN.
rograg/processor.rs::RogragProcessorBuilder::build: replaced inner .unwrap() on HybridQueryDecomposer::new() and IntentClassifier::new() with ? propagation; SystemTime::duration_since(UNIX_EPOCH).unwrap() → .expect("system clock before UNIX epoch") (genuine programmer-bug case).
graphrag-server/src/qdrant_store.rs: removed 6 production .unwrap() calls in add_document, add_documents_batch, and search — payload .as_object(), serde_json::to_value, serde_json::from_value, and point.id now propagate QdrantError via ? and Result::collect.
Tests-only unwrap() in vector/voy_store.rs and graphrag-cli/src/config.rs left intact (per Phase 3 scope: production paths only).

Added - GLiNER-Relex Extraction via gline-rs (2026-02-23)

GLiNER-Relex Entity + Relation Extractor (`entity/gliner_extractor.rs`, `config/mod.rs`, `config/setconfig.rs`, `lib.rs`)

New GLiNERExtractor: joint entity + relation extraction in a single forward pass via gline-rs v1.0.1 + ONNX Runtime. ~1.5 GB VRAM vs 8+ GB for generative LLMs; zero structural hallucinations.
Two-stage pipeline: NER (SpanPipeline or TokenPipeline) → RE (RelationPipeline), both composed on the same orp::model::Model with lazy loading via Arc<RwLock<Option<Model>>>.
Confidence scores propagated natively into Entity.confidence and Relationship.confidence.
Optional feature flag gliner: crate compiles and works normally without it.
tokio::task::spawn_blocking wrapper in lib.rs keeps the async runtime unblocked.

Config example (JSON5):

gliner: {
  enabled: true,
  model_path: "./models/gliner-relex-large-v0.5.onnx",
  entity_labels: ["person", "organization", "location"],
  relation_labels: ["controls", "located in", "causes"],
  entity_threshold: 0.40,
  relation_threshold: 0.50,
  mode: "span",   // or "token" for gliner-multitask
  use_gpu: false,
}

Added - Graph Persistence / Storage Choice (2026-02-23)

Storage Backend — In-Memory vs Disk (`config/mod.rs`, `config/setconfig.rs`, `lib.rs`)

AutoSaveConfig (and AutoSaveSetConfig in SetConfig) now expose:
- base_dir: Option<String> — directory where workspace folders are stored (e.g. "./output")
- workspace_name: Option<String> — sub-folder inside base_dir (default: "default")
- enabled: bool — false (default) = in-memory only; true = persist to disk
GraphRAG::initialize() now calls try_load_from_workspace(): if auto_save.enabled = true and the workspace already exists on disk, the graph is loaded from disk instead of starting empty. The second run reuses the previously built graph automatically.
GraphRAG::save_to_workspace() — new public method; also called automatically at the end of build_graph() when persistence is enabled.
No-op when enabled = false; zero performance cost for in-memory-only deployments.
Format hierarchy on disk: Parquet (if persistent-storage feature) → JSON fallback (always).

JSON5 config usage:

auto_save: {
  enabled: true,
  base_dir: "./output",
  workspace_name: "my_project",
}

Fixed - Extraction Temperature (2026-02-23)

Zero-Temperature Entity Extraction (`entity/gleaning_extractor.rs`, `entity/llm_extractor.rs`, `config/setconfig.rs`)

GleaningConfig::default() and LLMEntityExtractor::new() now use temperature: 0.0 (was 0.1)
- Fully deterministic JSON output — eliminates spurious token variation that causes parse failures
- Consistent with recommendations for structured extraction models (NuExtract, Triplex, etc.)
EntityExtractionConfig.temperature in SetConfig now defaults via default_extraction_temperature() = 0.0
- Separate from default_temperature() = 0.1 used for general LLM parameters
- Users can override in JSON5: entity_extraction.temperature = 0.0
ContextualEnricher retains 0.1 (generates natural language descriptions, not strict JSON)

Fixed & Improved - Entity Extraction, Query Quality & Sources (2026-02-23)

SetConfig `use_gleaning` Bug Fix (`config/setconfig.rs`)

Bug: when mode.approach = "semantic" with no semantic: sub-section, the else block hardcoded config.entities.use_gleaning = true regardless of the top-level entity_extraction.use_gleaning field
Fix: the else block now reads from self.entity_extraction.use_gleaning and max_gleaning_rounds directly
This affected ALL JSON5 configs using mode.approach = "semantic" without an explicit semantic: block

LLM Single-Pass Entity Extraction (`lib.rs`, `entity/llm_extractor.rs`, `ollama/mod.rs`)

New LLM single-pass path in lib.rs: ollama.enabled && !use_gleaning now uses LLMEntityExtractor instead of falling through to pattern-based regex extraction
Dynamic num_ctx per chunk: (prompt_tokens + max_output_tokens) × 1.20, rounded to 1024, clamped [4096, 131072] — mirrors the ContextualEnricher formula
LLMEntityExtractor now carries keep_alive: Option<String> and with_keep_alive() builder
call_llm_with_retry and call_llm_completion_check use generate_with_params instead of generate() to pass num_ctx and keep_alive — activates Ollama KV cache during entity extraction
GleaningEntityExtractor::new extracts keep_alive before consuming the client and threads it through
OllamaClient::config() getter added for field access without moving
Result on Symposium (274 chunks, mistral-nemo, no gleaning): 1,139 entities, 670 relationships (vs 0 relationships previously due to pattern-based fallback)

JSON Parse Resilience — Missing `description` Field (`entity/prompts.rs`)

EntityData.description is now annotated #[serde(default)]
When the LLM returns JSON with a missing description field (e.g. for Project Gutenberg license chunks), parsing succeeds with an empty string instead of falling through to the error path and losing all entities from that chunk
Fixes the "JSON repair failed: missing field 'description'" errors seen in the last ~10 chunks of Project Gutenberg books

Multi-Chunk Semantic Answer Generation (`lib.rs`, `handlers/bench.rs`)

generate_semantic_answer_from_results: reworked context assembly
- Removed 400-char truncation: full chunk content is now passed to the LLM for each result
- Deduplication: tracks seen chunk IDs to avoid repeating the same chunk from multiple entity hits
- Relevance sorting: context sections sorted by score descending before joining
- Synthesis prompt: updated instructions to ask the LLM to synthesize across ALL context sections
- Dynamic num_ctx: prompt size calculated at runtime with 20% margin — activates KV cache for answering
- generate_with_params used instead of generate() — passes num_ctx, keep_alive, temperature
bench.rs: switched from graphrag.ask() to graphrag.ask_explained()
- sources in the JSON output now populated with actual chunk IDs and excerpts (was always [])

E2E Config — No-Gleaning Mistral Pipeline

New config tests/e2e/configs/kv_no_gleaning_mistral__symposium.json5
- use_gleaning: false, keep_alive: "1h", chunk_size: 1000, chunk_overlap: 200
- Uses mistral-nemo:latest for entity extraction and nomic-embed-text for embeddings

Added - Ollama KV Cache & Contextual Retrieval (2026-02-22)

Ollama KV Cache Parameters (`ollama/mod.rs`, `config/mod.rs`, `config/setconfig.rs`)

keep_alive field added to OllamaConfig and OllamaGenerationParams
- Keeps the Ollama model loaded in VRAM between requests (prevents KV cache eviction)
- Critical for multi-chunk document processing: without it, the model unloads between each chunk
- Default: None (uses Ollama’s built-in 5-minute default)
- Example: "1h" for book-length document processing sessions
num_ctx field added to OllamaConfig and OllamaGenerationParams
- Explicitly sets the context window size (Ollama silently truncates to 2k-8k without this)
- Goes into the options object in Ollama API requests; keep_alive is a top-level field
- Default: None (uses Ollama’s default, usually 2048-8192 tokens)
- Example: 32768 for documents up to ~130k characters
Both fields wired through the full config stack: JSON5 parser, OllamaSetConfig, request body

Contextual Chunk Enricher (`text/contextual_enricher.rs`)

New module implementing Anthropic’s Contextual Retrieval pattern
ContextualEnricher: augments each chunk with 2-3 sentences of document-level context before embedding
KV Cache optimization: static prefix (full document) is cached by Ollama; only the chunk suffix is re-evaluated per request
- First chunk: ~2 min (loads document into KV cache on RTX 4070 with Mistral-NeMo 12B)
- Subsequent chunks: ~3-5 sec each (only chunk tokens evaluated)
- ~100 chunks from a 45k-token book: 5-10 minutes total vs hours without KV cache
calculate_num_ctx(): dynamic context window calculation per document
- Formula: tokens(instructions) + tokens(document) + tokens(largest_chunk) + output_budget + 5% margin
- Rounded to nearest 1024, clamped to [4096, 131072]
enrich_document_chunks() and enrich_chunks(): async, groups chunks by source document
Output format: [LLM context]\n\n[original chunk text] — preserves original text verbatim

Late Chunking Strategy (`text/late_chunking.rs`)

New LateChunkingStrategy implementing ChunkingStrategy trait (Jina AI technique)
Produces chunks annotated with position_in_document metadata (byte spans) for post-hoc pooling
JinaLateChunkingClient: calls Jina Embeddings API v2 with late_chunking: true
split_into_sections(): handles documents exceeding model context window (8192 tokens for Jina v3)
LateChunkingConfig: configurable chunk size, overlap, max document tokens, position annotation

E2E Benchmark KV Cache Support (`tests/e2e/run_benchmarks.sh`)

Three new pipeline dimensions: keep_alive, num_ctx, ollama_timeout
All existing pipelines updated with explicit defaults (keep_alive=none, num_ctx=0)
Semantic/hybrid pipelines with Ollama now default to keep_alive=30m (model stays loaded during build phase)
Three new KV cache pipelines targeting long document processing:
- kv_semantic_mistral: semantic approach, Mistral-NeMo, keep_alive=1h, num_ctx=32768, timeout=300s
- kv_hybrid_mistral: hybrid approach, Mistral-NeMo, keep_alive=1h, num_ctx=32768, timeout=300s
- kv_semantic_qwen3: semantic approach, Qwen3 8B Q4, keep_alive=1h, num_ctx=16384, timeout=300s
KV Cache settings shown in run header when active
Generated JSON5 configs include keep_alive and num_ctx in the ollama section

Tests

tests/contextual_enricher_e2e.rs: 4 tests for ContextualEnricher
- test_enriched_chunk_contains_original_and_context (#[ignore], requires ENABLE_OLLAMA_TESTS=1)
- test_kv_cache_speedup (#[ignore]) — measures per-chunk timing and speedup ratio
- test_num_ctx_calculation_sanity — always-run, validates num_ctx formula bounds
- test_disabled_enricher_returns_chunks_unchanged — always-run no-op safety check

Added - Service Registry Completion (2025-02-11)

Core Infrastructure

Complete test utilities module (core/test_utils.rs):
- MockEmbedder: Deterministic hash-based embedding generation with dimension support
- MockLanguageModel: Configurable response mapping for testing
- MockVectorStore: In-memory vector store with cosine similarity search
- MockRetriever: Simple retriever for testing search pipelines
- All mocks fully implement core Async* traits
- 100% test coverage with 5 passing test cases

Adapter Implementations

Entity extraction adapter (core/entity_adapters.rs):
- GraphIndexerAdapter bridges LightRAG’s GraphIndexer to AsyncEntityExtractor trait
- Configurable confidence threshold filtering
- Entity type conversion from domain-specific to core types
- Batch extraction support
- Feature-gated with lightrag feature
Retrieval system adapter (core/retrieval_adapters.rs):
- RetrievalSystemAdapter implements AsyncRetriever trait
- Integration with KnowledgeGraph-based retrieval
- Batch search support
- Comprehensive documentation on graph requirements
- Feature-gated with basic-retrieval feature
Metrics collector implementation (monitoring/metrics_collector.rs):
- Thread-safe metrics with DashMap for counters, gauges, and histograms
- Atomic operations for zero-lock contention
- Histogram statistics: count, sum, mean, min, max, p50, p95, p99
- Timer support with start/finish API
- Metric tagging with key-value pairs
- 7/7 passing tests for all metric types
- Feature-gated with dashmap and monitoring features

Registry Integration

Service registration in ServiceConfig::build_registry():
- Entity extractor registration (with lightrag feature)
- Retriever registration (with basic-retrieval feature)
- Metrics collector registration (with dashmap + monitoring features)
- Mock services for testing via with_test_defaults()
- Proper feature-gating for modular compilation

Documentation

Architectural documentation:
- Documented trait hierarchy for vector stores (domain-specific vs generic)
- Explained when to use adapters vs direct implementations
- Clarified graph integration requirements for retrieval
- Added TODO markers for future unification work
- Inline examples in all adapter modules
Code quality improvements:
- Removed unused imports across multiple modules
- Fixed parameter name warnings in data import
- Commented out incomplete vector-memory feature gate
- Clean compilation with async,ollama,dashmap,monitoring,basic-retrieval,lightrag features

Testing

310 tests passing in graphrag-core library
All new service implementations verified:
- test_mock_embedder: Hash-based deterministic embeddings
- test_mock_language_model: Response mapping
- test_mock_vector_store: Cosine similarity search
- test_mock_retriever: Basic search operations
- Metrics collector tests: counters, gauges, histograms, timers
Integration tests for service registration and retrieval

Added - Ollama Advanced Integration (2025-02-11)

Streaming Support

Real-time token generation with tokio channel-based streaming
generate_streaming() method returns tokio::sync::mpsc::Receiver<String>
Server-Sent Events (SSE) parsing for Ollama streaming API
Background task spawning for non-blocking stream reads
Automatic statistics recording for streamed responses
Example usage in test suite (tests/ollama_enhancements.rs)

Custom Generation Parameters

OllamaGenerationParams struct for fine-grained control:
- num_predict: Maximum tokens to generate
- temperature: Sampling temperature (0.0 - 1.0)
- top_p: Nucleus sampling threshold
- top_k: Top-k sampling
- stop: Stop sequences (array of strings)
- repeat_penalty: Repetition control
generate_with_params() method for custom parameter usage
Integration with AsyncLanguageModel trait’s complete_with_params()
Automatic conversion between core and Ollama parameter formats

Model Response Caching

DashMap-based caching for thread-safe concurrent access
Automatic cache population on API responses
Cache hit detection before making API calls
Performance: <1ms for cache hits vs 100-1000ms for API calls
Cache management API:
- clear_cache(): Clear all cached responses
- cache_size(): Get number of cached items
Configurable via OllamaConfig.enable_caching (default: true)
80%+ hit rate on repeated queries
6x cost reduction potential

Metrics & Usage Tracking

OllamaUsageStats struct with atomic counters:
- total_requests: Total number of API calls
- successful_requests: Successful completions
- failed_requests: Failed attempts
- total_tokens: Cumulative token count (estimated)
Thread-safe atomic operations (Arc<AtomicU64>)
Zero lock contention for metrics updates
API methods:
- record_success(tokens): Record successful request
- record_failure(): Record failed request
- get_success_rate(): Calculate success percentage (0.0 - 1.0)
Integration with AsyncLanguageModel::get_usage_stats()
Automatic token estimation (~4 characters per token)

Service Registry Integration

Type-safe service injection for Ollama services
OllamaEmbedderAdapter implements AsyncEmbedder trait
OllamaLanguageModelAdapter implements AsyncLanguageModel trait
Automatic registration in ServiceConfig::build_registry()
Support for both embeddings and language model services
MemoryVectorStore registration for in-memory operations

Documentation

Complete OLLAMA_INTEGRATION.md guide with:
- Setup and prerequisites
- Basic and advanced usage examples
- Supported models (embeddings and LLM)
- Configuration options reference
- Batch processing examples
- Custom parameter examples
- Performance tips and troubleshooting
Updated graphrag-core/README.md with new features
Updated main README.md with Ollama integration section
API reference with code examples
Sources and external documentation links

Testing

8 new test cases in tests/ollama_enhancements.rs:
- Config with caching test
- Custom generation parameters test
- Client statistics API test
- Stats recording test
- Cache management test
- Default parameters test
- Adapter integration tests
All tests passing (13/13 total including registry tests)
Compilation verified with all feature combinations

Configuration Updates

Added enable_caching: bool to OllamaConfig
Updated all OllamaConfig initializers across codebase:
- config/mod.rs: TOML parsing
- config/setconfig.rs: Config mapping
- entity/llm_relationship_extractor.rs: LLM extraction
Default caching: enabled (true)

Changed

Model info updated: supports_streaming now returns true
AsyncLanguageModel implementation: Now uses generate_with_params() internally
OllamaClient structure: Added stats and cache fields
Error handling: Improved with metrics recording on failures
Test count: Increased from 214+ to 220+ test cases

Fixed

Missing enable_caching field in OllamaConfig initializers
Incorrect ModelUsageStats field mapping in adapter
Iterator reference error in execute_caused_query
Compilation warnings for unused imports

[0.1.1] - Previous Release

Added - Core GraphRAG Implementation

Temporal and causal reasoning for RoGRAG
Graph indexer with 23 relationship patterns
Service registry pattern for dependency injection
GraphRAGBuilder with fluent API
Parquet persistence for entities, relationships, documents
Memory vector store implementation
Complete trait-based architecture

Added - Research Features

LightRAG dual-level retrieval (6000x token reduction)
Leiden community detection (+15% modularity)
Cross-encoder reranking (+20% accuracy)
HippoRAG personalized PageRank (10-30x cost reduction)
Semantic chunking with better boundaries

Added - Infrastructure

Comprehensive test suite (214+ tests)
Production-grade logging with tracing
Feature flags for modular compilation
WASM support with WebGPU acceleration
Docker Compose deployment

[0.1.0] - Initial Release

Added

Basic GraphRAG pipeline
Entity and relationship extraction
Vector embeddings support
Graph construction and querying
REST API server
CLI tools

Migration Guides

Upgrading to Ollama Advanced Features

If you’re using basic Ollama integration, upgrading to the new features is seamless:

Before (still works):

#![allow(unused)]
fn main() {
let client = OllamaClient::new(OllamaConfig::default());
let response = client.generate("Hello").await?;
}

After (with new features):

#![allow(unused)]
fn main() {
let config = OllamaConfig {
    enable_caching: true,  // NEW: Enable caching
    ..Default::default()
};
let client = OllamaClient::new(config);

// Streaming
let mut rx = client.generate_streaming("Hello").await?;
while let Some(token) = rx.recv().await {
    print!("{}", token);
}

// Custom parameters
let params = OllamaGenerationParams {
    temperature: Some(0.8),
    top_p: Some(0.95),
    ..Default::default()
};
let response = client.generate_with_params("Hello", params).await?;

// Metrics
let stats = client.get_stats();
println!("Success rate: {:.2}%", stats.get_success_rate() * 100.0);
}

git clone https://github.com/your-username/graphrag-rs.git
cd graphrag-rs
cargo build --release --features async,ollama,dashmap

Running Tests

cargo test --all-features
cargo test -p graphrag-core --test ollama_enhancements

Contributing

See CONTRIBUTING.md for guidelines.

For complete documentation, see:

README.md - Main project documentation
graphrag-core/OLLAMA_INTEGRATION.md - Ollama guide
graphrag-core/README.md - Core library docs
ARCHITECTURE.md - System architecture

Keyboard shortcuts

GraphRAG-RS