Link generation
brain link proposes links automatically from two signals:
- Tag overlap — documents sharing 2+ tags are candidate links.
- Semantic similarity — documents with cosine similarity above threshold are candidate links.
brain_links table with a link type and a confidence score, and also materialized back into the markdown files as wiki-style [[document]] references (opt-in via --apply). Obsidian users get free bidirectional navigation; git users get a diff of which links were added.
Link generation is the single biggest driver of retrieval quality over time. A 500-doc vault with dense linking beats a 5,000-doc vault with no graph structure.
Link boost in retrieval
At query time, the retrieval pipeline runs a link-boost pass over the top-N results. For each hit, it uplifts any linked document’s score by +0.15, capped at +0.45 total per document. This means a weak match that’s linked to a strong match gets pulled into the result set — mimicking how a human researcher follows citations. See Retrieval for the scoring detail.Graph analysis
brain graph reports the structural health of the graph:
- Orphans — documents with zero in-links. Candidates for deletion or re-linking.
- Connected components — isolated subgraphs. Often indicates unrelated topics should be split into separate brains.
- Link-degree distribution — hub-and-spoke vs. mesh structure.
- Authority scores — PageRank-style scoring over the link graph.
Deduplication
brain dedup finds near-duplicate documents — the copy-paste-from-Slack pattern that silently bloats vector stores:
- Groups of documents with ≥85% semantic similarity.
- For each group: canonical pick (most-linked, highest-authority), and the others flagged for merge or deletion.
- Audit trail written to
_reflections/dedup-<date>.mdso you can review before applying.
Health checks
brain health lints the vault and reports structural issues:
- Missing embeddings (content changed but not re-embedded).
- Stale chunks (chunk config changed since ingest).
- Broken
[[links]](target doc removed or renamed). - Frontmatter contract violations (e.g., document declares
confidence: highbut has no sources). - Empty or under-populated domains.
brain health --fix to apply the auto-fixable subset (re-embed, re-chunk, repair obvious link typos).
Quality scoring
brain score returns a 4-dimension quality grade (A–F):
| Dimension | What it measures |
|---|---|
| Content | Coverage (domain breadth), depth (avg. doc length, citations), freshness (staleness of top docs) |
| Structure | Link density, orphan percentage, domain balance |
| Retrieval | Sweep-grade against the seeded question set — does search actually find the right docs? |
| Hygiene | Dedup debt, health-check pass rate, frontmatter-contract compliance |
B- on retrieval but A on content is telling you: “your content is great, but your retrieval config doesn’t find it.” Run brain sweep next.
Decisions
brain decisions surfaces, edits, and applies strategy decisions recorded over time:
_decisions/ — auditable, diff-able, git-commitable. brain auto-kb writes to this directory automatically; you can edit the files by hand and brain decisions apply to re-materialize the config.
Forgetting
brain forget removes documents with a full audit trail:
- Removed from
brain_documents,brain_chunks, andbrain_links. - Logged to
_reflections/forget-<date>.mdwith the reason and the content hash (so you can re-ingest if you change your mind). - Their outbound links re-scored against remaining docs.
A weekly graph-ops loop
For a brain that’s used actively but not viaauto-kb:
Monday: brain health + fix auto-fixable
brain health --fix --brain my-brain — keeps embeddings and chunks current.Wednesday: link pass
brain link --brain my-brain --apply — new content accumulated since last link run gets integrated.Friday: dedup + forget review
brain dedup --brain my-brain --threshold 0.85 — read the audit, apply the merges you agree with.brain auto-kb --rounds 1 once a week instead.
What’s next
Closed-loop intelligence
The one-command autonomous version of everything on this page.
Retrieval
How the graph operations above lift retrieval quality at query time.