Vault Search Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Build V1 of the local SQLite + FTS5 retrieval system for this Obsidian vault.

Architecture: Implement a small importable Python package under code_scripts/vault_search/, with wrapper scripts and tests under code-scripts/. Markdown files are parsed into document records, written to a disposable SQLite database under tmp/, then queried by vault-search and vault-health commands. Markdown remains the source of truth; the database is rebuildable derived state.

Tech Stack: Python 3 standard library, SQLite FTS5, unittest, argparse, Markdown/frontmatter parsing with focused local helpers.


File Structure

  • Create code_scripts/__init__.py: import bridge for local tools.
  • Create code_scripts/vault_search/__init__.py: package marker and version.
  • Create code_scripts/vault_search/models.py: dataclasses for documents, links, headings, index summaries, and search results.
  • Create code_scripts/vault_search/parser.py: Markdown/frontmatter parser, title extraction, heading extraction, tag/alias normalization, link extraction.
  • Create code_scripts/vault_search/discovery.py: vault file discovery and area detection.
  • Create code_scripts/vault_search/database.py: SQLite schema creation, full rebuild write path, FTS queries, health queries.
  • Create code_scripts/vault_search/indexer.py: orchestration from discovered files to parsed documents to database.
  • Create code_scripts/vault_search/cli.py: vault-index, vault-search, and vault-health command implementations.
  • Create code-scripts/vault-search.py: executable dispatcher script for convenient local use.
  • Create code-scripts/vault-index.py: wrapper for vault-index.
  • Create code-scripts/vault-health.py: wrapper for vault-health.
  • Create code-scripts/vault_search/tests/fixtures/sample_vault/: fixture vault used by tests.
  • Create code-scripts/vault_search/tests/test_parser.py: parser unit tests.
  • Create code-scripts/vault_search/tests/test_discovery.py: discovery unit tests.
  • Create code-scripts/vault_search/tests/test_database.py: database/search/health tests.
  • Create code-scripts/vault_search/tests/test_cli.py: CLI JSON and command behavior tests.
  • Create code-scripts/vault_search/README.md: short usage notes for the local tools.

Use no third-party dependency in V1. The frontmatter parser will support the YAML subset already used in this vault: key: value, inline arrays such as tags: [学习, java], and simple list values.

Task 1: Create Models And Parser Tests

Files:

  • Create: code_scripts/__init__.py

  • Create: code_scripts/vault_search/__init__.py

  • Create: code_scripts/vault_search/models.py

  • Create: code_scripts/vault_search/parser.py

  • Create: code-scripts/vault_search/tests/test_parser.py

  • Step 1: Write parser tests

Create code-scripts/vault_search/tests/test_parser.py:

import unittest
from pathlib import Path
 
from code_scripts.vault_search.parser import parse_markdown
 
 
class ParserTests(unittest.TestCase):
    def test_parse_frontmatter_inline_tags_aliases_title_and_body(self):
        text = """---
tags: [学习, java, 状态/进行中]
aliases: [Java泛型学习, 泛型]
---
 
# Java 泛型
 
正文包含 List<T> 和类型擦除。
[语法](../java-basic/java-grammar.md)
"""
 
        doc = parse_markdown(Path("IT-learning/java-basic/generic.md"), text)
 
        self.assertEqual(doc.path, "IT-learning/java-basic/generic.md")
        self.assertEqual(doc.area, "IT-learning")
        self.assertEqual(doc.title, "Java 泛型")
        self.assertTrue(doc.has_frontmatter)
        self.assertTrue(doc.has_tags)
        self.assertEqual(doc.tags, ["学习", "java", "状态/进行中"])
        self.assertEqual(doc.aliases, ["Java泛型学习", "泛型"])
        self.assertIn("正文包含", doc.content)
        self.assertEqual(len(doc.links), 1)
        self.assertEqual(doc.links[0].link_type, "markdown")
        self.assertEqual(doc.links[0].target_raw, "../java-basic/java-grammar.md")
 
    def test_parse_yaml_list_tags_and_wikilinks(self):
        text = """---
tags:
  - 学习
  - network
aliases:
  - 网络基础
---
 
# 计算机网络
 
- [[TCP 三次握手]]
- [[TCP 滑动窗口|滑动窗口]]
"""
 
        doc = parse_markdown(Path("IT-learning/computer-level/network.md"), text)
 
        self.assertEqual(doc.tags, ["学习", "network"])
        self.assertEqual(doc.aliases, ["网络基础"])
        self.assertEqual(doc.title, "计算机网络")
        self.assertEqual([link.target_raw for link in doc.links], ["TCP 三次握手", "TCP 滑动窗口"])
        self.assertEqual([link.link_text for link in doc.links], ["TCP 三次握手", "滑动窗口"])
        self.assertTrue(all(link.link_type == "wikilink" for link in doc.links))
 
    def test_missing_frontmatter_uses_filename_title(self):
        doc = parse_markdown(Path("task.md"), "Plain note body")
 
        self.assertEqual(doc.title, "task")
        self.assertFalse(doc.has_frontmatter)
        self.assertFalse(doc.has_tags)
        self.assertEqual(doc.tags, [])
        self.assertEqual(doc.content, "Plain note body")
 
 
if __name__ == "__main__":
    unittest.main()
  • Step 2: Run parser tests and verify they fail

Run:

python3 -m unittest code-scripts/vault_search/tests/test_parser.py -v

Expected: FAIL or ERROR because code_scripts.vault_search.parser does not exist yet.

  • Step 3: Create package alias for importable code

Because the existing directory is named code-scripts with a hyphen, create a Python package path that tests can import. Create code_scripts/__init__.py:

"""Import bridge for local code scripts."""

Create code_scripts/vault_search/__init__.py:

"""Vault search package."""
 
__version__ = "0.1.0"

Use code_scripts/vault_search/ as the actual Python package path, while wrapper scripts can live in code-scripts/.

  • Step 4: Implement models

Create code_scripts/vault_search/models.py:

from dataclasses import dataclass, field
 
 
@dataclass(frozen=True)
class Link:
    link_text: str
    target_raw: str
    link_type: str
    target_path: str | None = None
    target_exists: bool = False
 
 
@dataclass(frozen=True)
class Heading:
    level: int
    text: str
    anchor: str
 
 
@dataclass
class Document:
    path: str
    area: str
    title: str
    content: str
    tags: list[str] = field(default_factory=list)
    aliases: list[str] = field(default_factory=list)
    frontmatter: dict[str, object] = field(default_factory=dict)
    headings: list[Heading] = field(default_factory=list)
    links: list[Link] = field(default_factory=list)
    has_frontmatter: bool = False
    has_tags: bool = False
    mtime: str | None = None
    ctime: str | None = None
    content_hash: str = ""
  • Step 5: Implement parser

Create code_scripts/vault_search/parser.py:

from __future__ import annotations
 
import hashlib
import re
from pathlib import Path
 
from .models import Document, Heading, Link
 
FRONTMATTER_RE = re.compile(r"\\A---\\n(?P<body>.*?)\\n---\\n?", re.DOTALL)
HEADING_RE = re.compile(r"^(#{1,6})\\s+(.+?)\\s*$", re.MULTILINE)
MARKDOWN_LINK_RE = re.compile(r"\\[([^\\]]+)\\]\\(([^)]+?\\.md(?:#[^)]+)?)\\)")
WIKILINK_RE = re.compile(r"\\[\\[([^\\]|]+)(?:\\|([^\\]]+))?\\]\\]")
 
 
def parse_markdown(path: Path, text: str) -> Document:
    rel_path = path.as_posix()
    area = rel_path.split("/", 1)[0] if "/" in rel_path else "root"
    frontmatter, body, has_frontmatter = _split_frontmatter(text)
    tags = _list_value(frontmatter.get("tags"))
    aliases = _list_value(frontmatter.get("aliases"))
    headings = _extract_headings(body)
    title = _select_title(frontmatter, headings, path)
    links = _extract_links(body)
    content = body.strip()
 
    return Document(
        path=rel_path,
        area=area,
        title=title,
        content=content,
        tags=tags,
        aliases=aliases,
        frontmatter=frontmatter,
        headings=headings,
        links=links,
        has_frontmatter=has_frontmatter,
        has_tags=bool(tags),
        content_hash=hashlib.sha256(text.encode("utf-8")).hexdigest(),
    )
 
 
def _split_frontmatter(text: str) -> tuple[dict[str, object], str, bool]:
    match = FRONTMATTER_RE.match(text)
    if not match:
        return {}, text, False
    raw = match.group("body")
    body = text[match.end():]
    return _parse_frontmatter(raw), body, True
 
 
def _parse_frontmatter(raw: str) -> dict[str, object]:
    data: dict[str, object] = {}
    current_key: str | None = None
    for line in raw.splitlines():
        if not line.strip():
            continue
        if line.startswith("  - ") and current_key:
            value = line[4:].strip()
            existing = data.setdefault(current_key, [])
            if isinstance(existing, list):
                existing.append(value)
            continue
        if ":" not in line:
            continue
        key, value = line.split(":", 1)
        key = key.strip()
        value = value.strip()
        current_key = key
        if value == "":
            data[key] = []
        elif value.startswith("[") and value.endswith("]"):
            data[key] = [item.strip().strip('"').strip("'") for item in value[1:-1].split(",") if item.strip()]
        else:
            data[key] = value.strip('"').strip("'")
    return data
 
 
def _list_value(value: object) -> list[str]:
    if isinstance(value, list):
        return [str(item).strip() for item in value if str(item).strip()]
    if isinstance(value, str) and value.strip():
        return [value.strip()]
    return []
 
 
def _extract_headings(body: str) -> list[Heading]:
    headings: list[Heading] = []
    for match in HEADING_RE.finditer(body):
        text = match.group(2).strip()
        headings.append(Heading(level=len(match.group(1)), text=text, anchor=_anchor(text)))
    return headings
 
 
def _select_title(frontmatter: dict[str, object], headings: list[Heading], path: Path) -> str:
    for heading in headings:
        if heading.level == 1:
            return heading.text
    title = frontmatter.get("title")
    if isinstance(title, str) and title.strip():
        return title.strip()
    return path.stem
 
 
def _extract_links(body: str) -> list[Link]:
    links: list[Link] = []
    for match in MARKDOWN_LINK_RE.finditer(body):
        links.append(Link(link_text=match.group(1), target_raw=match.group(2), link_type="markdown"))
    for match in WIKILINK_RE.finditer(body):
        target = match.group(1).strip()
        text = (match.group(2) or target).strip()
        links.append(Link(link_text=text, target_raw=target, link_type="wikilink"))
    return links
 
 
def _anchor(text: str) -> str:
    return re.sub(r"\\s+", "-", text.strip().lower())
  • Step 6: Run parser tests and verify they pass

Run:

python3 -m unittest code-scripts/vault_search/tests/test_parser.py -v

Expected: PASS with 3 tests.

  • Step 7: Commit Task 1

Run:

git add code_scripts code-scripts/vault_search/tests/test_parser.py
git commit -m "feat: add vault markdown parser"

Task 2: Add Vault Discovery

Files:

  • Create: code_scripts/vault_search/discovery.py

  • Create: code-scripts/vault_search/tests/fixtures/sample_vault/

  • Create: code-scripts/vault_search/tests/test_discovery.py

  • Step 1: Create discovery tests and fixtures

Create fixture files:

code-scripts/vault_search/tests/fixtures/sample_vault/README.md
code-scripts/vault_search/tests/fixtures/sample_vault/IT-learning/java-basic/java.md
code-scripts/vault_search/tests/fixtures/sample_vault/wiki/INDEX.md
code-scripts/vault_search/tests/fixtures/sample_vault/tmp/ignored.md
code-scripts/vault_search/tests/fixtures/sample_vault/.obsidian/ignored.md

Create code-scripts/vault_search/tests/test_discovery.py:

import unittest
from pathlib import Path
 
from code_scripts.vault_search.discovery import discover_markdown_files, path_area
 
 
FIXTURE = Path("code-scripts/vault_search/tests/fixtures/sample_vault")
 
 
class DiscoveryTests(unittest.TestCase):
    def test_discover_markdown_files_respects_exclusions(self):
        files = [path.as_posix() for path in discover_markdown_files(FIXTURE)]
 
        self.assertIn("README.md", files)
        self.assertIn("IT-learning/java-basic/java.md", files)
        self.assertIn("wiki/INDEX.md", files)
        self.assertNotIn("tmp/ignored.md", files)
        self.assertNotIn(".obsidian/ignored.md", files)
 
    def test_path_area(self):
        self.assertEqual(path_area(Path("README.md")), "root")
        self.assertEqual(path_area(Path("IT-learning/java-basic/java.md")), "IT-learning")
 
 
if __name__ == "__main__":
    unittest.main()
  • Step 2: Run discovery tests and verify they fail

Run:

python3 -m unittest code-scripts/vault_search/tests/test_discovery.py -v

Expected: ERROR because discovery.py does not exist.

  • Step 3: Implement discovery

Create code_scripts/vault_search/discovery.py:

from __future__ import annotations
 
from pathlib import Path
 
DEFAULT_EXCLUDES = {".git", ".obsidian", "node_modules", "tmp", ".trash"}
 
 
def discover_markdown_files(root: Path, excludes: set[str] | None = None) -> list[Path]:
    excludes = excludes or DEFAULT_EXCLUDES
    root = root.resolve()
    results: list[Path] = []
    for path in root.rglob("*.md"):
        rel = path.relative_to(root)
        if any(part.startswith(".") or part in excludes for part in rel.parts):
            continue
        results.append(rel)
    return sorted(results, key=lambda item: item.as_posix())
 
 
def path_area(path: Path) -> str:
    return path.parts[0] if len(path.parts) > 1 else "root"
  • Step 4: Create fixture files

Use mkdir -p for fixture directories. Create the files with these contents:

code-scripts/vault_search/tests/fixtures/sample_vault/README.md:

# Fixture Vault

code-scripts/vault_search/tests/fixtures/sample_vault/IT-learning/java-basic/java.md:

---
tags: [学习, java]
---
 
# Java

code-scripts/vault_search/tests/fixtures/sample_vault/wiki/INDEX.md:

# Wiki Index

code-scripts/vault_search/tests/fixtures/sample_vault/tmp/ignored.md:

# Ignored

code-scripts/vault_search/tests/fixtures/sample_vault/.obsidian/ignored.md:

# Ignored
  • Step 5: Run discovery tests and verify they pass

Run:

python3 -m unittest code-scripts/vault_search/tests/test_discovery.py -v

Expected: PASS with 2 tests.

  • Step 6: Commit Task 2

Run:

git add code_scripts/vault_search/discovery.py code-scripts/vault_search/tests/test_discovery.py code-scripts/vault_search/tests/fixtures/sample_vault
git commit -m "feat: discover vault markdown files"

Task 3: Add SQLite Schema And Index Writer

Files:

  • Create: code_scripts/vault_search/database.py

  • Create: code-scripts/vault_search/tests/test_database.py

  • Step 1: Write database indexing tests

Create code-scripts/vault_search/tests/test_database.py:

import sqlite3
import tempfile
import unittest
from pathlib import Path
 
from code_scripts.vault_search.database import rebuild_database, search_documents
from code_scripts.vault_search.parser import parse_markdown
 
 
class DatabaseTests(unittest.TestCase):
    def test_rebuild_database_writes_documents_tags_links_and_fts(self):
        doc = parse_markdown(
            Path("wiki/ssl-certificate.md"),
            """---
tags: [知识总结, network, 状态/已完结]
aliases: [SSL]
---
 
# SSL 证书
 
SSL 证书用于验证服务器身份。
[相关](llm-wiki-compiler.md)
""",
        )
 
        with tempfile.TemporaryDirectory() as tmp:
            db_path = Path(tmp) / "vault-search.sqlite"
            rebuild_database(db_path, [doc])
 
            conn = sqlite3.connect(db_path)
            self.assertEqual(conn.execute("select count(*) from documents").fetchone()[0], 1)
            self.assertEqual(conn.execute("select count(*) from tags").fetchone()[0], 3)
            self.assertEqual(conn.execute("select count(*) from aliases").fetchone()[0], 1)
            self.assertEqual(conn.execute("select count(*) from links").fetchone()[0], 1)
 
            results = search_documents(db_path, query="SSL", limit=5)
            self.assertEqual(results[0]["path"], "wiki/ssl-certificate.md")
            self.assertEqual(results[0]["title"], "SSL 证书")
 
 
if __name__ == "__main__":
    unittest.main()
  • Step 2: Run database tests and verify they fail

Run:

python3 -m unittest code-scripts/vault_search/tests/test_database.py -v

Expected: ERROR because database.py does not exist.

  • Step 3: Implement database schema and search

Create code_scripts/vault_search/database.py:

from __future__ import annotations
 
import json
import sqlite3
from pathlib import Path
 
from .models import Document
 
 
SCHEMA = """
drop table if exists documents;
drop table if exists tags;
drop table if exists aliases;
drop table if exists frontmatter;
drop table if exists links;
drop table if exists headings;
drop table if exists document_fts;
 
create table documents (
    id integer primary key,
    path text unique not null,
    area text not null,
    title text not null,
    content text not null,
    mtime text,
    ctime text,
    has_frontmatter integer not null,
    has_tags integer not null,
    content_hash text not null
);
 
create virtual table document_fts using fts5(
    title,
    content,
    content='documents',
    content_rowid='id'
);
 
create table tags (
    document_id integer not null,
    tag text not null
);
 
create table aliases (
    document_id integer not null,
    alias text not null
);
 
create table frontmatter (
    document_id integer not null,
    key text not null,
    value_json text not null
);
 
create table links (
    id integer primary key,
    source_document_id integer not null,
    link_text text,
    target_raw text not null,
    target_path text,
    link_type text not null,
    target_exists integer not null
);
 
create table headings (
    document_id integer not null,
    level integer not null,
    text text not null,
    anchor text
);
"""
 
 
def rebuild_database(db_path: Path, documents: list[Document]) -> None:
    db_path.parent.mkdir(parents=True, exist_ok=True)
    conn = sqlite3.connect(db_path)
    try:
        conn.executescript(SCHEMA)
        for doc in documents:
            cursor = conn.execute(
                """
                insert into documents
                (path, area, title, content, mtime, ctime, has_frontmatter, has_tags, content_hash)
                values (?, ?, ?, ?, ?, ?, ?, ?, ?)
                """,
                (
                    doc.path,
                    doc.area,
                    doc.title,
                    doc.content,
                    doc.mtime,
                    doc.ctime,
                    int(doc.has_frontmatter),
                    int(doc.has_tags),
                    doc.content_hash,
                ),
            )
            document_id = cursor.lastrowid
            conn.execute("insert into document_fts(rowid, title, content) values (?, ?, ?)", (document_id, doc.title, doc.content))
            conn.executemany("insert into tags(document_id, tag) values (?, ?)", [(document_id, tag) for tag in doc.tags])
            conn.executemany("insert into aliases(document_id, alias) values (?, ?)", [(document_id, alias) for alias in doc.aliases])
            conn.executemany(
                "insert into frontmatter(document_id, key, value_json) values (?, ?, ?)",
                [(document_id, key, json.dumps(value, ensure_ascii=False)) for key, value in doc.frontmatter.items()],
            )
            conn.executemany(
                """
                insert into links(source_document_id, link_text, target_raw, target_path, link_type, target_exists)
                values (?, ?, ?, ?, ?, ?)
                """,
                [(document_id, link.link_text, link.target_raw, link.target_path, link.link_type, int(link.target_exists)) for link in doc.links],
            )
            conn.executemany(
                "insert into headings(document_id, level, text, anchor) values (?, ?, ?, ?)",
                [(document_id, heading.level, heading.text, heading.anchor) for heading in doc.headings],
            )
        conn.commit()
    finally:
        conn.close()
 
 
def search_documents(db_path: Path, query: str, limit: int = 10, area: str | None = None, tags: list[str] | None = None) -> list[dict[str, object]]:
    conn = sqlite3.connect(db_path)
    conn.row_factory = sqlite3.Row
    try:
        params: list[object] = [query]
        where = ["document_fts match ?"]
        joins = ["join document_fts on document_fts.rowid = documents.id"]
        if area:
            where.append("documents.area = ?")
            params.append(area)
        if tags:
            for tag in tags:
                joins.append("join tags as t%s on t%s.document_id = documents.id and t%s.tag = ?" % (len(params), len(params), len(params)))
                params.append(tag)
        params.append(limit)
        sql = f"""
            select documents.path, documents.title, documents.area, bm25(document_fts) as score,
                   snippet(document_fts, 1, '[', ']', '...', 16) as snippet
            from documents
            {' '.join(joins)}
            where {' and '.join(where)}
            order by score
            limit ?
        """
        rows = conn.execute(sql, params).fetchall()
        return [dict(row) | {"tags": _tags_for_document(conn, row["path"])} for row in rows]
    finally:
        conn.close()
 
 
def _tags_for_document(conn: sqlite3.Connection, path: str) -> list[str]:
    rows = conn.execute(
        """
        select tags.tag
        from tags
        join documents on documents.id = tags.document_id
        where documents.path = ?
        order by tags.tag
        """,
        (path,),
    ).fetchall()
    return [row[0] for row in rows]
  • Step 4: Run database tests and verify they pass

Run:

python3 -m unittest code-scripts/vault_search/tests/test_database.py -v

Expected: PASS with 1 test.

  • Step 5: Commit Task 3

Run:

git add code_scripts/vault_search/database.py code-scripts/vault_search/tests/test_database.py
git commit -m "feat: write vault search sqlite index"

Files:

  • Create: code_scripts/vault_search/indexer.py

  • Modify: code_scripts/vault_search/parser.py

  • Modify: code-scripts/vault_search/tests/test_database.py

  • Step 1: Add indexer test for link resolution and summary

Append to code-scripts/vault_search/tests/test_database.py:

from code_scripts.vault_search.indexer import build_index
 
 
class IndexerTests(unittest.TestCase):
    def test_build_index_resolves_markdown_links_and_reports_summary(self):
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            (root / "wiki").mkdir()
            (root / "wiki" / "ssl.md").write_text("# SSL\n[Compiler](compiler.md)\n[[Missing Note]]", encoding="utf-8")
            (root / "wiki" / "compiler.md").write_text("# Compiler\n", encoding="utf-8")
            db_path = root / "tmp" / "vault-search.sqlite"
 
            summary = build_index(root, db_path)
 
            self.assertEqual(summary["documents"], 2)
            self.assertEqual(summary["links"], 2)
            self.assertEqual(summary["broken_links"], 1)
 
            conn = sqlite3.connect(db_path)
            rows = conn.execute("select target_raw, target_path, target_exists from links order by target_raw").fetchall()
            self.assertEqual(rows[0], ("Missing Note", None, 0))
            self.assertEqual(rows[1], ("compiler.md", "wiki/compiler.md", 1))
  • Step 2: Run indexer test and verify it fails

Run:

python3 -m unittest code-scripts/vault_search/tests/test_database.py -v

Expected: ERROR because indexer.py does not exist.

  • Step 3: Implement indexer and link resolution

Create code_scripts/vault_search/indexer.py:

from __future__ import annotations
 
from dataclasses import replace
from datetime import datetime, timezone
from pathlib import Path
 
from .database import rebuild_database
from .discovery import discover_markdown_files
from .models import Document, Link
from .parser import parse_markdown
 
 
def build_index(root: Path, db_path: Path) -> dict[str, int]:
    root = root.resolve()
    rel_paths = discover_markdown_files(root)
    documents: list[Document] = []
    all_paths = {path.as_posix() for path in rel_paths}
    title_index: dict[str, str] = {}
 
    for rel_path in rel_paths:
        full_path = root / rel_path
        text = full_path.read_text(encoding="utf-8")
        doc = parse_markdown(rel_path, text)
        stat = full_path.stat()
        doc.mtime = _iso(stat.st_mtime)
        doc.ctime = _iso(stat.st_ctime)
        documents.append(doc)
        title_index[doc.title] = doc.path
        title_index[Path(doc.path).stem] = doc.path
 
    resolved = [_resolve_document_links(doc, all_paths, title_index) for doc in documents]
    rebuild_database(db_path, resolved)
 
    links = sum(len(doc.links) for doc in resolved)
    broken_links = sum(1 for doc in resolved for link in doc.links if not link.target_exists)
    missing_tags = sum(1 for doc in resolved if not doc.has_tags)
    return {
        "documents": len(resolved),
        "links": links,
        "broken_links": broken_links,
        "missing_tags": missing_tags,
        "wikilinks": sum(1 for doc in resolved for link in doc.links if link.link_type == "wikilink"),
    }
 
 
def _resolve_document_links(doc: Document, all_paths: set[str], title_index: dict[str, str]) -> Document:
    resolved_links: list[Link] = []
    source_dir = Path(doc.path).parent
    for link in doc.links:
        if link.link_type == "markdown":
            raw_path = link.target_raw.split("#", 1)[0]
            target = (source_dir / raw_path).as_posix()
            normalized = Path(target).as_posix()
            exists = normalized in all_paths
            resolved_links.append(replace(link, target_path=normalized if exists else normalized, target_exists=exists))
        else:
            target_path = title_index.get(link.target_raw)
            resolved_links.append(replace(link, target_path=target_path, target_exists=bool(target_path)))
    doc.links = resolved_links
    return doc
 
 
def _iso(timestamp: float) -> str:
    return datetime.fromtimestamp(timestamp, timezone.utc).isoformat()
  • Step 4: Run database/indexer tests and verify they pass

Run:

python3 -m unittest code-scripts/vault_search/tests/test_database.py -v

Expected: PASS with 2 tests.

  • Step 5: Commit Task 4

Run:

git add code_scripts/vault_search/indexer.py code-scripts/vault_search/tests/test_database.py
git commit -m "feat: build vault search index"

Task 5: Add Search And Health CLI

Files:

  • Create: code_scripts/vault_search/cli.py

  • Create: code-scripts/vault-search.py

  • Create: code-scripts/vault-index.py

  • Create: code-scripts/vault-health.py

  • Create: code-scripts/vault_search/tests/test_cli.py

  • Modify: code_scripts/vault_search/database.py

  • Step 1: Write CLI tests

Create code-scripts/vault_search/tests/test_cli.py:

import json
import tempfile
import unittest
from pathlib import Path
 
from code_scripts.vault_search.cli import main
 
 
class CliTests(unittest.TestCase):
    def test_index_search_and_health_json(self):
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            (root / "wiki").mkdir()
            (root / "wiki" / "ssl.md").write_text("---\ntags: [知识总结, network]\n---\n# SSL 证书\nSSL 证书用于 HTTPS。\n", encoding="utf-8")
            (root / "note.md").write_text("# Untagged\n[[Missing]]\n", encoding="utf-8")
            db_path = root / "tmp" / "vault-search.sqlite"
 
            index_code = main(["vault-index", "--root", str(root), "--db", str(db_path)])
            self.assertEqual(index_code, 0)
 
            search_output = main(["vault-search", "SSL", "--db", str(db_path), "--json"], capture=True)
            payload = json.loads(search_output)
            self.assertEqual(payload["results"][0]["path"], "wiki/ssl.md")
 
            health_output = main(["vault-health", "--db", str(db_path), "--json"], capture=True)
            health = json.loads(health_output)
            self.assertEqual(health["summary"]["missing_tags"], 1)
            self.assertEqual(health["summary"]["wikilinks"], 1)
 
 
if __name__ == "__main__":
    unittest.main()
  • Step 2: Run CLI tests and verify they fail

Run:

python3 -m unittest code-scripts/vault_search/tests/test_cli.py -v

Expected: ERROR because cli.py does not exist.

  • Step 3: Add health query helpers

Append to code_scripts/vault_search/database.py:

 
def health_summary(db_path: Path) -> dict[str, int]:
    conn = sqlite3.connect(db_path)
    try:
        return {
            "documents": conn.execute("select count(*) from documents").fetchone()[0],
            "missing_frontmatter": conn.execute("select count(*) from documents where has_frontmatter = 0").fetchone()[0],
            "missing_tags": conn.execute("select count(*) from documents where has_tags = 0").fetchone()[0],
            "broken_links": conn.execute("select count(*) from links where target_exists = 0 and link_type = 'markdown'").fetchone()[0],
            "wikilinks": conn.execute("select count(*) from links where link_type = 'wikilink'").fetchone()[0],
        }
    finally:
        conn.close()
  • Step 4: Implement CLI

Create code_scripts/vault_search/cli.py:

from __future__ import annotations
 
import argparse
import json
import sys
from pathlib import Path
 
from .database import health_summary, search_documents
from .indexer import build_index
 
DEFAULT_DB = Path("tmp/vault-search.sqlite")
 
 
def main(argv: list[str] | None = None, capture: bool = False):
    argv = list(sys.argv[1:] if argv is None else argv)
    parser = argparse.ArgumentParser(prog="vault-search-tools")
    subparsers = parser.add_subparsers(dest="command", required=True)
 
    index_parser = subparsers.add_parser("vault-index")
    index_parser.add_argument("--root", default=".")
    index_parser.add_argument("--db", default=str(DEFAULT_DB))
 
    search_parser = subparsers.add_parser("vault-search")
    search_parser.add_argument("query")
    search_parser.add_argument("--db", default=str(DEFAULT_DB))
    search_parser.add_argument("--area")
    search_parser.add_argument("--tag", action="append", default=[])
    search_parser.add_argument("--limit", type=int, default=10)
    search_parser.add_argument("--json", action="store_true")
 
    health_parser = subparsers.add_parser("vault-health")
    health_parser.add_argument("--db", default=str(DEFAULT_DB))
    health_parser.add_argument("--json", action="store_true")
 
    args = parser.parse_args(argv)
 
    if args.command == "vault-index":
        summary = build_index(Path(args.root), Path(args.db))
        text = json.dumps({"summary": summary}, ensure_ascii=False, indent=2)
        return _emit(text, capture, code=0)
 
    if args.command == "vault-search":
        results = search_documents(Path(args.db), query=args.query, limit=args.limit, area=args.area, tags=args.tag)
        payload = {"query": args.query, "results": results}
        if args.json:
            return _emit(json.dumps(payload, ensure_ascii=False, indent=2), capture)
        lines = [f"{item['path']} | {item['title']} | {', '.join(item['tags'])}" for item in results]
        return _emit("\\n".join(lines), capture)
 
    if args.command == "vault-health":
        payload = {"summary": health_summary(Path(args.db))}
        if args.json:
            return _emit(json.dumps(payload, ensure_ascii=False, indent=2), capture)
        lines = [f"{key}: {value}" for key, value in payload["summary"].items()]
        return _emit("\\n".join(lines), capture)
 
    return 2
 
 
def _emit(text: str, capture: bool, code: int = 0):
    if capture:
        return text
    print(text)
    return code
  • Step 5: Add wrapper scripts

Create code-scripts/vault-search.py:

#!/usr/bin/env python3
from code_scripts.vault_search.cli import main
 
raise SystemExit(main(["vault-search"] + __import__("sys").argv[1:]))

Create code-scripts/vault-index.py:

#!/usr/bin/env python3
from code_scripts.vault_search.cli import main
 
raise SystemExit(main(["vault-index"] + __import__("sys").argv[1:]))

Create code-scripts/vault-health.py:

#!/usr/bin/env python3
from code_scripts.vault_search.cli import main
 
raise SystemExit(main(["vault-health"] + __import__("sys").argv[1:]))
  • Step 6: Run CLI tests and verify they pass

Run:

python3 -m unittest code-scripts/vault_search/tests/test_cli.py -v

Expected: PASS with 1 test.

  • Step 7: Commit Task 5

Run:

git add code_scripts/vault_search code-scripts/vault-search.py code-scripts/vault-index.py code-scripts/vault-health.py code-scripts/vault_search/tests/test_cli.py
git commit -m "feat: add vault search cli"

Task 6: Add Real Vault Smoke Tests And Documentation

Files:

  • Create: code-scripts/vault_search/README.md

  • Modify: code_scripts/vault_search/cli.py

  • Step 1: Run full unit test suite

Run:

python3 -m unittest discover -s code-scripts/vault_search/tests -v

Expected: PASS for parser, discovery, database, indexer, and CLI tests.

  • Step 2: Build index for the real vault

Run:

python3 code-scripts/vault-index.py --root . --db tmp/vault-search.sqlite

Expected: exit 0 and JSON summary similar to:

{
  "summary": {
    "documents": 75,
    "links": 20,
    "broken_links": 1,
    "missing_tags": 16,
    "wikilinks": 12
  }
}

Counts may differ as the vault changes. The important condition is that the command exits 0 and writes tmp/vault-search.sqlite.

  • Step 3: Smoke test search

Run:

python3 code-scripts/vault-search.py "SSL" --db tmp/vault-search.sqlite --json

Expected: exit 0 and JSON containing a result with path wiki/ssl-certificate.md or a clipping about SSL certificates.

  • Step 4: Smoke test health

Run:

python3 code-scripts/vault-health.py --db tmp/vault-search.sqlite --json

Expected: exit 0 and JSON with summary.documents greater than 0.

  • Step 5: Add usage documentation

Create code-scripts/vault_search/README.md:

# Vault Search
 
Local SQLite + FTS5 search tools for this Obsidian vault.
 
## Build The Index
 
```bash
python3 code-scripts/vault-index.py --root . --db tmp/vault-search.sqlite

The database is derived state and is safe to delete. It lives under tmp/, which is ignored by git.

python3 code-scripts/vault-search.py "java 泛型" --db tmp/vault-search.sqlite
python3 code-scripts/vault-search.py "SSL" --db tmp/vault-search.sqlite --json
python3 code-scripts/vault-search.py "网络" --area IT-learning --db tmp/vault-search.sqlite

Health Checks

python3 code-scripts/vault-health.py --db tmp/vault-search.sqlite
python3 code-scripts/vault-health.py --db tmp/vault-search.sqlite --json

Health checks report missing frontmatter, missing tags, broken Markdown links, and wikilinks.

Design

See docs/superpowers/specs/2026-05-10-vault-search-design.md.


- [ ] **Step 6: Commit Task 6**

Run:

```bash
git add code-scripts/vault_search/README.md
git commit -m "docs: document vault search tools"

Final Verification

  • Step 1: Run all tests

Run:

python3 -m unittest discover -s code-scripts/vault_search/tests -v

Expected: all tests pass.

  • Step 2: Rebuild real index

Run:

python3 code-scripts/vault-index.py --root . --db tmp/vault-search.sqlite

Expected: exit 0 and summary JSON with documents greater than 0.

  • Step 3: Query JSON for AI contract

Run:

python3 code-scripts/vault-search.py "SSL" --db tmp/vault-search.sqlite --json

Expected: valid JSON with top-level query and results.

  • Step 4: Query health JSON

Run:

python3 code-scripts/vault-health.py --db tmp/vault-search.sqlite --json

Expected: valid JSON with top-level summary.

  • Step 5: Check git status

Run:

git status --short

Expected: only intentional implementation files are modified or untracked. Existing unrelated .agents/ and AGENTS.md may still be untracked and should not be staged unless the user asks.

Spec Coverage Self-Review

  • Local SQLite + FTS5 retrieval foundation: covered by Tasks 3-6.
  • Markdown remains source of truth and index is derived state: covered by database location and rebuild flow.
  • File exclusions and areas: covered by Task 2.
  • Frontmatter, tags, aliases, headings, links, body content: covered by Tasks 1, 3, and 4.
  • CLI-first workflow: covered by Task 5.
  • JSON output for automation and AI retrieval: covered by Task 5 and final verification.
  • Governance checks: covered by Task 5.
  • No network service, embedding, Obsidian plugin, or Markdown modification in V1: preserved throughout the plan.
  • Future semantic search, graph, and Web UI: intentionally documented in the spec, not implemented in V1.