Vault Search Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Build V1 of the local SQLite + FTS5 retrieval system for this Obsidian vault.
Architecture: Implement a small importable Python package under code_scripts/vault_search/, with wrapper scripts and tests under code-scripts/. Markdown files are parsed into document records, written to a disposable SQLite database under tmp/, then queried by vault-search and vault-health commands. Markdown remains the source of truth; the database is rebuildable derived state.
Tech Stack: Python 3 standard library, SQLite FTS5, unittest, argparse, Markdown/frontmatter parsing with focused local helpers.
File Structure
- Create
code_scripts/__init__.py: import bridge for local tools. - Create
code_scripts/vault_search/__init__.py: package marker and version. - Create
code_scripts/vault_search/models.py: dataclasses for documents, links, headings, index summaries, and search results. - Create
code_scripts/vault_search/parser.py: Markdown/frontmatter parser, title extraction, heading extraction, tag/alias normalization, link extraction. - Create
code_scripts/vault_search/discovery.py: vault file discovery and area detection. - Create
code_scripts/vault_search/database.py: SQLite schema creation, full rebuild write path, FTS queries, health queries. - Create
code_scripts/vault_search/indexer.py: orchestration from discovered files to parsed documents to database. - Create
code_scripts/vault_search/cli.py:vault-index,vault-search, andvault-healthcommand implementations. - Create
code-scripts/vault-search.py: executable dispatcher script for convenient local use. - Create
code-scripts/vault-index.py: wrapper forvault-index. - Create
code-scripts/vault-health.py: wrapper forvault-health. - Create
code-scripts/vault_search/tests/fixtures/sample_vault/: fixture vault used by tests. - Create
code-scripts/vault_search/tests/test_parser.py: parser unit tests. - Create
code-scripts/vault_search/tests/test_discovery.py: discovery unit tests. - Create
code-scripts/vault_search/tests/test_database.py: database/search/health tests. - Create
code-scripts/vault_search/tests/test_cli.py: CLI JSON and command behavior tests. - Create
code-scripts/vault_search/README.md: short usage notes for the local tools.
Use no third-party dependency in V1. The frontmatter parser will support the YAML subset already used in this vault: key: value, inline arrays such as tags: [学习, java], and simple list values.
Task 1: Create Models And Parser Tests
Files:
-
Create:
code_scripts/__init__.py -
Create:
code_scripts/vault_search/__init__.py -
Create:
code_scripts/vault_search/models.py -
Create:
code_scripts/vault_search/parser.py -
Create:
code-scripts/vault_search/tests/test_parser.py -
Step 1: Write parser tests
Create code-scripts/vault_search/tests/test_parser.py:
import unittest
from pathlib import Path
from code_scripts.vault_search.parser import parse_markdown
class ParserTests(unittest.TestCase):
def test_parse_frontmatter_inline_tags_aliases_title_and_body(self):
text = """---
tags: [学习, java, 状态/进行中]
aliases: [Java泛型学习, 泛型]
---
# Java 泛型
正文包含 List<T> 和类型擦除。
[语法](../java-basic/java-grammar.md)
"""
doc = parse_markdown(Path("IT-learning/java-basic/generic.md"), text)
self.assertEqual(doc.path, "IT-learning/java-basic/generic.md")
self.assertEqual(doc.area, "IT-learning")
self.assertEqual(doc.title, "Java 泛型")
self.assertTrue(doc.has_frontmatter)
self.assertTrue(doc.has_tags)
self.assertEqual(doc.tags, ["学习", "java", "状态/进行中"])
self.assertEqual(doc.aliases, ["Java泛型学习", "泛型"])
self.assertIn("正文包含", doc.content)
self.assertEqual(len(doc.links), 1)
self.assertEqual(doc.links[0].link_type, "markdown")
self.assertEqual(doc.links[0].target_raw, "../java-basic/java-grammar.md")
def test_parse_yaml_list_tags_and_wikilinks(self):
text = """---
tags:
- 学习
- network
aliases:
- 网络基础
---
# 计算机网络
- [[TCP 三次握手]]
- [[TCP 滑动窗口|滑动窗口]]
"""
doc = parse_markdown(Path("IT-learning/computer-level/network.md"), text)
self.assertEqual(doc.tags, ["学习", "network"])
self.assertEqual(doc.aliases, ["网络基础"])
self.assertEqual(doc.title, "计算机网络")
self.assertEqual([link.target_raw for link in doc.links], ["TCP 三次握手", "TCP 滑动窗口"])
self.assertEqual([link.link_text for link in doc.links], ["TCP 三次握手", "滑动窗口"])
self.assertTrue(all(link.link_type == "wikilink" for link in doc.links))
def test_missing_frontmatter_uses_filename_title(self):
doc = parse_markdown(Path("task.md"), "Plain note body")
self.assertEqual(doc.title, "task")
self.assertFalse(doc.has_frontmatter)
self.assertFalse(doc.has_tags)
self.assertEqual(doc.tags, [])
self.assertEqual(doc.content, "Plain note body")
if __name__ == "__main__":
unittest.main()- Step 2: Run parser tests and verify they fail
Run:
python3 -m unittest code-scripts/vault_search/tests/test_parser.py -vExpected: FAIL or ERROR because code_scripts.vault_search.parser does not exist yet.
- Step 3: Create package alias for importable code
Because the existing directory is named code-scripts with a hyphen, create a Python package path that tests can import. Create code_scripts/__init__.py:
"""Import bridge for local code scripts."""Create code_scripts/vault_search/__init__.py:
"""Vault search package."""
__version__ = "0.1.0"Use code_scripts/vault_search/ as the actual Python package path, while wrapper scripts can live in code-scripts/.
- Step 4: Implement models
Create code_scripts/vault_search/models.py:
from dataclasses import dataclass, field
@dataclass(frozen=True)
class Link:
link_text: str
target_raw: str
link_type: str
target_path: str | None = None
target_exists: bool = False
@dataclass(frozen=True)
class Heading:
level: int
text: str
anchor: str
@dataclass
class Document:
path: str
area: str
title: str
content: str
tags: list[str] = field(default_factory=list)
aliases: list[str] = field(default_factory=list)
frontmatter: dict[str, object] = field(default_factory=dict)
headings: list[Heading] = field(default_factory=list)
links: list[Link] = field(default_factory=list)
has_frontmatter: bool = False
has_tags: bool = False
mtime: str | None = None
ctime: str | None = None
content_hash: str = ""- Step 5: Implement parser
Create code_scripts/vault_search/parser.py:
from __future__ import annotations
import hashlib
import re
from pathlib import Path
from .models import Document, Heading, Link
FRONTMATTER_RE = re.compile(r"\\A---\\n(?P<body>.*?)\\n---\\n?", re.DOTALL)
HEADING_RE = re.compile(r"^(#{1,6})\\s+(.+?)\\s*$", re.MULTILINE)
MARKDOWN_LINK_RE = re.compile(r"\\[([^\\]]+)\\]\\(([^)]+?\\.md(?:#[^)]+)?)\\)")
WIKILINK_RE = re.compile(r"\\[\\[([^\\]|]+)(?:\\|([^\\]]+))?\\]\\]")
def parse_markdown(path: Path, text: str) -> Document:
rel_path = path.as_posix()
area = rel_path.split("/", 1)[0] if "/" in rel_path else "root"
frontmatter, body, has_frontmatter = _split_frontmatter(text)
tags = _list_value(frontmatter.get("tags"))
aliases = _list_value(frontmatter.get("aliases"))
headings = _extract_headings(body)
title = _select_title(frontmatter, headings, path)
links = _extract_links(body)
content = body.strip()
return Document(
path=rel_path,
area=area,
title=title,
content=content,
tags=tags,
aliases=aliases,
frontmatter=frontmatter,
headings=headings,
links=links,
has_frontmatter=has_frontmatter,
has_tags=bool(tags),
content_hash=hashlib.sha256(text.encode("utf-8")).hexdigest(),
)
def _split_frontmatter(text: str) -> tuple[dict[str, object], str, bool]:
match = FRONTMATTER_RE.match(text)
if not match:
return {}, text, False
raw = match.group("body")
body = text[match.end():]
return _parse_frontmatter(raw), body, True
def _parse_frontmatter(raw: str) -> dict[str, object]:
data: dict[str, object] = {}
current_key: str | None = None
for line in raw.splitlines():
if not line.strip():
continue
if line.startswith(" - ") and current_key:
value = line[4:].strip()
existing = data.setdefault(current_key, [])
if isinstance(existing, list):
existing.append(value)
continue
if ":" not in line:
continue
key, value = line.split(":", 1)
key = key.strip()
value = value.strip()
current_key = key
if value == "":
data[key] = []
elif value.startswith("[") and value.endswith("]"):
data[key] = [item.strip().strip('"').strip("'") for item in value[1:-1].split(",") if item.strip()]
else:
data[key] = value.strip('"').strip("'")
return data
def _list_value(value: object) -> list[str]:
if isinstance(value, list):
return [str(item).strip() for item in value if str(item).strip()]
if isinstance(value, str) and value.strip():
return [value.strip()]
return []
def _extract_headings(body: str) -> list[Heading]:
headings: list[Heading] = []
for match in HEADING_RE.finditer(body):
text = match.group(2).strip()
headings.append(Heading(level=len(match.group(1)), text=text, anchor=_anchor(text)))
return headings
def _select_title(frontmatter: dict[str, object], headings: list[Heading], path: Path) -> str:
for heading in headings:
if heading.level == 1:
return heading.text
title = frontmatter.get("title")
if isinstance(title, str) and title.strip():
return title.strip()
return path.stem
def _extract_links(body: str) -> list[Link]:
links: list[Link] = []
for match in MARKDOWN_LINK_RE.finditer(body):
links.append(Link(link_text=match.group(1), target_raw=match.group(2), link_type="markdown"))
for match in WIKILINK_RE.finditer(body):
target = match.group(1).strip()
text = (match.group(2) or target).strip()
links.append(Link(link_text=text, target_raw=target, link_type="wikilink"))
return links
def _anchor(text: str) -> str:
return re.sub(r"\\s+", "-", text.strip().lower())- Step 6: Run parser tests and verify they pass
Run:
python3 -m unittest code-scripts/vault_search/tests/test_parser.py -vExpected: PASS with 3 tests.
- Step 7: Commit Task 1
Run:
git add code_scripts code-scripts/vault_search/tests/test_parser.py
git commit -m "feat: add vault markdown parser"Task 2: Add Vault Discovery
Files:
-
Create:
code_scripts/vault_search/discovery.py -
Create:
code-scripts/vault_search/tests/fixtures/sample_vault/ -
Create:
code-scripts/vault_search/tests/test_discovery.py -
Step 1: Create discovery tests and fixtures
Create fixture files:
code-scripts/vault_search/tests/fixtures/sample_vault/README.md
code-scripts/vault_search/tests/fixtures/sample_vault/IT-learning/java-basic/java.md
code-scripts/vault_search/tests/fixtures/sample_vault/wiki/INDEX.md
code-scripts/vault_search/tests/fixtures/sample_vault/tmp/ignored.md
code-scripts/vault_search/tests/fixtures/sample_vault/.obsidian/ignored.mdCreate code-scripts/vault_search/tests/test_discovery.py:
import unittest
from pathlib import Path
from code_scripts.vault_search.discovery import discover_markdown_files, path_area
FIXTURE = Path("code-scripts/vault_search/tests/fixtures/sample_vault")
class DiscoveryTests(unittest.TestCase):
def test_discover_markdown_files_respects_exclusions(self):
files = [path.as_posix() for path in discover_markdown_files(FIXTURE)]
self.assertIn("README.md", files)
self.assertIn("IT-learning/java-basic/java.md", files)
self.assertIn("wiki/INDEX.md", files)
self.assertNotIn("tmp/ignored.md", files)
self.assertNotIn(".obsidian/ignored.md", files)
def test_path_area(self):
self.assertEqual(path_area(Path("README.md")), "root")
self.assertEqual(path_area(Path("IT-learning/java-basic/java.md")), "IT-learning")
if __name__ == "__main__":
unittest.main()- Step 2: Run discovery tests and verify they fail
Run:
python3 -m unittest code-scripts/vault_search/tests/test_discovery.py -vExpected: ERROR because discovery.py does not exist.
- Step 3: Implement discovery
Create code_scripts/vault_search/discovery.py:
from __future__ import annotations
from pathlib import Path
DEFAULT_EXCLUDES = {".git", ".obsidian", "node_modules", "tmp", ".trash"}
def discover_markdown_files(root: Path, excludes: set[str] | None = None) -> list[Path]:
excludes = excludes or DEFAULT_EXCLUDES
root = root.resolve()
results: list[Path] = []
for path in root.rglob("*.md"):
rel = path.relative_to(root)
if any(part.startswith(".") or part in excludes for part in rel.parts):
continue
results.append(rel)
return sorted(results, key=lambda item: item.as_posix())
def path_area(path: Path) -> str:
return path.parts[0] if len(path.parts) > 1 else "root"- Step 4: Create fixture files
Use mkdir -p for fixture directories. Create the files with these contents:
code-scripts/vault_search/tests/fixtures/sample_vault/README.md:
# Fixture Vaultcode-scripts/vault_search/tests/fixtures/sample_vault/IT-learning/java-basic/java.md:
---
tags: [学习, java]
---
# Javacode-scripts/vault_search/tests/fixtures/sample_vault/wiki/INDEX.md:
# Wiki Indexcode-scripts/vault_search/tests/fixtures/sample_vault/tmp/ignored.md:
# Ignoredcode-scripts/vault_search/tests/fixtures/sample_vault/.obsidian/ignored.md:
# Ignored- Step 5: Run discovery tests and verify they pass
Run:
python3 -m unittest code-scripts/vault_search/tests/test_discovery.py -vExpected: PASS with 2 tests.
- Step 6: Commit Task 2
Run:
git add code_scripts/vault_search/discovery.py code-scripts/vault_search/tests/test_discovery.py code-scripts/vault_search/tests/fixtures/sample_vault
git commit -m "feat: discover vault markdown files"Task 3: Add SQLite Schema And Index Writer
Files:
-
Create:
code_scripts/vault_search/database.py -
Create:
code-scripts/vault_search/tests/test_database.py -
Step 1: Write database indexing tests
Create code-scripts/vault_search/tests/test_database.py:
import sqlite3
import tempfile
import unittest
from pathlib import Path
from code_scripts.vault_search.database import rebuild_database, search_documents
from code_scripts.vault_search.parser import parse_markdown
class DatabaseTests(unittest.TestCase):
def test_rebuild_database_writes_documents_tags_links_and_fts(self):
doc = parse_markdown(
Path("wiki/ssl-certificate.md"),
"""---
tags: [知识总结, network, 状态/已完结]
aliases: [SSL]
---
# SSL 证书
SSL 证书用于验证服务器身份。
[相关](llm-wiki-compiler.md)
""",
)
with tempfile.TemporaryDirectory() as tmp:
db_path = Path(tmp) / "vault-search.sqlite"
rebuild_database(db_path, [doc])
conn = sqlite3.connect(db_path)
self.assertEqual(conn.execute("select count(*) from documents").fetchone()[0], 1)
self.assertEqual(conn.execute("select count(*) from tags").fetchone()[0], 3)
self.assertEqual(conn.execute("select count(*) from aliases").fetchone()[0], 1)
self.assertEqual(conn.execute("select count(*) from links").fetchone()[0], 1)
results = search_documents(db_path, query="SSL", limit=5)
self.assertEqual(results[0]["path"], "wiki/ssl-certificate.md")
self.assertEqual(results[0]["title"], "SSL 证书")
if __name__ == "__main__":
unittest.main()- Step 2: Run database tests and verify they fail
Run:
python3 -m unittest code-scripts/vault_search/tests/test_database.py -vExpected: ERROR because database.py does not exist.
- Step 3: Implement database schema and search
Create code_scripts/vault_search/database.py:
from __future__ import annotations
import json
import sqlite3
from pathlib import Path
from .models import Document
SCHEMA = """
drop table if exists documents;
drop table if exists tags;
drop table if exists aliases;
drop table if exists frontmatter;
drop table if exists links;
drop table if exists headings;
drop table if exists document_fts;
create table documents (
id integer primary key,
path text unique not null,
area text not null,
title text not null,
content text not null,
mtime text,
ctime text,
has_frontmatter integer not null,
has_tags integer not null,
content_hash text not null
);
create virtual table document_fts using fts5(
title,
content,
content='documents',
content_rowid='id'
);
create table tags (
document_id integer not null,
tag text not null
);
create table aliases (
document_id integer not null,
alias text not null
);
create table frontmatter (
document_id integer not null,
key text not null,
value_json text not null
);
create table links (
id integer primary key,
source_document_id integer not null,
link_text text,
target_raw text not null,
target_path text,
link_type text not null,
target_exists integer not null
);
create table headings (
document_id integer not null,
level integer not null,
text text not null,
anchor text
);
"""
def rebuild_database(db_path: Path, documents: list[Document]) -> None:
db_path.parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(db_path)
try:
conn.executescript(SCHEMA)
for doc in documents:
cursor = conn.execute(
"""
insert into documents
(path, area, title, content, mtime, ctime, has_frontmatter, has_tags, content_hash)
values (?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
doc.path,
doc.area,
doc.title,
doc.content,
doc.mtime,
doc.ctime,
int(doc.has_frontmatter),
int(doc.has_tags),
doc.content_hash,
),
)
document_id = cursor.lastrowid
conn.execute("insert into document_fts(rowid, title, content) values (?, ?, ?)", (document_id, doc.title, doc.content))
conn.executemany("insert into tags(document_id, tag) values (?, ?)", [(document_id, tag) for tag in doc.tags])
conn.executemany("insert into aliases(document_id, alias) values (?, ?)", [(document_id, alias) for alias in doc.aliases])
conn.executemany(
"insert into frontmatter(document_id, key, value_json) values (?, ?, ?)",
[(document_id, key, json.dumps(value, ensure_ascii=False)) for key, value in doc.frontmatter.items()],
)
conn.executemany(
"""
insert into links(source_document_id, link_text, target_raw, target_path, link_type, target_exists)
values (?, ?, ?, ?, ?, ?)
""",
[(document_id, link.link_text, link.target_raw, link.target_path, link.link_type, int(link.target_exists)) for link in doc.links],
)
conn.executemany(
"insert into headings(document_id, level, text, anchor) values (?, ?, ?, ?)",
[(document_id, heading.level, heading.text, heading.anchor) for heading in doc.headings],
)
conn.commit()
finally:
conn.close()
def search_documents(db_path: Path, query: str, limit: int = 10, area: str | None = None, tags: list[str] | None = None) -> list[dict[str, object]]:
conn = sqlite3.connect(db_path)
conn.row_factory = sqlite3.Row
try:
params: list[object] = [query]
where = ["document_fts match ?"]
joins = ["join document_fts on document_fts.rowid = documents.id"]
if area:
where.append("documents.area = ?")
params.append(area)
if tags:
for tag in tags:
joins.append("join tags as t%s on t%s.document_id = documents.id and t%s.tag = ?" % (len(params), len(params), len(params)))
params.append(tag)
params.append(limit)
sql = f"""
select documents.path, documents.title, documents.area, bm25(document_fts) as score,
snippet(document_fts, 1, '[', ']', '...', 16) as snippet
from documents
{' '.join(joins)}
where {' and '.join(where)}
order by score
limit ?
"""
rows = conn.execute(sql, params).fetchall()
return [dict(row) | {"tags": _tags_for_document(conn, row["path"])} for row in rows]
finally:
conn.close()
def _tags_for_document(conn: sqlite3.Connection, path: str) -> list[str]:
rows = conn.execute(
"""
select tags.tag
from tags
join documents on documents.id = tags.document_id
where documents.path = ?
order by tags.tag
""",
(path,),
).fetchall()
return [row[0] for row in rows]- Step 4: Run database tests and verify they pass
Run:
python3 -m unittest code-scripts/vault_search/tests/test_database.py -vExpected: PASS with 1 test.
- Step 5: Commit Task 3
Run:
git add code_scripts/vault_search/database.py code-scripts/vault_search/tests/test_database.py
git commit -m "feat: write vault search sqlite index"Task 4: Resolve Links And Build Indexer
Files:
-
Create:
code_scripts/vault_search/indexer.py -
Modify:
code_scripts/vault_search/parser.py -
Modify:
code-scripts/vault_search/tests/test_database.py -
Step 1: Add indexer test for link resolution and summary
Append to code-scripts/vault_search/tests/test_database.py:
from code_scripts.vault_search.indexer import build_index
class IndexerTests(unittest.TestCase):
def test_build_index_resolves_markdown_links_and_reports_summary(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
(root / "wiki").mkdir()
(root / "wiki" / "ssl.md").write_text("# SSL\n[Compiler](compiler.md)\n[[Missing Note]]", encoding="utf-8")
(root / "wiki" / "compiler.md").write_text("# Compiler\n", encoding="utf-8")
db_path = root / "tmp" / "vault-search.sqlite"
summary = build_index(root, db_path)
self.assertEqual(summary["documents"], 2)
self.assertEqual(summary["links"], 2)
self.assertEqual(summary["broken_links"], 1)
conn = sqlite3.connect(db_path)
rows = conn.execute("select target_raw, target_path, target_exists from links order by target_raw").fetchall()
self.assertEqual(rows[0], ("Missing Note", None, 0))
self.assertEqual(rows[1], ("compiler.md", "wiki/compiler.md", 1))- Step 2: Run indexer test and verify it fails
Run:
python3 -m unittest code-scripts/vault_search/tests/test_database.py -vExpected: ERROR because indexer.py does not exist.
- Step 3: Implement indexer and link resolution
Create code_scripts/vault_search/indexer.py:
from __future__ import annotations
from dataclasses import replace
from datetime import datetime, timezone
from pathlib import Path
from .database import rebuild_database
from .discovery import discover_markdown_files
from .models import Document, Link
from .parser import parse_markdown
def build_index(root: Path, db_path: Path) -> dict[str, int]:
root = root.resolve()
rel_paths = discover_markdown_files(root)
documents: list[Document] = []
all_paths = {path.as_posix() for path in rel_paths}
title_index: dict[str, str] = {}
for rel_path in rel_paths:
full_path = root / rel_path
text = full_path.read_text(encoding="utf-8")
doc = parse_markdown(rel_path, text)
stat = full_path.stat()
doc.mtime = _iso(stat.st_mtime)
doc.ctime = _iso(stat.st_ctime)
documents.append(doc)
title_index[doc.title] = doc.path
title_index[Path(doc.path).stem] = doc.path
resolved = [_resolve_document_links(doc, all_paths, title_index) for doc in documents]
rebuild_database(db_path, resolved)
links = sum(len(doc.links) for doc in resolved)
broken_links = sum(1 for doc in resolved for link in doc.links if not link.target_exists)
missing_tags = sum(1 for doc in resolved if not doc.has_tags)
return {
"documents": len(resolved),
"links": links,
"broken_links": broken_links,
"missing_tags": missing_tags,
"wikilinks": sum(1 for doc in resolved for link in doc.links if link.link_type == "wikilink"),
}
def _resolve_document_links(doc: Document, all_paths: set[str], title_index: dict[str, str]) -> Document:
resolved_links: list[Link] = []
source_dir = Path(doc.path).parent
for link in doc.links:
if link.link_type == "markdown":
raw_path = link.target_raw.split("#", 1)[0]
target = (source_dir / raw_path).as_posix()
normalized = Path(target).as_posix()
exists = normalized in all_paths
resolved_links.append(replace(link, target_path=normalized if exists else normalized, target_exists=exists))
else:
target_path = title_index.get(link.target_raw)
resolved_links.append(replace(link, target_path=target_path, target_exists=bool(target_path)))
doc.links = resolved_links
return doc
def _iso(timestamp: float) -> str:
return datetime.fromtimestamp(timestamp, timezone.utc).isoformat()- Step 4: Run database/indexer tests and verify they pass
Run:
python3 -m unittest code-scripts/vault_search/tests/test_database.py -vExpected: PASS with 2 tests.
- Step 5: Commit Task 4
Run:
git add code_scripts/vault_search/indexer.py code-scripts/vault_search/tests/test_database.py
git commit -m "feat: build vault search index"Task 5: Add Search And Health CLI
Files:
-
Create:
code_scripts/vault_search/cli.py -
Create:
code-scripts/vault-search.py -
Create:
code-scripts/vault-index.py -
Create:
code-scripts/vault-health.py -
Create:
code-scripts/vault_search/tests/test_cli.py -
Modify:
code_scripts/vault_search/database.py -
Step 1: Write CLI tests
Create code-scripts/vault_search/tests/test_cli.py:
import json
import tempfile
import unittest
from pathlib import Path
from code_scripts.vault_search.cli import main
class CliTests(unittest.TestCase):
def test_index_search_and_health_json(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
(root / "wiki").mkdir()
(root / "wiki" / "ssl.md").write_text("---\ntags: [知识总结, network]\n---\n# SSL 证书\nSSL 证书用于 HTTPS。\n", encoding="utf-8")
(root / "note.md").write_text("# Untagged\n[[Missing]]\n", encoding="utf-8")
db_path = root / "tmp" / "vault-search.sqlite"
index_code = main(["vault-index", "--root", str(root), "--db", str(db_path)])
self.assertEqual(index_code, 0)
search_output = main(["vault-search", "SSL", "--db", str(db_path), "--json"], capture=True)
payload = json.loads(search_output)
self.assertEqual(payload["results"][0]["path"], "wiki/ssl.md")
health_output = main(["vault-health", "--db", str(db_path), "--json"], capture=True)
health = json.loads(health_output)
self.assertEqual(health["summary"]["missing_tags"], 1)
self.assertEqual(health["summary"]["wikilinks"], 1)
if __name__ == "__main__":
unittest.main()- Step 2: Run CLI tests and verify they fail
Run:
python3 -m unittest code-scripts/vault_search/tests/test_cli.py -vExpected: ERROR because cli.py does not exist.
- Step 3: Add health query helpers
Append to code_scripts/vault_search/database.py:
def health_summary(db_path: Path) -> dict[str, int]:
conn = sqlite3.connect(db_path)
try:
return {
"documents": conn.execute("select count(*) from documents").fetchone()[0],
"missing_frontmatter": conn.execute("select count(*) from documents where has_frontmatter = 0").fetchone()[0],
"missing_tags": conn.execute("select count(*) from documents where has_tags = 0").fetchone()[0],
"broken_links": conn.execute("select count(*) from links where target_exists = 0 and link_type = 'markdown'").fetchone()[0],
"wikilinks": conn.execute("select count(*) from links where link_type = 'wikilink'").fetchone()[0],
}
finally:
conn.close()- Step 4: Implement CLI
Create code_scripts/vault_search/cli.py:
from __future__ import annotations
import argparse
import json
import sys
from pathlib import Path
from .database import health_summary, search_documents
from .indexer import build_index
DEFAULT_DB = Path("tmp/vault-search.sqlite")
def main(argv: list[str] | None = None, capture: bool = False):
argv = list(sys.argv[1:] if argv is None else argv)
parser = argparse.ArgumentParser(prog="vault-search-tools")
subparsers = parser.add_subparsers(dest="command", required=True)
index_parser = subparsers.add_parser("vault-index")
index_parser.add_argument("--root", default=".")
index_parser.add_argument("--db", default=str(DEFAULT_DB))
search_parser = subparsers.add_parser("vault-search")
search_parser.add_argument("query")
search_parser.add_argument("--db", default=str(DEFAULT_DB))
search_parser.add_argument("--area")
search_parser.add_argument("--tag", action="append", default=[])
search_parser.add_argument("--limit", type=int, default=10)
search_parser.add_argument("--json", action="store_true")
health_parser = subparsers.add_parser("vault-health")
health_parser.add_argument("--db", default=str(DEFAULT_DB))
health_parser.add_argument("--json", action="store_true")
args = parser.parse_args(argv)
if args.command == "vault-index":
summary = build_index(Path(args.root), Path(args.db))
text = json.dumps({"summary": summary}, ensure_ascii=False, indent=2)
return _emit(text, capture, code=0)
if args.command == "vault-search":
results = search_documents(Path(args.db), query=args.query, limit=args.limit, area=args.area, tags=args.tag)
payload = {"query": args.query, "results": results}
if args.json:
return _emit(json.dumps(payload, ensure_ascii=False, indent=2), capture)
lines = [f"{item['path']} | {item['title']} | {', '.join(item['tags'])}" for item in results]
return _emit("\\n".join(lines), capture)
if args.command == "vault-health":
payload = {"summary": health_summary(Path(args.db))}
if args.json:
return _emit(json.dumps(payload, ensure_ascii=False, indent=2), capture)
lines = [f"{key}: {value}" for key, value in payload["summary"].items()]
return _emit("\\n".join(lines), capture)
return 2
def _emit(text: str, capture: bool, code: int = 0):
if capture:
return text
print(text)
return code- Step 5: Add wrapper scripts
Create code-scripts/vault-search.py:
#!/usr/bin/env python3
from code_scripts.vault_search.cli import main
raise SystemExit(main(["vault-search"] + __import__("sys").argv[1:]))Create code-scripts/vault-index.py:
#!/usr/bin/env python3
from code_scripts.vault_search.cli import main
raise SystemExit(main(["vault-index"] + __import__("sys").argv[1:]))Create code-scripts/vault-health.py:
#!/usr/bin/env python3
from code_scripts.vault_search.cli import main
raise SystemExit(main(["vault-health"] + __import__("sys").argv[1:]))- Step 6: Run CLI tests and verify they pass
Run:
python3 -m unittest code-scripts/vault_search/tests/test_cli.py -vExpected: PASS with 1 test.
- Step 7: Commit Task 5
Run:
git add code_scripts/vault_search code-scripts/vault-search.py code-scripts/vault-index.py code-scripts/vault-health.py code-scripts/vault_search/tests/test_cli.py
git commit -m "feat: add vault search cli"Task 6: Add Real Vault Smoke Tests And Documentation
Files:
-
Create:
code-scripts/vault_search/README.md -
Modify:
code_scripts/vault_search/cli.py -
Step 1: Run full unit test suite
Run:
python3 -m unittest discover -s code-scripts/vault_search/tests -vExpected: PASS for parser, discovery, database, indexer, and CLI tests.
- Step 2: Build index for the real vault
Run:
python3 code-scripts/vault-index.py --root . --db tmp/vault-search.sqliteExpected: exit 0 and JSON summary similar to:
{
"summary": {
"documents": 75,
"links": 20,
"broken_links": 1,
"missing_tags": 16,
"wikilinks": 12
}
}Counts may differ as the vault changes. The important condition is that the command exits 0 and writes tmp/vault-search.sqlite.
- Step 3: Smoke test search
Run:
python3 code-scripts/vault-search.py "SSL" --db tmp/vault-search.sqlite --jsonExpected: exit 0 and JSON containing a result with path wiki/ssl-certificate.md or a clipping about SSL certificates.
- Step 4: Smoke test health
Run:
python3 code-scripts/vault-health.py --db tmp/vault-search.sqlite --jsonExpected: exit 0 and JSON with summary.documents greater than 0.
- Step 5: Add usage documentation
Create code-scripts/vault_search/README.md:
# Vault Search
Local SQLite + FTS5 search tools for this Obsidian vault.
## Build The Index
```bash
python3 code-scripts/vault-index.py --root . --db tmp/vault-search.sqliteThe database is derived state and is safe to delete. It lives under tmp/, which is ignored by git.
Search
python3 code-scripts/vault-search.py "java 泛型" --db tmp/vault-search.sqlite
python3 code-scripts/vault-search.py "SSL" --db tmp/vault-search.sqlite --json
python3 code-scripts/vault-search.py "网络" --area IT-learning --db tmp/vault-search.sqliteHealth Checks
python3 code-scripts/vault-health.py --db tmp/vault-search.sqlite
python3 code-scripts/vault-health.py --db tmp/vault-search.sqlite --jsonHealth checks report missing frontmatter, missing tags, broken Markdown links, and wikilinks.
Design
See docs/superpowers/specs/2026-05-10-vault-search-design.md.
- [ ] **Step 6: Commit Task 6**
Run:
```bash
git add code-scripts/vault_search/README.md
git commit -m "docs: document vault search tools"
Final Verification
- Step 1: Run all tests
Run:
python3 -m unittest discover -s code-scripts/vault_search/tests -vExpected: all tests pass.
- Step 2: Rebuild real index
Run:
python3 code-scripts/vault-index.py --root . --db tmp/vault-search.sqliteExpected: exit 0 and summary JSON with documents greater than 0.
- Step 3: Query JSON for AI contract
Run:
python3 code-scripts/vault-search.py "SSL" --db tmp/vault-search.sqlite --jsonExpected: valid JSON with top-level query and results.
- Step 4: Query health JSON
Run:
python3 code-scripts/vault-health.py --db tmp/vault-search.sqlite --jsonExpected: valid JSON with top-level summary.
- Step 5: Check git status
Run:
git status --shortExpected: only intentional implementation files are modified or untracked. Existing unrelated .agents/ and AGENTS.md may still be untracked and should not be staged unless the user asks.
Spec Coverage Self-Review
- Local SQLite + FTS5 retrieval foundation: covered by Tasks 3-6.
- Markdown remains source of truth and index is derived state: covered by database location and rebuild flow.
- File exclusions and areas: covered by Task 2.
- Frontmatter, tags, aliases, headings, links, body content: covered by Tasks 1, 3, and 4.
- CLI-first workflow: covered by Task 5.
- JSON output for automation and AI retrieval: covered by Task 5 and final verification.
- Governance checks: covered by Task 5.
- No network service, embedding, Obsidian plugin, or Markdown modification in V1: preserved throughout the plan.
- Future semantic search, graph, and Web UI: intentionally documented in the spec, not implemented in V1.