db.rs owns the SQLite database that weaveback writes after every tangling
run. It is opened and written by `SafeFileWriter` (for
baselines) and by `Clip::write_files` (for noweb_map
entries). See weaveback_tangle.adoc for the
module map and architecture.adoc for the
full concurrency and apply-back context.
The database stores nine kinds of data:
-
files— a path-interning table: every unique file path is stored once and given an integer ID. All other tables reference file paths through these IDs rather than repeating the full string on every row. -
gen_baselines— the last content weaveback wrote to each generated file, used to detect external edits between runs. -
noweb_map— a line-by-line source map from output lines back to their origin chunk and line in the literate source. -
macro_map— per-line tracing data from the macro expander. -
src_snapshots— byte-for-byte copies of the literate source files at the time of the last run; used by apply-back to reconstruct the original text. -
var_defs/macro_defs— byte-offset records for every%set/%defcall, enabling fast "where was this defined?" lookups. -
chunk_defs— the line range of every chunk definition header and close marker in each literate source file; used byweaveback serveto open the right editor location for a chunk. -
source_blocks— logical blocks parsed from each literate source file (section headers, code/listing blocks, paragraphs), each with a BLAKE3 content hash. Used to drive sub-file-precision incremental building: only@filechunks whose source blocks changed need to be re-expanded and re-written.
Concurrency model
Weaveback builds an in-memory database during a run (open_temp), then
merge_into copies all tables into the target file database in a single
BEGIN IMMEDIATE write transaction. Using IMMEDIATE acquires the write lock
at BEGIN time rather than at the first write, so two concurrent processes
cannot interleave partial snapshots into the same target — one wins the lock and
the other waits up to the 200 ms busy-timeout. The target runs in WAL mode, so
read-only connections (MCP server, apply-back) never block merges and merges
never block readers.
File paths are interned independently in each in-memory database, so their
integer IDs may differ. merge_into resolves this by first copying all files
rows into the target and then remapping IDs via subquery lookups during each
table insert.
NowebMapEntry
Each row of noweb_map carries five fields:
-
src_file— path of the literate source file containing the chunk definition. -
chunk_name— the name of the chunk that produced this output line. -
src_line— 0-indexed line number within the source file. -
indent— the indentation string prepended during expansion. -
confidence— how reliably the post-formatter line was traced back to this source line. Three values:exact(diff Equal match),hash_match(content hash match, survives reordering),inferred(nearest-neighbour fill). Old rows in existing databases default toexactvia the columnDEFAULT.
Schema
The schema is created on first open via apply_schema. All tables use
STRICT mode to catch type mismatches at the SQLite layer. File path columns
that were previously TEXT are now INTEGER REFERENCES files(id), eliminating
the redundant path storage on every row. Indexes on chunk_deps(to_chunk) and
noweb_map(src_file, src_line) keep reverse-dep and trace lookups O(log n).
// <[db-schema]>=
const CREATE_SCHEMA: &str = "
CREATE TABLE IF NOT EXISTS files (
id INTEGER PRIMARY KEY,
path TEXT NOT NULL UNIQUE
) STRICT;
CREATE TABLE IF NOT EXISTS gen_baselines (
path TEXT PRIMARY KEY NOT NULL,
content BLOB NOT NULL
) STRICT;
CREATE TABLE IF NOT EXISTS noweb_map (
out_file INTEGER NOT NULL REFERENCES files(id),
out_line INTEGER NOT NULL,
src_file INTEGER NOT NULL REFERENCES files(id),
chunk_name TEXT NOT NULL,
src_line INTEGER NOT NULL,
indent TEXT NOT NULL,
confidence TEXT NOT NULL DEFAULT 'exact',
PRIMARY KEY (out_file, out_line)
) STRICT, WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS macro_map (
driver_file INTEGER NOT NULL REFERENCES files(id),
expanded_line INTEGER NOT NULL,
data BLOB NOT NULL,
PRIMARY KEY (driver_file, expanded_line)
) STRICT;
CREATE TABLE IF NOT EXISTS src_snapshots (
path TEXT PRIMARY KEY NOT NULL,
content BLOB NOT NULL
) STRICT;
CREATE TABLE IF NOT EXISTS var_defs (
var_name TEXT NOT NULL,
src_file INTEGER NOT NULL REFERENCES files(id),
pos INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (var_name, src_file, pos)
) STRICT, WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS macro_defs (
macro_name TEXT NOT NULL,
src_file INTEGER NOT NULL REFERENCES files(id),
pos INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (macro_name, src_file, pos)
) STRICT, WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS chunk_deps (
from_chunk TEXT NOT NULL,
to_chunk TEXT NOT NULL,
src_file INTEGER NOT NULL REFERENCES files(id),
PRIMARY KEY (from_chunk, to_chunk, src_file)
) STRICT, WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS chunk_defs (
src_file INTEGER NOT NULL REFERENCES files(id),
chunk_name TEXT NOT NULL,
nth INTEGER NOT NULL DEFAULT 0,
def_start INTEGER NOT NULL,
def_end INTEGER NOT NULL,
PRIMARY KEY (src_file, chunk_name, nth)
) STRICT, WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS literate_source_config (
src_file INTEGER NOT NULL REFERENCES files(id),
special_char TEXT NOT NULL,
open_delim TEXT NOT NULL,
close_delim TEXT NOT NULL,
chunk_end TEXT NOT NULL,
comment_markers TEXT NOT NULL,
PRIMARY KEY (src_file)
) STRICT, WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS run_config (
key TEXT PRIMARY KEY NOT NULL,
value TEXT NOT NULL
) STRICT, WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS source_blocks (
src_file INTEGER NOT NULL REFERENCES files(id),
block_index INTEGER NOT NULL,
block_type TEXT NOT NULL,
line_start INTEGER NOT NULL,
line_end INTEGER NOT NULL,
content_hash BLOB NOT NULL,
PRIMARY KEY (src_file, block_index)
) STRICT, WITHOUT ROWID;
CREATE INDEX IF NOT EXISTS idx_chunk_deps_to ON chunk_deps(to_chunk);
CREATE INDEX IF NOT EXISTS idx_noweb_map_src ON noweb_map(src_file, src_line);
";
// @@
Error type and NowebMapEntry
// <[db-types]>=
use Error;
/// How reliably a post-formatter output line was traced back to its source.
/// One parsed logical block stored in `source_blocks`.
/// Location of a chunk definition within a literate source file.
/// `def_start` is the 1-indexed line of the open marker (`// <<name>>=`).
/// `def_end` is the 1-indexed line of the close marker (`// @@`).
// @@
WeavebackDb — open modes
WeavebackDb wraps a single rusqlite::Connection. Three constructors cover
the three use cases:
-
open— read-write with WAL mode; used when writing the persistentweaveback.dbdirectly (uncommon — most writes go throughopen_temp
merge_into). -
open_read_only— read-only, never blocks concurrent writers; used in the MCP server and apply-back reads. -
open_temp— in-memory database that accumulates all writes during a single weaveback run;merge_intoflushes it to the target file at the end.
intern_file inserts a path into files if it is not already there, then
returns its integer ID. All write methods call this before their transaction so
the IDs are available without opening a nested transaction.
needs_file_id_migration checks whether an on-disk database was created before
the file-ID schema was introduced by inspecting the column type of
noweb_map.out_file. apply_schema uses this to drop and recreate the
affected tables (while preserving gen_baselines and src_snapshots) before
running CREATE_SCHEMA.
// <[db-open]>=
/// Intern a file path: insert if not present, return the row id.
/// Detect whether the db uses the pre-file-ID schema (noweb_map.out_file is TEXT).
// @@
gen_baselines
set_baseline / get_baseline maintain the modification-detection baseline
for each generated file. list_baselines is used during merge and in tests.
// <[db-baselines]>=
// @@
noweb_map
set_noweb_entries writes all source-map rows for one output file in a single
transaction. All file paths are interned before the transaction opens so the
integer IDs are ready. get_noweb_entry is used by the weaveback where and
trace commands; it JOINs the files table to return path strings.
// <[db-noweb-map]>=
// @@
chunk_deps
set_chunk_deps replaces all dependency edges for a given source file in one
transaction. File paths are interned before the transaction; the unique-IDs
list drives the delete pass. query_chunk_deps returns everything a chunk
directly references (forward edges); query_reverse_deps returns everything
that directly references a chunk (backward edges — "what would break if I edit
this?"). query_all_chunk_deps returns every edge in the graph for DOT export.
query_chunk_output_files maps a chunk name to the gen/ files it contributes
lines to, enabling weaveback impact to report affected output files.
// <[db-chunk-deps]>=
// @@
chunk_defs
set_chunk_defs replaces all definition records for every source file in the
batch in a single transaction. File paths are interned before the transaction;
the unique-IDs list drives the delete pass. get_chunk_def retrieves a single
entry by (src_file, chunk_name, nth), used by weaveback serve to open the
correct editor location.
// <[db-chunk-defs-api]>=
// @@
macro_map
Pre-serialized entries (opaque BLOB per line) written by the macro expander
and read back during trace operations. The driver file path is interned before
the transaction; get_macro_map_bytes resolves the path via a JOIN.
// <[db-macro-map]>=
// @@
literate_source_config
set_source_config records the TangleConfig used for a given source file.
get_source_config retrieves it during trace or apply-back.
get_output_location and get_all_output_mappings are used by apply-back to
translate literate source positions into generated file positions.
set_run_config / get_run_config store free-form key→value pairs for the
current run.
// <[db-config]>=
// @@
source_blocks
set_source_blocks replaces all block rows for a given source file in one
transaction. get_source_block_hashes returns (block_index, content_hash)
pairs for a file — used by the incremental-build logic to detect which blocks
changed since the last run. query_blocks_overlapping_range returns all
blocks whose line range overlaps a given [line_start, line_end] interval,
enabling the caller to map a changed line range to a set of dirty blocks.
// <[db-source-blocks]>=
// @@
merge_into
merge_into copies all tables from the in-memory database into the persistent
file database in a single write transaction. The target is created and
WAL-initialized if it does not yet exist.
Because file paths are interned independently in each in-memory database their
integer IDs may differ. The merge resolves this by first copying all files
rows into the target (INSERT OR IGNORE), then remapping each data table’s ID
columns via subquery lookups: for every row, each file-ID column is translated
to the corresponding target ID by joining through the shared path string.
Tables without file-ID columns (gen_baselines, src_snapshots, run_config)
are copied with a simple SELECT *.
The BEGIN IMMEDIATE, inserts, and COMMIT are issued as separate
execute_batch calls so that an explicit ROLLBACK can be sent on any
failure. DETACH always runs, even on error, to avoid leaking the attachment.
SQLite’s ATTACH DATABASE does not support parameterized paths, so the path
is string-interpolated. The sqlite_string_literal helper encapsulates
single-quote escaping to prevent injection.
// <[db-merge]>=
/// Escape a string for use inside a SQLite single-quoted string literal.
// @@
src_snapshots, var_defs, macro_defs
src_snapshots stores the raw bytes of each literate source file read during
a run; apply-back uses these to reconstruct the original text when patching.
var_defs and macro_defs record byte-offset spans for every %set and
%def call, enabling fast "where was this defined?" lookups without
re-running the macro expander. The source file path is interned before each
insert; queries JOIN through files to return path strings.
// <[db-rest]>=
// @@
Assembly
// <[@file weaveback-tangle/src/db.rs]>=
use ;
use Path;
// <[db-schema]>
// <[db-types]>
// <[db-open]>
// <[db-baselines]>
// <[db-noweb-map]>
// <[db-chunk-deps]>
// <[db-chunk-defs-api]>
// <[db-macro-map]>
// <[db-config]>
// <[db-source-blocks]>
// <[db-merge]>
// <[db-rest]>
// @@