noweb.rs is the heart of weaveback-tangle.
Three types work together: ChunkStore reads and expands chunk definitions;
ChunkWriter writes @file chunks to disk through
`SafeFileWriter`; Clip is the top-level façade that
glues them together and exposes the public API.
Source-map and baseline data produced here is persisted by
`WeavebackDb`. The CLI entry point that drives Clip is in
main.rs. See weaveback_tangle.adoc
for the module map and architecture.adoc
for the pipeline overview.
Chunk syntax
A chunk definition starts with a line matching the open pattern, followed by
body lines, and ends with a line matching the close pattern. The open
delimiter, close delimiter, chunk-end marker, and comment-marker prefixes are
all configurable; the defaults are <[, ]>, @, and #,//.
Using the classic << / >> notation for illustration:
// <<@file src/output.rs>>= ← file chunk (declares an output file) ... body lines ... // @ ← chunk-end // <<helper-chunk>>= ← named chunk ... body lines ... // @@
A chunk reference inside a body causes the referenced chunk to be expanded inline, with indentation accumulated:
// <<@file src/output.rs>>=
fn main() {
// <<body>>
}
// @@
Modifiers on definitions and references:
-
@file— marks the chunk as an output file target. -
@replace— on a definition, discards all earlier definitions of this name. -
@reversed— on a reference, reverses the order in which accumulated definitions are emitted.
Data types
ChunkDef
A ChunkDef stores one contiguous block of text between a chunk-open line and
a chunk-close line. It records the definition’s base indentation — the
number of leading spaces on the open marker line. During expansion that many
spaces are stripped from every content line before the caller’s indentation
prefix is prepended, so indentation stays relative to where the chunk is
referenced, not where it was defined.
// <[noweb-chunk-def]>=
// @@
Errors
ChunkLocation is a (file_idx, line_no) pair used in error messages;
file_idx indexes into ChunkStore::file_names.
ChunkError covers every failure that can occur during parsing or expansion:
| Variant | When it is produced |
|---|---|
|
Expansion depth exceeds |
|
A chunk directly or indirectly references itself. |
|
A reference names a chunk that was never defined. Silently expands to
nothing by default; fatal when |
|
An I/O failure in |
|
An |
// <[noweb-errors]>=
use Error;
// @@
NamedChunk
A single chunk name may accumulate multiple ChunkDef entries across one or
more source files. All definitions are emitted in sequence when the chunk is
expanded; the @reversed modifier on a reference reverses this order.
Reference tracking is handled externally by the caller (see write_files),
keeping expand_inner a pure function with no hidden mutable state.
// <[noweb-named-chunk]>=
// @@
ChunkStore
ChunkStore holds all chunk definitions and drives expansion. It owns three
compiled regexes (open, slot, close) built once from the configurable delimiter
and comment-marker settings, plus the chunk registry and a list of source-file
names used for error messages.
Path safety
Two free functions gate output paths before any content is written.
path_is_safe rejects literal absolute paths, Windows-style drive paths, and
.. traversal components. It runs on every @file chunk name at parse time.
expand_tilde replaces a leading ~ with $HOME on Unix. A tilde-expanded
path resolves to an absolute path outside gen/ — it therefore bypasses
path_is_safe (which would reject it) and goes instead through the
allow_home gate in ChunkWriter::write_chunk.
// <[noweb-path-utils]>=
// @@
Struct
// <[noweb-chunkstore-struct]>=
// @@
Constructor
ChunkStore::new compiles three regexes that encode the full delimiter
grammar. All three allow an optional comment prefix (alternated as #|// for
the default markers) and use the escaped delimiter strings:
-
The open pattern matches chunk-definition headers: an optional comment prefix, the open delimiter, optional
@replaceand@filemodifiers captured as named groups so they can be detected structurally rather than by scanning the whole line, the chunk name, and a=suffix. -
The slot pattern matches chunk references inside body lines: an optional comment prefix, the open delimiter, an optional
@fileor@reversedmodifier (captured as its own group so@reversedcan be detected structurally rather than by scanning the whole line), the chunk name, and the close delimiter. -
The close pattern matches chunk-end markers: an optional comment prefix followed by the chunk-end string (default
@).
// <[noweb-chunkstore-new]>=
// @@
Read loop
read scans a source text line-by-line, maintaining a simple three-state
machine:
@file chunks are registered in file_chunks on first appearance. Duplicate
@file definitions without @replace are pushed to parse_errors in strict
mode (fatal when write_files is called) or reported to stderr and skipped in
permissive mode, keeping the first definition rather than silently clobbering it.
// <[noweb-chunkstore-read]>=
// @@
Expander
ExpandState bundles the mutable state that threads through the recursive
expansion so that expand_inner does not exceed the argument-count limit.
ExpandResult is a type alias for the three-tuple returned by expand_with_map.
expand_inner is the core recursive function. It takes a chunk name, the
accumulated indentation prefix from all enclosing references, a mutable
ExpandState reference, a reference location for error messages, and whether
@reversed mode is active.
Each content line is either a slot (a chunk reference) or a plain line.
For plain lines the base indentation recorded at parse time is stripped and
target_indent is prepended; a NowebMapEntry is also produced so callers
can reconstruct the source map. For slot lines, expand_inner recurses,
extending the indent by the slot’s relative indentation within the definition.
The seen set provides O(1) cycle detection. A chunk name is inserted on
descent and removed on return, so sibling references to the same chunk do not
falsely trigger the cycle check. The parallel stack vector tracks the same
entries in order, enabling readable cycle traces like A → B → C → A when a
cycle is detected.
expand_inner is a pure function: it does not mutate ChunkStore state.
Reference tracking is performed externally by inserting chunk names into the
caller-provided referenced_chunks set, which check_unused_chunks then
consults.
// <[noweb-chunkstore-expand]>=
/// Mutable state threaded through the recursive chunk expansion.
/// Return type of `expand_with_map`: expanded lines, source-map entries,
/// referenced chunk names, and direct dependency edges.
type ExpandResult = ;
// @@
Utilities
After all @file chunks are written, check_unused_chunks warns about named
chunks that were defined but never referenced — a common mistake when
refactoring literate sources.
// <[noweb-chunkstore-utils]>=
// @@
ChunkWriter
ChunkWriter borrows a SafeFileWriter and writes one @file chunk at a
time. It dispatches on whether the expanded path is absolute or relative:
-
Relative paths — go through
SafeFileWriter::before_write/after_write, which stage to aNamedTempFile, run the optional formatter, check the modification baseline, and atomically copy the result togen/. -
Absolute paths (only possible after tilde expansion of
~/…chunks) — written directly viafs::File::create. Theallow_homeflag inSafeWriterConfiggates this; without it the write is rejected as aSecurityViolation.
// <[noweb-chunkwriter]>=
// @@
Clip
Clip composes ChunkStore and SafeFileWriter into the public API. Most
methods delegate directly to one of the two inner types.
Constructor and read / query methods
// <[noweb-clip-core]>=
// @@
In-memory tangle check
tangle_check is a free function (no SafeFileWriter involved) that reads a
set of source texts purely in memory and expands every @file chunk. It is
the oracle used by the /__apply HTTP endpoint to verify that an edited chunk
body still produces valid tangle output before the .adoc source file is
modified.
// <[noweb-tangle-check]>=
/// Verify that `texts` tangle without errors.
///
/// Each element of `texts` is a `(source_text, filename)` pair — the same
/// inputs you would pass to [`Clip::read`]. Every `@file` chunk is expanded
/// in memory; no filesystem I/O is performed.
///
/// Returns a map from output file path (relative to `gen/`) to its expanded
/// lines on success, or the first expansion error encountered.
// @@
Post-formatter remap
When a formatter (e.g. rustfmt) rewrites the output, the pre-formatter line
numbers stored in the source map no longer correspond to the lines the user
sees in their editor. remap_noweb_entries uses similar::TextDiff to build
a mapping from post-formatter line indices back to pre-formatter line indices,
then re-keys the NowebMapEntry vector accordingly.
The remap uses a three-tier strategy, applied in order, and assigns a
Confidence level to each output line based on which tier attributed it:
-
Diff anchor (
Confidence::Exact):similar::TextDiffmapsEquallines exactly from pre- to post-formatter positions. -
Contextual content hash (
Confidence::HashMatch): for lines still unattributed, match by a normalised(prev, curr, next)triple — three-line context rather than a single line. This eliminates false matches on trivial lines ({,}, etc.) and handles structure-preserving reorders such as import sorting. Ambiguous keys (same triple appearing in multiple pre-formatter lines from different chunks) are excluded from the hash map entirely: better no match than a wrong match. -
Bidirectional nearest-neighbour fill (
Confidence::Inferred): remaining gaps are filled forward (prefer preceding source line) then backward (covers leading insertions prepended by the formatter).
|
Note
|
Attribution is approximate when a formatter makes large-scale semantic
changes (e.g. merges or splits blocks). The |
// <[noweb-remap]>=
/// Normalise a source line for content-hash matching:
/// strip leading/trailing whitespace and drop any trailing `//` comment.
// @@
Writing files and dry-run
write_files iterates over every @file chunk, calls expand_with_map to
get both the expanded lines and their NowebMapEntry source-map entries,
writes the content through a ChunkWriter, then stores the source-map entries
in the database’s noweb_map table.
After write_chunk runs (including any configured formatter), the formatted
output is read back from disk. If it differs from the pre-formatter content,
remap_noweb_entries re-keys the source-map entries using post-formatter
line numbers. This ensures that perform_trace always receives line numbers
consistent with the file the user sees in their editor.
After all files are written, unused-chunk warnings are emitted.
list_output_files resolves the same paths that write_files would write to,
without touching the filesystem. It is used by --dry-run.
// <[noweb-clip-write]>=
// @@
Assembly
The @file chunk assembles the module by expanding all sub-chunks in order.
Imports are inlined here; the sub-chunks contain the type definitions and
impl blocks.
// <[@file weaveback-tangle/src/noweb.rs]>=
use memchr;
use Regex;
use ;
use fs;
use ;
use ;
use crate;
use crateSafeWriterError;
use crateWeavebackError;
use crateSafeFileWriter;
use debug;
// <[noweb-chunk-def]>
// <[noweb-errors]>
// <[noweb-named-chunk]>
// <[noweb-path-utils]>
// <[noweb-chunkstore-struct]>
// <[noweb-chunkstore-new]>
// <[noweb-chunkstore-read]>
// <[noweb-chunkstore-expand]>
// <[noweb-chunkstore-utils]>
// <[noweb-chunkwriter]>
// <[noweb-remap]>
// <[noweb-clip-core]>
// <[noweb-tangle-check]>
// <[noweb-clip-write]>
// @@