Output sinks and source tracing

output.rs defines the EvalOutput trait and its three implementations, the span types used to attribute output text back to source tokens, and the MacroMapEntry record stored in the macro_map database table.

Design rationale

The `EvalOutput` trait: pluggable output sinks

The evaluator calls push_str(text, span) for every piece of tracked text and push_untracked(text) for computed/script results. Decoupling the sink from the evaluator allows three strategies with no runtime overhead in the common case:

PlainOutput: Ignores span arguments entirely. push_str is #[inline] and compiles to a single String::push_str. This is the default path and has zero overhead compared to the old String-based accumulator.
TracingOutput: Records one SourceSpan per output line (the first tracked span on each line wins). Used when building the macro_map database table. Allocations are proportional to line count rather than token count — much cheaper than per-push recording.
PreciseTracingOutput: Records one SpanRange per source-token transition. Provides exact byte attribution for every character in the output. Used by the MCP weaveback_apply_fix oracle to verify that a source edit produces the expected output.

`SpanKind`: why was this text produced?

Knowing where a piece of output came from is useful; knowing how it was produced is essential for the apply-back tool. SpanKind distinguishes:

Literal — raw text from the document or a literal block.
MacroBody — text from expanding a macro body.
MacroArg — text substituted from a call-site argument.
VarBinding — text from a %set variable.
Computed — script/builtin result with no direct source token.

`TracingOutput` first-wins-per-line semantics

For each output line, the first push_str call that carries a span sets the line’s span. Subsequent tracked pushes on the same line are ignored (span already set). Untracked pushes advance line counters without setting a span. This models the most common query: "what source line corresponds to output line N?" The first token on the line is the best answer.

Output sink types

File structure

// <<@file weaveback-macro/src/evaluator/output.rs>>=
// <<output span kind>>
// <<output source span>>
// <<output eval output trait>>
// <<output plain output>>
// <<output tracing output>>
// <<output macro map entry>>
// <<output tracing into macro map>>
// <<output span range>>
// <<output precise tracing output>>
// @

`SpanKind` — classification of how output was produced

// <<output span kind>>=
// crates/weaveback-macro/src/evaluator/output.rs

/// Indicates how a piece of output relates to the original source.
#[derive(Debug, Clone, PartialEq, serde::Serialize, serde::Deserialize)]
pub enum SpanKind {
    /// Literal text from the source document or a textual block.
    Literal,
    /// Text substituted from expanding a macro body.
    MacroBody {
        macro_name: String,
    },
    /// Text substituted from an argument value at a macro call site.
    MacroArg {
        macro_name: String,
        param_name: String,
    },
    /// Text substituted from a global setting or without macro context.
    VarBinding {
        var_name: String,
    },
    /// Text generated programmatically (e.g. Rhai/Python script results, builtins)
    /// that has no direct corresponding source token for its content.
    Computed,
}
// @

`SourceSpan` — byte-offset reference into a source file

SourceSpan mirrors the fields of Token (src, pos, length) so that no conversion is needed when creating a span from an AST node’s token. Line and column numbers are derived on demand via LineIndex — they are not cached here to keep the struct small.

// <<output source span>>=
/// Byte-offset span referencing the source token that produced a piece of output.
///
/// Fields mirror `Token.src`, `Token.pos`, `Token.length` — no conversion needed.
/// Line/col can be derived on demand via `LineIndex`.
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct SourceSpan {
    /// Source file index (same as `Token.src`).
    pub src: u32,
    /// Byte offset in the source string (same as `Token.pos`).
    pub pos: usize,
    /// Byte length of the span (same as `Token.length`).
    pub length: usize,
    /// The kind of expansion that produced this text.
    pub kind: SpanKind,
}
// @

`EvalOutput` trait

push_str is the hot path — called for every literal text token and every macro argument that expands to non-empty text. push_untracked is called for built-in results and script outputs. finish consumes the accumulator and returns the assembled string. is_tracing signals whether the caller should invest the extra effort of per-argument span threading.

// <<output eval output trait>>=
/// Generic output sink for the evaluator.
///
/// The evaluator calls `push_str` for every piece of text it produces,
/// providing the `SourceSpan` of the token that generated it.
/// `push_untracked` is used for text whose origin cannot be attributed to
/// a single source span (e.g. Rhai/Python script results).
pub trait EvalOutput {
    /// Append `text` that originated at `span` in the source.
    fn push_str(&mut self, text: &str, span: SourceSpan);

    /// Append text with no span information (computed/script results).
    fn push_untracked(&mut self, text: &str);

    /// Consume the accumulator and return the rendered string.
    fn finish(self) -> String;

    /// Returns `true` for `PreciseTracingOutput`.
    /// Used to opt into per-argument span threading in `evaluate_macro_call_to`.
    fn is_tracing(&self) -> bool {
        false
    }
}
// @

`PlainOutput` — zero-overhead fast path

// <<output plain output>>=
/// Fast-path output accumulator — ignores span info, just collects text.
///
/// This is functionally identical to the existing `String`-based output in
/// `Evaluator::evaluate()`.  Zero overhead: span arguments are discarded.
#[derive(Debug)]
pub struct PlainOutput {
    buf: String,
}

impl PlainOutput {
    pub fn new() -> Self {
        Self {
            buf: String::new(),
        }
    }
}

impl Default for PlainOutput {
    fn default() -> Self {
        Self::new()
    }
}

impl EvalOutput for PlainOutput {
    #[inline]
    fn push_str(&mut self, text: &str, _span: SourceSpan) {
        self.buf.push_str(text);
    }

    #[inline]
    fn push_untracked(&mut self, text: &str) {
        self.buf.push_str(text);
    }

    fn finish(self) -> String {
        self.buf
    }
}
// @

`TracingOutput` — per-line source attribution

TracingOutput records one optional SourceSpan per completed output line. When a \n byte is encountered inside a push_str call, advance_line moves the current span into line_spans. If the text continues after the \n on the same call (a multi-line literal), the span is propagated to the next line so intermediate lines are not left unattributed.

// <<output tracing output>>=
/// Output accumulator that records one source span per output line.
///
/// For each completed output line the first tracked `push_str` span on that
/// line is stored.  Untracked pushes (Rhai results, builtins) advance the line
/// counter but do not contribute a span.
///
/// This is much cheaper than recording per-push-call byte offsets: allocations
/// are proportional to line count rather than token count.
#[derive(Debug)]
pub struct TracingOutput {
    buf: String,
    /// One entry per completed output line (terminated by `\n`).
    /// `None` means that line had no tracked source span.
    line_spans: Vec<Option<SourceSpan>>,
    /// Span for the current, still-open output line.
    current_line_span: Option<SourceSpan>,
}

impl TracingOutput {
    pub fn new() -> Self {
        Self {
            buf: String::new(),
            line_spans: Vec::new(),
            current_line_span: None,
        }
    }

    fn advance_line(&mut self) {
        self.line_spans.push(self.current_line_span.take());
    }
}

impl Default for TracingOutput {
    fn default() -> Self {
        Self::new()
    }
}

impl EvalOutput for TracingOutput {
    fn push_str(&mut self, text: &str, span: SourceSpan) {
        if text.is_empty() {
            return;
        }
        // First tracked span on each line wins.
        if self.current_line_span.is_none() {
            self.current_line_span = Some(span.clone());
        }
        let bytes = text.as_bytes();
        for i in 0..bytes.len() {
            if bytes[i] == b'\n' {
                self.advance_line();
                // Propagate span to the next line only when more content follows
                // this '\n' within the same push_str (intermediate line of a
                // multi-line literal).  If '\n' is the last byte, leave
                // current_line_span as None so the next push_str starts fresh.
                if i + 1 < bytes.len() && self.current_line_span.is_none() {
                    self.current_line_span = Some(span.clone());
                }
            }
        }
        self.buf.push_str(text);
    }

    fn push_untracked(&mut self, text: &str) {
        if text.is_empty() {
            return;
        }
        // Untracked text advances line boundaries but does not set a span.
        for b in text.bytes() {
            if b == b'\n' {
                self.advance_line();
            }
        }
        self.buf.push_str(text);
    }

    fn finish(self) -> String {
        self.buf
    }
}
// @

Note

`MacroMapEntry` — database record

// <<output macro map entry>>=
/// A serialized entry stored in the `macro_map` database table.
/// It maps an output line (indirectly via the table key) to the original
/// `.md` source file that generated it.
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct MacroMapEntry {
    /// Path of the source (literate) file containing the original text.
    pub src_file: String,
    /// 0-indexed line number within the source file.
    pub src_line: u32,
    /// 0-indexed column (byte offset) within the source line.
    pub src_col: u32,
    /// The kind of macro expansion that produced this text.
    pub kind: SpanKind,
}
// @

`TracingOutput::into_macro_map_entries`

Converts the per-line span records into MacroMapEntry values. The final open line (if the output does not end with \n) is included via a chained iterator. Lines with no tracked span are silently skipped.

// <<output tracing into macro map>>=
use crate::evaluator::state::SourceManager;
use crate::line_index::LineIndex;

impl TracingOutput {
    /// Convert the per-line span records into `MacroMapEntry`s suitable for
    /// storage in the `macro_map` database table.
    ///
    /// Returns a list of `(expanded_line_index, MacroMapEntry)`.
    pub fn into_macro_map_entries(
        &self,
        sources: &SourceManager,
    ) -> Vec<(u32, MacroMapEntry)> {
        // Collect completed lines, plus the final open line if the output does
        // not end with a newline.
        let final_span = if !self.buf.is_empty() && !self.buf.ends_with('\n') {
            Some(self.current_line_span.as_ref())
        } else {
            None
        };

        let base = self.line_spans.iter().map(|s| s.as_ref());
        let all: Box<dyn Iterator<Item = Option<&SourceSpan>>> = match final_span {
            Some(s) => Box::new(base.chain(std::iter::once(s))),
            None => Box::new(base),
        };

        // Cache one LineIndex per source file so repeated lookups into the same
        // file are O(log n) rather than O(n).
        let mut line_index_cache: std::collections::HashMap<u32, LineIndex> =
            std::collections::HashMap::new();
        let mut results = Vec::new();
        for (line_idx, maybe_span) in all.enumerate() {
            let Some(span) = maybe_span else { continue };
            let Some(src_path) = sources.source_files().get(span.src as usize) else {
                continue;
            };
            let Some(src_bytes) = sources.get_source(span.src) else {
                continue;
            };
            let line_index = line_index_cache
                .entry(span.src)
                .or_insert_with(|| LineIndex::from_bytes(src_bytes));
            let (line_1, col_1) = line_index.line_col(span.pos);
            results.push((
                line_idx as u32,
                MacroMapEntry {
                    src_file: src_path.to_string_lossy().into_owned(),
                    src_line: (line_1 - 1) as u32,
                    src_col: (col_1 - 1) as u32,
                    kind: span.kind.clone(),
                },
            ));
        }
        results
    }
}
// @

`SpanRange` — a contiguous attributed byte range

SpanRange is used by PreciseTracingOutput and by TrackedValue.spans. The start/end fields index into the output buffer (or variable value); span gives the source token that produced those bytes.

// <<output span range>>=
/// A contiguous byte range in the output attributed to one source token.
/// Gaps (script/builtin results) are absent from the list.
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct SpanRange {
    pub start: usize,
    pub end: usize,
    pub span: SourceSpan,
}
// @

`PreciseTracingOutput` — exact per-byte attribution

PreciseTracingOutput coalesces consecutive pushes from the same token into a single SpanRange (same src`+`pos`+`length). flush_current is called when the token changes or on push_untracked. Gaps (untracked segments) are simply absent from the ranges list — callers use span_at_byte with a binary search to query the coverage.

// <<output precise tracing output>>=
/// Output accumulator with exact per-byte source attribution.
///
/// Records one `SpanRange` entry per source-token transition — far fewer
/// entries than bytes, and no granularity tradeoff.
///
/// Use `into_parts()` to obtain `(output_string, Vec<SpanRange>)`.
/// Use `span_at_byte` to query which span covers a given byte offset.
#[derive(Debug, Default)]
pub struct PreciseTracingOutput {
    buf: String,
    ranges: Vec<SpanRange>,
    current_span: Option<SourceSpan>,
    current_start: usize,
}

impl PreciseTracingOutput {
    pub fn new() -> Self {
        Self::default()
    }

    fn flush_current(&mut self) {
        if let Some(span) = self.current_span.take() {
            self.ranges.push(SpanRange {
                start: self.current_start,
                end: self.buf.len(),
                span,
            });
        }
    }

    /// Consume and return `(output_string, span_ranges)`.
    /// The ranges are sorted by `start` and cover only tracked regions.
    pub fn into_parts(mut self) -> (String, Vec<SpanRange>) {
        self.flush_current();
        (self.buf, self.ranges)
    }

    /// Return the `SourceSpan` covering `byte_offset`, or `None` for untracked gaps.
    pub fn span_at_byte(ranges: &[SpanRange], byte_offset: usize) -> Option<&SourceSpan> {
        let idx = ranges.partition_point(|sr| sr.start <= byte_offset);
        if idx == 0 {
            return None;
        }
        let sr = &ranges[idx - 1];
        if byte_offset < sr.end { Some(&sr.span) } else { None }
    }
}

impl EvalOutput for PreciseTracingOutput {
    fn is_tracing(&self) -> bool {
        true
    }

    fn push_str(&mut self, text: &str, span: SourceSpan) {
        if text.is_empty() {
            return;
        }
        let same = self.current_span.as_ref().is_some_and(|s| {
            s.src == span.src && s.pos == span.pos && s.length == span.length
        });
        if !same {
            self.flush_current();
            self.current_start = self.buf.len();
            self.current_span = Some(span);
        }
        self.buf.push_str(text);
    }

    fn push_untracked(&mut self, text: &str) {
        if text.is_empty() {
            return;
        }
        self.flush_current(); // end current span; gap follows
        self.buf.push_str(text);
    }

    fn finish(self) -> String {
        self.into_parts().0
    }
}
// @

Output sinks and source tracing

Design rationale

The EvalOutput trait: pluggable output sinks

SpanKind: why was this text produced?

TracingOutput first-wins-per-line semantics

Output sink types

File structure

SpanKind — classification of how output was produced

SourceSpan — byte-offset reference into a source file

EvalOutput trait

PlainOutput — zero-overhead fast path

TracingOutput — per-line source attribution

MacroMapEntry — database record

TracingOutput::into_macro_map_entries

SpanRange — a contiguous attributed byte range

PreciseTracingOutput — exact per-byte attribution