The weaveback-macro parser transforms a flat token stream produced by the lexer
into a tree of ParseNode values. It is a hand-written deterministic pushdown
automaton (DPDA): each state owns its set of termination tokens and delegates
everything else to a shared opener/leaf fallthrough handler.
State Termination Table
Each row names a parser state and defines what tokens it acts on directly.
Every other token falls through to the opener/leaf handler in parse().
| State | Terminates on | Accepted directly | Fallthrough |
|---|---|---|---|
Block |
|
none — only terminates |
all others via opener/leaf |
Param |
|
|
nested |
Macro |
never (invariant state) |
none |
error on any token |
Comment |
|
|
all others ignored |
Macro is never the top of the stack when a token arrives: it sits below
Param, which is always pushed immediately after Macro. Receiving a token
with Macro on top is an internal invariant violation.
File Structure
The single output file is assembled from all the chunks defined below.
impl Parser Structure
The impl block assembles every method sub-chunk in declaration order. The sections below define each sub-chunk in the same sequence.
// <<parser impl>>=
// @
Module Preamble
// <<parser preamble>>=
use crateLineIndex;
use crate;
use File;
use ;
use Error;
/// The parser-specific error type.
// @
Serialization Helpers
JSON serialization of tokens and parse nodes, used only in tests to dump
the parse tree for inspection. Both impl blocks are gated #[cfg(test)]
so they compile away entirely in production builds.
// <<parser serialization>>=
// @
The Parser State Machine
Stack Frame Variants
Block carries the tag extent so BlockClose can validate the matching tag.
tag_len == 0 means an anonymous block (%{/%}).
Macro and Param are always paired: Macro sits below Param on the
stack. Param receives individual tokens; Macro is only ever visible
after Param is popped (at )) so that handle_param can verify the
expected stack shape.
// <<parser state>>=
/// Stack frame state. Each variant owns its termination tokens;
/// everything else falls through to the shared opener/leaf handler.
///
/// `tag_len == 0` means an anonymous block (`%{`/`%}`).
///
/// `Macro` and `Param` are always paired: `Macro` sits below `Param` on the
/// stack. `Param` receives individual tokens; `Macro` is only ever visible
/// after `Param` is popped (at `)`) so that `handle_param` can verify the
/// expected stack shape.
// <<parser context>>
// <<parser block_tag_label>>
// @
Parse Context
ParseContext bundles the raw source bytes with a cached LineIndex.
It is built once per parse() call so the O(n) newline scan happens at most
once regardless of how many errors are emitted.
// <<parser context>>=
/// Bundles the source bytes with a borrowed `LineIndex`.
/// The index is built once by the caller (who may already need it for lexer
/// error formatting) and passed in, so the O(n) newline scan never happens
/// more than once per source string.
// @
Block Tag Label
A small free function centralises the formatting of block-tag error labels so
all error messages use the same %name{ / (anonymous) style.
// <<parser block_tag_label>>=
/// Format a block tag for error messages.
/// Anonymous blocks (`tag == ""`) render as `(anonymous)`;
/// named blocks render as `%name{` (open) or `%name}` (close).
// @
Parser Struct and Arena
The Parser owns a flat arena of ParseNode values (nodes) and a stack of
(ParserState, arena_index) pairs. All tree structure is encoded as index
lists inside ParseNode::parts; no heap pointers are stored.
// <<parser struct>>=
// @
Arena Primitives
// <<parser arena>>=
/// Convenience: `create_add_node` + push a new stack frame in one call.
/// Set `end_pos` on the node currently at the top of the stack.
/// Must be called *before* the corresponding `stack.pop()`.
/// Returns `Err` rather than panicking so callers can propagate cleanly.
/// Close all open nodes and clear the stack. Called on both error and
/// normal termination paths to keep the tree structurally consistent.
// @
Block Tag Extraction
// <<parser block_tag>>=
/// Extract the tag sub-span from a `BlockOpen` or `BlockClose` token.
/// For `%{` / `%}` (length 2) the tag is empty (tag_len == 0).
/// For `%foo{` / `%foo}` (length > 2) the tag is bytes [pos+1 .. pos+length-1].
// @
State Handlers
handle_block
Called when the top of the stack is a Block state. The only token a Block
responds to directly is BlockClose; everything else falls through.
tag_pos/tag_len come from the caller’s pattern match to avoid a redundant
second stack lookup.
// <<parser handle_block>>=
/// Handle a token when the top of the stack is a `Block`.
/// `tag_pos`/`tag_len` come directly from the caller's pattern match —
/// no second stack lookup needed.
/// Returns `Ok(true)` if the token was consumed (caller should `continue`).
// @
handle_param
Called when the top of the stack is a Param state. Handles comma (open next
param), close-paren (close param and its enclosing macro), and the three token
kinds that are direct children of a param node. Everything else falls through
to allow nested blocks, macros, and variables inside parameter values.
// <<parser handle_param>>=
/// Handle a token when the top of the stack is a `Param`.
/// Returns `Ok(true)` if the token was consumed (caller should `continue`).
// @
handle_comment
Comment state is total: it consumes every token it sees. The only interesting
tokens are CommentOpen (push a nested comment frame) and CommentClose (pop
the frame). Everything else is silently swallowed. Because this handler
always consumes, it returns Result<(), ParserError> rather than Result<bool,
ParserError>.
// <<parser handle_comment>>=
/// Handle a token when the top of the stack is a `Comment`.
/// Comment state always consumes every token, so no bool return is needed.
// @
Main Dispatch Loop
parse() is the public entry point. It builds a ParseContext once, seeds
the stack with a synthetic root Block, then dispatches each token to the
appropriate state handler. Tokens not consumed by the state handler fall
through to the opener/leaf arm.
// <<parser parse>>=
/// Main parse function.
/// `content` is the raw source bytes — used for block-tag comparison and diagnostics.
/// `line_index` is borrowed from the caller, who may have built it already for
/// lexer-error formatting, so the O(n) newline scan happens at most once per source.
// @
Token I/O
These methods read token streams from files or stdin (used by the
weaveback-macro binary when it receives tokens from a preceding pipeline
stage).
// <<parser token_io>>=
// @
Public API
Node accessors, JSON serialization, AST construction, and space-stripping are
grouped here. These are called by the evaluator and by ast/mod.rs.
// <<parser api>>=
/// Get a reference to a node by index
/// Get a mutable reference to a node by index
/// Get the root node's index (usually 0 if parse succeeded)
/// Process AST including space stripping
/// Direct build without space stripping
/// Strip ending spaces from a node's token
// @
Tests
The test module exercises the parser through the full lex→parse pipeline.
lex_parse is a helper that runs both stages and returns Ok(()) or an error
string; lex_parse_err additionally accepts lex-stage errors so that tests
for unclosed constructs can check both paths.
Tests are grouped into:
-
Tagged block helpers — basic open/close matching and mismatches
-
Tagged block — valid structures — nesting, mixed named/anonymous blocks, Unicode content, comments inside blocks
-
Tagged block — mismatch errors — wrong tag name, crossed nesting
-
Unclosed block errors — various ways to leave blocks open
-
Unclosed macro / lex-level errors — unclosed
%foo(and%/* … */ -
EOF token must not appear in AST — guards against a historical bug where the zero-length EOF token leaked as a
Textnode -
Root block
end_pos— the root node’send_posmust equal the input length
// <<@file weaveback-macro/src/parser/tests.rs>>=
// src/parser/tests.rs
use crateLexer;
use crateLineIndex;
use crateParser;
use crate;
// -----------------------------------------------------------------------
// Tagged block helpers
// -----------------------------------------------------------------------
// -----------------------------------------------------------------------
// Helper: lex+parse expecting an error from either stage
// -----------------------------------------------------------------------
// -----------------------------------------------------------------------
// Tagged block — valid structures
// -----------------------------------------------------------------------
// -----------------------------------------------------------------------
// Tagged block — mismatch errors
// -----------------------------------------------------------------------
// -----------------------------------------------------------------------
// Unclosed block errors
// -----------------------------------------------------------------------
// -----------------------------------------------------------------------
// Unclosed macro / lex-level errors
// -----------------------------------------------------------------------
// -----------------------------------------------------------------------
// EOF token must not appear as a Text node in the AST
// -----------------------------------------------------------------------
// -----------------------------------------------------------------------
// Root block end_pos is set correctly
// -----------------------------------------------------------------------
// @