Internals¶
Explore Wenmode’s node model, parser flow, rule dispatch, root transforms, state, and renderer internals.
Wenmode is organized around a small set of data objects and dispatch points: AST nodes, parser rules, root transforms, parser state, and renderers.
The AST is mdast-compatible. Core Markdown nodes use mdast-style names and
fields: root.children, paragraph.children, heading.depth, link.url,
link.title, image.url, image.alt, code.lang, and literal value
fields. Extensions use the same data-object style with explicit node types such
as table, footnoteReference, math, ruby, and the directive node family.
AST nodes¶
Node classes live in wenmode.nodes. They are dataclasses that describe parsed
content. Rendering behavior is not stored on the nodes; renderers decide how to
turn nodes into output.
from wenmode import Wenmode
text = '# Hello'
root = Wenmode().parse(text)
print(root.to_ast())
Node.to_ast() returns a plain dictionary representation, recursively
converting child nodes.
{
'type': 'root',
'children': [
{
'type': 'heading',
'children': [{'type': 'text', 'value': 'Hello'}],
'depth': 1,
}
],
}
Nodes follow mdast-style type names where possible. Common node groups are:
Parent nodes, such as
root,paragraph,heading,blockquote,list,listItem, table nodes, directive nodes, and formatting nodes.Literal nodes, such as
text,inlineCode,code,html,math, andinlineMath.Leaf nodes, such as
thematicBreak,break,image, andfootnoteReference.
Nodes are pure data objects. They do not carry HTML tag names, HTML attributes,
or other renderer hints. HTMLRenderer, MarkdownRenderer, and custom
renderers own output behavior.
Parser flow¶
Parser.parse() creates a fresh BlockState, parses block nodes into a
Root, runs root transform preparation, resolves deferred inline parsing, runs
root transforms, and returns the root node.
At a high level:
Blank lines are skipped.
Block openers are matched against enabled
BlockRulepatterns.If no block rule handles the line, the parser reads a paragraph.
Paragraph text is parsed with enabled inline rules.
Root transforms finalize document-wide features.
Parser.parse_iter() follows the block parser incrementally and yields nodes as
they are parsed. It rejects rule sets that require deferred inline transforms.
Rules¶
All rules inherit from Rule and have a stable name. Enabled rules are
available as parser.rules, a dictionary keyed by rule name.
BlockRule instances provide a block opener pattern and a parse() method.
They receive the parser, current block state, and the matched opener.
ContinueRule instances can inspect paragraph continuation lines. This is used
for syntax where a paragraph can become another block, such as setext headings.
InlineRule instances provide a regex pattern and parse() method. They return
(node, end_index). If the rule does not accept a match, it returns
(None, start_index) so the parser can treat the marker as text.
Root transforms¶
Rules can attach root transforms through their root_transforms attribute.
Transforms can:
add required helper rules,
collect document-wide definitions,
defer inline parsing until definitions are known,
update nodes after the whole tree is parsed.
Reference links, footnotes, abbreviations, and heading ID generation use this mechanism.
Parser state¶
BlockState stores the current line index, nesting depth, deferred inline
queues, and a per-parse StateStore. Built-in reference, footnote, and
abbreviation rules use that store through StateKey objects instead of fixed
fields on BlockState.
Because a new state and store are created for every top-level parse, definitions do not leak between parser calls. Nested block parsing shares the same store, so definitions found inside block quotes, lists, directives, or footnotes remain visible to document-level transforms.
StreamBlockState wraps a line buffer for iterable sources. It supports
lookahead without forcing the entire input to be read immediately.
Renderers¶
Renderers inherit from BaseRenderer, which dispatches by node.type.
from wenmode.renderers import BaseRenderer
class PlainTextRenderer(BaseRenderer):
pass
Register handlers with BaseRenderer.register() in renderer subclasses.
from wenmode.nodes import Text
from wenmode.renderers import BaseRenderer, RenderContext
class UpperRenderer(BaseRenderer):
pass
@UpperRenderer.register('text')
def render_text(renderer: UpperRenderer, node: Text, context: RenderContext) -> str:
return node.value.upper()
If no handler is registered, BaseRenderer renders child nodes or a literal
value field. HTMLRenderer registers explicit handlers for Wenmode’s node
types and falls back to the same child/value behavior for unknown nodes.