Custom rules¶

Create custom inline, block, continuation, and transform rules for new Markdown syntax.

Create a custom rule when you want Wenmode to recognize syntax that is not part of an existing preset. Custom rules use the same base classes as built-in rules: InlineRule, BlockRule, ContinueRule, and plain Rule for extensions that only provide root transforms.

Rule API¶

Every rule has a stable name. Parser rule names are used as dictionary keys in parser.rules, and block rule names are also used as regex group names when the parser compiles block openers, so use snake_case identifier-style names.

Rules also have an order class attribute. Block and inline rules default to order = 100; lower values run earlier when syntax overlaps.

class MyRule(InlineRule):
    order = 90

Parser and Wenmode accept either rule classes or configured rule instances. Classes are instantiated automatically. Instances are useful for rules with options.

Custom inline rule¶

An inline rule inherits from InlineRule, defines a regex pattern, and returns a node plus the index where parsing should resume.

This example parses ++marked++ as a Mark node.

The target Markdown syntax is:

This is ++marked++ text.

from __future__ import annotations

import re
from typing import TYPE_CHECKING

from wenmode.nodes import Mark as MarkNode
from wenmode.nodes import Node
from wenmode.rules import InlineRule
from wenmode.state import BlockState

if TYPE_CHECKING:
    from wenmode.parser import Parser


class PlusMark(InlineRule):
    def __init__(self) -> None:
        super().__init__('plus_mark', r'\+\+', '+')

    def parse(
        self,
        parser: Parser,
        text: str,
        match: re.Match[str],
        state: BlockState | None = None,
    ) -> tuple[Node | None, int]:
        end = text.find('++', match.end())
        if end == -1:
            return None, match.start()

        children = parser.parse_inlines(text[match.end() : end], state)
        return MarkNode(children=children), end + 2

from wenmode import Wenmode
from wenmode.rules import Emphasis

wenmode = Wenmode([PlusMark, Emphasis])
text = '++very *important*++'
expected = '''
<p><mark>very <em>important</em></mark></p>
'''

assert wenmode.render(text) == expected.lstrip()

The third InlineRule constructor argument is trigger_chars. Use it when a rule starts with known literal characters. The parser then jumps directly to those characters and calls rule.compiled.match() there. If trigger_chars is empty, the parser calls rule.search(text, pos); override search() for rules that need custom scanning behavior.

If an inline rule decides not to handle a match, return (None, match.start()). The parser will emit the marker as text and continue.

Custom node and renderers¶

Use a custom node when your syntax does not map to one of Wenmode’s built-in node types. The parser only creates the node; each renderer still needs to know how to serialize it.

This example parses reStructuredText keyboard roles such as :kbd:`Ctrl+C`. Markdown has no native keyboard-input node, so the example creates a KeyboardInput node and registers renderers for HTML, Markdown, and reStructuredText output.

import re

from wenmode import HTMLRenderer, MarkdownRenderer, Parser, RSTRenderer
from wenmode.nodes import Literal, Node
from wenmode.renderers import RenderContext
from wenmode.rules import InlineRule
from wenmode.state import BlockState


class KeyboardInput(Literal):
    def __init__(self, value: str) -> None:
        super().__init__(type='keyboardInput', value=value)


class KeyboardInputRole(InlineRule):
    def __init__(self) -> None:
        super().__init__('keyboard_input_role', r':kbd:`', ':')

    def parse(
        self,
        parser: Parser,
        text: str,
        match: re.Match[str],
        state: BlockState | None = None,
    ) -> tuple[Node | None, int]:
        end = text.find('`', match.end())
        if end == -1:
            return None, match.start()

        return KeyboardInput(value=text[match.end() : end]), end + 1


@HTMLRenderer.register('keyboardInput')
def render_keyboard_input_html(
    renderer: HTMLRenderer,
    node: KeyboardInput,
    context: RenderContext,
) -> str:
    return f'<kbd>{renderer.escape_html(node.value)}</kbd>'


@MarkdownRenderer.register('keyboardInput')
def render_keyboard_input_markdown(
    renderer: MarkdownRenderer,
    node: KeyboardInput,
    context: RenderContext,
) -> str:
    return f'<kbd>{renderer.escape_text(node.value)}</kbd>'


@RSTRenderer.register('keyboardInput')
def render_keyboard_input_rst(
    renderer: RSTRenderer,
    node: KeyboardInput,
    context: RenderContext,
) -> str:
    return f':kbd:`{renderer.escape_inline_literal(node.value)}`'


parser = Parser([KeyboardInputRole])
root = parser.parse('Press :kbd:`Ctrl+C`.')
expected_html = '''
<p>Press <kbd>Ctrl+C</kbd>.</p>
'''
expected_markdown = r'''
Press <kbd>Ctrl\+C</kbd>\.
'''
expected_rst = '''
Press :kbd:`Ctrl+C`.
'''

assert root.to_ast() == {
    'type': 'root',
    'children': [
        {
            'type': 'paragraph',
            'children': [
                {'type': 'text', 'value': 'Press '},
                {'type': 'keyboardInput', 'value': 'Ctrl+C'},
                {'type': 'text', 'value': '.'},
            ],
        }
    ],
}
assert HTMLRenderer().render(root) == expected_html.lstrip()
assert MarkdownRenderer().render(root) == expected_markdown.lstrip()
assert RSTRenderer().render(root) == expected_rst.lstrip()

Registering a handler mutates the renderer class, so do it during application startup or in the module that defines the extension. Without a registered handler, BaseRenderer falls back to rendering children or value; that may be useful for plain-text fallbacks, but it will not preserve your custom output format semantics.

Custom block rule¶

A block rule inherits from BlockRule. The constructor passes a rule name and a block opener pattern. The parser wraps each block opener as a named regex group, matches the current line, and calls the matched rule’s parse() method.

The parse() method receives the parser, the current BlockState, and the matched opener. It must advance state when it consumes input.

This example treats a line that starts with ! as a paragraph after removing the marker:

! Pay attention to *this*.

from __future__ import annotations

import re
from typing import TYPE_CHECKING

from wenmode.nodes import Node, Paragraph
from wenmode.rules import BlockRule
from wenmode.state import BlockState

if TYPE_CHECKING:
    from wenmode.parser import Parser


class BangParagraph(BlockRule):
    def __init__(self) -> None:
        super().__init__('bang_paragraph', r'[ \t]{0,3}!')

    def parse(self, parser: Parser, state: BlockState, match: re.Match[str]) -> Node | None:
        text = state.line.lstrip(' \t!').rstrip('\r\n')
        state.advance()
        return Paragraph(children=parser.parse_inlines(text, state))

If the rule decides not to handle a matched opener, return None without advancing. The parser will fall back to paragraph parsing for that line.

Use parser.parse_blocks(text, parent_state=state) when your block rule contains nested Markdown content. Nested parsing shares the same state store and increments state.depth.

Custom continuation rule¶

A continuation rule inherits from ContinueRule and implements parse_paragraph_continuation(). It receives the paragraph lines collected so far and may return a replacement node. Setext headings and definition lists are implemented this way.

This example turns a paragraph followed by a line of ! markers into a level 6 heading:

Important title
!!!

from __future__ import annotations

import re
from typing import TYPE_CHECKING

from wenmode.nodes import Heading, Node
from wenmode.rules import ContinueRule
from wenmode.state import BlockState

if TYPE_CHECKING:
    from wenmode.parser import Parser


BANG_HEADING_RE = re.compile(r'[ \t]{0,3}!+[ \t]*$')


class BangHeading(ContinueRule):
    def __init__(self) -> None:
        super().__init__('bang_heading')

    def matches(self, line: str) -> bool:
        return line.lstrip(' \t').startswith('!')

    def parse_paragraph_continuation(
        self,
        parser: Parser,
        state: BlockState,
        lines: list[str],
    ) -> Node | None:
        if BANG_HEADING_RE.match(state.line) is None:
            return None

        state.advance()
        text = ''.join(lines).strip()
        return Heading(depth=6, children=parser.parse_inlines(text, state))

matches() is optional, but it is useful as a cheap pre-check before doing more expensive parsing.

Rule options¶

Use configured rule instances when your rule has options.

from wenmode import Parser
from wenmode.rules import Image, Link

parser = Parser([
    Link(references=False),
    Image(references=False),
])

Wenmode stores enabled rules by name, so registering another instance with the same name replaces the previous configuration.

parser.register_rule(Link(references=False))

Root transforms can declare required_rules; the parser automatically registers missing required rules when it rebuilds the rule set.

Extension state and transforms¶

Parser, rule, and transform instances should not store per-parse mutable state. Use BlockState.store with a StateKey when a rule or transform needs shared state for one parse.

This example collects glossary term definitions from block syntax and stores the result on the root node:

@term[HTML]: HyperText Markup Language
@term[AST]: Abstract Syntax Tree

from __future__ import annotations

import re
from typing import TYPE_CHECKING

from wenmode.nodes import Node, Root
from wenmode.rules import BlockRule, Rule
from wenmode.state import BlockState, StateKey

if TYPE_CHECKING:
    from wenmode.parser import Parser
    from wenmode.rules import RootTransform


TERMS = StateKey('my_package.terms', lambda: {})
TERM_RE = re.compile(r'^[ \t]{0,3}@term\[(?P<label>[^\]]+)]:[ \t]*(?P<title>.*)$')


class Glossary(Rule):
    def __init__(self) -> None:
        super().__init__('glossary')
        self.root_transforms: list[RootTransform] = [GlossaryTransform()]


class TermDefinition(BlockRule):
    def __init__(self) -> None:
        super().__init__('term_definition', r'[ \t]{0,3}@term\[')

    def parse(self, parser: Parser, state: BlockState, match: re.Match[str]) -> Node | None:
        term = TERM_RE.match(state.line.rstrip('\r\n'))
        if term is None:
            return None

        state.store.get(TERMS)[term.group('label')] = term.group('title')
        state.advance()
        return None


class GlossaryTransform:
    name = 'glossary'
    defer_inlines = False
    required_rules = [TermDefinition]

    def prepare(self, parser: Parser, root: Root, state: BlockState) -> None:
        pass

    def transform(self, parser: Parser, root: Root, state: BlockState) -> None:
        root.data = {'terms': dict(state.store.get(TERMS))}

StateKey.factory creates the value the first time it is requested for a parse. Each top-level parse gets a fresh store. Nested block parsing shares the same store, so definitions inside block quotes, lists, directives, or footnotes are visible to document-level transforms.

Root transforms follow the RootTransform protocol. Custom transforms do not need to inherit from it; use the protocol for type annotations when you want type checkers to validate the transform shape. Each transform must provide:

name, used for deduplication when multiple rules attach the same transform.
required_rules, a sequence of rule classes or configured rule instances to auto-register.
defer_inlines, which tells the parser whether inline parsing must wait until after prepare().
prepare(parser, root, state), run after block parsing.
transform(parser, root, state), run after deferred inline parsing is resolved.

Set defer_inlines = True only when inline rules need document-wide state collected by prepare(), such as reference-style links. Rule sets with deferred inlines cannot be used with streaming output.

Testing a rule¶

Test custom rules at the parser and renderer boundary. A small rule usually needs these cases:

recognized syntax renders as expected,
unmatched or incomplete syntax stays as text,
nested inline parsing works if the rule calls parser.parse_inlines(),
custom node renderers are registered for each output format you support,
any rule option changes the enabled behavior,
per-parse state does not leak between parser calls.

from wenmode import HTMLRenderer, Parser
from wenmode.rules import Emphasis


def render(markdown: str) -> str:
    parser = Parser([PlusMark, Emphasis])
    return HTMLRenderer().render(parser.parse(markdown))


EXPECTED_MARKED = '''
<p><mark>a <em>b</em></mark></p>
'''
EXPECTED_OPEN = '''
<p>++open</p>
'''


def test_plus_mark() -> None:
    marked = '++a *b*++'
    open_marker = '++open'

    assert render(marked) == EXPECTED_MARKED.lstrip()
    assert render(open_marker) == EXPECTED_OPEN.lstrip()