Custom rules¶
Create custom inline, block, continuation, and transform rules for new Markdown syntax.
Create a custom rule when you want Wenmode to recognize syntax that is not part
of an existing preset. Custom rules use the same base classes as built-in rules:
InlineRule, BlockRule, ContinueRule, and plain Rule for extensions that
only provide root transforms.
Rule API¶
Every rule has a stable name. Parser rule names are used as dictionary keys
in parser.rules, and block rule names are also used as regex group names when
the parser compiles block openers, so use snake_case identifier-style names.
Rules also have an order class attribute. Block and inline rules default to
order = 100; lower values run earlier when syntax overlaps.
class MyRule(InlineRule):
order = 90
Parser and Wenmode accept either rule classes or configured rule instances.
Classes are instantiated automatically. Instances are useful for rules with
options.
Custom inline rule¶
An inline rule inherits from InlineRule, defines a regex pattern, and returns
a node plus the index where parsing should resume.
This example parses ++marked++ as a Mark node.
The target Markdown syntax is:
This is ++marked++ text.
from __future__ import annotations
import re
from typing import TYPE_CHECKING
from wenmode.nodes import Mark as MarkNode
from wenmode.nodes import Node
from wenmode.rules import InlineRule
from wenmode.state import BlockState
if TYPE_CHECKING:
from wenmode.parser import Parser
class PlusMark(InlineRule):
def __init__(self) -> None:
super().__init__('plus_mark', r'\+\+', '+')
def parse(
self,
parser: Parser,
text: str,
match: re.Match[str],
state: BlockState | None = None,
) -> tuple[Node | None, int]:
end = text.find('++', match.end())
if end == -1:
return None, match.start()
children = parser.parse_inlines(text[match.end() : end], state)
return MarkNode(children=children), end + 2
Register it like any other rule.
from wenmode import Wenmode
from wenmode.rules import Emphasis
wenmode = Wenmode([PlusMark, Emphasis])
text = '++very *important*++'
expected = '''
<p><mark>very <em>important</em></mark></p>
'''
assert wenmode.render(text) == expected.lstrip()
The third InlineRule constructor argument is trigger_chars. Use it when a
rule starts with known literal characters. The parser then jumps directly to
those characters and calls rule.compiled.match() there. If trigger_chars is
empty, the parser calls rule.search(text, pos); override search() for rules
that need custom scanning behavior.
If an inline rule decides not to handle a match, return
(None, match.start()). The parser will emit the marker as text and continue.
Custom node and renderers¶
Use a custom node when your syntax does not map to one of Wenmode’s built-in node types. The parser only creates the node; each renderer still needs to know how to serialize it.
This example parses reStructuredText keyboard roles such as
:kbd:`Ctrl+C`. Markdown has no native keyboard-input node, so the example
creates a KeyboardInput node and registers renderers for HTML, Markdown, and
reStructuredText output.
import re
from wenmode import HTMLRenderer, MarkdownRenderer, Parser, RSTRenderer
from wenmode.nodes import Literal, Node
from wenmode.renderers import RenderContext
from wenmode.rules import InlineRule
from wenmode.state import BlockState
class KeyboardInput(Literal):
def __init__(self, value: str) -> None:
super().__init__(type='keyboardInput', value=value)
class KeyboardInputRole(InlineRule):
def __init__(self) -> None:
super().__init__('keyboard_input_role', r':kbd:`', ':')
def parse(
self,
parser: Parser,
text: str,
match: re.Match[str],
state: BlockState | None = None,
) -> tuple[Node | None, int]:
end = text.find('`', match.end())
if end == -1:
return None, match.start()
return KeyboardInput(value=text[match.end() : end]), end + 1
@HTMLRenderer.register('keyboardInput')
def render_keyboard_input_html(
renderer: HTMLRenderer,
node: KeyboardInput,
context: RenderContext,
) -> str:
return f'<kbd>{renderer.escape_html(node.value)}</kbd>'
@MarkdownRenderer.register('keyboardInput')
def render_keyboard_input_markdown(
renderer: MarkdownRenderer,
node: KeyboardInput,
context: RenderContext,
) -> str:
return f'<kbd>{renderer.escape_text(node.value)}</kbd>'
@RSTRenderer.register('keyboardInput')
def render_keyboard_input_rst(
renderer: RSTRenderer,
node: KeyboardInput,
context: RenderContext,
) -> str:
return f':kbd:`{renderer.escape_inline_literal(node.value)}`'
parser = Parser([KeyboardInputRole])
root = parser.parse('Press :kbd:`Ctrl+C`.')
expected_html = '''
<p>Press <kbd>Ctrl+C</kbd>.</p>
'''
expected_markdown = r'''
Press <kbd>Ctrl\+C</kbd>\.
'''
expected_rst = '''
Press :kbd:`Ctrl+C`.
'''
assert root.to_ast() == {
'type': 'root',
'children': [
{
'type': 'paragraph',
'children': [
{'type': 'text', 'value': 'Press '},
{'type': 'keyboardInput', 'value': 'Ctrl+C'},
{'type': 'text', 'value': '.'},
],
}
],
}
assert HTMLRenderer().render(root) == expected_html.lstrip()
assert MarkdownRenderer().render(root) == expected_markdown.lstrip()
assert RSTRenderer().render(root) == expected_rst.lstrip()
Registering a handler mutates the renderer class, so do it during application
startup or in the module that defines the extension. Without a registered
handler, BaseRenderer falls back to rendering children or value; that may
be useful for plain-text fallbacks, but it will not preserve your custom output
format semantics.
Custom block rule¶
A block rule inherits from BlockRule. The constructor passes a rule name and a
block opener pattern. The parser wraps each block opener as a named regex group,
matches the current line, and calls the matched rule’s parse() method.
The parse() method receives the parser, the current BlockState, and the
matched opener. It must advance state when it consumes input.
This example treats a line that starts with ! as a paragraph after removing
the marker:
! Pay attention to *this*.
from __future__ import annotations
import re
from typing import TYPE_CHECKING
from wenmode.nodes import Node, Paragraph
from wenmode.rules import BlockRule
from wenmode.state import BlockState
if TYPE_CHECKING:
from wenmode.parser import Parser
class BangParagraph(BlockRule):
def __init__(self) -> None:
super().__init__('bang_paragraph', r'[ \t]{0,3}!')
def parse(self, parser: Parser, state: BlockState, match: re.Match[str]) -> Node | None:
text = state.line.lstrip(' \t!').rstrip('\r\n')
state.advance()
return Paragraph(children=parser.parse_inlines(text, state))
If the rule decides not to handle a matched opener, return None without
advancing. The parser will fall back to paragraph parsing for that line.
Use parser.parse_blocks(text, parent_state=state) when your block rule
contains nested Markdown content. Nested parsing shares the same state store and
increments state.depth.
Custom continuation rule¶
A continuation rule inherits from ContinueRule and implements
parse_paragraph_continuation(). It receives the paragraph lines collected so
far and may return a replacement node. Setext headings and definition lists are
implemented this way.
This example turns a paragraph followed by a line of ! markers into a level 6
heading:
Important title
!!!
from __future__ import annotations
import re
from typing import TYPE_CHECKING
from wenmode.nodes import Heading, Node
from wenmode.rules import ContinueRule
from wenmode.state import BlockState
if TYPE_CHECKING:
from wenmode.parser import Parser
BANG_HEADING_RE = re.compile(r'[ \t]{0,3}!+[ \t]*$')
class BangHeading(ContinueRule):
def __init__(self) -> None:
super().__init__('bang_heading')
def matches(self, line: str) -> bool:
return line.lstrip(' \t').startswith('!')
def parse_paragraph_continuation(
self,
parser: Parser,
state: BlockState,
lines: list[str],
) -> Node | None:
if BANG_HEADING_RE.match(state.line) is None:
return None
state.advance()
text = ''.join(lines).strip()
return Heading(depth=6, children=parser.parse_inlines(text, state))
matches() is optional, but it is useful as a cheap pre-check before doing more
expensive parsing.
Rule options¶
Use configured rule instances when your rule has options.
from wenmode import Parser
from wenmode.rules import Image, Link
parser = Parser([
Link(references=False),
Image(references=False),
])
Wenmode stores enabled rules by name, so registering another instance with the
same name replaces the previous configuration.
parser.register_rule(Link(references=False))
Root transforms can declare required_rules; the parser automatically
registers missing required rules when it rebuilds the rule set.
Extension state and transforms¶
Parser, rule, and transform instances should not store per-parse mutable state.
Use BlockState.store with a StateKey when a rule or transform needs shared
state for one parse.
This example collects glossary term definitions from block syntax and stores the result on the root node:
@term[HTML]: HyperText Markup Language
@term[AST]: Abstract Syntax Tree
from __future__ import annotations
import re
from typing import TYPE_CHECKING
from wenmode.nodes import Node, Root
from wenmode.rules import BlockRule, Rule
from wenmode.state import BlockState, StateKey
if TYPE_CHECKING:
from wenmode.parser import Parser
from wenmode.rules import RootTransform
TERMS = StateKey('my_package.terms', lambda: {})
TERM_RE = re.compile(r'^[ \t]{0,3}@term\[(?P<label>[^\]]+)]:[ \t]*(?P<title>.*)$')
class Glossary(Rule):
def __init__(self) -> None:
super().__init__('glossary')
self.root_transforms: list[RootTransform] = [GlossaryTransform()]
class TermDefinition(BlockRule):
def __init__(self) -> None:
super().__init__('term_definition', r'[ \t]{0,3}@term\[')
def parse(self, parser: Parser, state: BlockState, match: re.Match[str]) -> Node | None:
term = TERM_RE.match(state.line.rstrip('\r\n'))
if term is None:
return None
state.store.get(TERMS)[term.group('label')] = term.group('title')
state.advance()
return None
class GlossaryTransform:
name = 'glossary'
defer_inlines = False
required_rules = [TermDefinition]
def prepare(self, parser: Parser, root: Root, state: BlockState) -> None:
pass
def transform(self, parser: Parser, root: Root, state: BlockState) -> None:
root.data = {'terms': dict(state.store.get(TERMS))}
StateKey.factory creates the value the first time it is requested for a parse.
Each top-level parse gets a fresh store. Nested block parsing shares the same
store, so definitions inside block quotes, lists, directives, or footnotes are
visible to document-level transforms.
Root transforms follow the RootTransform protocol. Custom transforms do not
need to inherit from it; use the protocol for type annotations when you want
type checkers to validate the transform shape. Each transform must provide:
name, used for deduplication when multiple rules attach the same transform.required_rules, a sequence of rule classes or configured rule instances to auto-register.defer_inlines, which tells the parser whether inline parsing must wait until afterprepare().prepare(parser, root, state), run after block parsing.transform(parser, root, state), run after deferred inline parsing is resolved.
Set defer_inlines = True only when inline rules need document-wide state
collected by prepare(), such as reference-style links. Rule sets with deferred
inlines cannot be used with streaming output.
Testing a rule¶
Test custom rules at the parser and renderer boundary. A small rule usually needs these cases:
recognized syntax renders as expected,
unmatched or incomplete syntax stays as text,
nested inline parsing works if the rule calls
parser.parse_inlines(),custom node renderers are registered for each output format you support,
any rule option changes the enabled behavior,
per-parse state does not leak between parser calls.
from wenmode import HTMLRenderer, Parser
from wenmode.rules import Emphasis
def render(markdown: str) -> str:
parser = Parser([PlusMark, Emphasis])
return HTMLRenderer().render(parser.parse(markdown))
EXPECTED_MARKED = '''
<p><mark>a <em>b</em></mark></p>
'''
EXPECTED_OPEN = '''
<p>++open</p>
'''
def test_plus_mark() -> None:
marked = '++a *b*++'
open_marker = '++open'
assert render(marked) == EXPECTED_MARKED.lstrip()
assert render(open_marker) == EXPECTED_OPEN.lstrip()