(custom-rules)= # Custom rules ```{rst-class} lead Create custom inline, block, continuation, and transform rules for new Markdown syntax. ``` --- Create a custom rule when you want Wenmode to recognize syntax that is not part of an existing preset. Custom rules use the same base classes as built-in rules: `InlineRule`, `BlockRule`, `ContinueRule`, and plain `Rule` for extensions that only provide root transforms. ## Rule API Every rule has a stable `name`. Parser rule names are used as dictionary keys in `parser.rules`, and block rule names are also used as regex group names when the parser compiles block openers, so use snake_case identifier-style names. Rules also have an `order` class attribute. Block and inline rules default to `order = 100`; lower values run earlier when syntax overlaps. ```python class MyRule(InlineRule): order = 90 ``` `Parser` and `Wenmode` accept either rule classes or configured rule instances. Classes are instantiated automatically. Instances are useful for rules with options. ## Custom inline rule An inline rule inherits from `InlineRule`, defines a regex pattern, and returns a node plus the index where parsing should resume. This example parses `++marked++` as a `Mark` node. The target Markdown syntax is: ```markdown This is ++marked++ text. ``` ```python from __future__ import annotations import re from typing import TYPE_CHECKING from wenmode.nodes import Mark as MarkNode from wenmode.nodes import Node from wenmode.rules import InlineRule from wenmode.state import BlockState if TYPE_CHECKING: from wenmode.parser import Parser class PlusMark(InlineRule): def __init__(self) -> None: super().__init__('plus_mark', r'\+\+', '+') def parse( self, parser: Parser, text: str, match: re.Match[str], state: BlockState | None = None, ) -> tuple[Node | None, int]: end = text.find('++', match.end()) if end == -1: return None, match.start() children = parser.parse_inlines(text[match.end() : end], state) return MarkNode(children=children), end + 2 ``` Register it like any other rule. ```python from wenmode import Wenmode from wenmode.rules import Emphasis wenmode = Wenmode([PlusMark, Emphasis]) text = '++very *important*++' expected = '''
very important
''' assert wenmode.render(text) == expected.lstrip() ``` The third `InlineRule` constructor argument is `trigger_chars`. Use it when a rule starts with known literal characters. The parser then jumps directly to those characters and calls `rule.compiled.match()` there. If `trigger_chars` is empty, the parser calls `rule.search(text, pos)`; override `search()` for rules that need custom scanning behavior. If an inline rule decides not to handle a match, return `(None, match.start())`. The parser will emit the marker as text and continue. ## Custom node and renderers Use a custom node when your syntax does not map to one of Wenmode's built-in node types. The parser only creates the node; each renderer still needs to know how to serialize it. This example parses reStructuredText keyboard roles such as `` :kbd:`Ctrl+C` ``. Markdown has no native keyboard-input node, so the example creates a `KeyboardInput` node and registers renderers for HTML, Markdown, and reStructuredText output. ```python import re from wenmode import HTMLRenderer, MarkdownRenderer, Parser, RSTRenderer from wenmode.nodes import Literal, Node from wenmode.renderers import RenderContext from wenmode.rules import InlineRule from wenmode.state import BlockState class KeyboardInput(Literal): def __init__(self, value: str) -> None: super().__init__(type='keyboardInput', value=value) class KeyboardInputRole(InlineRule): def __init__(self) -> None: super().__init__('keyboard_input_role', r':kbd:`', ':') def parse( self, parser: Parser, text: str, match: re.Match[str], state: BlockState | None = None, ) -> tuple[Node | None, int]: end = text.find('`', match.end()) if end == -1: return None, match.start() return KeyboardInput(value=text[match.end() : end]), end + 1 @HTMLRenderer.register('keyboardInput') def render_keyboard_input_html( renderer: HTMLRenderer, node: KeyboardInput, context: RenderContext, ) -> str: return f'{renderer.escape_html(node.value)}' @MarkdownRenderer.register('keyboardInput') def render_keyboard_input_markdown( renderer: MarkdownRenderer, node: KeyboardInput, context: RenderContext, ) -> str: return f'{renderer.escape_text(node.value)}' @RSTRenderer.register('keyboardInput') def render_keyboard_input_rst( renderer: RSTRenderer, node: KeyboardInput, context: RenderContext, ) -> str: return f':kbd:`{renderer.escape_inline_literal(node.value)}`' parser = Parser([KeyboardInputRole]) root = parser.parse('Press :kbd:`Ctrl+C`.') expected_html = '''Press Ctrl+C.
''' expected_markdown = r''' Press Ctrl\+C\. ''' expected_rst = ''' Press :kbd:`Ctrl+C`. ''' assert root.to_ast() == { 'type': 'root', 'children': [ { 'type': 'paragraph', 'children': [ {'type': 'text', 'value': 'Press '}, {'type': 'keyboardInput', 'value': 'Ctrl+C'}, {'type': 'text', 'value': '.'}, ], } ], } assert HTMLRenderer().render(root) == expected_html.lstrip() assert MarkdownRenderer().render(root) == expected_markdown.lstrip() assert RSTRenderer().render(root) == expected_rst.lstrip() ``` Registering a handler mutates the renderer class, so do it during application startup or in the module that defines the extension. Without a registered handler, `BaseRenderer` falls back to rendering `children` or `value`; that may be useful for plain-text fallbacks, but it will not preserve your custom output format semantics. ## Custom block rule A block rule inherits from `BlockRule`. The constructor passes a rule name and a block opener pattern. The parser wraps each block opener as a named regex group, matches the current line, and calls the matched rule's `parse()` method. The `parse()` method receives the parser, the current `BlockState`, and the matched opener. It must advance `state` when it consumes input. This example treats a line that starts with `!` as a paragraph after removing the marker: ```markdown ! Pay attention to *this*. ``` ```python from __future__ import annotations import re from typing import TYPE_CHECKING from wenmode.nodes import Node, Paragraph from wenmode.rules import BlockRule from wenmode.state import BlockState if TYPE_CHECKING: from wenmode.parser import Parser class BangParagraph(BlockRule): def __init__(self) -> None: super().__init__('bang_paragraph', r'[ \t]{0,3}!') def parse(self, parser: Parser, state: BlockState, match: re.Match[str]) -> Node | None: text = state.line.lstrip(' \t!').rstrip('\r\n') state.advance() return Paragraph(children=parser.parse_inlines(text, state)) ``` If the rule decides not to handle a matched opener, return `None` without advancing. The parser will fall back to paragraph parsing for that line. Use `parser.parse_blocks(text, parent_state=state)` when your block rule contains nested Markdown content. Nested parsing shares the same state store and increments `state.depth`. ## Custom continuation rule A continuation rule inherits from `ContinueRule` and implements `parse_paragraph_continuation()`. It receives the paragraph lines collected so far and may return a replacement node. Setext headings and definition lists are implemented this way. This example turns a paragraph followed by a line of `!` markers into a level 6 heading: ```markdown Important title !!! ``` ```python from __future__ import annotations import re from typing import TYPE_CHECKING from wenmode.nodes import Heading, Node from wenmode.rules import ContinueRule from wenmode.state import BlockState if TYPE_CHECKING: from wenmode.parser import Parser BANG_HEADING_RE = re.compile(r'[ \t]{0,3}!+[ \t]*$') class BangHeading(ContinueRule): def __init__(self) -> None: super().__init__('bang_heading') def matches(self, line: str) -> bool: return line.lstrip(' \t').startswith('!') def parse_paragraph_continuation( self, parser: Parser, state: BlockState, lines: list[str], ) -> Node | None: if BANG_HEADING_RE.match(state.line) is None: return None state.advance() text = ''.join(lines).strip() return Heading(depth=6, children=parser.parse_inlines(text, state)) ``` `matches()` is optional, but it is useful as a cheap pre-check before doing more expensive parsing. ## Rule options Use configured rule instances when your rule has options. ```python from wenmode import Parser from wenmode.rules import Image, Link parser = Parser([ Link(references=False), Image(references=False), ]) ``` Wenmode stores enabled rules by `name`, so registering another instance with the same name replaces the previous configuration. ```python parser.register_rule(Link(references=False)) ``` Root transforms can declare `required_rules`; the parser automatically registers missing required rules when it rebuilds the rule set. ## Extension state and transforms Parser, rule, and transform instances should not store per-parse mutable state. Use `BlockState.store` with a `StateKey` when a rule or transform needs shared state for one parse. This example collects glossary term definitions from block syntax and stores the result on the root node: ```markdown @term[HTML]: HyperText Markup Language @term[AST]: Abstract Syntax Tree ``` ```python from __future__ import annotations import re from typing import TYPE_CHECKING from wenmode.nodes import Node, Root from wenmode.rules import BlockRule, Rule from wenmode.state import BlockState, StateKey if TYPE_CHECKING: from wenmode.parser import Parser from wenmode.rules import RootTransform TERMS = StateKey('my_package.terms', lambda: {}) TERM_RE = re.compile(r'^[ \t]{0,3}@term\[(?P