[[PageOutline(2-3)]] = Wiki Engine Refactoring: Split Parser and Formatter Stages = The current Trac Wiki Engine, while working quite well, suffers from a few fundamental drawbacks: - the parsing and formatting steps are done in one step; this makes it hard to examine the inner structure of a Wiki text, as a new parser/formatter has to be rewritten by subclassing an existing one - the code is quite complex, contains numerous (only?) special cases, and as a result, is not easy to maintain Because of that, a parser/formatter split was in order. This has to be done carefully however, as the Wiki engine is one of the "vital piece" of Trac. ''We first outline the major requirements, then briefly explain the envisioned implementation of the new engine. Finally we present and discuss all the major opened issues for the Wiki system, which provide concrete use cases that must be addressed by the new design.'' == Requirements == 1. '''Compatibility''' [[br]] The existing Trac wiki syntax should continue to work the way it always had (well, modulo the bug fixes). Pages that used to "look good" using the former engine should look the same. 2. '''Flexibility''' [[br]] The Trac wiki engine is a modular one, with the possibility to have plugins inject their own Wiki syntax (IWikiSyntaxProvider). This flexibility should remain and even be augmented: 1. Possibility to extend the parser, by adding new "tokens" 2. Possibility to extend the existing formatters, in order to have the new tokens transformed by existing formatters 3. Possibility to easily create new formatters, to address new needs 3. '''Speed''' [[br]] The use of wiki text is ubiquitous in Trac, so it should be very fast to parse a wiki text and apply one or many formatting on it. 4. '''Maintainability''' [[br]] The wiki engine must be easy to understand. Ok, the regexps can still be scary to satisfy 3. == Implementation == I've chosen a tree-based approach, in which the wiki text is first converted to a Wiki DOM tree, consisting of instances of subclasses of !WikiDomNode. This tree can then in turn be transformed into some other output format, like Genshi Element nodes or events. This is the approach outlined in #4431. More specifically, the !WikiDomNode class hierarchy is used to model the inheritance relations between the various types of nodes. This enables to setup ''embedding rules'', specifying which node can be placed into which others. Therefore the structure of the document and its consistency can be ensured. The inheritance relationships are also quite useful on the formatter level, as this way one can easily specify some specific handling for a group of node types. More importantly, if a new type of node is added (say by a `IWikiSyntaxProvider`), the existing formatters can also be augmented by custom rendering callbacks for those new nodes. If some formatters aren't updated (say because there are special formatters contributed by some unrelated plugins), they'll be able to handle those new nodes based on their parent type. As a snapshot of the work in progress, here are the new interfaces: - `IWikiSyntaxProvider`, for providing new syntax. [[br]] The default Trac syntax is in most part defined by syntax provider components. The interface remains compatible with Trac [milestone:0.10] (lesson learned ;-) ) - `IWikiFormatterContributor`, for providing new or extending existing formatting flavors. [[br]] The default Trac formatters are entirely provided by formatter contributors. [[br]] (''new interface'') {{{ #!python class IWikiSyntaxProvider(Interface): def get_wiki_syntax(): """Return an iterable that provides additional wiki syntax. A new syntax rule is a `(priority, regexp, callback)` triple. The `priority` is used to globally order the rules. The `regexp` must be of the form `"(?P<...>...)"`, where `<...>` contains a globally unique name that will be used to identify the rule. The `callback` expects a `parser` argument of type `WikiParser`, and will use it to participate to the construction of the Wiki DOM tree. Before 0.11, the additional wiki syntax corresponded simply to a `(regexp, callback)` pair. The priority of such rules is set to `50`, which introduces them after the simple text style rules and before the other ones, as they used to be. However, the old `callback` function was of the form `handler(formatter, ns, match, fullmatch)`, i.e. it was actually expected to take care of the ''formatting'' as well. """ def get_link_resolvers(): """Generate new handlers for new TracLinks prefixes. A handler is a `(namespace, callback)` pair, where the `callback` is a function expected a `parser` argument (the `WikiParser` instance currently driving the parsing). Before 0.11, the generated value were `(namespace, formatter)` pairs, where the `formatter` was a function of the form `fmt(formatter, ns, target, label)`, and would return some HTML fragment. The `label` is already HTML escaped, whereas the `target` is not. """ def get_valid_parents(): """Generate rules for element embedding. Each rule is of the form `(nodetype, valid_parent_nodetypes)`, which means that an instance node of the given `nodetype` type can be parented in instance nodes of the `valid_parent_nodetypes` types. Example: `(Inline, Anchor)` means that any `Inline` node can be set to be a child of an `Anchor` node. (`valid_parent_nodetypes` can be a single class or a tuple of classes, all subclasses of `WikiDomNode`) """ class IWikiFormatterContributor(Interface): def get_wiki_formatters(): """Generate `(flavor, nodetype, formatter_callback)` triples. This enables the wiki system to register the `formatter_callback` for handling `nodetype` nodes when rendering a parse tree to the specified `flavor` kind of output. If for a given node, there's no callback registered directly for its class, a callback registered for one of its ancestor class will be used, following the method resolution order. The `formatter_callback` itself is a function accepting a `Formatter` argument and the `WikiDomNode` instance currently being rendered and must return a result which is appropriate for the kind of formatting being done. """ }}} The main utility functions for parsing and formatting wiki text will be: {{{ #!python # -- Utility functions def parse_wiki(ctx, wikitext): """Parse `wikitext` and produce a Wiki DOM tree. Return the `WikiDocument` root node of that tree. """ return WikiSystem(ctx.env).parse(wikitext) def format_to(flavor, ctx, wikidom, **kwargs): """Format to `flavor` the given `wikidom` parsed wiki text, in the given `ctx`. `wikidom` can be simply text instead of a Wiki DOM tree, and it will be parsed on the fly. """ return WikiSystem(ctx.env).format(ctx, wikidom, flavor, **kwargs) def format_to_html(ctx, wikidom, escape_newlines=False): return WikiSystem(ctx.env).format(ctx, wikidom, 'default', escape_newlines=escape_newlines) # etc. }}} ''implementations details to be updated as the code progresses'' === The !WikiSystem === It contains the extension point for the `IWikiSyntaxProvider`s. It also serves as a registry for the various formatter, according to their ''flavor''. === The Parser === !WikiParser Component:: The component is used to incorporate syntax additions and rules provided by the `IWikiSyntaxProvider`s and to maintain the environment global state of the parser. !WikiEngine:: An engine is a transient object whose purpose is to maintain some state during the parsing of a given wiki text, and produce a Wiki DOM tree. !WikiDomNode:: A node in a Wiki DOM tree. Each node can have children nodes, and other properties depending on their type. === The Formatters === !WikiFormatter Component:: The component is used to register the specific formatting rules that the `IWikiSyntaxProvider`s may offer. (format)Formatter:: Individual formatters are transient objects used for traversing a Wiki DOM tree and generating some kind of output. == Addressing Current Issues == Here we list some of the most outstanding issues in the Trac ticket database concerning the Wiki component, and we discuss for each of them how they would be handled in the proposed solution. === Parser Issues === ==== #2048: Wiki macros for generating `
elements are collapsed, because the corresponding output is wrapped in a Markup instance. This can be solved by having the !HtmlFormatter output Genshi events or Elements. ==== #3232: Wiki syntax should be enforced (for ticket comments) ==== In various situations, it happens that the formatted HTML is not ''valid'', as it contains opened tags with no corresponding closing tags. If the formatter uses a regular way (e.g. recursive descent traversal of the parse tree) to produce structured content, this problem can be avoided completely. ==== #3794: Invalid table(with indentation) layout in wiki. ==== An example of the usual kind of "buglets" that affect the Wiki engine: the formatting that is produced for a given wiki text doesn't look "right". Fixing such issues is currently not very convenient, because all is done at once. Being able to focus on the logic of either the parser or the formatter would tremendously help with this kind of issue. Other similar tickets: #1936, #3335, #4790 ... == Comments == Not sure what your specific plans are, but one of the docutils developers at !PyCon illustrated fairly convincingly why calling ''enter''/''leave'' methods for each node in the tree, when "parsing", is bad, and why a visitor pattern is good. Something to consider (AlecThomas) For the formatting phase, I use something that could be called a "typed" hierarchical visitor pattern. You start with the root of the parse tree, and the formatter simply `render`s the root node, by calling the most specific callback registered for the type of that node. That callback in turn can call the `render` method on children nodes, and the process repeats. So in effect you find there the ''hierarchical navigation'' notion (as the visitor has full access to the tree) and the ''conditional navigation'' notion (as the visitor decides if it's worth recursing or not) of the c2Wiki:HierarchicalVisitorPattern. But this will become clearer with some code :-) [[br]] -- cboos ---- See also: [query:?component=wiki&status=!closed Wiki related tickets], [../VerticalHorizontalParsing]