[[PageOutline(2-3)]] = Wiki Engine Refactoring: Split Parser and Formatter Stages = The current Trac Wiki Engine, while working quite well, suffers from a few fundamental drawbacks: - the parsing and formatting steps are done in one step; this makes it hard to examine the inner structure of a Wiki text, as a new parser/formatter has to be rewritten by subclassing an existing one - the code is quite complex, contains numerous (only?) special cases, and as a result, is not easy to maintain Because of that, a parser/formatter split was in order. This has to be done carefully however, as the Wiki engine is one of the "vital piece" of Trac. ''We first outline the major requirements, then briefly explain the envisioned implementation of the new engine. Finally we present and discuss all the major opened issues for the Wiki system, which provide concrete use cases that must be addressed by the new design.'' == Requirements == 1. '''Compatibility''' [[br]] The existing Trac wiki syntax should continue to work the way it always had (well, modulo the bug fixes). Pages that used to "look good" using the former engine should look the same. 2. '''Flexibility''' [[br]] The Trac wiki engine is a modular one, with the possibility to have plugins inject their own Wiki syntax (IWikiSyntaxProvider). This flexibility should remain and even be augmented: 1. Possibility to extend the parser, by adding new "tokens" 2. Possibility to extend the existing formatters, in order to have the new tokens transformed by existing formatters 3. Possibility to easily create new formatters, to address new needs 3. '''Speed''' [[br]] The use of wiki text is ubiquitous in Trac, so it should be very fast to parse a wiki text and apply one or many formatting on it. 4. '''Maintainability''' [[br]] The wiki engine must be easy to understand. Ok, the regexps can still be scary to satisfy 3. == Implementation Notes == I've chosen a tree-based approach, in which the wiki text is first converted to a Wiki DOM tree, consisting of instances of subclasses of !WikiDomNode. This tree can then in turn be transformed into some other output format, like Genshi Element nodes or events. This is the approach outlined in #4431. //More specifically, the !WikiDomNode class hierarchy is used to model the inheritance relations between the various types of nodes. This enables to setup ''embedding rules'', specifying which node can be placed into which others. Therefore the structure of the document and its consistency can be ensured.// not sure about this anymore (cboos) The inheritance relationships are also quite useful on the formatter level, as this way one can easily specify some specific handling for a group of node types. More importantly, if a new type of node is added (say by a `IWikiSyntaxProvider`), the existing formatters can also be augmented by custom rendering callbacks for those new nodes. If some formatters aren't updated (say because there are special formatters contributed by some unrelated plugins), they'll be able to handle those new nodes based on their parent type. Some early ideas are visible in [[.@6#Implementation|version 6]] of this document. The parsing logic is described in VerticalHorizontalParsing. === The !WikiSystem === It contains the extension point for the `IWikiSyntaxProvider`s. It also serves as a registry for the various formatter, according to their ''flavor''. === The Parser === !WikiParser Component:: The component is used to incorporate syntax additions and rules provided by the `IWikiSyntaxProvider`s and to maintain the environment global state of the parser. !WikiDocument:: A document correspond to the parsing of a given wiki text. It contains a Wiki DOM tree. !WikiDomNode:: A node in a Wiki DOM tree. Each node can have children nodes, and other properties depending on their type. === The Formatters === !WikiFormatter Component:: The component is used to register the specific formatting rules that the `IWikiSyntaxProvider`s may offer. (format)Formatter:: Individual formatters are transient objects used for traversing a Wiki DOM tree and generating some kind of output. == Addressing Current Issues == Here we list some of the most outstanding issues in the Trac ticket database concerning the Wiki component, and we discuss for each of them how they would be handled in the proposed solution. === Parser Issues === ==== #2048: Wiki macros for generating `
elements are collapsed, because the corresponding output is wrapped in a Markup instance. This can be solved by having the !HtmlFormatter output Genshi events or Elements. ==== #3232: Wiki syntax should be enforced (for ticket comments) ==== In various situations, it happens that the formatted HTML is not ''valid'', as it contains opened tags with no corresponding closing tags. If the formatter uses a regular way (e.g. recursive descent traversal of the parse tree) to produce structured content, this problem can be avoided completely. ==== #3794: Invalid table(with indentation) layout in wiki. ==== An example of the usual kind of "buglets" that affect the Wiki engine: the formatting that is produced for a given wiki text doesn't look "right". Fixing such issues is currently not very convenient, because all is done at once. Being able to focus on the logic of either the parser or the formatter would tremendously help with this kind of issue. Other similar tickets: #1936, #3335, #4790 ... == Comments == Not sure what your specific plans are, but one of the docutils developers at !PyCon illustrated fairly convincingly why calling ''enter''/''leave'' methods for each node in the tree, when "parsing", is bad, and why a visitor pattern is good. Something to consider (AlecThomas) For the formatting phase, I use something that could be called a "typed" hierarchical visitor pattern. You start with the root of the parse tree, and the formatter simply `render`s the root node, by calling the most specific callback registered for the type of that node. That callback in turn can call the `render` method on children nodes, and the process repeats. So in effect you find there the ''hierarchical navigation'' notion (as the visitor has full access to the tree) and the ''conditional navigation'' notion (as the visitor decides if it's worth recursing or not) of the c2Wiki:HierarchicalVisitorPattern. But this will become clearer with some code :-) [[br]] -- cboos ---- See also: [query:?component=wiki&status=!closed Wiki related tickets], [../VerticalHorizontalParsing]