| 21 | |
| 22 | == Parsing Overview |
| 23 | |
| 24 | Here's a very rough outline: |
| 25 | * `parse_vertical` |
| 26 | - prepares !WikiDocument (W) |
| 27 | - preprocess (r9868), split text in lines |
| 28 | - `parse_blocks` - get a tree of the `{{{` / `}}}` delimited blocks (B) and the spans between them (Raw); at this stage, the root document (W) and each (B) contains a list of (B|Raw) nodes |
| 29 | - for each wiki block (i.e. (W) and each (B) containing wiki text) |
| 30 | - `parse_raw_text` - each top-level (Raw) node will be scanned for structural ("vertical") markup; for each line: |
| 31 | * detect verbatim text ({{{`}}}...{{{`}}} and `{{{`...`}}}` sequences); remember verbatim spans for that line, escape the line (replaced by 'X') |
| 32 | * match vertical patterns which can result in: |
| 33 | - (I)tem node (`- * 1.` etc.) |
| 34 | - (D)efinition list node (`... :: ...`) |
| 35 | - (Q)uote node (leading space) |
| 36 | - (C)itation node (`>+`) |
| 37 | - (Row) node (`|| ... ||`) |
| 38 | - ... |
| 39 | - if nothing matches, this is a plain (T)ext node |
| 40 | * at the end, this collection of (S)tructural nodes replace the (Raw) node |
| 41 | - `assemble_nodes` - each node had an indentation level, the first non-space character in its starting line; this information will enable us to re-arrange a list of (B) and (S) nodes according a logical nesting determined by the indentation |
| 42 | * `parse_horizontal` - each node in the previous tree will be split further, according to inline ("horizontal") markup |
| 43 | - some nodes won't have any text content to process |
| 44 | - some will have two (D) or more (Row) |
| 45 | - it can well be that some markup will need to be processed recursively (e.g. `[=#anchor ''this was already explained above'']`). |
| 46 | - macros could at this stage expand the tree as it's being built (e.g. via a new `IWikiMacroProvider.parse_macro` method) |
| 47 | |