| 1 | = Journaling Proposal = |
| 2 | |
| 3 | == The Problem == |
| 4 | Trac maintains coherency upon data changes by using various `I...Listener` |
| 5 | extension points. |
| 6 | |
| 7 | While this works in many cases, this approach is somewhat flawed or insufficient in |
| 8 | situations where there are multiple server processes. And this is quite a common |
| 9 | scenario, with the widespread use of the Apache/prefork web front-end. |
| 10 | |
| 11 | === Some examples === |
| 12 | |
| 13 | ==== Reacting on Wiki content change ==== |
| 14 | |
| 15 | Several Wiki pages are used to facilitate interactive configuration by the users. |
| 16 | This is the case of the InterMapTxt, for maintaining the list of InterWiki prefixes, |
| 17 | the BadContent page for maintaining a list of regexps used to filter out SPAM, |
| 18 | and probably more in the future. |
| 19 | See my original explanation about what's going on with |
| 20 | [http://trac-hacks.org/ticket/456#comment:3 updating InterMapTxt]. |
| 21 | |
| 22 | ==== Reacting on Wiki page creation/deletion ==== |
| 23 | |
| 24 | In order to not have to check in the DB for the existence of a Wiki page |
| 25 | every time a WikiPageNames is seen in wiki text, we maintain a cache of |
| 26 | the existing wiki pages. |
| 27 | This list could be easily maintained using the change listener, but |
| 28 | this would ''not'' work if a creation and deletion would be done |
| 29 | by another process. A workaround for this is currently implemented: |
| 30 | every once in a while, the cache is cleared and updated |
| 31 | (see from source:trunk/trac/wiki/api.py@3362#L114). |
| 32 | This is a very ad-hoc solution. It should be possible to do this |
| 33 | better and in a more generic way. |
| 34 | |
| 35 | |
| 36 | == A solution == |
| 37 | |
| 38 | Every ''change'' event could be journalled. |
| 39 | A `journal` table could record all transactions and serve as |
| 40 | a basis for redispatching changes happening in one process |
| 41 | to other processes, in a generic way. |
| 42 | |
| 43 | After all, this journaling is already done in some cases. |
| 44 | For example, all the ticket changes are journaled, in the `ticket_change` table: |
| 45 | {{{ |
| 46 | #!sql |
| 47 | CREATE TABLE ticket_change ( |
| 48 | ticket integer, |
| 49 | time integer, |
| 50 | author text, |
| 51 | field text, |
| 52 | oldvalue text, |
| 53 | newvalue text, |
| 54 | UNIQUE (ticket,time,field) |
| 55 | ); |
| 56 | }}} |
| 57 | |
| 58 | There's currently some discussion about adding to the above |
| 59 | the `ipnr` and `authenticated` columns, to better track |
| 60 | who did what (see #1890 for details). |
| 61 | |
| 62 | This would lead to even more duplication of data than what we have now. |
| 63 | Granted, currently this duplication (of the ticket/time/author values) |
| 64 | are used to group related changes. |
| 65 | |
| 66 | A cleaner approach, for #1890, would be: |
| 67 | {{{ |
| 68 | #!sql |
| 69 | CREATE TABLE ticket_change ( |
| 70 | tid integer, |
| 71 | field text, |
| 72 | oldvalue text, |
| 73 | newvalue text, |
| 74 | ); |
| 75 | |
| 76 | CREATE TABLE ticket_transaction ( |
| 77 | tid integer PRIMARY KEY, |
| 78 | ticket integer, |
| 79 | time integer, |
| 80 | author text, |
| 81 | ipnr text, |
| 82 | authenticated boolean |
| 83 | ); |
| 84 | }}} |
| 85 | |
| 86 | Now, with this proposal, this could be extended to: |
| 87 | {{{ |
| 88 | #!sql |
| 89 | CREATE TABLE ticket_change ( |
| 90 | tid integer, |
| 91 | field text, |
| 92 | oldvalue text, |
| 93 | newvalue text, |
| 94 | ); |
| 95 | |
| 96 | CREATE TABLE journal ( |
| 97 | tid integer PRIMARY KEY, |
| 98 | type text, |
| 99 | id text, |
| 100 | change text, |
| 101 | time integer, |
| 102 | author text, |
| 103 | ipnr text, |
| 104 | authenticated boolean |
| 105 | ); |
| 106 | }}} |
| 107 | |
| 108 | And `ticket_change` could even be generalized to `property_change` |
| 109 | and go in the direction of a generalization of properties to |
| 110 | all Trac objects (remember the TracObjectModelProposal?) |
| 111 | |
| 112 | The `change` column in `journal` could contain some keyword about |
| 113 | the nature of the change: `CREATE`, `DELETE`, `MODIFICATION`, etc. |
| 114 | |
| 115 | Now, how to use this information? |
| 116 | |
| 117 | Each process would write into the `journal` table during the same |
| 118 | transaction that modifies the object model tables themselves. |
| 119 | This will mostly be something along the lines of: |
| 120 | {{{ |
| 121 | #!python |
| 122 | tid = record_in_journal(req, db, 'wiki', page, 'CREATE') |
| 123 | }}} |
| 124 | and: |
| 125 | {{{ |
| 126 | #!python |
| 127 | tid = record_in_journal(req, db, 'ticket', id, 'MODIFY') |
| 128 | }}} |
| 129 | |
| 130 | Each process will also have to keep track of the last `tid` known. |
| 131 | |
| 132 | If this happens to have changed (details to be finalized: |
| 133 | the detection could be done either during `record_in_journal` itself, |
| 134 | or before request dispatching, or ...), there could be a ''replay'' |
| 135 | of those events, triggering the appropriate change listeners. |
| 136 | |
| 137 | The change listeners would anyway gain to be refactored in a more |
| 138 | generic way (merging the IWikiChangeListener, ITicketChangeListener |
| 139 | giving IMilestoneChangeListener, IChangesetChangeListener etc. for free, |
| 140 | the usual TracObjectModelProposal blurb ;) ). |
| 141 | |
| 142 | Last but not least, there would be a need to differentiate between |
| 143 | '''primary''' change and '''secondary''' change. |
| 144 | primary change:: the change originated from the same process; |
| 145 | there's only one process which ever sees a change as being a primary change |
| 146 | secondary change:: the change originated from another process. |
| 147 | |
| 148 | This distinction is quite important w.r.t. to side-effects. |
| 149 | |
| 150 | Only ''primary'' changes should ever generate side-effects, such as e-mail |
| 151 | notifications (a related topic: e-mail notifications should also be based |
| 152 | on change listeners, see #1660). That way, one can be sure that the side-effects |
| 153 | will be triggered only once, independantly from the number of server processes. |
| 154 | |
| 155 | Then, ''secondary'' changes could be used for all the informational stuff, |
| 156 | for refreshing all sorts of internal caches (the use cases listed |
| 157 | [TracDev/Proposals/Journaling#Someexamples above]). |
| 158 | |