Edgewall Software

Changes between Initial Version and Version 1 of TracDev/Proposals/Journaling


Ignore:
Timestamp:
Jun 26, 2006, 8:08:34 PM (16 years ago)
Author:
Christian Boos
Comment:

Copied from TracDev/JournalingProposal. I should have used that page name in the first place.

Legend:

Unmodified
Added
Removed
Modified
  • TracDev/Proposals/Journaling

    v1 v1  
     1= Journaling Proposal =
     2
     3== The Problem ==
     4Trac maintains coherency upon data changes by using various `I...Listener`
     5extension points.
     6
     7While this works in many cases, this approach is somewhat flawed or insufficient in
     8situations where there are multiple server processes. And this is quite a common
     9scenario, with the widespread use of the Apache/prefork web front-end.
     10
     11=== Some examples ===
     12
     13==== Reacting on Wiki content change ====
     14
     15Several Wiki pages are used to facilitate interactive configuration by the users.
     16This is the case of the InterMapTxt, for maintaining the list of InterWiki prefixes,
     17the BadContent page for maintaining a list of regexps used to filter out SPAM,
     18and probably more in the future.
     19See my original explanation about what's going on with
     20[http://trac-hacks.org/ticket/456#comment:3 updating InterMapTxt].
     21
     22==== Reacting on Wiki page creation/deletion ====
     23
     24In order to not have to check in the DB for the existence of a Wiki page
     25every time a WikiPageNames is seen in wiki text, we maintain a cache of
     26the existing wiki pages.
     27This list could be easily maintained using the change listener, but
     28this would ''not'' work if a creation and deletion would be done
     29by another process. A workaround for this is currently implemented:
     30every once in a while, the cache is cleared and updated
     31(see from source:trunk/trac/wiki/api.py@3362#L114).
     32This is a very ad-hoc solution. It should be possible to do this
     33better and in a more generic way.
     34
     35
     36== A solution ==
     37
     38Every ''change'' event could be journalled.
     39A `journal` table could record all transactions and serve as
     40a basis for redispatching changes happening in one process
     41to other processes, in a generic way.
     42
     43After all, this journaling is already done in some cases.
     44For example, all the ticket  changes are journaled, in the `ticket_change` table:
     45{{{
     46#!sql
     47CREATE TABLE ticket_change (
     48    ticket integer,
     49    time integer,
     50    author text,
     51    field text,
     52    oldvalue text,
     53    newvalue text,
     54    UNIQUE (ticket,time,field)
     55);
     56}}}
     57
     58There's currently some discussion about adding to the above
     59the `ipnr` and `authenticated` columns, to better track
     60who did what (see #1890 for details).
     61
     62This would lead to even more duplication of data than what we have now.
     63Granted, currently this duplication (of the ticket/time/author values)
     64are used to group related changes.
     65
     66A cleaner approach, for #1890, would be:
     67{{{
     68#!sql
     69CREATE TABLE ticket_change (
     70    tid integer,
     71    field text,
     72    oldvalue text,
     73    newvalue text,
     74);
     75
     76CREATE TABLE ticket_transaction (
     77    tid integer PRIMARY KEY,
     78    ticket integer,
     79    time integer,
     80    author text,
     81    ipnr text,
     82    authenticated boolean
     83);
     84}}}
     85
     86Now, with this proposal, this could be extended to:
     87{{{
     88#!sql
     89CREATE TABLE ticket_change (
     90    tid integer,
     91    field text,
     92    oldvalue text,
     93    newvalue text,
     94);
     95
     96CREATE TABLE journal (
     97    tid integer PRIMARY KEY,
     98    type text,
     99    id text,
     100    change text,
     101    time integer,
     102    author text,
     103    ipnr text,
     104    authenticated boolean
     105);
     106}}}
     107
     108And `ticket_change` could even be generalized to `property_change`
     109and go in the direction of a generalization of properties to
     110all Trac objects (remember the TracObjectModelProposal?)
     111
     112The `change` column in `journal` could contain some keyword about
     113the nature of the change: `CREATE`, `DELETE`, `MODIFICATION`, etc.
     114
     115Now, how to use this information?
     116
     117Each process would write into the `journal` table during the same
     118transaction that modifies the object model tables themselves.
     119This will mostly be something along the lines of:
     120{{{
     121#!python
     122    tid = record_in_journal(req, db, 'wiki', page, 'CREATE')
     123}}}
     124and:
     125{{{
     126#!python
     127    tid = record_in_journal(req, db, 'ticket', id, 'MODIFY')
     128}}}
     129
     130Each process will also have to keep track of the last `tid` known.
     131
     132If this happens to have changed (details to be finalized:
     133the detection could be done either during `record_in_journal` itself,
     134or before request dispatching, or ...), there could be a ''replay''
     135of those events, triggering the appropriate change listeners.
     136
     137The change listeners would anyway gain to be refactored in a more
     138generic way (merging the IWikiChangeListener, ITicketChangeListener
     139giving IMilestoneChangeListener, IChangesetChangeListener etc. for free,
     140the usual TracObjectModelProposal blurb ;) ).
     141
     142Last but not least, there would be a need to differentiate between
     143'''primary''' change and '''secondary''' change.
     144 primary change:: the change originated from the same process;
     145  there's only one process which ever sees a change as being a primary change
     146 secondary change:: the change originated from another process.
     147
     148This distinction is quite important w.r.t. to side-effects.
     149
     150Only ''primary'' changes should ever generate side-effects, such as e-mail
     151notifications (a related topic: e-mail notifications should also be based
     152on change listeners, see #1660). That way, one can be sure that the side-effects
     153will be triggered only once, independantly from the number of server processes.
     154
     155Then, ''secondary'' changes could be used for all the informational stuff,
     156for refreshing all sorts of internal caches (the use cases listed
     157[TracDev/Proposals/Journaling#Someexamples above]).
     158