Edgewall Software

Version 1 (modified by Christian Boos, 18 years ago) ( diff )

Copied from TracDev/JournalingProposal. I should have used that page name in the first place.

Journaling Proposal

The Problem

Trac maintains coherency upon data changes by using various I...Listener extension points.

While this works in many cases, this approach is somewhat flawed or insufficient in situations where there are multiple server processes. And this is quite a common scenario, with the widespread use of the Apache/prefork web front-end.

Some examples

Reacting on Wiki content change

Several Wiki pages are used to facilitate interactive configuration by the users. This is the case of the InterMapTxt, for maintaining the list of InterWiki prefixes, the BadContent page for maintaining a list of regexps used to filter out SPAM, and probably more in the future. See my original explanation about what's going on with updating InterMapTxt.

Reacting on Wiki page creation/deletion

In order to not have to check in the DB for the existence of a Wiki page every time a WikiPageNames is seen in wiki text, we maintain a cache of the existing wiki pages. This list could be easily maintained using the change listener, but this would not work if a creation and deletion would be done by another process. A workaround for this is currently implemented: every once in a while, the cache is cleared and updated (see from source:trunk/trac/wiki/api.py@3362#L114). This is a very ad-hoc solution. It should be possible to do this better and in a more generic way.

A solution

Every change event could be journalled. A journal table could record all transactions and serve as a basis for redispatching changes happening in one process to other processes, in a generic way.

After all, this journaling is already done in some cases. For example, all the ticket changes are journaled, in the ticket_change table:

CREATE TABLE ticket_change (
    ticket integer,
    time integer,
    author text,
    field text,
    oldvalue text,
    newvalue text,
    UNIQUE (ticket,time,field)
);

There's currently some discussion about adding to the above the ipnr and authenticated columns, to better track who did what (see #1890 for details).

This would lead to even more duplication of data than what we have now. Granted, currently this duplication (of the ticket/time/author values) are used to group related changes.

A cleaner approach, for #1890, would be:

CREATE TABLE ticket_change (
    tid integer,
    field text,
    oldvalue text,
    newvalue text,
);

CREATE TABLE ticket_transaction (
    tid integer PRIMARY KEY,
    ticket integer,
    time integer,
    author text,
    ipnr text,
    authenticated boolean
);

Now, with this proposal, this could be extended to:

CREATE TABLE ticket_change (
    tid integer,
    field text,
    oldvalue text,
    newvalue text,
);

CREATE TABLE journal (
    tid integer PRIMARY KEY,
    type text,
    id text,
    change text,
    time integer,
    author text,
    ipnr text,
    authenticated boolean
);

And ticket_change could even be generalized to property_change and go in the direction of a generalization of properties to all Trac objects (remember the TracObjectModelProposal?)

The change column in journal could contain some keyword about the nature of the change: CREATE, DELETE, MODIFICATION, etc.

Now, how to use this information?

Each process would write into the journal table during the same transaction that modifies the object model tables themselves. This will mostly be something along the lines of:

    tid = record_in_journal(req, db, 'wiki', page, 'CREATE')

and:

    tid = record_in_journal(req, db, 'ticket', id, 'MODIFY')

Each process will also have to keep track of the last tid known.

If this happens to have changed (details to be finalized: the detection could be done either during record_in_journal itself, or before request dispatching, or …), there could be a replay of those events, triggering the appropriate change listeners.

The change listeners would anyway gain to be refactored in a more generic way (merging the IWikiChangeListener, ITicketChangeListener giving IMilestoneChangeListener, IChangesetChangeListener etc. for free, the usual TracObjectModelProposal blurb ;) ).

Last but not least, there would be a need to differentiate between primary change and secondary change.

primary change
the change originated from the same process; there's only one process which ever sees a change as being a primary change
secondary change
the change originated from another process.

This distinction is quite important w.r.t. to side-effects.

Only primary changes should ever generate side-effects, such as e-mail notifications (a related topic: e-mail notifications should also be based on change listeners, see #1660). That way, one can be sure that the side-effects will be triggered only once, independantly from the number of server processes.

Then, secondary changes could be used for all the informational stuff, for refreshing all sorts of internal caches (the use cases listed above).

Note: See TracWiki for help on using the wiki.