Version 2 (modified by 17 years ago) ( diff ) | ,
---|
Journaling Proposal
The Problem
Trac maintains coherency upon data changes by using various I...Listener
extension points.
While this works in many cases, this approach is somewhat flawed or insufficient in situations where there are multiple server processes. And this is quite a common scenario, with the widespread use of the Apache/prefork web front-end.
Some examples
Reacting on Wiki content change
Several Wiki pages are used to facilitate interactive configuration by the users. This is the case of the InterMapTxt, for maintaining the list of InterWiki prefixes, the BadContent page for maintaining a list of regexps used to filter out SPAM, and probably more in the future. See my original explanation about what's going on with updating InterMapTxt.
Reacting on Wiki page creation/deletion
In order to not have to check in the DB for the existence of a Wiki page every time a WikiPageNames is seen in wiki text, we maintain a cache of the existing wiki pages. This list could be easily maintained using the change listener, but this would not work if a creation and deletion would be done by another process. A workaround for this is currently implemented: every once in a while, the cache is cleared and updated (see from source:trunk/trac/wiki/api.py@3362#L114). This is a very ad-hoc solution. It should be possible to do this better and in a more generic way.
A solution
Every change event could be journaled.
Several journal
tables (one per resource) could record all the transactions and serve as
a basis for dispatching changes happening in one process
to other processes, in a generic way.
After all, such a kind of journaling is already done for tickets.
Current Situation
For example, all the ticket changes are journaled, in the ticket_change
table:
CREATE TABLE ticket_change ( ticket integer, time integer, author text, field text, oldvalue text, newvalue text, UNIQUE (ticket,time,field) );
There's currently some discussion about adding to the above
the ipnr
and authenticated
columns, to better track
who did what (see #1890 for details).
But adding those to the above table would lead to even more duplication of data than what we currently have. Currently this duplication (of the ticket/time/author values) is even used to group together related changes!
Step in the Right Direction
A cleaner approach, for #1890, would be:
CREATE TABLE ticket_change ( tid integer, field text, oldvalue text, newvalue text, ); CREATE TABLE ticket_transaction ( tid integer PRIMARY KEY, ticket integer, time integer, author text, ipnr text, authenticated boolean );
The _journal
and _history
tables
Now, with this proposal, this could be extended to:
CREATE TABLE ticket_history ( tid integer, id int, field text, value text ); CREATE TABLE ticket_journal ( tid integer PRIMARY KEY, id int, change text, time integer, author text, ipnr text, authenticated boolean );
ticket_history
could eventually be spread in multiple, type-specialized tables (ticket_int_property
, …).
See also TracDev/Proposals/DataModel#ResourceChangeHistory.
The change
column in <resource>_journal
could contain some keyword about
the nature of the change: CREATE
, DELETE
, MODIFICATION
, etc.
Each process would write into the <resource>_journal
table during the same
transaction that modifies the object model tables themselves.
This could be something along the lines of the following API:
class WikiModule(): def _do_create(self, pagename): ... # Getting a new transaction for creating a Wiki page tnx = Transaction(self.env.get_db_cnx()) tnx.prepare(req, 'CREATE') tnx.save('wiki', id=pagename, readonly=readonly, content=content) tnx.commit() # flush all changes to disk self.notify(tnx) # dispatch change information to listeners class TicketModule(): def _do_save(self, ticket): tnx = Transaction(self.env.get_db_cnx()) tnx.prepare(req, 'MODIFY') tnx.save('ticket', ticket) tnx.commit() # flush all changes to disk self.notify(tnx) # dispatch change information to listeners
The actual Transaction
object would know how to modify the underlying (generic) data model.
Notifying changes
Now, how to use this information?
Each module's notify(tnx)
method would have to propagate the appropriate change information to registered listeners (the IWikiChangeListener, ITicketChangeListener, etc. interfaces).
Each module would also have to keep track of the last tid
it has dispatched.
In notify(tnx)
, we would check for all existing tid
inserted since the last dispatched tid
(or the one we had at system startup). If there are more than one tid
, those are coming from changes created by other processes.
This way, we could easily differentiate between primary changes and secondary changes.
- primary change
tid == tnx.tid
; the change originated from the same process. There's only one process which ever sees a change as being a primary change, it's the one which just created that change.- secondary change
tid != tnx.tid
; the change originated from another process.
This distinction is quite important w.r.t. to side-effects.
Only primary changes should ever generate side-effects, such as e-mail notifications (a related topic: e-mail notifications should also be based on change listeners, see #1660). That way, one can be sure that the side-effects will be triggered only once, independantly from the number of server processes.
Then, secondary changes could be used for all the informational stuff, for refreshing all sorts of internal caches (the use cases listed above).
Finally note that this Transaction class could make good use of the Unit Of Work concept of SQLAlchemy, should we use that in a future version.