Edgewall Software

Version 16 (modified by Christian Boos, 14 years ago) ( diff )

introduce the idea of sequences

GenericTrac Data Model

This page defines a new data model for Trac that should be suitable for storing most of the data from resources, along with their change history.

The main benefits expected from the new model are:

  • simplification of the internals of Trac, especially for the ticket model, in which the storage of changes is quite cumbersome (see #454, #6466)
  • solve a few design problems with the current data model (like #1890, #4582)
  • allow better code reuse and share of the base features among different kinds of resources (like #695, #1113, etc. see #RelatedTickets for more)

This stems from the following former proposals:

See also this mail.

As this will be a major redesign of the data model, it could also be a good opportunity to take the multiple project considerations into account (#130).

Somehow related to the generic data model, but not strictly depending on it, Trac should also make it possible to implement new kinds of plugins that would perform generic operations on Trac resources. This could allow the (re-)implementation of the TracCrossReferences idea as a plugin, for example. See #6543 and TracDev/Proposals/TracRelations.

Design Discussion

Requirements for the new model:

  1. it has to be simple;
  2. it must be flexible, in order to accommodate different kinds of resources and allow for dynamic evolution;
  3. it should remain fast, if not faster than what we currently have;
  4. it should lead to a more compact representation of data

Resource Content

The ticket model is by far richer data model we would have to support, so we could take this as a basis to lay out the foundations of the new model. For ticket, we currently have a fixed set of properties (as columns in the ticket table) and a flexible set of properties (as name/value columns in a ticket_custom table).

Both styles have advantages and disadvantages:

  1. properties as columns:
    • (-) only flexibility is to not use some fields (e.g. severity)
    • (-) no multiple values per field possible
    • (+) faster (?)
    • (+) straightforward code (for field1,field2, ... in cursor: ...) (?)
  2. properties in name/value columns
    • (+) highest flexibility, add or remove fields at will
    • (+) allow for multiple values per name, provided we don't use a primary key as we currently do for the ticket_custom table (#918)
      • (+) a slight extension would allow ordered multiple values (sequences), otherwise we have no control over the order in which those multiple values are retrieved, which might introduce confusion as this order will vary when new values are added; while primarily useful for int fields, this could also be useful to implement text fields for Oracle (working around the limits on varchars and not having to use CLOBs)
    • (-) slower, less memory efficient (?)
    • (-) more complex code (?)

(?) means yet to be verified

In order to reduce the overall complexity, the idea would be to pick only one approach, instead of having to support both. By using the second style, we could also have our "fixed" set of properties, while obviously the first style can't support the second.

It remains to be seen whether the second approach is really less efficient than the first, but this doesn't really matter as we anyway have already to pay the price for that flexibility. Note also that by an appropriate use of indexes, we might eventually get better performance compared to what we have today.

So the new model could be simply:

ticket

id name value seq

seq is the sequence number in case of multiple entries with the same name.

or even:

resource_prop

realm id name value seq

(if we use one mega table for all resources)

Note by the way that newer "resources", like repository in MultiRepos already have this (id,name/value) form.

We could also keep the metadata associated to the properties in the database, instead of being hard-coded and present in the TracIni file, permitting an unification of the representation for fixed fields and custom fields.

resource_schema

realm prop name value

Here, possible values for name could be 'label', 'default', 'order', 'type', etc.

Example.

ticket description type wiki
ticket priority type enum
ticket priority enum priority
ticket priority default normal
ticket need_review type checkbox
ticket need_review default 0

Note: the existence of a schema describing the fields doesn't mean that modules can't interpret fields as being special. Quite the opposite, as modules are what provides the real "behavior" of resources. Furthermore, properties not defined in the schema could simply be ignored, so this would allow a great deal of flexibility for plugins when they need to store "special" properties or revision properties.

As a possible refining, it could be possible to have specialized tables, one for each different value column type we want to support:

  • resource_prop for text values
  • resource_prop_int for integer values
  • (resource_prop_float for float values, if really needed)

And we could even differentiate between short and long text values (requirement 4):

  • resource_prop for short text values
  • resource_prop_text for long text values

(see #6986).

Along the same lines there's also the question of what should be the id: a natural or a surrogate key?

  • natural keys: (id would be 123 for ticket #123, id would be 'milestone1' for milestone1, etc.)
    • we have to support different type of keys (text for milestone, int for ticket).
      • not a problem for separate tables
      • would require resource_int_prop style for resources having an int id … cumbersome
    • less compact but easier to "understand"
    • renaming is more difficult
  • surrogate keys: (id would be a number in a sequence, never shown as such in the interface)
    • only one type of keys (int) - faster, simpler, the unique resource_prop table approach is possible
    • more compact, not that difficult to read either (there would always be a name=id, value=the natural key entry
    • renaming is easy (relations preserved)

This suggests that using surrogate keys would be preferable.

Now if this is the case, the resource_prop table could as well become:

id name value seq

and the realm information could simply be stored as another name/value entry.

Resource History

We need to differentiate between the changes to the data, and the metadata about the change. The metadata is about who did the change, when, why the change was made, etc. We can adopt the same flexible strategy as the one for resource properties and store arbitrary name/value pairs of "revision properties".

resource_revprop

changeid name value

Typical example:

101001 author cboos
101001 auth 1
101001 date 1231232114.12
101001 comment random change

A given changeid is usually related to a specific change in one resource, but there could be other situations:

  • one change affecting lots of resources (typically #4582 and #5658)
  • changes affecting changes (typically #454)

The property changes themselves are stored in other tables. We only need the changed properties here, no need to store the old/new values for each change, as this can be deduced from the past changes. Deletions of fields should be represented by setting a field to the NULL value.

Several possibilities here:

ticket_change

id changeid name value seq

milestone_change

id changeid name value seq

or:

resource_change

id changeid name value seq

(surrogate key approach)

The latter has the advantage that it would make easy to relate a given changeid to the resource(s) that were affected by the change, without having to go through each resource table.

We could also keep all property changes as text values or have extra ..._int (..._float) tables for more compact representation.

See also ticket:6466#comment:10 and follow-ups for a discussion about how ticket changes and in particular ticket change edits, could be handled using this approach.

The Model

To summarize the above discussion, here's what could be the new model.

In the following:

  • int stands for the integer column type
  • string stands for a short text column type (e.g. varchar(65))
  • text stands for a long text column type, of unbounded size

Minimal Model

  • surrogate keys for all resources
  • only text fields

Schema

resource_schema

realm prop name value
string string string text

resource_prop

id name value seq
int string text int

resource_revprop

changeid name value seq
int string text int

resource_change

id changeid name value seq
int int string text int

Example Dataset

resource_schema

realm prop name value
ticket summary type text
ticket description type wiki
ticket reporter type text

resource_prop

id name value seq
0 id 130 0
0 summary Multiple Project Support 0
0descriptionOne day… 0
0 reporter joe 0

resource_revprop

changeid name value seq
1 author joe 0
1 date 5 years ago 0
2 author joe 0
2 date 2 years ago 0
2 comment come on… 0
3 author cboos 0
3 date 1 year ago 0
3 comment sure… 0

resource_change

id changeid name value seq
0 1 id 130 0
0 1 summary Multiple Project Support 0
0 1descriptionShould be easy… 0
0 1 reporter joe 0
0 2descriptionShould be easy… Redmine has it! 0
0 3descriptionOne day… 0


Intermediate Model

  • surrogate keys for all resources
  • text and int fields (same as Complete Model, without the *_string tables)

The minimal model above is handy for showing the essence of the new model, but it's too simple in practice.

I think we need at the very least to support 'integer' type columns, useful for storing dates efficiently, boolean values, and relations to other resources (as the surrogate ids will be integers).

Complete Model

  • surrogate keys for all resources
  • int, bigint, short and long text fields

Not absolutely necessary to go that far, this could nevertheless help a lot for the MySQL backend (#6986), possibly also for a future Oracle backend. Don't know about PostgreSQL, but for SQLite this should be indifferent.

bigint is needed since we use this to store microsecond precision timestamps (#6466).

Schema

FIXME need to add the seq columns here as well

resource_schema

realm prop name value
string string string text

resource_prop

id name value
int string text

resource_revprop

changeid name value
int string text

resource_change

id changeid name value
int int string text

resource_prop_string

id name value
int string string

resource_revprop_string

changeid name value
int string string

resource_change_string

id changeid name value
int int string string

resource_prop_int

id name value
int string int

resource_revprop_int

changeid name value
int string int

resource_change_int

id changeid name value
int int string int

Example Dataset

FIXME add example of multivalued property

resource_schema

realm prop name value
ticket summary type text
ticket description type wiki
ticket reporter type string

resource_prop

id name value
0 summary Multiple Project Support
0 descriptionOne day…

resource_revprop

changeid name value
2 comment come on…
3 comment sure…

resource_change

id changeid name value
0 1 summary Multiple Project Support
0 1 descriptionShould be easy…
0 2 descriptionShould be easy… Redmine has it||
0 3 descriptionOne day…

resource_prop_string

id name value
0 reporter joe

resource_revprop_string

changeid name value
1 author joe
2 author joe
3 author cboos

resource_change_string

id changeid name value
0 1 reporter joe

resource_prop_int

id name value
0 id 130

resource_revprop_int

changeid name value
1 date 5 years ago
2 date 2 years ago
3 date 1 year ago

resource_change_int

id changeid name value
0 1 id 130


  • Comment Why not implement all of the different resource_prop* tables into a single table, where each tupel has multiple attributes, see for example the JBPM datamodel for a working and presumably also fast approach. Here, there exists a process_variable or some similar table that stores all the different value types in single table. And please rename the name field to prop so that it matches the one in the resource_schema table.The schema would be like so:
    table resource_prop
    {
    id
    prop
    type
    int_val
    string_val
    datetime_val
    ...
    }
    
    table resource_revprop
    {
    change_id
    prop
    type
    int_val
    string_val
    ...
    }
    
    table resource_change
    {
    id
    change_id
    prop
    type
    int_val
    string_val
    ...
    }
    
  • Comment Also I would like to have the resource_schema table extended so that it will support different schemas for, say, different ticket types. That way, users can define their personal ticket type schemas. Of course, derivation would also be nice, but that could be implemented at a later point in time, requiring yet another table. That way we could have both inheritance at the schema level and also multiple different models per realm ;)
table resource_schema
{
  realm
  urn    -- the urn of the ticket schema or wiki page schema to which this belongs to, defaults are for example trac::ticket or trac::milestone or trac::wiki-page and so on
  prop
  name
  type
}

and the inheritance table for the schemas would then be

table resource_schema_inheritance
{
  realm
  urn
  base -- the urn of the base schema
}
  • inheritance would then provide for also multiple inheritance

Possible Implementation Plan

Milestone First

  • modify the Milestone module so that it uses the new proposed datamodel.
  • experiment new tabbed view for the milestone (View, Discussion, History). See TracProject/UiGuidelines.
  • milestone should be able to have attachments, too (#3068)
  • adapt the Roadmap module to the new model
  • adapt the Milestone admin component to the new model

Once this is complete, validate the genericity by promoting the components to be first class resources as well (#1233).

Ticket First

As the ticket module is by far the most complex, it might be worth to try out the new model there first:

  • we could verify that we meet the expectations in terms of code simplification, solving open issues, etc.
  • we could detect early if there are no regressions or risk of losing current features
  • by redeploying the ticket infrastructure to the other components, we could spread the most benefits of tickets (comments, custom fields, queries, etc.) to other resources (milestone, wiki, component, …)

Related Tickets

  • Data model issues:
    #4130
    Ticket Component Interface Fails to Properly Check for Existing Component
    #11301
    intermittent failure with notification test
  • Resource related:
    #150
    User-centric storage. Last-viewed information and 'intelligent' followup
    #221
    Creating TR for multiple components
    #695
    Keywords for wiki pages
    #787
    Change attachment description
    #918
    [patch] Custom Ticket Fields should support the multiple selection type
    #1113
    Show milestone changes in timeline
    #1386
    Adding properties to Wiki Pages
    #1395
    Text box for duplicate when a bug is a duplicate
    #1678
    Show component view in timeline view for checkins
    #1835
    Add a discussion tab for Trac resources
    #2035
    Changeset commenting
    #2344
    sub milestone
    #2464
    Conditional fields in tickets
    #2465
    Add "custom query" functionality for changesets
    #2467
    Link user name in reports to custom query showing that user's open tickets
    #2662
    assign tickets to multiple users
    #2961
    custom comment fields
    #3003
    milestone could be a ticket
    #3080
    Custom field sorts only as text
    #3718
    Trac should use HTTP 301 Moved when milestones get renamed
    #3911
    implement an object system (xwiki, roundup)
    #4588
    User Page: trac links to these pages
    #5211
    Ticket - Wiki Integration enhancement/suggestion
    #7871
    Add start date to a milestone
    #8335
    Create People page with user profiles and status of who's doing what.
    #9263
    if SVN is used, milestone should manage/allowtoassign links for branch and tag
Note: See TracWiki for help on using the wiki.