Version 29 (modified by 14 years ago) ( diff ) | ,
---|
GenericTrac Data Model
This page defines a new data model for Trac that should be suitable for storing most of the data from resources, along with their change history.
In the longer term, it would be more appropriate to keep the change history in a dedicated system (a version control backend, even a distributed one). In that picture, we could even provide a simple "dbvcs" backend, which would use the database for storing the change history much like we do now.
The main benefits expected from the new model are:
- simplification of the internals of Trac, especially for the ticket model, in which the storage of changes is quite cumbersome (see #454, #6466)
- solve a few design problems with the current data model (like #1890, #4582)
- allow better code reuse and share of the base features among different kinds of resources (like #695, #1113, etc. see #RelatedTickets for more)
Related Proposals
This stems from the following former proposals:
See also this mail.
As this will be a major redesign of the data model, it could also be a good opportunity to take the multiple project considerations into account (#130).
Somehow related to the generic data model, but not strictly depending on it, Trac should also make it possible to implement new kinds of plugins that would perform generic operations on Trac resources. This could allow the (re-)implementation of the TracCrossReferences idea as a plugin, for example. See #31, #886, #6543 and TracDev/Proposals/TracRelations.
Design Discussion
Requirements for the new model:
- it has to be simple;
- it must be flexible, in order to accommodate different kinds of resources and allow for dynamic evolution, like addition or removal of fields by plugin or via the web admin;
- it should remain fast, if not become faster than what we currently have;
- it should lead to a more compact representation of data
- all the existing constraints about maintaining resource history and associated metadata should be taken into account
Note that the persistence constraints imposed by the Trac data model are not necessarily only (or even best) approached using the RelationalModel, one could imagine that a future version could use more document-oriented persistence layers (e.g. MongoDB), or object-oriented databases (e.g. ZODB). Also, as said above, the versioning of resources should be delegated to a version control backend, with a default, simple, in-database VCS backend.
Resource Content
The ticket model is by far richer data model we would have to support, so we could use it as a basis to lay out the foundations of the new model.
For ticket, we currently have a fixed set of properties
(as columns in the ticket
table)
and a flexible set of properties
(as prop/value columns in a ticket_custom
table).
Note that we might soon get relations (or links) as well, see #31.
Let's examine the advantages and disadvantages of each different kind of storage, for the properties:
- properties stored in columns of a main table:
- (-) only flexibility is to not use some fields (e.g. 'severity', 'url')
- (-) no multiple values per field possible
- (+) faster especially for multicriteria queries (?)
- (+) straightforward code (
for field1,field2, ... in cursor: ...
) (?)
- properties stored in prop/value columns of a generic table
- (+) highest flexibility, add or remove fields at will
- (+) allow for multiple values per property, provided we don't use a primary key
as we currently do for the
ticket_custom
table (#918)- (+) a slight extension would allow ordered multiple values (sequences), otherwise we have no control over the order in which those multiple values are retrieved, which might introduce confusion as this order will vary when new values are added;
while primarily useful for
int
fields, this could also be useful to implementtext
fields for Oracle (working around the limits on varchars and not having to use CLOBs)
- (+) a slight extension would allow ordered multiple values (sequences), otherwise we have no control over the order in which those multiple values are retrieved, which might introduce confusion as this order will vary when new values are added;
while primarily useful for
- (-) slower, less memory efficient, especially for multicriteria queries(?)
- (-) more complex code (?)
(?) means yet to be verified
In order to reduce the overall complexity, the idea is to pick only one approach, instead of having to support both. By using the second style, we could also have our "fixed" set of properties, while obviously the first style can't support the second.
It remains to be seen whether the second approach is really less efficient than the first, but this doesn't really matter as we anyway have already to pay the price for that flexibility. Note also that by an appropriate use of indexes, we might eventually get better performance compared to what we have today.
The second style (prop/value) can be implemented in several ways:
- a single
properties
table withprop
andvalue
string columns (this is what our currentticket_custom
,system
andrepository
tables do) - a single
properties
table withprop
and several typed columns (int_value
,string_value
,date_value
, etc.), only the appropriate column being used - see discussion (JBPM's approach) - several
properties
tables with eachprop
andvalue
columns, each table dedicated to a given type (int_properties
,string_properties
, etc.) - … more …
So for example the new model could be simply:
ticket
id | prop | value | seq |
---|
seq is the sequence number in case of multiple entries with the same property name.
or even:
resource_prop
realm | id | prop | value | seq |
---|
(if we use one mega table for all resources)
Note by the way that newer "resources", like repository
in MultiRepos already have this (id,prop/value)
form.
We could also keep the metadata associated to the properties in the database, instead of being hard-coded and present in the TracIni file, permitting an unification of the representation for fixed fields and custom fields.
resource_schema
realm | prop | metaprop | value |
---|
Here, possible content for prop could be 'label', 'default', 'order', 'type', etc.
Example:
ticket | description | type | wiki |
ticket | priority | type | enum |
ticket | priority | enum | priority |
ticket | priority | default | normal |
ticket | need_review | type | checkbox |
ticket | need_review | default | 0 |
Note: the existence of a schema describing the fields doesn't mean that modules can't interpret fields as being special. Quite the opposite, as modules are what provides the real "behavior" of resources. Furthermore, properties not defined in the schema could simply be ignored, so this would allow a great deal of flexibility for plugins when they need to store "special" properties or revision properties.
As a possible refining, it could be possible to have specialized tables, one for each different value column type we want to support:
- resource_prop for text values
- resource_prop_int for integer values
- (resource_prop_float for float values, if really needed)
And we could even differentiate between short and long text values (requirement 4):
- resource_prop for short text values
- resource_prop_text for long text values
(see #6986).
Along the same lines there's also the question of what should be the id: a natural or a surrogate key?
- natural keys
- (id would be 123 for ticket #123, id would be 'milestone1' for milestone1, etc.)
- we have to support different type of keys (text for milestone, int for ticket).
- not a problem for separate tables
- would require resource_int_prop style for resources having an int id … cumbersome
- less compact but easier to "understand"
- renaming is more difficult
- we have to support different type of keys (text for milestone, int for ticket).
- surrogate keys
- (id would be a number in a sequence, never shown as such in the interface)
- only one type of keys (int) - faster, simpler,
- the unique resource_prop table approach is possible
- more compact, not that difficult to read either as there would always be a prop/value pair (
'id'
,<the natural key>
) - renaming is easy (relations are preserved)
This all suggests that using surrogate keys would be preferable, with a single resource_prop table.
If this is the case, the resource_prop table could as well become simply:
id | prop | value | seq |
---|
- the realm information could simply be stored as another prop/value entry (
'realm'
,<the resource realm>
) - this could be extended naturally to the support of multiple projects, with another (
'project'
,<projectid>
) prop/value pair; this is better as adding an additionalproject
column to all entries
Note that having a separate table for storing the relations only wouldn't necessary be better, as we would likely have to store something like:
role | source | target | seq |
---|
which is just a reformulation of:
prop | id | value | seq |
---|
i.e. the proposed resource_prop.
Resource History
We need to differentiate between the changes to the data, and the metadata about the change. The metadata is about who did the change, when, why the change was made, etc. We can adopt the same flexible strategy as the one for resource properties and store arbitrary key/value pairs of "revision properties".
resource_revprop
changeid | revprop | value |
Typical example:
101001 | author | cboos |
101001 | auth | 1 |
101001 | date | 1231232114.12 |
101001 | comment | random change |
A given changeid is usually related to a specific change in one resource, but there could be other situations:
- one change affecting lots of resources (typically #4582 and #5658, ticket batch changes #525)
- changes affecting changes (typically #454); now that we have completed #454, it would be interesting to see what constraints this impose on the new model
The property changes themselves are stored in other tables. We only need the changed properties here, no need to store the old/new values for each change, as this can be deduced from the past changes. Deletions of fields should be represented by setting a field to the NULL value.
Several possibilities here:
ticket_change
id | changeid | prop | value | seq |
---|
milestone_change
id | changeid | prop | value | seq |
---|
or:
resource_change
id | changeid | prop | value | seq |
---|
(surrogate key approach)
The latter has the advantage that it would make easy to relate a given changeid
to the resource(s) that were affected by the change, without having to go through
each resource table.
We could also keep all property changes as text values
or have extra ..._int
(..._float
) tables for more compact
representation.
See also ticket:6466#comment:10 and follow-ups for a discussion about how ticket changes and in particular ticket change edits, could be handled using this approach.
Possible Implementation Plan
Milestone First
- modify the Milestone module so that it uses the new proposed datamodel.
- experiment new tabbed view for the milestone (View, Discussion, History). See TracProject/UiGuidelines.
- milestone should be able to have attachments, too (#3068)
- adapt the Roadmap module to the new model
- adapt the Milestone admin component to the new model
Once this is complete, validate the genericity by promoting the components to be first class resources as well (#1233).
Ticket First
As the ticket module is by far the most complex, it might be worth to try out the new model there first:
- we could verify that we meet the expectations in terms of code simplification, solving open issues, etc.
- we could detect early if there are no regressions or risk of losing current features
- by redeploying the ticket infrastructure to the other components, we could spread the most benefits of tickets (comments, custom fields, queries, etc.) to other resources (milestone, wiki, component, …)
Wiki First
Milestone and components are closer in spirit to a wiki page than to a ticket (we have the long standing #3776 ticket).
We could as well start simply on the wiki model and wiki module, to rework the basics there, and once this is done port the changes to the milestone module, then work on the components.
If we do this, we can also at the same time think about the storage model, in particular how past versions could be stored in a VCS (#1132). We could then think about a serialization model that would play nicely with the genericity of the Trac resources on one hand side, and with the external editors on the other hand.
Component First
An other possibility would be to start on fresh ground, implementing not yet existing resource typse in a generic way. That would be the less disrupting approach, as everything else should just continue to work unchanged (or mostly unchanged, except for things like #5211 and places where we deal with the old component
table).
Related Tickets
- Data model issues:
- Resource related:
- #150
- User-centric storage. Last-viewed information and 'intelligent' followup
- #221
- Creating TR for multiple components
- #695
- Keywords for wiki pages
- #787
- Change attachment description
- #918
- [patch] Custom Ticket Fields should support the multiple selection type
- #1113
- Show milestone changes in timeline
- #1386
- Adding properties to Wiki Pages
- #1395
- Text box for duplicate when a bug is a duplicate
- #1678
- Show component view in timeline view for checkins
- #1835
- Add a discussion tab for Trac resources
- #2035
- Changeset commenting
- #2344
- sub milestone
- #2464
- Conditional fields in tickets
- #2465
- Add "custom query" functionality for changesets
- #2467
- Link user name in reports to custom query showing that user's open tickets
- #2662
- assign tickets to multiple users
- #2961
- custom comment fields
- #3003
- milestone could be a ticket
- #3080
- Custom field sorts only as text
- #3718
- Trac should use HTTP 301 Moved when milestones get renamed
- #3911
- implement an object system (xwiki, roundup)
- #4588
- User Page: trac links to these pages
- #5211
- Ticket - Wiki Integration enhancement/suggestion
- #7871
- Add start date to a milestone
- #8335
- Create People page with user profiles and status of who's doing what.
- #9263
- if SVN is used, milestone should manage/allowtoassign links for branch and tag