= GenericTrac Data Model = This page attempts to define a new data model for Trac that could be suitable for most of its resources. The main benefits expected from the new model are: - simplification of the internals of Trac, especially for the ticket model, in which the storage of changes is quite cumbersome (see #454, #6466) - solve a few design problems with the current data model (like #1890, #4582) - allow better code reuse and share of the base features among different kinds of resources (numerous examples for that, see [#RelatedTickets] below) This stems from the following former proposals: - TracObjectModelProposal - TracDev/Proposals/DataModel - TracDev/Proposals/Journaling See also [googlegroups:trac-dev:8cf3f5fe0e476ce5 this mail]. As this will be a major redesign of the data model, it could also be a good opportunity to take the ''[TracMultipleProjects multiple project]'' considerations into account (#130). Each resource related table could get a `project` identifier field. Working on the generic aspect of Trac should also make it possible to implement various ''generic'' operations on Trac resources as plugins, mainly being able to (re-)implement TracCrossReferences as a plugin (see also #6543). == Design Discussion == Requirements for the new model: 1. it has to be ''simple''; 2. it must be ''flexible'', in order to accommodate different kinds of resources and allow for dynamic evolution; 3. it should remain ''fast'', if not faster than what we currently have; 4. it should lead to a more ''compact'' representation of data === Resource Content === The ticket model is by far richer data model we would have to support, so we could take this as a basis to lay out the foundations of the new model. For ticket, we currently have a fixed set of properties (as columns in the `ticket` table) and a flexible set of properties (as name/value columns in a `ticket_custom` table). Both styles have advantages and disadvantages: 1. properties as columns: - (-) only flexibility is to not use some fields (e.g. severity) - (-) no multiple values per field possible - (+) faster - (+) straightforward code (`for field1,field2, ... in cursor: ...`) 2. properties in name/value columns - (+) highest flexibility, add or remove fields at will - (+) allow for multiple values per name, provided we don't use a primary key as we currently do for the `ticket_custom` table (#918) - (-) slower, less memory efficient (?) - (-) more complex code (?) In order to reduce the overall complexity, the idea would be to pick only one approach, instead of having to support both. By using the second style, we could also have our "fixed" set of properties, while obviously the first style can't support the second. It remains to be seen whether the second approach is really less efficient than the first, but this doesn't really matter as we anyway have already to pay the price for that flexibility. So the new model could be simply: '''ticket''' || ''id'' || ''name'' || ''value'' || or even: '''resource_prop''' || ''realm'' || ''id'' || ''name'' || ''value'' || (if we use one mega table for all resources) We could also keep the metadata associated to the properties in the database, instead of being hard-coded and present in the TracIni file. '''resource_schema''' || ''realm'' || ''prop'' || ''name'' || ''value'' || Here, possible values for ''name'' could be 'label', 'default', 'order', 'type', etc. Example. || ticket || description || type || wiki || || ticket || priority || type || enum || || ticket || priority || enum || priority || || ticket || priority || default || normal || || ticket || need_review || type || checkbox || || ticket || need_review || default || 0 || As a possible refining, it could be possible to have specialized tables, one for each different value column type we want to support: - '''resource_prop''' for text values - '''resource_prop_int''' for integer values - ('''resource_prop_float''' for float values, if really needed) And we could even differentiate between short and long text values (requirement 4): - '''resource_prop''' for short text values - '''resource_prop_text''' for long text values (see #6986). Along the same lines there's also the question of what should be the ''id'': a natural or a surrogate key? - natural keys: (''id'' would be 123 for ticket !#123, id would be 'milestone1' for milestone1, etc.) - we have to support different type of keys (text for milestone, int for ticket). - not a problem for separate tables - would require ''resource_int_prop'' style for resources having an ''int'' id ... cumbersome - less compact but easier to "understand" - renaming is more difficult - surrogate keys: (''id'' would be a number in a sequence, never shown as such in the interface) - only one type of keys (int) - faster, simpler, the unique ''resource_prop'' table approach is possible - more compact, not that difficult to read either (there would always be a ''name=id'', ''value=the natural key'' entry - renaming is easy (relations preserved) This suggests that using surrogate keys would be preferable. Now if this is the case, the '''resource_prop''' table could as well become: || ''id'' || ''name'' || ''value'' || and the ''realm'' information could simply be store as another name/value entry. === Resource History === We need to differentiate between the changes to the data, and the metadata about the change. The metadata is about who did the change, when, why the change was made, etc. We can adopt the same flexible strategy as the one for resource properties and store arbitrary name/value pairs of "revision properties". '''resource_revprop''' || ''changeid'' || ''name'' || ''value'' || Typical example: || 101001 || author || cboos || || 101001 || auth || 1 || || 101001 || date || 1231232114.12 || || 101001 || comment || random change || A given ''changeid'' is usually related to a specific change in one resource, but there could be other situations: - one change affecting lots of resources (typically #4582 and #5658) - changes affecting changes (typically #454) The property changes themselves are stored in other tables. Several possibilities here: '''ticket_change'' || ''id'' || ''changeid'' || ''name'' || ''value'' || '''milestone_change'' || ''id'' || ''changeid'' || ''name'' || ''value'' || or: '''resource_change'' || ''id'' || ''changeid'' || ''name'' || ''value'' || (surrogate key approach) The latter has the advantage that it would make easy to relate a given ``changeid`` to the resource(s) that were affected by the change, without having to go through each resource table. We could also keep all property changes as text values or have extra `..._int` (`..._float`) tables for more compact representation. See also ticket:6466#comment:10 and follow-ups for a discussion about how ticket changes and in particular ticket change edits, could be handled using this approach. == Possible Implementation Plan == === Milestone First === - modify the Milestone module so that it uses the new proposed datamodel. See [#TheMilestoneExample]. - experiment new tabbed view for the milestone (''View'', ''Discussion'', ''History''). See TracProject/UiGuidelines. - milestone should be able to have attachments, too (#3068) - adapt the Roadmap module to the new model - adapt the Milestone admin component to the new model Once this is complete, validate the genericity by promoting the components to be first class resources as well (#1233). === Ticket First === As the ticket module is by far the most complex, it might be worth to try out the new model there first: - we could verify that we meet the expectations in terms of code simplification, solving open issues, etc. - we could detect early if there are no regressions or risk of losing current features - by redeploying the ticket infrastructure to the other components, we could spread the most benefits of tickets (comments, custom fields, queries, etc.) to other resources (milestone, wiki, component, ...) == Related Tickets == - Data model issues: [[TicketQuery(status=!closed&keywords=~model)]] - Resource related: [[TicketQuery(status=!closed&keywords=~tracobject)]]