Ideas for the GenericTrac data model
We list a few ideas about what could be the new model.
In the following:
- int stands for the integer column type
- string stands for a short text column type (e.g. varchar(65))
- text stands for a long text column type, of unbounded size
Example for the Complete Model
- surrogate keys for all resources
- int, bigint, short and long text fields
Not absolutely necessary to go that far, this could nevertheless help a lot for the MySQL backend (#6986), possibly also for a future Oracle backend. Don't know about PostgreSQL, but for SQLite this should be indifferent.
bigint is needed since we use this to store microsecond precision timestamps (#6466).
Schema
The generic scheme for a given resource:
| {resource}_prop | |||
|---|---|---|---|
| id | prop | value | seq |
| int | string | biging | bigint |
| {resource}_prop_string | |||
| id | prop | value | seq |
| int | string | string | bigint |
| {resource}_prop_text | |||
| id | prop | value | seq |
| int | string | text | bigint |
The generic scheme for the resource change metadata:
| {resource}_rev_prop | |||
|---|---|---|---|
| id | prop | value | seq |
| int | string | bigint | bigint |
| {resource}_rev_prop_string | |||
| id | prop | value | seq |
| int | string | string | bigint |
| {resource}_rev_prop_text | |||
| id | prop | value | seq |
| int | string | text | bigint |
The generic scheme for the resource changes themselves:
| {resource}_version_prop | |||
|---|---|---|---|
| id | prop | value | seq |
| int | string | bigint | bigint |
| {resource}_version_prop_string | |||
| id | prop | value | seq |
| int | string | string | bigint |
| {resource}_version_prop_text | |||
| id | prop | value | seq |
| int | string | text | bigint |
Example Dataset
FIXME add example of multivalued property
| {resource}_prop | |||
|---|---|---|---|
| id | prop | value | seq |
| 0 | id | 130 | 100001 |
| {resource}_prop_string | |||
| id | prop | value | seq |
| 0 | reporter | joe | 100002 |
| {resource}_prop_text | |||
| id | prop | value | seq |
| 0 | summary | Multiple Project Support | 100003 |
| 0 | description | One day… | 100004 |
| {resource}_rev_prop | |||
|---|---|---|---|
| id | prop | value | seq |
| 11 | resource | 0 | 10000010 |
| 11 | date | (5 years ago) | 10000011 |
| 12 | resource | 0 | 10000012 |
| 12 | date | (2 years ago) | 10000013 |
| 13 | resource | 0 | 10000014 |
| 13 | date | (1 year ago) | 10000015 |
| {resource}_revprop_string | |||
| id | prop | value | seq |
| 11 | author | joe | 10000016 |
| 12 | author | joe | 10000017 |
| 13 | author | cboos | 10000018 |
| {resource}_revprop_text | |||
| id | prop | value | seq |
| 12 | comment | come on… | 10000019 |
| 13 | comment | sure… | 10000020 |
| {resource}_version | |||
|---|---|---|---|
| id | prop | value | seq |
| 21 | resource | 0 | 10000080 |
| 21 | revision | 11 | 10000081 |
| 22 | resource | 0 | 10000082 |
| 22 | revision | 12 | 10000083 |
| 23 | resource | 0 | 10000084 |
| 23 | revision | 13 | 10000085 |
| {resource}_version_string | |||
| id | prop | value | seq |
| 21 | reporter | joe | 10000086 |
| {resource}_version_text | |||
| id | prop | value | seq |
| 21 | summary | Multiple Project Support | 10000087 |
| 21 | description | Should be easy… | 10000088 |
| 22 | description | Should be easy… Redmine has it! | 10000089 |
| 23 | description | One day… | 10000090 |
About the "Schema"
{resource}_schema realm prop metaprop value
Here, possible content for prop could be 'label', 'default', 'order', 'type', etc.
Example:
ticket description type wiki ticket priority type enum ticket priority enum priority ticket priority default normal ticket need_review type checkbox ticket need_review default 0
But we could simply reuse the "generic scheme", instantiated for each resource using "{resource}_schema" resource type. Each instance would correspond to a field specification.
Example:
ticket_schema_prop 10305001 name description 50030303 10305001 type wiki 50030304 10305002 name priority 50030305 10305002 type enum 50030306 10305002 enum priority 50030307 10305002 default normal 50030308 10305003 name need_review 50030309 10305003 type checkbox 50030310 10305003 default 0 50030311
Discussion
- (cklein) Why not implement all of the different resource_prop* tables into a single table, where each tuple has multiple attributes, see for example the JBPM datamodel for a working and presumably also fast approach. Here, there exists a process_variable or some similar table that stores all the different value types in single table.
The schema would be like so:table resource_prop { id prop type int_val string_val datetime_val ... } table resource_revprop { change_id prop type int_val string_val ... } table resource_change { id change_id prop type int_val string_val ... }- (cboos) not sure how you see that as an advantage; each row will waste all the fields but one; there need to be one index for each type, each index having to deal with lots of NULL values, each update will have to rebuild all indexes, etc.). But it could be worth benchmarking anyway…
- (cboos) note that when we discussed a bit GenericTrac with Remy, his natural choice was also towards this approach, so it's certainly worth considering :-)
- (cboos) not sure how you see that as an advantage; each row will waste all the fields but one; there need to be one index for each type, each index having to deal with lots of NULL values, each update will have to rebuild all indexes, etc.). But it could be worth benchmarking anyway…
- And please rename the name field to prop so that it matches the one in the resource_schema table.
- (cboos) done - now I use prop consistently to talk about resource property names
- Also I would like to have the resource_schema table extended so that it will support different schemas for, say, different ticket types. That way, users can define their personal ticket type schemas. Of course, derivation would also be nice, but that could be implemented at a later point in time, requiring yet another table. That way we could have both inheritance at the schema level and also multiple different models per realm ;)
table resource_schema { realm urn -- the urn of the ticket schema or wiki page schema to which this belongs to, defaults are for example trac::ticket or trac::milestone or trac::wiki-page and so on prop name type }The inheritance table for the schemas would then betable resource_schema_inheritance { realm urn base -- the urn of the base schema }- (cboos) Interesting idea, though for simplicity my initial idea was that one could simply define new realms (bug: vs. enhancement:, both nevertheless managed by the TicketModule)
- inheritance would then provide for also multiple inheritance
- (cboos) much harder ;-)
Experiments
The usual issue with EAV models is the relative complexity of making queries. If we'd like to support something as flexible as our custom query module (and even more, as we currently don't support multi-valued fields, needed in particular for TracDev/Proposals/TicketLinks), we need to find an efficient way, which doesn't require too many joins.
One idea is to combine two queries, the first to get the ids of the matching objects, the second to retrieve the desired columns.
The two queries could be combined in a single SQL statement, or the set of ids could be first used to see if there is already some corresponding data present in memory that could be reused. Only the remaining ids would then be used for looking up the remaining values (as "full" objects).
Example: looking for the ticket number, summary, priority and blocked-by values for all tickets having a priority lower than 3 and "test" in their summary or as a keyword.
With the single table approach:
CREATE TABLE ticket_prop ( id bigint , prop varchar(16) , ival bigint , tval text , seq bigint ); SELECT p.id, p.prop, p.ival, p.tval, p.seq FROM (SELECT id FROM ticket_prop WHERE prop = 'priority' AND ival < 3 INTERSECT SELECT id FROM (SELECT id FROM ticket_prop WHERE prop = 'summary' AND tval LIKE '%test%' UNION SELECT id FROM ticket_prop WHERE prop = 'keyword' AND tval = 'test' ) ) AS ids LEFT JOIN ticket_prop p ON ids.id = p.id WHERE p.prop in ('id', 'summary', 'priority', 'blocked-by') ORDER BY p.id, p.prop, p.seq;
With the multiple table approach:
CREATE TABLE ticket_prop_int ( id bigint , prop varchar(16) , val bigint , seq bigint ); CREATE TABLE ticket_prop_text ( id bigint , prop varchar(16) , val text , seq bigint ); SELECT p.id as id, p.prop as prop, NULL as ival, p.val as tval, p.seq as seq FROM (SELECT id FROM ticket_prop_int WHERE prop = 'priority' AND val < 3 INTERSECT SELECT id FROM (SELECT id FROM ticket_prop_text WHERE prop = 'summary' AND val LIKE '%test%' UNION SELECT id FROM ticket_prop_text WHERE prop = 'keyword' AND val = 'test' ) ) AS ids LEFT JOIN ticket_prop_text p ON ids.id = p.id WHERE p.prop = 'summary' UNION SELECT p.id as id, p.prop as prop, p.val as ival, NULL as tval, p.seq as seq FROM (SELECT id FROM ticket_prop_int WHERE prop = 'priority' AND val < 3 INTERSECT SELECT id FROM (SELECT id FROM ticket_prop_text WHERE prop = 'summary' AND val LIKE '%test%' UNION SELECT id FROM ticket_prop_text WHERE prop = 'keyword' AND val = 'test' ) ) AS ids LEFT JOIN ticket_prop_int p ON ids.id = p.id WHERE p.prop in ('id', 'priority', 'blocked-by') ORDER BY id, prop, seq;
Ok, maybe there's some clever way to reuse the ids, but I couldn't find one. Hope the SQL backend is smart enough to detect it's the same subquery ;-)


