Edgewall Software
Last modified 17 months ago Last modified on Jan 6, 2011 11:54:00 PM

Ideas for the GenericTrac data model

We list a few ideas about what could be the new model.

In the following:

  • int stands for the integer column type
  • string stands for a short text column type (e.g. varchar(65))
  • text stands for a long text column type, of unbounded size

Example for the Complete Model

  • surrogate keys for all resources
  • int, bigint, short and long text fields

Not absolutely necessary to go that far, this could nevertheless help a lot for the MySQL backend (#6986), possibly also for a future Oracle backend. Don't know about PostgreSQL, but for SQLite this should be indifferent.

bigint is needed since we use this to store microsecond precision timestamps (#6466).

Schema

The generic scheme for a given resource:

{resource}_prop
id prop value seq
int string biging bigint
{resource}_prop_string
id prop value seq
int string string bigint
{resource}_prop_text
id prop value seq
int string text bigint

The generic scheme for the resource change metadata:

{resource}_rev_prop
id prop value seq
int string bigint bigint
{resource}_rev_prop_string
id prop value seq
int string string bigint
{resource}_rev_prop_text
id prop value seq
int string text bigint

The generic scheme for the resource changes themselves:

{resource}_version_prop
id prop value seq
int string bigint bigint
{resource}_version_prop_string
id prop value seq
int string string bigint
{resource}_version_prop_text
id prop value seq
int string text bigint

Example Dataset

FIXME add example of multivalued property

{resource}_prop
id prop value seq
0 id 130 100001
{resource}_prop_string
id prop value seq
0 reporter joe 100002
{resource}_prop_text
id prop value seq
0 summary Multiple Project Support 100003
0descriptionOne day… 100004


{resource}_rev_prop
id prop value seq
11 resource 0 10000010
11 date (5 years ago) 10000011
12 resource 0 10000012
12 date (2 years ago) 10000013
13 resource 0 10000014
13 date (1 year ago) 10000015
{resource}_revprop_string
id prop value seq
11 author joe 10000016
12 author joe 10000017
13 author cboos 10000018
{resource}_revprop_text
id prop value seq
12 comment come on… 10000019
13 comment sure… 10000020


{resource}_version
id prop value seq
21 resource 0 10000080
21 revision 11 10000081
22 resource 0 10000082
22 revision 12 10000083
23 resource 0 10000084
23 revision 13 10000085
{resource}_version_string
id prop value seq
21 reporter joe 10000086
{resource}_version_text
id prop value seq
21 summary Multiple Project Support 10000087
21 descriptionShould be easy… 10000088
22 descriptionShould be easy… Redmine has it! 10000089
23 descriptionOne day… 10000090


About the "Schema"

{resource}_schema
realm prop metaprop value

Here, possible content for prop could be 'label', 'default', 'order', 'type', etc.

Example:

ticket description type wiki
ticket priority type enum
ticket priority enum priority
ticket priority default normal
ticket need_review type checkbox
ticket need_review default 0

But we could simply reuse the "generic scheme", instantiated for each resource using "{resource}_schema" resource type. Each instance would correspond to a field specification.

Example:

ticket_schema_prop
10305001 name description 50030303
10305001 type wiki 50030304
10305002 name priority 50030305
10305002 type enum 50030306
10305002 enum priority 50030307
10305002 default normal 50030308
10305003 name need_review 50030309
10305003 type checkbox 50030310
10305003 default 0 50030311

Discussion

  • (cklein) Why not implement all of the different resource_prop* tables into a single table, where each tuple has multiple attributes, see for example the JBPM datamodel for a working and presumably also fast approach. Here, there exists a process_variable or some similar table that stores all the different value types in single table.
    The schema would be like so:
    table resource_prop
    {
    id
    prop
    type
    int_val
    string_val
    datetime_val
    ...
    }
    
    table resource_revprop
    {
    change_id
    prop
    type
    int_val
    string_val
    ...
    }
    
    table resource_change
    {
    id
    change_id
    prop
    type
    int_val
    string_val
    ...
    }
    
    • (cboos) not sure how you see that as an advantage; each row will waste all the fields but one; there need to be one index for each type, each index having to deal with lots of NULL values, each update will have to rebuild all indexes, etc.). But it could be worth benchmarking anyway…
      • (cboos) note that when we discussed a bit GenericTrac with Remy, his natural choice was also towards this approach, so it's certainly worth considering :-)
  • And please rename the name field to prop so that it matches the one in the resource_schema table.
    • (cboos) done - now I use prop consistently to talk about resource property names
  • Also I would like to have the resource_schema table extended so that it will support different schemas for, say, different ticket types. That way, users can define their personal ticket type schemas. Of course, derivation would also be nice, but that could be implemented at a later point in time, requiring yet another table. That way we could have both inheritance at the schema level and also multiple different models per realm ;)
    table resource_schema
    {
      realm
      urn    -- the urn of the ticket schema or wiki page schema to which this belongs to, defaults are for example trac::ticket or trac::milestone or trac::wiki-page and so on
      prop
      name
      type
    }
    
    The inheritance table for the schemas would then be
    table resource_schema_inheritance
    {
      realm
      urn
      base -- the urn of the base schema
    }
    
    • (cboos) Interesting idea, though for simplicity my initial idea was that one could simply define new realms (bug: vs. enhancement:, both nevertheless managed by the TicketModule)
  • inheritance would then provide for also multiple inheritance
    • (cboos) much harder ;-)

Experiments

The usual issue with EAV models is the relative complexity of making queries. If we'd like to support something as flexible as our custom query module (and even more, as we currently don't support multi-valued fields, needed in particular for TracDev/Proposals/TicketLinks), we need to find an efficient way, which doesn't require too many joins.

One idea is to combine two queries, the first to get the ids of the matching objects, the second to retrieve the desired columns.

The two queries could be combined in a single SQL statement, or the set of ids could be first used to see if there is already some corresponding data present in memory that could be reused. Only the remaining ids would then be used for looking up the remaining values (as "full" objects).

Example: looking for the ticket number, summary, priority and blocked-by values for all tickets having a priority lower than 3 and "test" in their summary or as a keyword.

With the single table approach:

CREATE TABLE ticket_prop  ( id bigint
             , prop varchar(16)
             , ival bigint
             , tval text
             , seq bigint 
             );

SELECT p.id, p.prop, p.ival, p.tval, p.seq
FROM (SELECT id FROM ticket_prop
      WHERE prop = 'priority' AND ival < 3
      INTERSECT
      SELECT id FROM
      (SELECT id FROM ticket_prop
       WHERE prop = 'summary' AND tval LIKE '%test%' 
       UNION
       SELECT id FROM ticket_prop
       WHERE prop = 'keyword' AND tval = 'test' )
     ) AS ids
 LEFT JOIN ticket_prop p ON ids.id = p.id
WHERE p.prop in ('id', 'summary', 'priority', 'blocked-by')
ORDER BY p.id, p.prop, p.seq;

With the multiple table approach:

CREATE TABLE ticket_prop_int  ( id bigint
             , prop varchar(16)
             , val bigint
             , seq bigint 
             );

CREATE TABLE ticket_prop_text  ( id bigint
             , prop varchar(16)
             , val text
             , seq bigint 
             );

SELECT p.id as id, p.prop as prop, NULL as ival, p.val as tval, p.seq as seq
FROM (SELECT id FROM ticket_prop_int
      WHERE prop = 'priority' AND val < 3
      INTERSECT
      SELECT id FROM
      (SELECT id FROM ticket_prop_text
       WHERE prop = 'summary' AND val LIKE '%test%' 
       UNION
       SELECT id FROM ticket_prop_text
       WHERE prop = 'keyword' AND val = 'test' )
     ) AS ids
 LEFT JOIN ticket_prop_text p ON ids.id = p.id
WHERE p.prop = 'summary'
UNION
SELECT p.id as id, p.prop as prop, p.val as ival, NULL as tval, p.seq as seq
FROM (SELECT id FROM ticket_prop_int
      WHERE prop = 'priority' AND val < 3
      INTERSECT
      SELECT id FROM
      (SELECT id FROM ticket_prop_text
       WHERE prop = 'summary' AND val LIKE '%test%' 
       UNION
       SELECT id FROM ticket_prop_text
       WHERE prop = 'keyword' AND val = 'test' )
     ) AS ids
 LEFT JOIN ticket_prop_int p ON ids.id = p.id
WHERE p.prop in ('id', 'priority', 'blocked-by')
ORDER BY id, prop, seq;

Ok, maybe there's some clever way to reuse the ids, but I couldn't find one. Hope the SQL backend is smart enough to detect it's the same subquery ;-)