Edgewall Software

Changes between Version 4 and Version 5 of GenericTrac/Brainstorm


Ignore:
Timestamp:
Jan 6, 2011, 11:54:00 PM (13 years ago)
Author:
Christian Boos
Comment:

saving some notes about one vs. multiple tables for attributes

Legend:

Unmodified
Added
Removed
Modified
  • GenericTrac/Brainstorm

    v4 v5  
    1 
    21= Ideas for the GenericTrac data model =
    32
     
    176175== Discussion
    177176
    178  - (cklein) [=#JBPM-approach] Why not implement all of the different `resource_prop*` tables into a single table, where each tuple has multiple attributes, see for example the JBPM datamodel for a working and presumably also fast approach. Here, there exists a process_variable or some similar table that stores all the different value types in single table.
    179    - (cboos) not sure how you see that as an advantage; each row will waste all the fields but one; there need to be one index for each type, each index having to deal with lots of NULL values, each update will have to rebuild all indexes, etc.). But it could be worth benchmarking anyway...
    180   The schema would be like so:
     177 - (cklein) [=#JBPM-approach] Why not implement all of the different `resource_prop*` tables into a single table, where each tuple has multiple attributes, see for example the JBPM datamodel for a working and presumably also fast approach. Here, there exists a process_variable or some similar table that stores all the different value types in single table. \\
     178 The schema would be like so:
    181179{{{
    182180table resource_prop
     
    212210}
    213211}}}
     212   - (cboos) not sure how you see that as an advantage; each row will waste all the fields but one; there need to be one index for each type, each index having to deal with lots of NULL values, each update will have to rebuild all indexes, etc.). But it could be worth benchmarking anyway...
     213     - (cboos) note that [[TracDev/FrenchDevCon2010|when we discussed]] a bit GenericTrac with Remy, his natural choice was also towards this approach, so it's certainly worth considering :-)
    214214 - And please rename the ''name'' field to ''prop'' so that it matches the one in the resource_schema table.
    215215   - (cboos) done - now I use `prop` consistently to talk about resource property names
     
    237237 - inheritance would then provide for also multiple inheritance
    238238   - (cboos) much harder ;-)
     239
     240
     241=== Experiments
     242
     243The usual issue with EAV models is the relative complexity of making //queries//.
     244If we'd like to support something as flexible as our [/query custom query] module (and even more, as we currently don't support multi-valued fields, needed in particular for TracDev/Proposals/TicketLinks), we need to find an efficient way, which doesn't require too many joins.
     245
     246One idea is to combine two queries, the first to get the ids of the matching objects, the second to retrieve the desired columns.
     247
     248The two queries could be combined in a single SQL statement, or the set of ids could be first used to see if there is already some corresponding data present in memory that could be reused. Only the remaining ids would then be used for looking up the remaining values (as "full" objects).
     249
     250Example: looking for the ticket number, summary, priority and blocked-by values for all tickets having a priority lower than 3 and "test" in their summary or as a keyword.
     251
     252With the single table approach:
     253{{{#!sql
     254CREATE TABLE ticket_prop  ( id bigint
     255             , prop varchar(16)
     256             , ival bigint
     257             , tval text
     258             , seq bigint
     259             );
     260
     261SELECT p.id, p.prop, p.ival, p.tval, p.seq
     262FROM (SELECT id FROM ticket_prop
     263      WHERE prop = 'priority' AND ival < 3
     264      INTERSECT
     265      SELECT id FROM
     266      (SELECT id FROM ticket_prop
     267       WHERE prop = 'summary' AND tval LIKE '%test%'
     268       UNION
     269       SELECT id FROM ticket_prop
     270       WHERE prop = 'keyword' AND tval = 'test' )
     271     ) AS ids
     272 LEFT JOIN ticket_prop p ON ids.id = p.id
     273WHERE p.prop in ('id', 'summary', 'priority', 'blocked-by')
     274ORDER BY p.id, p.prop, p.seq;
     275}}}
     276
     277With the multiple table approach:
     278{{{#!sql
     279CREATE TABLE ticket_prop_int  ( id bigint
     280             , prop varchar(16)
     281             , val bigint
     282             , seq bigint
     283             );
     284
     285CREATE TABLE ticket_prop_text  ( id bigint
     286             , prop varchar(16)
     287             , val text
     288             , seq bigint
     289             );
     290
     291SELECT p.id as id, p.prop as prop, NULL as ival, p.val as tval, p.seq as seq
     292FROM (SELECT id FROM ticket_prop_int
     293      WHERE prop = 'priority' AND val < 3
     294      INTERSECT
     295      SELECT id FROM
     296      (SELECT id FROM ticket_prop_text
     297       WHERE prop = 'summary' AND val LIKE '%test%'
     298       UNION
     299       SELECT id FROM ticket_prop_text
     300       WHERE prop = 'keyword' AND val = 'test' )
     301     ) AS ids
     302 LEFT JOIN ticket_prop_text p ON ids.id = p.id
     303WHERE p.prop = 'summary'
     304UNION
     305SELECT p.id as id, p.prop as prop, p.val as ival, NULL as tval, p.seq as seq
     306FROM (SELECT id FROM ticket_prop_int
     307      WHERE prop = 'priority' AND val < 3
     308      INTERSECT
     309      SELECT id FROM
     310      (SELECT id FROM ticket_prop_text
     311       WHERE prop = 'summary' AND val LIKE '%test%'
     312       UNION
     313       SELECT id FROM ticket_prop_text
     314       WHERE prop = 'keyword' AND val = 'test' )
     315     ) AS ids
     316 LEFT JOIN ticket_prop_int p ON ids.id = p.id
     317WHERE p.prop in ('id', 'priority', 'blocked-by')
     318ORDER BY id, prop, seq;
     319}}}
     320Ok, maybe there's some clever way to reuse the `ids`, but I couldn't find one. Hope the SQL backend is smart enough to detect it's the same subquery ;-)