Context Navigation

Changes between Version 4 and Version 5 of GenericTrac/Brainstorm

Timestamp:: Jan 6, 2011, 11:54:00 PM (13 years ago)
Author:: Christian Boos
Comment:: saving some notes about one vs. multiple tables for attributes

Legend:

: Unmodified
: Added
: Removed
: Modified

GenericTrac/Brainstorm

-              v4
+              v5
 = Ideas for the GenericTrac data model =
 …
 == Discussion
+ - (cklein) [=#JBPM-approach] Why not implement all of the different `resource_prop*` tables into a single table, where each tuple has multiple attributes, see for example the JBPM datamodel for a working and presumably also fast approach. Here, there exists a process_variable or some similar table that stores all the different value types in single table.
+   - (cboos) not sure how you see that as an advantage; each row will waste all the fields but one; there need to be one index for each type, each index having to deal with lots of NULL values, each update will have to rebuild all indexes, etc.). But it could be worth benchmarking anyway...
+  The schema would be like so:
+ - (cklein) [=#JBPM-approach] Why not implement all of the different `resource_prop*` tables into a single table, where each tuple has multiple attributes, see for example the JBPM datamodel for a working and presumably also fast approach. Here, there exists a process_variable or some similar table that stores all the different value types in single table. \\
+ The schema would be like so:
 {{{
 table resource_prop
 …
+}
 }}}
+   - (cboos) not sure how you see that as an advantage; each row will waste all the fields but one; there need to be one index for each type, each index having to deal with lots of NULL values, each update will have to rebuild all indexes, etc.). But it could be worth benchmarking anyway...
+     - (cboos) note that [[TracDev/FrenchDevCon2010|when we discussed]] a bit GenericTrac with Remy, his natural choice was also towards this approach, so it's certainly worth considering :-)
  - And please rename the ''name'' field to ''prop'' so that it matches the one in the resource_schema table.
    - (cboos) done - now I use `prop` consistently to talk about resource property names
 …
  - inheritance would then provide for also multiple inheritance
    - (cboos) much harder ;-)
+=== Experiments
+The usual issue with EAV models is the relative complexity of making //queries//.
+If we'd like to support something as flexible as our [/query custom query] module (and even more, as we currently don't support multi-valued fields, needed in particular for TracDev/Proposals/TicketLinks), we need to find an efficient way, which doesn't require too many joins.
+One idea is to combine two queries, the first to get the ids of the matching objects, the second to retrieve the desired columns.
+The two queries could be combined in a single SQL statement, or the set of ids could be first used to see if there is already some corresponding data present in memory that could be reused. Only the remaining ids would then be used for looking up the remaining values (as "full" objects).
+Example: looking for the ticket number, summary, priority and blocked-by values for all tickets having a priority lower than 3 and "test" in their summary or as a keyword.
+With the single table approach:
+{{{#!sql
+CREATE TABLE ticket_prop  ( id bigint
+             , prop varchar(16)
+             , ival bigint
+             , tval text
+             , seq bigint
+             );
+SELECT p.id, p.prop, p.ival, p.tval, p.seq
+FROM (SELECT id FROM ticket_prop
+      WHERE prop = 'priority' AND ival < 3
+      INTERSECT
+      SELECT id FROM
+      (SELECT id FROM ticket_prop
+       WHERE prop = 'summary' AND tval LIKE '%test%'
+       UNION
+       SELECT id FROM ticket_prop
+       WHERE prop = 'keyword' AND tval = 'test' )
+     ) AS ids
+ LEFT JOIN ticket_prop p ON ids.id = p.id
+WHERE p.prop in ('id', 'summary', 'priority', 'blocked-by')
+ORDER BY p.id, p.prop, p.seq;
+}}}
+With the multiple table approach:
+{{{#!sql
+CREATE TABLE ticket_prop_int  ( id bigint
+             , prop varchar(16)
+             , val bigint
+             , seq bigint
+             );
+CREATE TABLE ticket_prop_text  ( id bigint
+             , prop varchar(16)
+             , val text
+             , seq bigint
+             );
+SELECT p.id as id, p.prop as prop, NULL as ival, p.val as tval, p.seq as seq
+FROM (SELECT id FROM ticket_prop_int
+      WHERE prop = 'priority' AND val < 3
+      INTERSECT
+      SELECT id FROM
+      (SELECT id FROM ticket_prop_text
+       WHERE prop = 'summary' AND val LIKE '%test%'
+       UNION
+       SELECT id FROM ticket_prop_text
+       WHERE prop = 'keyword' AND val = 'test' )
+     ) AS ids
+ LEFT JOIN ticket_prop_text p ON ids.id = p.id
+WHERE p.prop = 'summary'
+UNION
+SELECT p.id as id, p.prop as prop, p.val as ival, NULL as tval, p.seq as seq
+FROM (SELECT id FROM ticket_prop_int
+      WHERE prop = 'priority' AND val < 3
+      INTERSECT
+      SELECT id FROM
+      (SELECT id FROM ticket_prop_text
+       WHERE prop = 'summary' AND val LIKE '%test%'
+       UNION
+       SELECT id FROM ticket_prop_text
+       WHERE prop = 'keyword' AND val = 'test' )
+     ) AS ids
+ LEFT JOIN ticket_prop_int p ON ids.id = p.id
+WHERE p.prop in ('id', 'priority', 'blocked-by')
+ORDER BY id, prop, seq;
+}}}
+Ok, maybe there's some clever way to reuse the `ids`, but I couldn't find one. Hope the SQL backend is smart enough to detect it's the same subquery ;-)