Edgewall Software

Changes between Version 2 and Version 3 of TracDarcs/Cache


Ignore:
Timestamp:
Apr 21, 2009, 11:33:51 PM (15 years ago)
Author:
Lele Gaifax <lele@…>
Comment:

Complete overview

Legend:

Unmodified
Added
Removed
Modified
  • TracDarcs/Cache

    v2 v3  
     1I'm writing down these details here because maybe other VC could take advantage of the same considerations, or even the VcRefactoring.
     2
    13= Darcs specific cache =
    24
     
    1113Generally speaking, in darcs the ''order'' of patches is meaningless: from an abstract point of view, a repository may be seen as a ''bag'' of changesets that, under the rules specified by the [http://en.wikibooks.org/wiki/Understanding_darcs/Patch_theory patch theory], may be composed (''commuted'') together to give the result one expects.
    1214
    13 In the Trac context however it's very handy to be able to point to a particular change by it's ''application ordering index'' instead of the full hash. To accomplish that, TracDarcs chose to use an artificial monotonically incrementing index as a shortcut, effectively mimicing the Subversion view of the history.
     15In the Trac context however it's very handy to be able to point to a particular change by it's ''application ordering index'' instead of the full hash. To accomplish that, TracDarcs chose to use an [http://progetti.arstecnica.it/trac%2Bdarcs/wiki/DarcsBackend#Revisionnumber artificial monotonically incrementing index] as a shortcut, effectively mimicing the Subversion view of the history.
    1416
     17This index is the `rev` field in the standard schema tables; TracDarcs adds a table `darcs_changesets` to augment the description of a single changeset (that is, to extend the `revision` table), and to have a way to map the `rev` field to the globally unique identifier, the `hash`, and the other way around:
    1518
     19{{{
     20    rev_table = Table('darcs_changesets', key='rev')[
     21        Column('rev',type='int'),
     22        Column('hash'),
     23        Column('name'),
     24        Index(['hash'])]
     25}}}
    1626
     27Another way to indentify a single (or rather a set of) changeset(s) in darcs is by its (their) `name` (darcs makes explicit the Subversion suggestion of using the first line of the changelog as a summary, the rest as a longer description).
    1728
     29=== M/R considerations ===
     30
     31Current MultipleRepositorySupport/Cache introduces the notion of a `repos`, and an equivalent field shall be added to this table, changing the primary key to `repos,rev`...
     32
     33I hope to convince that it would be nicer using a ''surrogate primary key'' for `repository`, to allow the administration layer to rename repositories without having to update handfuls of tables.
     34
     35IMHO, even the standard schema's `revision` and `node_change` should
     36
     37 a. use that surrogate key to point to the repository
     38 a. introduce their own surrogate index: this would make much easier for TracDarcs (and I'd imagine for any other VC that needs to extend the schema in some way) to correlate the tables (in other words, if done in this way from the beginning, TracDarcs wouldn't even need to know that Trac 0.12 will come with MultipleRepositorySupport :-)
     39
     40OTOH, if the consensum will be going with the IRepositoryChangeListener way proposed by cboos, I'll probably add the mentioned surrogate key to `darcs_changesets` anyway, to simplify TracDarcs's queries.
     41
     42== Nodes, a.k.a. the versioned tree ==
     43
     44As said, TracDarcs wants to cache also the actual ''tree'' of the repository, at any given changeset, because it is (still) very expensive if not impossible doing that with darcs itself.
     45
     46K.S.Sreeram (kudos!) replaced my original broken solution with one that actually works great (no new bug has been reported on the algorithm lately :)
     47
     48The pitfall was handling ''complex'' changesets (I mean, something SVN does not handle/allow) as well as ''corner cases'' (some would say ''b0rken'') changesets that darcs builds (maybe recorded with a buggy 0.9.x version of darcs many years ago, fixed and handled in succeding versions). If there is a lesson I learned writing [http://progetti.arstecnica.it/tailor Tailor], it's surely that each VC has its own strange interpretation of what a changeset means :-)
     49
     50Sreeram's solution uses two tables (actually three, one for caching the content of files, to make periodic purge easier), `darcs_nodes` and `darcs_nodes_changes`. The first introduces an artificial entity that represents a particular ''path'', so that the machinery can operate unambiguosly when the exact same path actually means different items (imagine something sillyties like ``ADD this/path; MOVE this/path other/path; MKDIR this/path; ADD this/path/file; MOVE other/path this/path/newpath``: in this example, each "this/path" would be a different ''node''); each ''node'' is of a particular type (that is, it's either a file '''or''' a directory), and "knows" which revision added it as well the one that removed it, if any.
     51
     52{{{
     53    node_table = Table('darcs_nodes', key='node_id')[
     54        Column('node_id',type='int'),
     55        Column('node_type',size=1),
     56        Column('add_rev',type='int'),
     57        Column('remove_rev',type='int')]
     58}}}
     59
     60=== Nodes changes ===
     61
     62Surprisingly, a generic ''node'' does not correspond to particular "path": such association happens couplings nodes to revisions, that is, quoting Sreeram, ''"a node doesn't have a particular name or content but, for a given revision, its name and content will be well defined"''.
     63
     64This is done in `darcs_node_changes`, that mimics the standard `node_change`, plus it keeps the hierarchy of the entries (that is, at any particular revision, a particular node has a pathname, the parent's node id, and the change it happens to be subject of).
     65
     66{{{
     67    change_table = Table('darcs_node_changes', key=('node_id','rev'))[
     68        Column('node_id',type='int'),
     69        Column('rev',type='int'),
     70        Column('path'),
     71        Column('parent_id',type='int'),
     72        Column('the_change')]
     73}}}
     74
     75=== Nodes content ===
     76
     77Even the content of any requested path at a particular revision is cached by TracDarcs (again, for speed issues). Earlier versions used to keep such cache on the filesystem, while it's now kept within the database itself, in the `darcs_cache` table:
     78
     79{{{
     80    cache_table = Table('darcs_cache', key=('node_id','rev'))[
     81        Column('node_id',type='int'),
     82        Column('rev',type='int'),
     83        Column('content',type=blobtype)]
     84}}}
     85
     86This could go within the `darcs_node_changes` table (see, same primary key), but for practical reason it's been kept on its own.