Edgewall Software

Version 10 (modified by Christian Boos, 17 years ago) ( diff )

Update the targeted milestone to 0.12 and add some multi-project support considerations

Version Control Refactoring

This branch is a sandbox for introducing new features that are potentially disruptive.

The forthcoming changes aim to be better support some advanced version control system backends, like Mercurial. To that effect, the changes added to the core will be exercised by jointly developing the TracMercurial plugin.

Sub topics:

See also the related Trac-Dev:161 thread.

Support for Multiple scopes within a Repository

This is mostly done in trunk, now.

Support for Multiple Repositories

This will most likely have to wait 0.12.

The cache needs to be modified/extended as well, in order to accommodate multiple repositories.

There are several options:

  1. use the cache as it is, merging all the repositories in a kind of virtual repository; the first component of the path would be the name of the repository.
  2. use a separate pair of tables for each repository
  3. use a dedicated db to cache each repository

Option 1. seems the best way to go. Its efficiency depends mainly about how the new cache will be implemented. If we go with path ids, then using one table would be practical, I think.

See also: #2086, trac-dev:340 and, more recently, this mail where I explain how TracLinks will support multiple repositories.

Another important interdependency which comes to mind is the support for multiple projects in a single environment (see this proposal).

In this scenario, each project would have one or more repositories. Those repositories could eventually be shared between projects.

Take the following example:

  • Project A
    • repository /srv/svn/repo1 (trunk, branches, etc.)
  • Project B
    • repository /srv/svn/repo1 (trunk, branches, etc.)
    • repository /srv/svn/repo2 (trunk, branches, etc.)

Within a wiki page of project A, [123] or source:trunk/ would have the usual 1-to-1 meaning. The same resources, referenced from within a page belonging to project B could be accessed using InterTrac links: [A123] or A:source:trunk/.

Now within project B, referring to [123] or source:trunk/ would be ambiguous, unless a default repository would be specified (say /srv/svn/repo2). But in general, path restriction should be used to properly identify the resource: [123/repo1], source:repo1/trunk/ and [123/repo2], source:repo2/trunk/.

The only problem with this approach would be to risk some confusion if a repository name is also used as a toplevel folder name of some other repository in the same project. I'm don't think it's a showstopper though, as:

  • this shouldn't happen often in the first place
  • if it happens nevertheless, a simple disambiguation rule could be adopted, like always consider that if the first element in the path restriction corresponds to a repository name when multiple repositories are present, then it's used as a repository selector.

On the data model level, the cache for /srv/svn/repo1 will be shared for projects A and B. We simply need an additional relation table, pairing projects with repositories.

Support for Mercurial-like Version Control System

Basic Level

  • DONE
    • support for non-numerical changesets (start with hexadecimal digit support)
    • support for extra changeset properties
      • basic infrastructure in trunk
      • support for SVN: see #2545

Those are the minimal changes needed so that the TracMercurial plugin can work at all.

Advanced Level

  • TracRevisionLog should show the branches (a la hgk). See also #1492.
  • DONE
    • Support for arbitrary changeset names (e.g. [tip] or [head])
    • Support for direct jump to a tag or a branch. Done on the branch (r3017); re-done for 0.11 (now in trunk)

Support for Big Repositories

This means extending cache support. Support for multiple repositories would also require some changes to the caching anyway.

This is material for Trac 0.12

New Repository Cache

I think I've come up with a new caching scheme that would be able to handle this. The idea is to replicate the tree changes information that svn stores. This should also work for Mercurial or other backends.

The node_changes table could even be kept as it is, I think. The main difference would be that we should also add the paths for files and folders that were not modified themselves, but happen to be in the same folder as one of the file or folder that has been modified.

That way, we could implement:

  • Repository.get_node(path, rev) using the cached information only, which would be a dramatic improvement for Mercurial, which has no information about the folder themselves.
  • Likewise, Repository.get_path_history could also be implemented in a generic and efficient way using that caching scheme.
  • The next/prev history navigation between revisions (or rather, their extended children/parents versions) could also be implemented on top of the cache.
  • Probably Node.get_history as well, not to mention the possibility to find out the copy_to information (#1445).
  • Repository.get_changes(from,to) should also be implemented using the cached information in the revision table (that would solve the #2353 issue).

Example:

(1) trunk/      (2) trunk/      (3) trunk/     (4) trunk/   (5) trunk/
      dir1/           dir1/           dir1/         ...          ...
      dir2/           dir2/           dir2/        tags/        tags/
      README          README*         README                      v1/ (copied from trunk)
                      dir3/           dir3/
                       A               A*
                       B               B

Would result in:

rev path node_type change_type base_path base_rev
1 trunk D A -1
1 trunk/dir1 D A -1
1 trunk/dir2 D A -1
1 trunk/README F A -1
2 trunk D (i) (ii) 2 (iii)
2 trunk/dir1 D 1
2 trunk/dir2 D 1
2 trunk/README F E 1
2 trunk/dir3 D A -1
2 trunk/dir3/A F A -1
2 trunk/dir3/B F A -1
3 trunk D 3
3 trunk/dir1 D 1
3 trunk/dir2 D 1
3 trunk/README F 2
3 trunk/dir3 D 3
3 trunk/dir3/A F E 2
3 trunk/dir3/B F 2
4 trunk D 3
4 tags D A -1
5 trunk D 3
5 tags D 5
5 tags/v1 D C trunk 3

Notes:

  1. change_type will be empty when the path didn't actually change in that revision, but is simply included in the cache for the sake of get_nodes
  2. base_path should be left empty when it doesn't change, even for regular edits. This will save some space.
  3. base_rev will point to the last changed rev for that path, i.e. the latest revision in which its change_type was not null.

The paths could maybe be represented by hashes of their dirname, and only the filename part would be in clear text (#3676?).

Mercurial would need to store additional information per node, in particular the file size. Using Mercurial:RevlogNG, that information would be cheap to get, but revlogng is not yet widely used. — update: source:sandbox/mercurial-plugin-0.11 uses the new API

The most flexible approach for storing extra node fields is certainly to let each backend create and maintain an additional table, e.g. node_changes_hg (see also #2733).


See also: improving bazaar support, discussed on Trac-Dev

Note: See TracWiki for help on using the wiki.