= Version Control Refactoring = This recurring branch is a sandbox for introducing new features that are potentially disruptive. Note that there's currently no source:sandbox/vc-refactoring branch, but there's a source:sandbox/multirepos one for the [#SupportforMultipleRepositories]. [[PageOutline]] The major goals for [milestone:0.12] in the versioncontrol area will be: 1. support for multiple repositories 2. support for scm neutral cache 3. ideally, if the GenericTrac approach is finalized, arbitrary properties, comments and attachments for changeset and path Trac resources (useful for code reviews) The forthcoming changes aim to better support some advanced version control system backends, like [http://www.selenic.com/mercurial Mercurial]. To that effect, the changes added to the core will be exercised by jointly developing the TracMercurial plugin. Sub topics: [[TitleIndex(VcRefactoring/)]] The current controller changes are simple refactorings which are not changing the versioncontrol API, but cleaning up the internals of the versioncontrol related web ui. Those changes could eventually go in 0.11. == Support for Multiple scopes within a Repository == This is mostly done in trunk, now. * This basically means fixing #1830. Initial work on this was done in r2992. This has been reworked more in-depth in [3174/trunk/trac/versioncontrol/svn_fs.py]. * There's also the idea to "scope" a repository using multiple paths. This will probably be done for milestone:0.11. Actually, it should be possible to achieve the above using FineGrainedPermissions, thanks to r3174, but I haven't verified this, up to now. == Support for Multiple Repositories == A first implementation of this feature is now available in the MultipleRepositorySupport branch (for the next release, i.e. Trac [milestone:0.12]). The problematic of the cache is for now avoided, this multiple repository support is only for the non-cached repositories, i.e. `hg` (Mercurial) and `direct-svnfs` (Subversion). You can happily mix ''both'' types of repositories, if needed. The cache needs to be modified/extended as well, in order to accommodate multiple repositories. There are several options: 1. use the cache as it is, merging all the repositories in a kind of virtual repository; the first component of the path would be the name of the repository. 2. use a separate pair of tables for each repository 3. use a dedicated db to cache each repository Option 1. seems the best way to go. Its efficiency depends mainly about how the new cache will be implemented. If we go with path ids, then using one table would be practical, I think. See also: #2086, trac-dev:340 and, more recently, [googlegroups:trac-users:14ca95377e4a53b5 this mail] where I explain how TracLinks will support multiple repositories. Another important interdependency which comes to mind is the support for multiple projects in a single environment (see this [TracMultipleProjects/SingleEnvironment#ProposedImplementation proposal]). In this scenario, each project would have one or more repositories. Those repositories could eventually be ''shared'' between projects. Take the following example: - Project A - repository /srv/svn/repo1 (trunk, branches, etc.) - Project B - repository /srv/svn/repo1 (trunk, branches, etc.) - repository /srv/svn/repo2 (trunk, branches, etc.) Within a wiki page of project A, `[123]` or `source:trunk/` would have the usual 1-to-1 meaning. The same resources, referenced from within a page belonging to project B could be accessed using InterTrac links: `[A123]` or `A:source:trunk/`. Now within project B, referring to `[123]` or `source:trunk/` would be ambiguous, unless a ''default'' repository would be specified (say /srv/svn/repo2). But in general, ''path restriction'' should be used to properly identify the resource: `[123/repo1]`, `source:repo1/trunk/` and `[123/repo2]`, `source:repo2/trunk/`. ''How about `[123@repo2]`, `source:@repo2/trunk` and let `[source:trunk]` go to the default repository of the project so that when a new repository is added to the project, all existing links won't break?''[[br]]''-- Kenneth Xu'' The ''only'' problem with this approach would be to risk some confusion if a repository name is also used as a toplevel folder name of some other repository in the same project. I'm don't think it's a showstopper though, as: - this shouldn't happen often in the first place - if it happens nevertheless, a simple disambiguation rule could be adopted, like always consider that if the first element in the path restriction corresponds to a repository name when multiple repositories are present, then it's used as a repository selector. On the data model level, the cache for /srv/svn/repo1 will be shared for projects A and B. We simply need an additional relation table, pairing projects with repositories. == Support for Mercurial-like Version Control System == === Basic Level === * DONE: * support for non-numerical changesets (start with hexadecimal digit support) * support for extra changeset properties * basic infrastructure in trunk * support for SVN: see #2545 Those are the minimal changes needed so that the TracMercurial plugin can work at all. === Advanced Level === * TracRevisionLog should show the branches (a la [http://www.flickr.com/photos/search/tags:mercurial%2Chgk/tagmode:all/ hgk]). See also #1492. * DONE: * Support for arbitrary changeset names (e.g. `[tip]` or `[head]`) * Support for direct jump to a tag or a branch. Done on the branch (r3017); re-done for 0.11 (now in trunk) === Support for Big Repositories === This means extending cache support. Support for multiple repositories would also require some changes to the caching anyway. This is material for Trac [milestone:0.12]... == New Repository Cache == I think I've come up with a new caching scheme that would be able to handle this. The idea is to replicate the tree changes information that svn stores. This should also work for Mercurial or other backends. The `node_changes` table could even be kept as it is, I think. The main difference would be that we should also add the paths for files and folders that were not modified themselves, but happen to be in the same folder as one of the file or folder that has been modified. That way, we could implement: - `Repository.get_node(path, rev)` using the cached information only, which would be a dramatic improvement for Mercurial, which has no information about the folder themselves. - Likewise, `Repository.get_path_history` could also be implemented in a generic and efficient way using that caching scheme. - The next/prev history navigation between revisions (or rather, their extended children/parents versions) could also be implemented on top of the cache. - Probably `Node.get_history` as well, not to mention the possibility to find out the ''copy_to'' information (#1445). - `Repository.get_changes(from,to)` should also be implemented using the cached information in the `revision` table (that would solve the #2353 issue). Example: {{{ (1) trunk/ (2) trunk/ (3) trunk/ (4) trunk/ (5) trunk/ dir1/ dir1/ dir1/ ... ... dir2/ dir2/ dir2/ tags/ tags/ README README* README v1/ (copied from trunk) dir3/ dir3/ A A* B B }}} Would result in: || '''rev''' || '''path''' || '''node_type''' || '''change_type''' || '''base_path''' || '''base_rev''' || || 1 || trunk || D || A || || -1 || || 1 || trunk/dir1 || D || A || || -1 || || 1 || trunk/dir2 || D || A || || -1 || || 1 || trunk/README || F || A || || -1 || || || || || || || || || 2 || trunk || D || ''(i)'' || ''(ii)'' || 2 ''(iii)'' || || 2 || trunk/dir1 || D || || || 1 || || 2 || trunk/dir2 || D || || || 1 || || 2 || trunk/README || F || E || || 1 || || 2 || trunk/dir3 || D || A || || -1 || || 2 || trunk/dir3/A || F || A || || -1 || || 2 || trunk/dir3/B || F || A || || -1 || || || || || || || || || 3 || trunk || D || || || 3 || || 3 || trunk/dir1 || D || || || 1 || || 3 || trunk/dir2 || D || || || 1 || || 3 || trunk/README || F || || || 2 || || 3 || trunk/dir3 || D || || || 3 || || 3 || trunk/dir3/A || F || E || || 2 || || 3 || trunk/dir3/B || F || || || 2 || || || || || || || || || 4 || trunk || D || || || 3 || || 4 || tags || D || A || || -1 || || || || || || || || || 5 || trunk || D || || || 3 || || 5 || tags || D || || || 5 || || 5 || tags/v1 || D || C || trunk || 3 || Notes: i. ''change_type'' will be empty when the path didn't actually change in that revision, but is simply included in the cache for the sake of `get_nodes` ii. ''base_path'' should be left empty when it doesn't change, even for regular edits. This will save some space. iii. ''base_rev'' will point to the last changed rev for that path, i.e. the latest revision in which its ''change_type'' was not null. The paths could maybe be represented by hashes of their dirname, and only the filename part would be in clear text (#3676?). Mercurial would need to store additional information per node, in particular the file size. Using Mercurial:RevlogNG, that information would be cheap to get, but revlogng is not yet widely used. -- ''update: source:plugins/0.11/mercurial-plugin uses the new API'' The most flexible approach for storing extra node fields is certainly to let each backend create and maintain an additional table, e.g. `node_changes_hg` (see also #2733). In addition, there will most certainly be the need for a kind of `revision_link` table in the general case, listing the prev/next relations between revisions. Of course, a backend which only needs a sequential ordering for its revisions should be able to bypass that table. ---- See also: the original Trac-Dev:161 thread, [googlegroups:trac-dev:f32afb39eb3bfd87 improving bazaar support] on Trac-Dev