#7723 closed enhancement (fixed)
Make repository cache work on multirepos branch
Reported by: | Remy Blank | Owned by: | Remy Blank |
---|---|---|---|
Priority: | normal | Milestone: | 0.12-multirepos |
Component: | version control | Version: | |
Severity: | major | Keywords: | multirepos |
Cc: | Christian Boos, gwk.rko@…, mike@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
This ticket tracks the progress in making the repository cache work in the multirepos branch (#2086), as discussed in MultipleRepositorySupport/Cache.
Attachments (20)
Change History (77)
by , 16 years ago
Attachment: | 7723-multirepos-cache-r7590.patch added |
---|
comment:1 by , 16 years ago
The patch above implements the simplest version of the multi-repository cache:
- Added a
repository
table with a generic structure similar tosystem
for holdingrepository_dir
andyoungest_rev
for every repository. - Added an
id
column torevision
andnode_change
to contain the repository key. - Updated
CachedRepository
andtrac-admin resync
accordingly. - Added a DB upgrade script.
This is probably the simplest solution to get the cache working again, with the minor drawback that data duplication is possible with scoped repositories.
The cache is currently still synced on every request. But I'd like to get the cache structure right before I tackle the next problem.
follow-up: 3 comment:2 by , 16 years ago
I just read briefly the patch, it looks good, I'll test it at some later point.
There's a possible problem with your last, unrelated, change in that patch:
- ngettext('Changeset', 'Changesets', single and 1 or 2) + ngettext('Changeset ', 'Changesets ', single and 1 or 2)
I'm not entirely sure, but I think that at some point during the extraction the trailing spaces are stripped, that's why I did it the other way round. Would be worth checking.
follow-up: 4 comment:3 by , 16 years ago
Status: | new → assigned |
---|
Replying to cboos:
I'm not entirely sure, but I think that at some point during the extraction the trailing spaces are stripped, that's why I did it the other way round. Would be worth checking.
At least 0.11-stable and trunk have the space in the 'Changeset '
string. If you put the space into the <em>
, the dotted underline below "pseudo-links" in the timeline continues below the space, which looks strange. That's why I made the change. Looking at the generated HTML, the space still seems to be there.
comment:4 by , 16 years ago
Replying to rblank:
If you put the space into the
<em>
, the dotted underline below "pseudo-links" in the timeline continues below the space, which looks strange.
Ok
… Looking at the generated HTML, the space still seems to be there.
Well, I was referring to their extraction using Babel (python setup.py extract_messages
), but now I verified and at least with 0.9.4, the leading and trailing spaces are correctly preserved (I was probably wrong about that stripping). So I'm fine with this change as well.
comment:5 by , 16 years ago
Cc: | added |
---|
comment:7 by , 16 years ago
Here's an updated patch with the following changes w.r.t. the previous one:
- Added autocompletion for
trac-admin resync
. - Use
(default)
as an ID for the default repository. - Added
IRepositoryChangeListener
and the corresponding change notification chain. Notification of new changesets are triggered bytrac-admin repository changeset
. - Fixed failing test cases. This involved changing the initial creation of the
youngest_rev
key in therepository
table.
I am not sure about a few points:
- Previously, the
youngest_rev
value in thesystem
table was created during DB creation. This cannot be done anymore for therepository
table, so I moved it intoCachedRepository.sync()
, in the same transaction whererepository_dir
is created. But I am not sure if this could introduce race conditions.
- Changeset notification tries to be smart about scoped repositories, so that only a single call to
trac-admin repository changeset
is needed. In particular, when notifying new changesets for a scoped repository, it will also notify all repositories having the same "base" repository. For this, I had to introduceRepository.get_base()
. I am not sure that this is the right way to go about this.
- It is also possible to specify the base repository name (e.g.
svn:{uuid}:/path/to/repos
) as the repository intrac-admin repository changeset
, but this exposes the (internal) repository name.
- The call to
RepositoryManager.notify_changesets_added()
fromconsole.py
goes through the environment, in the same way asget_repository()
. Maybe I should useRepositoryManager(env)
directly.
comment:8 by , 16 years ago
Milestone: | → 0.12 |
---|
follow-up: 10 comment:9 by , 16 years ago
As a reminder: we should do nothing when calling resync and there's no repository available (#7524).
On a related note, what do you think of having:
trac-admin <env> repository resync <repos> [rev]
Resync the specified repository.
If <repos> is not specified, do nothing.
If <repos> is'*'
, resync all repositories.
I think that as resyncing all repositories could be potentially very expensive, so better do that when specifically told to do so.
Also, moving the resync
as a sub-command of repository
seems nicer.
comment:10 by , 16 years ago
Replying to cboos:
As a reminder: we should do nothing when calling resync and there's no repository available (#7524).
I'll look into that.
I think that as resyncing all repositories could be potentially very expensive, so better do that when specifically told to do so. Also, moving the
resync
as a sub-command ofrepository
seems nicer.
Yes, that's a good idea.
On a related note as well, I was thinking of starting to work on your suggestion on trac-dev for the rest of the repository
command set:
repository add <type> <path>
: add a repositoryrepository delete <repos>
: remove a repositoryrepository rename <old> <new>
: rename a repositoryrepository alias <repos> <alias>
: create an alias for a repository
and managing repositories in the repository
table. I would also like to create an admin module allowing to perform the same operations. Should I merge this into the patch tracked here, or do you prefer a separate patch?
follow-up: 12 comment:11 by , 16 years ago
I think it will be better addressed separately.
(Ideally we'd need to have an IAdminConsoleCommandProvider extension, so that we could move all this logic into a trac/versioncontrol/admin.py module, where the command line and the web admin modules could reuse the same functionality).
A few notes on the changeset notification in the patch (in trac/versioncontrol/api.py):
289 def notify_repository(repos): 290 repos.sync() 291 for rev in revs:
Here we should probably take care of not notifying twice about a given change. Even if this should normally not happen if trac-admin repository changeset
is called from a hook, I think it would be more robust to have repos.sync()
return the list of changesets that were effectively synced and only notify for those changesets, as that method takes great care of race conditions (even if I'm not 100% confident that there's no race condition left…).
follow-up: 15 comment:12 by , 16 years ago
Replying to cboos:
I think it will be better addressed separately.
Ok, I'll open a new ticket about that. I'll keep a new patch there that should be applied on top of this one. Thank God for Mercurial and its easy branching / merging :-)
(Ideally we'd need to have an IAdminConsoleCommandProvider extension, so that we could move all this logic into a trac/versioncontrol/admin.py module, where the command line and the web admin modules could reuse the same functionality).
Hey, that's an excellent idea! I'll also open a new ticket to keep track of that.
Here we should probably take care of not notifying twice about a given change. Even if this should normally not happen if
trac-admin repository changeset
is called from a hook (…)
I'm not sure about that. New changeset notification and caching are actually completely orthogonal functionalities, and you certainly also want notification for non-cached repositories. So you probably want to keep the "last notified" information separate from the cache, probably in the repository
table as well.
But I'm not sure it's worth the trouble avoiding duplicate notification. As you say, calling the notification from the post-commit hook should already ensure no duplication occurs.
by , 16 years ago
Attachment: | 7723-multirepos-cache-3-r7684.patch added |
---|
Updated patch with better resync command
comment:13 by , 16 years ago
The patch above is updated to the current state of the multirepos branch.
- The
resync
command has been changed to work as suggested in comment:9.
- About #7524, the recent refactoring of
trac-admin
changed the way commands are provided. In this case, disabling the default repository prevents theVersionControlAdmin
component from loading, and therefore therepository
commands are not provided.
This completes the basic version described in MultipleRepositorySupport/Cache. I'm waiting for feedback and instructions about the next steps on this issue. In the meantime, I'll start on the repository management functionality in #7822.
by , 16 years ago
Attachment: | 7723-multirepos-cache-4-r7684.patch added |
---|
Update improving handling of repository aliases
comment:14 by , 16 years ago
The patch above improves handling of repository aliases, which ended up being cached separately before. It slightly modifies RepositoryManager.get_repository()
so that the same object is returned for a real repository and an alias. Therefore, repos.reponame
is always the name of the real repository.
Christian, do you think you could find some time to review this patch and give me some feedback about what must still be changed (or apply it if it is ok)?
comment:15 by , 16 years ago
Replying to rblank:
Replying to cboos:
Here we should probably take care of not notifying twice about a given change. Even if this should normally not happen if
trac-admin repository changeset
is called from a hook (…)I'm not sure about that. New changeset notification and caching are actually completely orthogonal functionalities, and you certainly also want notification for non-cached repositories. So you probably want to keep the "last notified" information separate from the cache, probably in the
repository
table as well.
Another reason why this still bothers me a bit is that I'd like this mechanism to be generic enough so that it can be extended later to other cached vcs. Take the case of Mercurial, the hook can inform you of which changesets have been added, which would be something non completely trivial to recompute on the Trac end. This is why I have the feeling that we should carry the changeset(s) information from the notify to the resync. I haven't started coding anything yet, so it's not anything more than a feeling, I might well be wrong ;-)
by , 16 years ago
Attachment: | 7723-multirepos-cache-4-r7815.patch added |
---|
Updated patch to current head of multirepos branch.
by , 16 years ago
Attachment: | 7723-multirepos-cache-4-r7829.patch added |
---|
Updated patch to current head of multirepos branch.
by , 16 years ago
Attachment: | 7723-multirepos-cache-4-r7888.patch added |
---|
Updated patch to current head of multirepos branch.
comment:16 by , 16 years ago
I've unfortunately run short of time … so here's my partial update to keep you busy for this WE ;-)
- attachment:7723-1-multirepos-cache-r7928.patch
your patch refreshed to r7928, with a commit message - attachment:7723-2-unicode-fix-r7928.patch
a little unicode related cleanup in svn_fs.py, not important - attachment:7723-3-get_youngest_rev-r7928.patch
get started on the youngest_rev problem - here we hit the classical issue we have we any cached data in Trac. Should we do config.touch as well after the resync innotify_changesets_added
?
by , 16 years ago
Attachment: | 7723-1-multirepos-cache-r7928.patch added |
---|
by , 16 years ago
Attachment: | 7723-3-get_youngest_rev-r7928.patch added |
---|
by , 16 years ago
Attachment: | 7723-2-unicode-fix-r7928.patch added |
---|
by , 16 years ago
Attachment: | 7723-multirepos-cache-5-r7928.patch added |
---|
Changed trac-admin
changeset notification command into repository notify
.
comment:17 by , 16 years ago
The updated patch above implements the following changes:
- Integrated the unicode fix in attachment:7723-2-unicode-fix-r7928.patch
- Changed the changeset notification
trac-admin
command fromrepository changeset
intorepository notify
, and added a parameter for the event type. This allowed making the notification code more modular, and to easily add the the eventchangeset_modified
(in addition to the already existingchangeset_added
), to be used e.g. in a SVNpost-revprop-change
hook.
The changeset_modified
event required changing Repository.sync_changeset()
to be a no-operation instead of raising a NotImplementedError
. This is reasonable IMO, as Repository.sync()
already is a no-operation.
The last remaining issue is the youngest_rev
problem. I have been staring at the code for an hour now, and I still don't get the sync()
logic and the associated caching issues. There are just too many interconnected issues. So I'm afraid I will need help from you on this, Christian. I would hate breaking the seemingly fragile synchronization procedure.
comment:18 by , 16 years ago
For RepositoryCache.youngest_rev
, we cache the value if it's not already there, in a way similar to attachment:7723-3-get_youngest_rev-r7928.patch. Then, the problem is how do we invalidate this when another process do the sync? The easiest way, which we should adopt as a first step, is to simply trigger a global invalidation of the affected environments, by calling config.touch()
at the end of a sync()
. It's admittedly heavy-weight, but:
- it will work, as we use it in plenty other situations (the
RepositoryCache
is not aComponent
itself, but it is accessed through theRepositoryManager
, so as the latter gets cleared, a newRepositoryCache
will be created when needed) - we sooner or later need a better solution for this cache invalidation problem anyway, and then it'll still be time to "fix" it for the
youngest_rev
case as well.
Are you OK with this approach? We can discuss the general issue in TracDev/Proposals/CacheInvalidation.
by , 16 years ago
Attachment: | 7723-multirepos-cache-6-r7928.patch added |
---|
Invalidate youngest_rev
cache when updating revision cache.
comment:19 by , 16 years ago
The patch above implements the suggestion in comment:18 and triggers an environment reload every time Repository.sync()
updates the cache, so that youngest_rev
is re-loaded from the database.
I think we agree that this solution is kinda ugly, so I'm all the more motivated to find a good solution to the cache invalidation issue.
Except if I missed something, this patch should therefore be final and completely restore the SVN repository cache functionality in the multirepos branch.
comment:20 by , 16 years ago
Is there anything else you need from me about this issue? Or would you rather wait for CacheInvalidation to be resolved before applying the patch?
comment:21 by , 16 years ago
Sorry - as usual, simply lack of time :-(
Besides, I think 7723-3-get_youngest_rev-r7928.patch is not optimal, we should rather take the youngest_rev info from the meta-data in the repository table.
But no need to wait for a better cache invalidation scheme, we'll just fix it a the same time as the other config.touch()
when time is ripe.
I'll try to finalize and commit the patch today.
by , 16 years ago
Attachment: | Cache-metadata-1.patch added |
---|
retrieve metadata in a helper function and use it also in get_youngest_rev (applies on top of attachment:7723-multirepos-cache-6-r7928.patch)
follow-ups: 24 33 comment:23 by , 16 years ago
Main patch (attachment:7723-multirepos-cache-6-r7928.patch) committed as [7961].
A series of follow-up patches were committed as well:
- attachment:Cache-metadata-1.patch in [7962]
- [7963] warn when
repository notify reponame
can't be resolved to a real repository - [7964] initial support for using
repository notify
in the Subversion post-commit hook
Things left to do before we can close this ticket:
- migrate the trac-post-commit-hook logic into a sample-plugin (e.g. update-ticket-from-changeset.py)
- the trac-post-commit-hook can now be a shell-script calling
repository notify
(i.e. similar to trac-post-commit-hook.cmd)
comment:24 by , 16 years ago
Things left to do before we can close this ticket:
… and rewrite the cache invalidation currently based on config.touch()
to be based on the TracDev/Proposals/CacheInvalidation, of course!
comment:25 by , 16 years ago
Also, I realized that [7964] was somehow not taking the aim of get_base
into account. By giving one appropriate reponame to trac-admin, notify would have triggered the appropriate sync()
in all repositories sharing the same base.
Still, I think it's simpler to give to trac-admin
directly the REPOS information obtained within the hook, thereby requiring no knowledge about how that repository is mapped to one or many repositories objects within the environment.
So I reworked this in [7973], removing the changes to get_repository
added in [7964] and added a get_repositories_by_dir
method.
Now notify
should do the right thing (i.e. sync and notify all the repositories that are involved), whether provided with a reponame or directly with repository directory.
by , 16 years ago
Attachment: | 7723-admin-cleanup-r7985.patch added |
---|
Small cleanup and removed unwanted argument to repository resync
.
comment:26 by , 16 years ago
The patch above is a small cleanup and it removes the unwanted argument clean
from the repository resync
command (the arguments of the _do_*
methods map directly to command arguments, so clean
should not be in _do_resync()
's signature).
follow-ups: 28 29 comment:27 by , 16 years ago
Also, the test suite currently fails on the multirepos
branch, I haven't yet found out why. Current trunk is ok, though.
Do you want me to set up a build slave for the multirepos branch?
follow-up: 30 comment:28 by , 16 years ago
Replying to rblank:
Also, the test suite currently fails on the
multirepos
branch, I haven't yet found out why.
Those are the functional tests failing during repository creation, I haven't checked yet.
Do you want me to set up a build slave for the multirepos branch?
Yes, that could be useful (once the above problem is fixed).
comment:29 by , 16 years ago
comment:30 by , 16 years ago
Replying to cboos:
Replying to rblank:
Also, the test suite currently fails on the
multirepos
branch, I haven't yet found out why.Those are the functional tests failing during repository creation, I haven't checked yet.
The above problem actually happens when there is another version of Trac installed - when the functional tests code starts the python process, that one doesn't pick the code from the checkout but from the installed location - it's a bit weird, PYTHONPATH doesn't seem to be propagated.
follow-up: 32 comment:31 by , 16 years ago
Related to #8067/#7490, I think we need to add a setting to disable the synchronization check for the default repository in the RepositoryManager.preprocess_request
, so that it could be done with a post-commit hook even in this case (e.g. [trac] repository_per_request_sync = false
, defaults to true
). Anyway, the post-commit hook will be needed for getting the changeset_added notification.
By the way, another thing that occurred to me recently is the possible confusion between sync
and notify changeset_added
, in hooks. I'm worried that people could now simply call repository sync
from their hook and be under the impression that everything just works - but they'll miss the notifications.
So here's an alternative idea to the notify
commands:
repository post-commit <repo_dir_or_name> <rev>
repository post-revprop <repo_dir_or_name> <rev>
Then it becomes more obvious what to use from within a hook, rather than facing the choice between sync
and notify
… (and no, people don't read the docs :-) ).
comment:32 by , 16 years ago
Replying to cboos:
Anyway, the post-commit hook will be needed for getting the changeset_added notification.
Then why not drop the per-request synchronization completely?
Another idea could be a [trac] repository_sync_per_request
setting that contains a list of repository names that should be synced on every request. This would allow those who cannot install post-commit hooks to use multiple repositories as well. Obviously, the impact has to be documented (no changeset notifications, bad performance). The default would be (default)
.
By the way, another thing that occurred to me recently is the possible confusion between
sync
andnotify changeset_added
, in hooks. I'm worried that people could now simply callrepository sync
from their hook and be under the impression that everything just works - but they'll miss the notifications.
I'm not sure. Another option could be to drop sync
and use an optional argument to resync
for resuming synchronization.
So here's an alternative idea to the
notify
commands:
I don't like them too much. I liked the fact that all notifications were under a single command better, and if sync
is removed, there shouldn't be any confusion. Maybe notify
is not the right term, and can be confused with ticket notifications? How about using the term hook
?
repository hook changeset_added <repos> <rev> [rev ...]
repository hook changeset_modified <repos> <rev> [rev ...]
Or even:
repository hook post-commit <repos> <rev> [rev ...]
repository hook post-revprop <repos> <rev> [rev ...]
I prefer the first version, as I feel the post-*
terms sound too SVN-specific.
by , 16 years ago
Attachment: | commit_ticket_update.py added |
---|
The functionality of trac-post-commit-hook
as a plugin using IRepositoryChangeListener
.
comment:33 by , 16 years ago
Replying to cboos:
Things left to do before we can close this ticket:
- migrate the trac-post-commit-hook logic into a sample-plugin (e.g. update-ticket-from-changeset.py)
I have attached a new plugin that implements the same functionality as the trac-post-commit-hook
script, to be placed into sample-plugins
.
It Works For Me (tm) ;-)
Oh, wait, I have forgotten to include the copyright notice of the original script. Coming…
by , 16 years ago
Attachment: | commit_ticket_update.2.py added |
---|
Same script as above, but with original copyright notice.
follow-up: 35 comment:34 by , 16 years ago
Some comments:
- about the trac-admin commands, would it be possible to use another prefix than
repository
? If yes, we could have:trac-admin <env> changeset added <repos> <rev>
trac-admin <env> changeset modified <repos> <rev>
- the commit_ticket_update plugin looks fine, but I wonder if it shouldn't also act on a
changeset_modified
callback. Rationale: you enter a ticket reference to #7732 but you actually meant #7723. Typically you'd edit the log message to fix it. It would make sense to also update the newly referenced ticket in this case. Even better, it would make sense to amend the old ticket as well, but for that we should pass the old changeset as well as the new:def changeset_modified(self, repos, old_changeset, new_changeset):
follow-up: 36 comment:35 by , 16 years ago
Replying to cboos:
- about the trac-admin commands, would it be possible to use another prefix than
repository
?
Sure, everything is possible ;-) It's a bit a return to the initial idea, which was repository changeset
.
I guess a separate category changeset
makes it harder to miss when actually reading the documentation, so I'd be ok with that. Let me know your final decision, and I'll implement it.
- the commit_ticket_update plugin looks fine, but I wonder if it shouldn't also act on a
changeset_modified
callback.
That does indeed make sense, thanks for the idea. In changeset_added
, I use the time of the changeset as the ticket change time. I will have to use the current time for changeset_modified
. I guess this makes sense anyway.
This also means I can compute the difference between the ticket references in the old and new changeset data, and only update those tickets that differ.
but for that we should pass the old changeset as well as the new:
Yes, I had forgotten about that. I'll implement the augmented changeset_modified
as well.
comment:36 by , 16 years ago
Replying to rblank:
Replying to cboos: Sure, everything is possible ;-) It's a bit a return to the initial idea, which was
repository changeset
.I guess a separate category
changeset
makes it harder to miss when actually reading the documentation, so I'd be ok with that. Let me know your final decision, and I'll implement it.
OK, so let's settle on trac-admin <env> changeset added|modified <repos> <rev>
.
by , 16 years ago
Attachment: | 7723-changeset-admin-r8028.patch added |
---|
Changed trac-admin repository notify <event>
into trac-admin changeset (added|modified)
.
comment:37 by , 16 years ago
The patch above renames the repository notification commands of trac-admin
as suggested in comment:36.
BTW, do you want me to continue posting patches on this ticket for small items like this one, or should I commit directly to the branch? I'm fine with both variants.
About adding the old changeset metadata to the changeset_modified()
hook, I'm afraid this is not such a good idea after all. Indeed, as the command is called after the metadata has been updated in the repository (typically in the post-revprop hook), the only way to get the previous values is from the cache. So we have a few options:
- Add the
old_changeset
argument tochangeset_modified()
anyway, and set it toNone
if the previous data cannot be retrieved. There's currently no interface for repositories to tell if they are cached, so this would have to be detected by comparing the old and new metadata, and assuming that the old values cannot be retrieved if they are equal.
- Add a
changes
argument containing a dictionary of changed changeset attributes. It will obviously be empty if the previous values cannot be retrieved.
- Do not provide the old values.
I tend to prefer the last option, as the others will need a big fat warning that the data may or may not be present depending on the VC backend.
It still makes sense to process the changeset_modified()
event in the commit_ticket_update
plugin.
comment:38 by , 16 years ago
Severity: | normal → major |
---|
Applied attachment:7723-changeset-admin-r8028.2.patch plus adaptations for the post-commit hooks in r8070.
About changeset_modified()
, yes it's true that we can't get the old changeset for anything else than the DB cached repositories… OTOH, the Git plugin has support for caching changeset metdata and the Mercurial plugin should have it as well sometime soon.
So I think the first option among the possibilities you listed is also a viable one and we could indeed add a property telling if the changesets are cached or not (and the big fat warning that the old_changeset
might be None
).
Btw, I guess the ticket has now reached a point where we can say it's was "major" task ;-) For the work process itself - feel free to commit directly on the branch.
Summarizing what's left to be done:
- merge TracDev/Proposals/CacheInvalidation on trunk (Remy) and from there on multirepos, adapt our use of
config.touch
(Remy/Christian) - finalize the commit_ticket_update.py plugin (Remy)
- adapt the Mercurial hook (e.g. http://bitbucket.org/madssj/mercurial-trac-hook/src/tip/trachook.py) to see if it works fine with the new way of things (Christian)
- decide what we do with
[trac] repository_sync_per_request
(but that can eventually get postponed and we can create another ticket just for that - I guess that as being the main backward compatible topic, it'll gather more feedback once MultiRepos hits trunk)
comment:39 by , 16 years ago
The cache invalidation patch has been committed in [8071]. I was going to merge it to multirepos, but I'm not sure about the database version conflict. Should I combine both upgrades into a single db22.py
file, or should the VC cache upgrade be moved to a 22 → 23 upgrade?
comment:40 by , 16 years ago
Yes, it's a bit tricky, but I think the rule here should be to minimize the trouble for people using trunk. So I'd say we do 22→23 on the branch, plus code in db23.py that copes with upgrading previous MultiRepos db22.py users. When merging in trunk, we strip off that extra code.
comment:41 by , 16 years ago
I believe r8073 handles this correctly:
- environments at version 21 get upgraded to 23 the normal way
- environments previously at version 22 get upgraded to 23, in which we detect that the
repository
table already exist, in which case we call explicitly thedb22.do_upgrade
. That code can be removed before merging into trunk, as it's unlikely that at that time there will be people left with a MultiRepos specific database at version 22 (and if so, it's easy enough to fix by hand).
comment:42 by , 16 years ago
I have added the old_changeset
argument to IRepositoryChangeListener.changeset_modified()
in [8094]. The old changeset is returned by Repository.sync_changeset()
, by querying the database prior to making the change. This is backward-compatible, as the function previously returned None
, which means "data not available".
I have also added the new commit_ticket_update.py
sample plugin, and removed the old trac-post-commit-hook
. It now triggers a ticket comment when a changeset is modified, for those tickets that were not already referenced by the old changeset text. Obviously, if the old changeset data cannot be retrieved, it will add a comment to all referenced tickets.
There are a few things that I'm not entirely satisfied with the plugin:
- The time used for the ticket comment is the current time, instead of the time of the changeset, as requested in comment:3:ticket:7251. I would have preferred the changeset time, but it would have messed up threading of the comments.
- If saving of the ticket comment fails (due to concurrent editing), it just logs the error. We could retry saving, but I don't know if it's worth the effort.
Other than that, I would say that this is a good sample plugin, so I'll move on to using the new cache infrastructure for caching repository metadata.
by , 16 years ago
Attachment: | 7723-metadata-cache-r8097.patch added |
---|
Updated caching of repository metadata to new caching infrastructure.
follow-up: 44 comment:43 by , 16 years ago
The patch above is a first stab at migrating caching of the repository metadata to the new caching infrastructure. While testing, the following popped up:
- Setting
reponame
on the repository after retrieving it inRepositoryManager.get_repository()
is impractical. For example, I have to change the cache id when settingreponame
. Wouldn't it be simpler to passreponame
toIRepositoryConnector.get_repository()
, so that it can be set in theRepository
constructor?
- More importantly (not related to metadata caching),
CachedRepository
delegates some work toSubversionRepository
without checking if the revision exists in the cache. For example, if a new changeset is added to a repository, but thechangeset added
hook has not yet been called, then the revision log for the repository generates an internal error. This is due to usingSubversionNode.get_history()
to get the history of a node. The history will contain the new changeset, but it will not be possible to retrieve it from the repository.
The second issue is already present on the multirepos branch. The window of opportunity for this to happen is typically small, as notification should be triggered from the post-commit hook, but still, it's not nice to get an internal error.
I see two potential solutions:
- "Overload" more of the behavior of
SubversionRepository
inCachedRepository
, and ensure that no data for not-yet-cached revisions leaks. Seems clumsy.
- In
CachedRepository
, fall back to the underlying repository if some piece of data cannot be retrieved from the cache. Seems more robust, but will introduce some inconsistencies between different views of the data.
The second idea could even be extended further: what if we did "opportunistic" caching? That is, start with an empty cache (and no explicit "resync"), and on data retrieval, if the data is not available from the cache, get it from the repository and store it in the cache.
comment:44 by , 16 years ago
Replying to rblank:
The patch above is a first stab at migrating caching of the repository metadata to the new caching infrastructure.
I've tested it, and it seems to work fine.
- Setting
reponame
on the repository after retrieving it inRepositoryManager.get_repository()
is impractical.
The primary concern here was API compatibility. There was no compelling reason to change things, as setting .reponame afterwards was enough. Now it seems we have a real need for a change. If possible, that change should be handled with backward compatibility in mind (using arity
on the connector's get_repository method), otherwise I think it's better to postpone the incompatible change once we're on trunk.
- More importantly (not related to metadata caching),
CachedRepository
delegates some work toSubversionRepository
without checking if the revision exists in the cache. For example, if a new changeset is added to a repository, but thechangeset added
hook has not yet been called, then the revision log for the repository generates an internal error. This is due to usingSubversionNode.get_history()
to get the history of a node. The history will contain the new changeset, but it will not be possible to retrieve it from the repository.
Yes, this is problematic.
I see two potential solutions:
- "Overload" more of the behavior of
SubversionRepository
inCachedRepository
, and ensure that no data for not-yet-cached revisions leaks. Seems clumsy.
All we need to do is to ensure that we start with a proper youngest, i.e. the one in the cache. Relatively easy for Subversion, maybe be more involved for other backends (well, we have still no other cached backends, but theoretically…)
- In
CachedRepository
, fall back to the underlying repository if some piece of data cannot be retrieved from the cache. Seems more robust, but will introduce some inconsistencies between different views of the data.
Note that this might cover real issues, like the hook not working anymore for some reason.
But I think the 2 solutions can be done in complement one to each other: take care of giving a consistent view on the repo starting from the cache's youngest and be prepared to handle what falls between the cracks by delegating to the underlying repos when the cache doesn't have the information.
The second idea could even be extended further: what if we did "opportunistic" caching? That is, start with an empty cache (and no explicit "resync"), and on data retrieval, if the data is not available from the cache, get it from the repository and store it in the cache.
As we can retrieve information about a repository at anytime (i.e. we might be going through the results of a SELECT), we can't save the metadata right away. We would have to postpone that to after request processing, like we do for saving the session data modifications. I think it's an interesting option and might be doable, but certainly not trivial and without surprise. For example, it will break the get_(next|prev)_rev optimizations.
comment:45 by , 16 years ago
I have committed the patch in [8200]. I left the reponame
setting after creation as-is, as we can change that at any time later, and it's not really a big issue (yet), only a minor inconvenience.
I'll try to address the youngest
issue next.
comment:46 by , 16 years ago
Cc: | added |
---|
by , 15 years ago
Attachment: | 7723-youngest-rev-r8299.patch added |
---|
Make consistent use of youngest_rev
in CachedRepository
.
follow-up: 48 comment:47 by , 15 years ago
About youngest_rev
consistency, the patch above seems to fix the issue for me, and all tests pass. Comments? Good to go?
comment:48 by , 15 years ago
Replying to rblank:
About
youngest_rev
consistency, the patch above seems to fix the issue for me, and all tests pass. Comments? Good to go?
I've tested attachment:7723-youngest-rev-r8299.patch, which works fine for me. Ok to apply.
So what's left? From comment:38, only the last two points are still opened, the one about Mercurial shouldn't block this ticket and the [trac] repository_sync_per_request
feature.
comment:49 by , 15 years ago
I'll tackle [trac] repository_sync_per_request
next. The initial idea was that it be a list of repository names, and that only those repositories would be synced at every request, with a default of (default)
to be backward compatible. Ok with that?
comment:50 by , 15 years ago
Yes. The empty list meaning nothing gets synced at every request, which could become the default one day.
comment:52 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Added the option [trac] repository_sync_per_request
in [8397].
As far as I can tell, this concludes work on this ticket. Phew, 10 months and 52 comments… not bad!
follow-up: 54 comment:53 by , 15 years ago
I just wondered about the interaction between the CachedRepository metadata
CacheProxy and the admin operations on the repository table: on rename and remove, don't we need to invalidate them?
follow-up: 56 comment:54 by , 15 years ago
Replying to cboos:
I just wondered about the interaction between the CachedRepository
metadata
CacheProxy and the admin operations on the repository table: on rename and remove, don't we need to invalidate them?
Currently it wouldn't make a difference, as all repository operations call RepositoryManager.reload_repositories()
, which in turn calls config.touch()
, so the whole environment is reloaded anyway.
Now if reload_repositories()
was migrated to the fine-grained cache mechanism, the repository objects would be re-created, and hence the CacheProxy
object as well, but the cache itself would still contain stale data from the previous object. This might have perverse side-effects.
I'll have to think about it some more. Maybe CacheProxy
should allow clearing the associated (thread and process) cache on creation, for the use case where it's stored as an object attribute.
comment:55 by , 15 years ago
Milestone: | 0.12 → 0.12-multirepos |
---|
comment:56 by , 15 years ago
Replying to rblank:
Now if
reload_repositories()
was migrated to the fine-grained cache mechanism, the repository objects would be re-created, and hence theCacheProxy
object as well, but the cache itself would still contain stale data from the previous object. This might have perverse side-effects.
There's no easy solution to that. Clearing the cache when creating the CacheProxy
in CachedRepository.__init__()
is not an option, as repository objects are created per thread, and hence new threads would clear the cache of older threads.
The current assumption is that all caches are independent. This would not be the case anymore if e.g. reload_repositories()
switches to a fine-grained cache, and there would need to be some kind of cascading cache invalidation. Currently, config.touch()
is used for that (a kind of global cache invalidation), but we could also imagine that a cache can depend on another, and is invalidated at the same time as the latter.
Anyway, the current implementation with config.touch()
works, so I'm leaving this for now.
comment:57 by , 15 years ago
Type: | task → enhancement |
---|
First try at a simple version of the cache