Edgewall Software
Modify

Opened 11 years ago

Closed 10 years ago

Last modified 10 years ago

#11164 closed defect (fixed)

Git persistent cache not refreshed on changeset added notification from another process

Reported by: James Teh <jamie@…> Owned by: Jun Omae
Priority: normal Milestone: 1.0.2
Component: plugin/git Version: 1.0.1
Severity: normal Keywords:
Cc: Peter Suter, Jun Omae Branch:
Release Notes:

Fix be not refreshing Git repository with persistent cache if it has been changed.

API Changes:
Internal Changes:

Description

With git persistent_cache enabled and repository sync_per_request empty, the persistent cache isn't refreshed after a changeset added notification is dispatched from another process. This is problematic because changeset added notifications are normally dispatched from a Git post-receive hook, which is obviously in a different process to the process(es) running Trac.

Str:

  1. Configure a Git repository, set repository sync_per_request to empty, enable git persistent_cache and disable git cached_repository.
  2. Start an instance of tracd.
  3. Open the Trac revision log.
    • Result (correct): The latest revision is displayed at the top.
  4. Commit something to the Git repository.
  5. Open the Trac revision log.
    • Expected: The latest revision should be displayed at the top.
    • Actual: The revision displayed at the top is the same as in step 3.

Unless I'm missing something, this makes (unfortunate) sense looking at the code. Calling sync() on the repository (and thus the underlying PyGIT.Storage) would refresh the cache, but it has no way of knowing that sync() was called from another process in response to a changeset added notification.

This gets even more interesting if you enable git cached_repository. In that case, GitCachedRepository knows about the new revision - even invalidation of the cached metadata is handled correctly cross-process via the cache db table - but it runs into trouble when it gets to the underlying PyGIT.Storage instance.

I guess one way to solve this is to make GitRepository use a cached property somewhere so it knows when to resync the underlying PyGIT.Storage instance. I'm not quite sure on specifics, though. :)

Attachments (2)

11164.patch (710 bytes ) - added by James Teh <jamie@…> 11 years ago.
Patch which refreshes the persistent cache if necessary.
11164.2.patch (2.6 KB ) - added by James Teh <jamie@…> 10 years ago.
Patch which uses a cached property to ensure the persistent cache is only updated when Trac is notified of a repository change.

Download all attachments as: .zip

Change History (20)

comment:1 by Christian Boos, 11 years ago

Milestone: next-stable-1.0.x

Sorry, not much experience here on cached git repositories. PatchWelcome.

by James Teh <jamie@…>, 11 years ago

Attachment: 11164.patch added

Patch which refreshes the persistent cache if necessary.

comment:2 by James Teh <jamie@…>, 11 years ago

This patch calls sync() on the PyGIT.Storage instance when retrieving it if it has been cached persistently. This will cause one call to git to get the youngest revision for each request, but it's still a lot faster than rebuilding the entire cache if the youngest revision hasn't changed.

It might still be nicer to use a cached property, perhaps for the youngest revision, to avoid even this one short call to git on every request. I'm not quite sure how to manage that yet, though.

comment:3 by Christian Boos, 11 years ago

Milestone: next-stable-1.0.x1.0.2
Owner: set to Christian Boos
Status: newassigned

Thanks for the patch! Looks fine, AFAICT. Jun, any opinion on this?

comment:4 by Christian Boos, 11 years ago

Cc: Peter Suter added

(or Peter)

comment:5 by Jun Omae, 11 years ago

Hmm, after the patch, Trac synchronizes all git repositories on every request. That is the same to specify the repositories in [trac] repository_sync_per_request (workaround) and would make no sense to enable [git] persistent_cache when especially it has many git repositories….

I think that we should use @cached for the persistent_cache feature like CachedRepository.metadata and invalidate the caches.

Last edited 11 years ago by Jun Omae (previous) (diff)

in reply to:  5 comment:6 by James Teh <jamie@…>, 11 years ago

Replying to jomae:

Hmm, after the patch, Trac synchronizes all git repositories on every request. That is the same to specify the repositories in [trac] repository_sync_per_request

Oops. I guess it is at that. I figured it was still slightly better if cached_repository is enabled, as that way, sync() isn't being needlessly called on GitCachedRepository. However, I guess that is practically a no-op anyway if it's already been synced.

I think that we should use @cached for the persistent_cache feature like CachedRepository.metadata and invalidate the caches.

The tricky part is what property to cache, what id to give it and how to trigger a resync when that property changes. Ideally, we just want to know when sync() has been called elsewhere, but the only way I can think of to do that with a property is pretty hacky (an arbitrary counter or timestamp which just keeps increasing). Another possibility is to cache the youngest rev and call sync() when fetching it, though I'm not sure if this might cause problems.

comment:7 by Christian Boos, 10 years ago

Milestone: 1.0.2next-stable-1.0.x
Owner: Christian Boos removed
Status: assignednew

Postponing, as no patch is ready.

by James Teh <jamie@…>, 10 years ago

Attachment: 11164.2.patch added

Patch which uses a cached property to ensure the persistent cache is only updated when Trac is notified of a repository change.

comment:8 by James Teh <jamie@…>, 10 years ago

This version of the patch uses @cached to ensure that the repo is only synced if Trac was notified of a repo change; i.e. sync() was called explicitly elsewhere. I'm caching the youngest revision and calling sync() if the cache is invalidated. The cached property is only used if persistent_cache is enabled.

In order for arbitrary changesets to work where the youngest rev isn't fetched, I have to call the cached property in the constructor so the repo will be synced if the cache was invalidated. This is a bit ugly, but I can't come up with anything more elegant.

comment:9 by Jun Omae, 10 years ago

Milestone: next-stable-1.0.x1.0.2
Owner: set to Jun Omae
Status: newassigned

Thanks for the patch! It seems good. I'll try it.

comment:10 by Jun Omae, 10 years ago

I've revised the patch and fixed another issue in log:jomae.git:ticket11164.

When a branch is removed or a non-latest commit is pushed, GitRepository.sync does nothing cause it uses result of git rev-list --max-count=1 --topo-order --all.

comment:11 by James Teh <jamie@…>, 10 years ago

Thanks. Using the rev cache as the cached property is certainly a lot cleaner!

comment:12 by Jun Omae, 10 years ago

Release Notes: modified (diff)
Resolution: fixed
Status: assignedclosed

Committed in [12357-12358] into 1.0-stable and merged in [12359]. If you encounter it again, please feel free to reopen this issue. Thanks.

comment:13 by Jun Omae, 10 years ago

Cc: Jun Omae added
Owner: changed from Jun Omae to James Teh <jamie@…>

comment:14 by Jun Omae, 10 years ago

Resolution: fixed
Status: closedreopened

Oops. My changes are broken. If persistent_cache = disable and visiting browser page after something is committed, the page shows the revision before the commits. I'll rework or revert.

comment:15 by Jun Omae, 10 years ago

Owner: changed from James Teh <jamie@…> to Jun Omae
Status: reopenedassigned

comment:16 by Jun Omae, 10 years ago

I just reworked. Proposed fix with unit tests is in log:jomae.git:ticket11164.1. Also, if persistent_cache is turned off, GitRepository would ignore new revisions. The branch includes fix for it.

I'll push later.

in reply to:  16 comment:17 by Jun Omae, 10 years ago

Resolution: fixed
Status: assignedclosed

Committed in [12479:12480] and merged to trunk in [12481].

comment:18 by Jun Omae, 10 years ago

Small refactor and simplification in [13113] and merged to trunk in [13114].

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Jun Omae.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Jun Omae to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.