Edgewall Software
Modify

Opened 11 years ago

Closed 11 years ago

Last modified 10 years ago

#4586 closed defect (fixed)

Too many concurrent resync attempts

Reported by: anonymous Owned by: Christian Boos
Priority: highest Milestone: 0.10.4
Component: version control Version: 0.10.3
Severity: major Keywords: cache resync
Cc:
Release Notes:
API Changes:

Description (last modified by Christian Boos)

Traceback (most recent call last):
  File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 387, in dispatch_request
    dispatcher.dispatch(req)
  File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 191, in dispatch
    chosen_handler = self._pre_process_request(req, chosen_handler)
  File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 263, in _pre_process_request
    chosen_handler = f.pre_process_request(req, chosen_handler)
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/api.py", line 73, in pre_process_request
    self.get_repository(req.authname) # triggers a sync if applicable
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/api.py", line 101, in get_repository
    repos = self._connector.get_repository(rtype, rdir, authname)
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/svn_fs.py", line 260, in get_repository
    crepos = CachedRepository(self.env.get_db_cnx(), repos, None, self.log)
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 34, in __init__
    self.sync()
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 100, in sync
    (str(current_rev), path, kind, action,
  File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 50, in execute
    return self.cursor.execute(sql_escape_percent(sql), args)
  File "/usr/lib/python2.3/site-packages/trac/db/sqlite_backend.py", line 56, in execute
    args or [])
  File "/usr/lib/python2.3/site-packages/trac/db/sqlite_backend.py", line 48, in _rollback_on_error
    return function(self, *args, **kwargs)
IntegrityError: columns rev, path, change_type are not unique

Attachments (3)

sync_using_youngest_rev_metadata-r4871.diff (22.2 KB ) - added by Christian Boos 11 years ago.
Reworked CachedRepository.sync in order to make it faster and more robust in case of concurrent resync attempts.
sync_using_youngest_rev_metadata-r4871.2.diff (13.8 KB ) - added by Christian Boos 11 years ago.
Take 2': don't require an environment upgrade (do it transparently) and don't substitute the db argument with an env argument in the CachedRepository? constructor (that will be a separate patch). The trac.web changes are also stripped out. (repost)
sync_using_youngest_rev_metadata-r4871.3.diff (14.0 KB ) - added by Christian Boos 11 years ago.
Take 3: fix an error with the log.warning call (spotted by Tim) and make sure that self.youngest is a normalized rev (see #4830)

Download all attachments as: .zip

Change History (19)

comment:1 Changed 11 years ago by Christian Boos

Component: generalversion control
Description: modified (diff)
Keywords: cache added
Milestone: 0.10.4
Owner: changed from Jonas Borgström to Christian Boos
Priority: normalhigh

This kind of error should be expected, however we should simply catch it and proceed normally, as this simply means another process or thread has synced the cache in the meantime.

A better solution would be to do the sync in a kind of "atomic" way, though that's not that easy.

See related #4043.

comment:2 Changed 11 years ago by Christian Boos

#4592 was marked as duplicate.

comment:3 Changed 11 years ago by Christian Boos

Status: newassigned

4600 was marked as duplicate (the error message is slightly different - a classical "database is locked" one, but I think it's the same issue).

I think it's time to do something about it…

First, a "quick fix", simply ignoring the exception. Then, as time permits, a more involved fix which would try to do the sync only if no other process or thread has already started the sync.

comment:4 Changed 11 years ago by Christian Boos

Quick fix added in r4662 (trunk) and r4664 (0.10-stable). This should already improve the situation for those having the problem. Please test.

I'll try to come up with a patch to improve the situation further, by avoiding the sync if one can detect that it's already in progress.

comment:5 Changed 11 years ago by Christian Boos

Keywords: resync added

#3831 was already filed for this issue, but I forgot about it ;) I marked it as duplicate of this one because the "solution" is explained here with a few more details.

comment:6 Changed 11 years ago by Christian Boos

Priority: highhighest
Severity: normalmajor

#3369 and other reports about slowness at startup might be related to the sync() attempts.

comment:7 Changed 11 years ago by Christian Boos

Summary: Python TracebackToo many concurrent resync attempts

Made the topic of this ticket clearer.

Changed 11 years ago by Christian Boos

Reworked CachedRepository.sync in order to make it faster and more robust in case of concurrent resync attempts.

comment:8 Changed 11 years ago by Christian Boos

I've written attachment:sync_using_youngest_rev_metadata-r4871.diff which should hopefully be enough to solve this issue (and #3837 + #4043 as well).

  • the system table is used to store the youngest_rev cache metadata. This value corresponds to the last changeset which is completely usable. Any revision or node_change information in the database with a rev higher than that is potentially belonging to a sync in progress.
  • when the metadata youngest_rev doesn't match the youngest rev given by the Subversion API, we attempt a sync, as before
  • but now we try hard to have at most one thread or process attempting this resync, by using a two phase check:
    1. first the obvious check: if the rev about to be synced is already in the revision table, then someone else was faster than us at doing the sync, let it finish and proceed with the rest of the request with the last "stable" youngest rev (the one from the metadata)
    2. if no one has yet written in the revision table, we try to do so, with a insert/commit sequence. Due to the primary key on that table, this will hopefully succeed for at most 1 thread or process (I say hopefully here because of the weirdness noted in #4043…); those who failed at step 2. proceed the same way as those having failed at step 1.
    3. at that point, we're reasonably sure that only one thread or process is doing the sync, so we proceed with the node_change table and continue with all the changesets between the cached youngest_rev and the actual youngest rev;
    4. finally, we update the youngest_rev metadata

So it looks pretty robust this way, but in practice I've rather seen with pysqlite threads waiting on the one which is doing the resync, so that in the end all requests managed to get the latest view… which is also OK.

Testing and feedback appreciated!

Final notes about a few other changes needed:

  • the trac.web changes were needed so that an early error gets propagated to the user (like the TracError raised in sync() called by the pre_request_filter(), warning about the need for an trac-admin upgrade)
  • yes, btw, currently a trac-admin upgrade is needed, but it is a "soft" upgrade: the schema is not modified, only the youngest_rev metadata key gets written using the now legacy Repository.youngest_rev_in_cache method. This will eventually be done automatically in a later version of the patch.

Changed 11 years ago by Christian Boos

Take 2': don't require an environment upgrade (do it transparently) and don't substitute the db argument with an env argument in the CachedRepository? constructor (that will be a separate patch). The trac.web changes are also stripped out. (repost)

Changed 11 years ago by Christian Boos

Take 3: fix an error with the log.warning call (spotted by Tim) and make sure that self.youngest is a normalized rev (see #4830)

comment:9 Changed 11 years ago by ThurnerRupert

you know how far the following is usable?

comment:10 in reply to:  9 Changed 11 years ago by Matthew Good

Replying to ThurnerRupert:

you know how far the following is usable?

Well, we already use locking in places around Trac. The "with" statement is some nice syntactic sugar added in Python 2.5, but you can do the exact same thing with a try/finally block. However, I believe this problem requires syncronizing across processes, but Lock objects only help with synchronization across threads in a single process.

comment:11 Changed 11 years ago by Christian Boos

Milestone: 0.10.50.10.4
Resolution: fixed
Status: assignedclosed

Even more robust version of the patch checked in as r4965 (trunk) and r4966 (0.10-stable).

In contrast with the above patches, the final version doesn't require a trac-admin upgrade. Also, the commit is now done after each revision has been synced "fully", which is slightly less efficient, but more robust.

comment:12 Changed 11 years ago by anonymous

I just got this error, upgraded to 0.10.3.1, restarted my web server, and I still get the error. Do I have to do something else to make it go away?

comment:13 in reply to:  12 Changed 11 years ago by Matthew Good

Replying to anonymous:

I just got this error, upgraded to 0.10.3.1, restarted my web server, and I still get the error. Do I have to do something else to make it go away?

The changes for this will be in 0.10.4, they are not included in 0.10.3.1. You can use the code from the 0.10-stable branch in the meantime if you want.

comment:14 Changed 11 years ago by anonymous

How to solve the error? My Trac was running fine before and it should be possible to reactivate it. How can I clear the cache? I think the error should be away then.

comment:15 in reply to:  14 Changed 10 years ago by j.mccune@…

Replying to anonymous:

How to solve the error? My Trac was running fine before and it should be possible to reactivate it. How can I clear the cache? I think the error should be away then.

To correct the issue:

trac-admin /path/to/trac/project/directory resync

comment:16 Changed 10 years ago by kasper.souren@…

We had to resync twice to bring back our Trac. The first time it complained about a database lock-up, but the second time it went fine.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted.
to The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .

 
Note: See TracTickets for help on using tickets.