#4586 closed defect (fixed)
Too many concurrent resync attempts
Reported by: | anonymous | Owned by: | Christian Boos |
---|---|---|---|
Priority: | highest | Milestone: | 0.10.4 |
Component: | version control | Version: | 0.10.3 |
Severity: | major | Keywords: | cache resync |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description (last modified by )
Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 387, in dispatch_request dispatcher.dispatch(req) File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 191, in dispatch chosen_handler = self._pre_process_request(req, chosen_handler) File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 263, in _pre_process_request chosen_handler = f.pre_process_request(req, chosen_handler) File "/usr/lib/python2.3/site-packages/trac/versioncontrol/api.py", line 73, in pre_process_request self.get_repository(req.authname) # triggers a sync if applicable File "/usr/lib/python2.3/site-packages/trac/versioncontrol/api.py", line 101, in get_repository repos = self._connector.get_repository(rtype, rdir, authname) File "/usr/lib/python2.3/site-packages/trac/versioncontrol/svn_fs.py", line 260, in get_repository crepos = CachedRepository(self.env.get_db_cnx(), repos, None, self.log) File "/usr/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 34, in __init__ self.sync() File "/usr/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 100, in sync (str(current_rev), path, kind, action, File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 50, in execute return self.cursor.execute(sql_escape_percent(sql), args) File "/usr/lib/python2.3/site-packages/trac/db/sqlite_backend.py", line 56, in execute args or []) File "/usr/lib/python2.3/site-packages/trac/db/sqlite_backend.py", line 48, in _rollback_on_error return function(self, *args, **kwargs) IntegrityError: columns rev, path, change_type are not unique
Attachments (3)
Change History (19)
comment:1 by , 18 years ago
Component: | general → version control |
---|---|
Description: | modified (diff) |
Keywords: | cache added |
Milestone: | → 0.10.4 |
Owner: | changed from | to
Priority: | normal → high |
comment:3 by , 18 years ago
Status: | new → assigned |
---|
4600 was marked as duplicate (the error message is slightly different - a classical "database is locked" one, but I think it's the same issue).
I think it's time to do something about it…
First, a "quick fix", simply ignoring the exception. Then, as time permits, a more involved fix which would try to do the sync only if no other process or thread has already started the sync.
comment:4 by , 18 years ago
comment:5 by , 18 years ago
Keywords: | resync added |
---|
#3831 was already filed for this issue, but I forgot about it ;) I marked it as duplicate of this one because the "solution" is explained here with a few more details.
comment:6 by , 18 years ago
Priority: | high → highest |
---|---|
Severity: | normal → major |
#3369 and other reports about slowness at startup might be related to the sync() attempts.
comment:7 by , 18 years ago
Summary: | Python Traceback → Too many concurrent resync attempts |
---|
Made the topic of this ticket clearer.
by , 18 years ago
Attachment: | sync_using_youngest_rev_metadata-r4871.diff added |
---|
Reworked CachedRepository.sync
in order to make it faster and more robust in case of concurrent resync attempts.
comment:8 by , 18 years ago
I've written attachment:sync_using_youngest_rev_metadata-r4871.diff which should hopefully be enough to solve this issue (and #3837 + #4043 as well).
- the
system
table is used to store theyoungest_rev
cache metadata. This value corresponds to the last changeset which is completely usable. Any revision or node_change information in the database with arev
higher than that is potentially belonging to a sync in progress. - when the metadata
youngest_rev
doesn't match the youngest rev given by the Subversion API, we attempt a sync, as before - but now we try hard to have at most one thread or process attempting this resync, by using a two phase check:
- first the obvious check: if the
rev
about to be synced is already in therevision
table, then someone else was faster than us at doing the sync, let it finish and proceed with the rest of the request with the last "stable" youngest rev (the one from the metadata) - if no one has yet written in the
revision
table, we try to do so, with a insert/commit sequence. Due to the primary key on that table, this will hopefully succeed for at most 1 thread or process (I say hopefully here because of the weirdness noted in #4043…); those who failed at step 2. proceed the same way as those having failed at step 1. - at that point, we're reasonably sure that only one thread or process is doing the sync, so we proceed with the node_change table and continue with all the changesets between the cached
youngest_rev
and the actual youngest rev; - finally, we update the
youngest_rev
metadata
- first the obvious check: if the
So it looks pretty robust this way, but in practice I've rather seen with pysqlite threads waiting on the one which is doing the resync, so that in the end all requests managed to get the latest view… which is also OK.
Testing and feedback appreciated!
Final notes about a few other changes needed:
- the
trac.web
changes were needed so that an early error gets propagated to the user (like the TracError raised in sync() called by the pre_request_filter(), warning about the need for an trac-admin upgrade) - yes, btw, currently a trac-admin upgrade is needed, but it is a "soft" upgrade: the schema is not modified, only the
youngest_rev
metadata key gets written using the now legacyRepository.youngest_rev_in_cache
method. This will eventually be done automatically in a later version of the patch.
by , 18 years ago
Attachment: | sync_using_youngest_rev_metadata-r4871.2.diff added |
---|
Take 2': don't require an environment upgrade (do it transparently) and don't substitute the db argument with an env argument in the CachedRepository? constructor (that will be a separate patch). The trac.web changes are also stripped out. (repost)
by , 18 years ago
Attachment: | sync_using_youngest_rev_metadata-r4871.3.diff added |
---|
Take 3: fix an error with the log.warning call (spotted by Tim) and make sure that self.youngest is a normalized rev (see #4830)
follow-up: 10 comment:9 by , 18 years ago
you know how far the following is usable?
- http://www.python.org/workshops/1997-10/proceedings/atom/atom.html
- http://www.python.org/dev/peps/pep-0343/, especially examle 1, locking
comment:10 by , 18 years ago
Replying to ThurnerRupert:
you know how far the following is usable?
- http://www.python.org/workshops/1997-10/proceedings/atom/atom.html
- http://www.python.org/dev/peps/pep-0343/, especially examle 1, locking
Well, we already use locking in places around Trac. The "with" statement is some nice syntactic sugar added in Python 2.5, but you can do the exact same thing with a try/finally block. However, I believe this problem requires syncronizing across processes, but Lock objects only help with synchronization across threads in a single process.
comment:11 by , 18 years ago
Milestone: | 0.10.5 → 0.10.4 |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
Even more robust version of the patch checked in as r4965 (trunk) and r4966 (0.10-stable).
In contrast with the above patches, the final version doesn't require a trac-admin upgrade
. Also, the commit is now done after each revision has been synced "fully", which is slightly less efficient, but more robust.
follow-up: 13 comment:12 by , 18 years ago
I just got this error, upgraded to 0.10.3.1, restarted my web server, and I still get the error. Do I have to do something else to make it go away?
comment:13 by , 18 years ago
Replying to anonymous:
I just got this error, upgraded to 0.10.3.1, restarted my web server, and I still get the error. Do I have to do something else to make it go away?
The changes for this will be in 0.10.4, they are not included in 0.10.3.1. You can use the code from the 0.10-stable branch in the meantime if you want.
follow-up: 15 comment:14 by , 18 years ago
How to solve the error? My Trac was running fine before and it should be possible to reactivate it. How can I clear the cache? I think the error should be away then.
comment:15 by , 17 years ago
Replying to anonymous:
How to solve the error? My Trac was running fine before and it should be possible to reactivate it. How can I clear the cache? I think the error should be away then.
To correct the issue:
trac-admin /path/to/trac/project/directory resync
comment:16 by , 17 years ago
We had to resync twice to bring back our Trac. The first time it complained about a database lock-up, but the second time it went fine.
This kind of error should be expected, however we should simply catch it and proceed normally, as this simply means another process or thread has synced the cache in the meantime.
A better solution would be to do the sync in a kind of "atomic" way, though that's not that easy.
See related #4043.