Ticket #4586 (closed defect: fixed)
Too many concurrent resync attempts
| Reported by: | anonymous | Owned by: | cboos |
|---|---|---|---|
| Priority: | highest | Milestone: | 0.10.4 |
| Component: | version control | Version: | 0.10.3 |
| Severity: | major | Keywords: | cache resync |
| Cc: |
Description (last modified by cboos) (diff)
Traceback (most recent call last):
File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 387, in dispatch_request
dispatcher.dispatch(req)
File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 191, in dispatch
chosen_handler = self._pre_process_request(req, chosen_handler)
File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 263, in _pre_process_request
chosen_handler = f.pre_process_request(req, chosen_handler)
File "/usr/lib/python2.3/site-packages/trac/versioncontrol/api.py", line 73, in pre_process_request
self.get_repository(req.authname) # triggers a sync if applicable
File "/usr/lib/python2.3/site-packages/trac/versioncontrol/api.py", line 101, in get_repository
repos = self._connector.get_repository(rtype, rdir, authname)
File "/usr/lib/python2.3/site-packages/trac/versioncontrol/svn_fs.py", line 260, in get_repository
crepos = CachedRepository(self.env.get_db_cnx(), repos, None, self.log)
File "/usr/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 34, in __init__
self.sync()
File "/usr/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 100, in sync
(str(current_rev), path, kind, action,
File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 50, in execute
return self.cursor.execute(sql_escape_percent(sql), args)
File "/usr/lib/python2.3/site-packages/trac/db/sqlite_backend.py", line 56, in execute
args or [])
File "/usr/lib/python2.3/site-packages/trac/db/sqlite_backend.py", line 48, in _rollback_on_error
return function(self, *args, **kwargs)
IntegrityError: columns rev, path, change_type are not unique
Attachments
Change History
comment:1 Changed 4 years ago by cboos
- Description modified (diff)
- Component changed from general to version control
- Priority changed from normal to high
- Owner changed from jonas to cboos
- Milestone set to 0.10.4
- Keywords cache added
comment:3 Changed 4 years ago by cboos
- Status changed from new to assigned
4600 was marked as duplicate (the error message is slightly different - a classical "database is locked" one, but I think it's the same issue).
I think it's time to do something about it...
First, a "quick fix", simply ignoring the exception. Then, as time permits, a more involved fix which would try to do the sync only if no other process or thread has already started the sync.
comment:5 Changed 4 years ago by cboos
- Keywords resync added
#3831 was already filed for this issue, but I forgot about it ;) I marked it as duplicate of this one because the "solution" is explained here with a few more details.
comment:6 Changed 4 years ago by cboos
- Priority changed from high to highest
- Severity changed from normal to major
#3369 and other reports about slowness at startup might be related to the sync() attempts.
comment:7 Changed 4 years ago by cboos
- Summary changed from Python Traceback to Too many concurrent resync attempts
Made the topic of this ticket clearer.
Changed 4 years ago by cboos
- attachment sync_using_youngest_rev_metadata-r4871.diff added
Reworked CachedRepository.sync in order to make it faster and more robust in case of concurrent resync attempts.
comment:8 Changed 4 years ago by cboos
I've written attachment:sync_using_youngest_rev_metadata-r4871.diff which should hopefully be enough to solve this issue (and #3837 + #4043 as well).
- the system table is used to store the youngest_rev cache metadata. This value corresponds to the last changeset which is completely usable. Any revision or node_change information in the database with a rev higher than that is potentially belonging to a sync in progress.
- when the metadata youngest_rev doesn't match the youngest rev given by the Subversion API, we attempt a sync, as before
- but now we try hard to have at most one thread or process attempting this resync, by using a two phase check:
- first the obvious check: if the rev about to be synced is already in the revision table, then someone else was faster than us at doing the sync, let it finish and proceed with the rest of the request with the last "stable" youngest rev (the one from the metadata)
- if no one has yet written in the revision table, we try to do so, with a insert/commit sequence. Due to the primary key on that table, this will hopefully succeed for at most 1 thread or process (I say hopefully here because of the weirdness noted in #4043...); those who failed at step 2. proceed the same way as those having failed at step 1.
- at that point, we're reasonably sure that only one thread or process is doing the sync, so we proceed with the node_change table and continue with all the changesets between the cached youngest_rev and the actual youngest rev;
- finally, we update the youngest_rev metadata
So it looks pretty robust this way, but in practice I've rather seen with pysqlite threads waiting on the one which is doing the resync, so that in the end all requests managed to get the latest view... which is also OK.
Testing and feedback appreciated!
Final notes about a few other changes needed:
- the trac.web changes were needed so that an early error gets propagated to the user (like the TracError raised in sync() called by the pre_request_filter(), warning about the need for an trac-admin upgrade)
- yes, btw, currently a trac-admin upgrade is needed, but it is a "soft" upgrade: the schema is not modified, only the youngest_rev metadata key gets written using the now legacy Repository.youngest_rev_in_cache method. This will eventually be done automatically in a later version of the patch.
Changed 4 years ago by cboos
- attachment sync_using_youngest_rev_metadata-r4871.2.diff added
Take 2': don't require an environment upgrade (do it transparently) and don't substitute the db argument with an env argument in the CachedRepository? constructor (that will be a separate patch). The trac.web changes are also stripped out. (repost)
Changed 4 years ago by cboos
- attachment sync_using_youngest_rev_metadata-r4871.3.diff added
Take 3: fix an error with the log.warning call (spotted by Tim) and make sure that self.youngest is a normalized rev (see #4830)
comment:9 follow-up: ↓ 10 Changed 3 years ago by ThurnerRupert
you know how far the following is usable?
- http://www.python.org/workshops/1997-10/proceedings/atom/atom.html
- http://www.python.org/dev/peps/pep-0343/, especially examle 1, locking
comment:10 in reply to: ↑ 9 Changed 3 years ago by mgood
Replying to ThurnerRupert:
you know how far the following is usable?
- http://www.python.org/workshops/1997-10/proceedings/atom/atom.html
- http://www.python.org/dev/peps/pep-0343/, especially examle 1, locking
Well, we already use locking in places around Trac. The "with" statement is some nice syntactic sugar added in Python 2.5, but you can do the exact same thing with a try/finally block. However, I believe this problem requires syncronizing across processes, but Lock objects only help with synchronization across threads in a single process.
comment:11 Changed 3 years ago by cboos
- Status changed from assigned to closed
- Resolution set to fixed
- Milestone changed from 0.10.5 to 0.10.4
Even more robust version of the patch checked in as r4965 (trunk) and r4966 (0.10-stable).
In contrast with the above patches, the final version doesn't require a trac-admin upgrade. Also, the commit is now done after each revision has been synced "fully", which is slightly less efficient, but more robust.
comment:12 follow-up: ↓ 13 Changed 3 years ago by anonymous
I just got this error, upgraded to 0.10.3.1, restarted my web server, and I still get the error. Do I have to do something else to make it go away?
comment:13 in reply to: ↑ 12 Changed 3 years ago by mgood
Replying to anonymous:
I just got this error, upgraded to 0.10.3.1, restarted my web server, and I still get the error. Do I have to do something else to make it go away?
The changes for this will be in 0.10.4, they are not included in 0.10.3.1. You can use the code from the 0.10-stable branch in the meantime if you want.
comment:14 follow-up: ↓ 15 Changed 3 years ago by anonymous
How to solve the error? My Trac was running fine before and it should be possible to reactivate it. How can I clear the cache? I think the error should be away then.
comment:15 in reply to: ↑ 14 Changed 3 years ago by j.mccune@…
Replying to anonymous:
How to solve the error? My Trac was running fine before and it should be possible to reactivate it. How can I clear the cache? I think the error should be away then.
To correct the issue:
trac-admin /path/to/trac/project/directory resync
comment:16 Changed 2 years ago by kasper.souren@…
We had to resync twice to bring back our Trac. The first time it complained about a database lock-up, but the second time it went fine.



This kind of error should be expected, however we should simply catch it and proceed normally, as this simply means another process or thread has synced the cache in the meantime.
A better solution would be to do the sync in a kind of "atomic" way, though that's not that easy.
See related #4043.