Edgewall Software

Ticket #4586 (closed defect: fixed)

Opened 19 months ago

Last modified 3 months ago

Too many concurrent resync attempts

Reported by: anonymous Owned by: cboos
Priority: highest Milestone: 0.10.4
Component: version control Version: 0.10.3
Severity: major Keywords: cache resync
Cc:

Description (last modified by cboos) (diff)

Traceback (most recent call last):
  File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 387, in dispatch_request
    dispatcher.dispatch(req)
  File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 191, in dispatch
    chosen_handler = self._pre_process_request(req, chosen_handler)
  File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 263, in _pre_process_request
    chosen_handler = f.pre_process_request(req, chosen_handler)
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/api.py", line 73, in pre_process_request
    self.get_repository(req.authname) # triggers a sync if applicable
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/api.py", line 101, in get_repository
    repos = self._connector.get_repository(rtype, rdir, authname)
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/svn_fs.py", line 260, in get_repository
    crepos = CachedRepository(self.env.get_db_cnx(), repos, None, self.log)
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 34, in __init__
    self.sync()
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 100, in sync
    (str(current_rev), path, kind, action,
  File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 50, in execute
    return self.cursor.execute(sql_escape_percent(sql), args)
  File "/usr/lib/python2.3/site-packages/trac/db/sqlite_backend.py", line 56, in execute
    args or [])
  File "/usr/lib/python2.3/site-packages/trac/db/sqlite_backend.py", line 48, in _rollback_on_error
    return function(self, *args, **kwargs)
IntegrityError: columns rev, path, change_type are not unique

Attachments

sync_using_youngest_rev_metadata-r4871.diff (22.2 kB) - added by cboos 18 months ago.
Reworked CachedRepository.sync in order to make it faster and more robust in case of concurrent resync attempts.
sync_using_youngest_rev_metadata-r4871.2.diff (13.8 kB) - added by cboos 18 months ago.
Take 2': don't require an environment upgrade (do it transparently) and don't substitute the db argument with an env argument in the CachedRepository? constructor (that will be a separate patch). The trac.web changes are also stripped out. (repost)
sync_using_youngest_rev_metadata-r4871.3.diff (14.0 kB) - added by cboos 18 months ago.
Take 3: fix an error with the log.warning call (spotted by Tim) and make sure that self.youngest is a normalized rev (see #4830)

Change History

  Changed 19 months ago by cboos

  • description modified (diff)
  • component changed from general to version control
  • priority changed from normal to high
  • owner changed from jonas to cboos
  • milestone set to 0.10.4
  • keywords cache added

This kind of error should be expected, however we should simply catch it and proceed normally, as this simply means another process or thread has synced the cache in the meantime.

A better solution would be to do the sync in a kind of "atomic" way, though that's not that easy.

See related #4043.

  Changed 19 months ago by cboos

#4592 was marked as duplicate.

  Changed 19 months ago by cboos

  • status changed from new to assigned

4600 was marked as duplicate (the error message is slightly different - a classical "database is locked" one, but I think it's the same issue).

I think it's time to do something about it...

First, a "quick fix", simply ignoring the exception. Then, as time permits, a more involved fix which would try to do the sync only if no other process or thread has already started the sync.

  Changed 19 months ago by cboos

Quick fix added in r4662 (trunk) and r4664 (0.10-stable). This should already improve the situation for those having the problem. Please test.

I'll try to come up with a patch to improve the situation further, by avoiding the sync if one can detect that it's already in progress.

  Changed 19 months ago by cboos

  • keywords resync added

#3831 was already filed for this issue, but I forgot about it ;) I marked it as duplicate of this one because the "solution" is explained here with a few more details.

  Changed 19 months ago by cboos

  • priority changed from high to highest
  • severity changed from normal to major

#3369 and other reports about slowness at startup might be related to the sync() attempts.

  Changed 19 months ago by cboos

  • summary changed from Python Traceback to Too many concurrent resync attempts

Made the topic of this ticket clearer.

Changed 18 months ago by cboos

Reworked CachedRepository.sync in order to make it faster and more robust in case of concurrent resync attempts.

  Changed 18 months ago by cboos

I've written attachment:sync_using_youngest_rev_metadata-r4871.diff which should hopefully be enough to solve this issue (and #3837 + #4043 as well).

  • the system table is used to store the youngest_rev cache metadata. This value corresponds to the last changeset which is completely usable. Any revision or node_change information in the database with a rev higher than that is potentially belonging to a sync in progress.
  • when the metadata youngest_rev doesn't match the youngest rev given by the Subversion API, we attempt a sync, as before
  • but now we try hard to have at most one thread or process attempting this resync, by using a two phase check:
    1. first the obvious check: if the rev about to be synced is already in the revision table, then someone else was faster than us at doing the sync, let it finish and proceed with the rest of the request with the last "stable" youngest rev (the one from the metadata)
    2. if no one has yet written in the revision table, we try to do so, with a insert/commit sequence. Due to the primary key on that table, this will hopefully succeed for at most 1 thread or process (I say hopefully here because of the weirdness noted in #4043...); those who failed at step 2. proceed the same way as those having failed at step 1.
    3. at that point, we're reasonably sure that only one thread or process is doing the sync, so we proceed with the node_change table and continue with all the changesets between the cached youngest_rev and the actual youngest rev;
    4. finally, we update the youngest_rev metadata

So it looks pretty robust this way, but in practice I've rather seen with pysqlite threads waiting on the one which is doing the resync, so that in the end all requests managed to get the latest view... which is also OK.

Testing and feedback appreciated!

Final notes about a few other changes needed:

  • the trac.web changes were needed so that an early error gets propagated to the user (like the TracError raised in sync() called by the pre_request_filter(), warning about the need for an trac-admin upgrade)
  • yes, btw, currently a trac-admin upgrade is needed, but it is a "soft" upgrade: the schema is not modified, only the youngest_rev metadata key gets written using the now legacy Repository.youngest_rev_in_cache method. This will eventually be done automatically in a later version of the patch.

Changed 18 months ago by cboos

Take 2': don't require an environment upgrade (do it transparently) and don't substitute the db argument with an env argument in the CachedRepository? constructor (that will be a separate patch). The trac.web changes are also stripped out. (repost)

Changed 18 months ago by cboos

Take 3: fix an error with the log.warning call (spotted by Tim) and make sure that self.youngest is a normalized rev (see #4830)

follow-up: ↓ 10   Changed 18 months ago by ThurnerRupert

you know how far the following is usable?

in reply to: ↑ 9   Changed 18 months ago by mgood

Replying to ThurnerRupert:

you know how far the following is usable? * http://www.python.org/workshops/1997-10/proceedings/atom/atom.html * http://www.python.org/dev/peps/pep-0343/, especially examle 1, locking

Well, we already use locking in places around Trac. The "with" statement is some nice syntactic sugar added in Python 2.5, but you can do the exact same thing with a try/finally block. However, I believe this problem requires syncronizing across processes, but Lock objects only help with synchronization across threads in a single process.

  Changed 18 months ago by cboos

  • status changed from assigned to closed
  • resolution set to fixed
  • milestone changed from 0.10.5 to 0.10.4

Even more robust version of the patch checked in as r4965 (trunk) and r4966 (0.10-stable).

In contrast with the above patches, the final version doesn't require a trac-admin upgrade. Also, the commit is now done after each revision has been synced "fully", which is slightly less efficient, but more robust.

follow-up: ↓ 13   Changed 18 months ago by anonymous

I just got this error, upgraded to 0.10.3.1, restarted my web server, and I still get the error. Do I have to do something else to make it go away?

in reply to: ↑ 12   Changed 18 months ago by mgood

Replying to anonymous:

I just got this error, upgraded to 0.10.3.1, restarted my web server, and I still get the error. Do I have to do something else to make it go away?

The changes for this will be in 0.10.4, they are not included in 0.10.3.1. You can use the code from the 0.10-stable branch in the meantime if you want.

follow-up: ↓ 15   Changed 15 months ago by anonymous

How to solve the error? My Trac was running fine before and it should be possible to reactivate it. How can I clear the cache? I think the error should be away then.

in reply to: ↑ 14   Changed 12 months ago by j.mccune@…

Replying to anonymous:

How to solve the error? My Trac was running fine before and it should be possible to reactivate it. How can I clear the cache? I think the error should be away then.

To correct the issue:

trac-admin /path/to/trac/project/directory resync

  Changed 3 months ago by kasper.souren@…

We had to resync twice to bring back our Trac. The first time it complained about a database lock-up, but the second time it went fine.

Add/Change #4586 (Too many concurrent resync attempts)

Author



Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.