Edgewall Software
Modify

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#12895 closed enhancement (fixed)

Improve performance of sync command with many revisions

Reported by: Ryan J Ollos Owned by: Jun Omae
Priority: normal Milestone: 1.0.17
Component: plugin/git Version:
Severity: normal Keywords: performance
Cc: Branch:
Release Notes:

Improve sync command when git repository has no changes.

API Changes:
Internal Changes:

Description

The patch was proposed in gmessage:trac-dev:2ityvgVZxsQ/KEA6IwY2CAAJ.

  • tracopt/versioncontrol/git/PyGIT.py

    diff --git a/tracopt/versioncontrol/git/PyGIT.py
    b/tracopt/versioncontrol/git/PyGIT.py
    index 47f397389..32aac9184 100644
    a b class Storage(object):  
    596596                          key=lambda (name, rev, head): (not head, name))
    597597        return [(name, rev) for name, rev, head in branches]
    598598
     599    def get_refs(self):
     600        for refname, rev in self.rev_cache.refs_dict.iteritems():
     601            if refname != 'HEAD':
     602                yield refname, rev
     603
    599604    def get_commits(self):
    600605        return self.rev_cache.rev_dict
    601606
  • tracopt/versioncontrol/git/git_fs.py

    diff --git a/tracopt/versioncontrol/git/git_fs.py
    b/tracopt/versioncontrol/git/git_fs.py
    index 031f68c2b..c33eb7a5d 100644
    a b class GitCachedRepository(CachedRepository):  
    100100                return count > 0
    101101            return False
    102102
     103        def needs_sync():
     104            max_holders = 999
     105            revs = sorted(set(rev for refname, rev in repos.git.get_refs()))
     106            for idx in xrange(0, len(revs), max_holders):
     107                revs_ = revs[idx:idx + max_holders]
     108                holders = ','.join(('%s',) * len(revs_))
     109                args = [self.id]
     110                args.extend(revs_)
     111                query = 'SELECT COUNT(*) FROM revision ' \
     112                        'WHERE repos=%s AND rev IN (' + holders + ')'
     113                for count, in self.env.db_query(query, args):
     114                    if count < len(revs_):
     115                        return True
     116            return False
     117
    103118        def traverse(rev, seen):
    104119            revs = []
    105120            merge_revs = []
    class GitCachedRepository(CachedRepository):  
    121136                    revs[idx:idx] = traverse(rev, seen)
    122137            return revs
    123138
    124         while True:
    125             repos.sync()
    126             repos_youngest = repos.youngest_rev or ''
     139        def sync_revs():
    127140            updated = False
    128141            seen = set()
    129142
    class GitCachedRepository(CachedRepository):  
    148161                    if feedback:
    149162                        feedback(rev)
    150163
    151             if updated:
    152                 continue  # sync again
     164            return updated
    153165
     166        while True:
     167            repos.sync()
     168            repos_youngest = repos.youngest_rev or ''
     169            if needs_sync() and sync_revs():
     170                continue  # sync again
    154171            if meta_youngest != repos_youngest:
    155172                with self.env.db_transaction as db:
    156173                    db("""

Attachments (0)

Change History (8)

comment:1 by figaro, 3 years ago

Keywords: performance added

comment:2 by Ryan J Ollos, 3 years ago

On 1.0-stable, timing execution of sync:

  • trac/versioncontrol/api.py

    diff --git a/trac/versioncontrol/api.py b/trac/versioncontrol/api.py
    index 94d5a4b7b..5807fc715 100644
    a b class RepositoryManager(Component):  
    376376                try:
    377377                    repo = self.get_repository(reponame)
    378378                    if repo:
     379                        from timeit import default_timer as timer
     380                        start = timer()
    379381                        repo.sync()
     382                        end = timer()
     383                        print(end - start)
    380384                    else:
    381385                        self.log.warning("Unable to find repository '%s' for "
    382386                                         "synchronization",

I did some rough profiling with teo repository configured for sync per request. This is just refreshing the page and no new revisions in repository.

rjollos.git:t12895_repos_sync_performance:

cached_repository = False cached_repository = True
persistent_cache = False 0.5s 0.5s
persistent_cache = True 0.04s 0.05s

1.0-stable:

cached_repository = False cached_repository = True
persistent_cache = False 0.5s 0.8s
persistent_cache = True 0.04s 0.4s

comment:3 by Ryan J Ollos, 3 years ago

Owner: set to Ryan J Ollos
Status: newassigned

comment:4 by Jun Omae, 3 years ago

My proposed changes has off-by-one error. See [a9fd49b97/jomae.git] (jomae.git@t12895).

comment:5 by Ryan J Ollos, 3 years ago

Owner: changed from Ryan J Ollos to Jun Omae

comment:6 by Jun Omae, 3 years ago

Max place-holders in sqlite3 is 999 on Windows, however it is 250000 on Ubuntu and Debian.

Python 2.7.10 (default, May 23 2015, 09:44:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from trac.test import EnvironmentStub
>>> env = EnvironmentStub()
>>> def q(n):
...     query = 'SELECT * FROM system WHERE 0 IN ({0})'.format(','.join(('%s',) * n))
...     args = list(xrange(n))
...     return env.db_query(query, args)
...
>>> q(999)
[(u'database_version', u'29')]
>>> q(1000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in q
  File "C:\venv\trac-1.0.15\lib\site-packages\trac\db\api.py", line 124, in execute
    return db.execute(query, params)
  File "C:\venv\trac-1.0.15\lib\site-packages\trac\db\util.py", line 128, in execute
    cursor.execute(query, params if params is not None else [])
  File "C:\venv\trac-1.0.15\lib\site-packages\trac\db\util.py", line 61, in execute
    r = self.cursor.execute(sql_escape_percent(sql), args)
  File "C:\venv\trac-1.0.15\lib\site-packages\trac\db\sqlite_backend.py", line 82, in execute
    result = PyFormatCursor.execute(self, *args)
  File "C:\venv\trac-1.0.15\lib\site-packages\trac\db\sqlite_backend.py", line 60, in execute
    args or [])
  File "C:\venv\trac-1.0.15\lib\site-packages\trac\db\sqlite_backend.py", line 52, in _rollback_on_error
    return function(self, *args, **kwargs)
sqlite3.OperationalError: too many SQL variables
>>>
Python 2.5.6 (r256:88840, Oct 21 2014, 22:49:55)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from trac.test import EnvironmentStub
>>> env = EnvironmentStub()
>>> def q(n):
...     query = 'SELECT * FROM system WHERE 0 IN (%s)' % ','.join(('%s',) * n)
...     args = list(xrange(n))
...     return env.db_query(query, args)
...
>>> q(999)
[(u'database_version', u'29')]
>>> q(1000)
[(u'database_version', u'29')]
>>> q(250000)
[(u'database_version', u'29')]
>>> q(250001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in q
  File "/venv/trac/1.0.15/lib/python2.5/site-packages/trac/db/api.py", line 124, in execute
    return db.execute(query, params)
  File "/venv/trac/1.0.15/lib/python2.5/site-packages/trac/db/util.py", line 128, in execute
    cursor.execute(query, params if params is not None else [])
  File "/venv/trac/1.0.15/lib/python2.5/site-packages/trac/db/util.py", line 61, in execute
    r = self.cursor.execute(sql_escape_percent(sql), args)
  File "/venv/trac/1.0.15/lib/python2.5/site-packages/trac/db/sqlite_backend.py", line 82, in execute
    result = PyFormatCursor.execute(self, *args)
  File "/venv/trac/1.0.15/lib/python2.5/site-packages/trac/db/sqlite_backend.py", line 60, in execute
    args or [])
  File "/venv/trac/1.0.15/lib/python2.5/site-packages/trac/db/sqlite_backend.py", line 52, in _rollback_on_error
    return function(self, *args, **kwargs)
sqlite3.OperationalError: too many SQL variables

comment:7 by Jun Omae, 3 years ago

Release Notes: modified (diff)
Resolution: fixed
Status: assignedclosed

Committed in [16289] and merged in [16290-16291].

comment:8 by Ryan J Ollos, 3 years ago

Milestone: 1.0.161.0.17

Milestone renamed

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Jun Omae.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Jun Omae to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.