Edgewall Software

Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#12895 closed enhancement (fixed)

Improve performance of sync command with many revisions — at Version 7

Reported by: Ryan J Ollos Owned by: Jun Omae
Priority: normal Milestone: 1.0.17
Component: plugin/git Version:
Severity: normal Keywords: performance
Cc: Branch:
Release Notes:

Improve sync command when git repository has no changes.

API Changes:
Internal Changes:

Description

The patch was proposed in gmessage:trac-dev:2ityvgVZxsQ/KEA6IwY2CAAJ.

  • tracopt/versioncontrol/git/PyGIT.py

    diff --git a/tracopt/versioncontrol/git/PyGIT.py
    b/tracopt/versioncontrol/git/PyGIT.py
    index 47f397389..32aac9184 100644
    a b class Storage(object):  
    596596                          key=lambda (name, rev, head): (not head, name))
    597597        return [(name, rev) for name, rev, head in branches]
    598598
     599    def get_refs(self):
     600        for refname, rev in self.rev_cache.refs_dict.iteritems():
     601            if refname != 'HEAD':
     602                yield refname, rev
     603
    599604    def get_commits(self):
    600605        return self.rev_cache.rev_dict
    601606
  • tracopt/versioncontrol/git/git_fs.py

    diff --git a/tracopt/versioncontrol/git/git_fs.py
    b/tracopt/versioncontrol/git/git_fs.py
    index 031f68c2b..c33eb7a5d 100644
    a b class GitCachedRepository(CachedRepository):  
    100100                return count > 0
    101101            return False
    102102
     103        def needs_sync():
     104            max_holders = 999
     105            revs = sorted(set(rev for refname, rev in repos.git.get_refs()))
     106            for idx in xrange(0, len(revs), max_holders):
     107                revs_ = revs[idx:idx + max_holders]
     108                holders = ','.join(('%s',) * len(revs_))
     109                args = [self.id]
     110                args.extend(revs_)
     111                query = 'SELECT COUNT(*) FROM revision ' \
     112                        'WHERE repos=%s AND rev IN (' + holders + ')'
     113                for count, in self.env.db_query(query, args):
     114                    if count < len(revs_):
     115                        return True
     116            return False
     117
    103118        def traverse(rev, seen):
    104119            revs = []
    105120            merge_revs = []
    class GitCachedRepository(CachedRepository):  
    121136                    revs[idx:idx] = traverse(rev, seen)
    122137            return revs
    123138
    124         while True:
    125             repos.sync()
    126             repos_youngest = repos.youngest_rev or ''
     139        def sync_revs():
    127140            updated = False
    128141            seen = set()
    129142
    class GitCachedRepository(CachedRepository):  
    148161                    if feedback:
    149162                        feedback(rev)
    150163
    151             if updated:
    152                 continue  # sync again
     164            return updated
    153165
     166        while True:
     167            repos.sync()
     168            repos_youngest = repos.youngest_rev or ''
     169            if needs_sync() and sync_revs():
     170                continue  # sync again
    154171            if meta_youngest != repos_youngest:
    155172                with self.env.db_transaction as db:
    156173                    db("""

Change History (7)

comment:1 by figaro, 7 years ago

Keywords: performance added

comment:2 by Ryan J Ollos, 7 years ago

On 1.0-stable, timing execution of sync:

  • trac/versioncontrol/api.py

    diff --git a/trac/versioncontrol/api.py b/trac/versioncontrol/api.py
    index 94d5a4b7b..5807fc715 100644
    a b class RepositoryManager(Component):  
    376376                try:
    377377                    repo = self.get_repository(reponame)
    378378                    if repo:
     379                        from timeit import default_timer as timer
     380                        start = timer()
    379381                        repo.sync()
     382                        end = timer()
     383                        print(end - start)
    380384                    else:
    381385                        self.log.warning("Unable to find repository '%s' for "
    382386                                         "synchronization",

I did some rough profiling with teo repository configured for sync per request. This is just refreshing the page and no new revisions in repository.

rjollos.git:t12895_repos_sync_performance:

cached_repository = False cached_repository = True
persistent_cache = False 0.5s 0.5s
persistent_cache = True 0.04s 0.05s

1.0-stable:

cached_repository = False cached_repository = True
persistent_cache = False 0.5s 0.8s
persistent_cache = True 0.04s 0.4s

comment:3 by Ryan J Ollos, 7 years ago

Owner: set to Ryan J Ollos
Status: newassigned

comment:4 by Jun Omae, 7 years ago

My proposed changes has off-by-one error. See [a9fd49b97/jomae.git] (jomae.git@t12895).

comment:5 by Ryan J Ollos, 7 years ago

Owner: changed from Ryan J Ollos to Jun Omae

comment:6 by Jun Omae, 7 years ago

Max place-holders in sqlite3 is 999 on Windows, however it is 250000 on Ubuntu and Debian.

Python 2.7.10 (default, May 23 2015, 09:44:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from trac.test import EnvironmentStub
>>> env = EnvironmentStub()
>>> def q(n):
...     query = 'SELECT * FROM system WHERE 0 IN ({0})'.format(','.join(('%s',) * n))
...     args = list(xrange(n))
...     return env.db_query(query, args)
...
>>> q(999)
[(u'database_version', u'29')]
>>> q(1000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in q
  File "C:\venv\trac-1.0.15\lib\site-packages\trac\db\api.py", line 124, in execute
    return db.execute(query, params)
  File "C:\venv\trac-1.0.15\lib\site-packages\trac\db\util.py", line 128, in execute
    cursor.execute(query, params if params is not None else [])
  File "C:\venv\trac-1.0.15\lib\site-packages\trac\db\util.py", line 61, in execute
    r = self.cursor.execute(sql_escape_percent(sql), args)
  File "C:\venv\trac-1.0.15\lib\site-packages\trac\db\sqlite_backend.py", line 82, in execute
    result = PyFormatCursor.execute(self, *args)
  File "C:\venv\trac-1.0.15\lib\site-packages\trac\db\sqlite_backend.py", line 60, in execute
    args or [])
  File "C:\venv\trac-1.0.15\lib\site-packages\trac\db\sqlite_backend.py", line 52, in _rollback_on_error
    return function(self, *args, **kwargs)
sqlite3.OperationalError: too many SQL variables
>>>
Python 2.5.6 (r256:88840, Oct 21 2014, 22:49:55)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from trac.test import EnvironmentStub
>>> env = EnvironmentStub()
>>> def q(n):
...     query = 'SELECT * FROM system WHERE 0 IN (%s)' % ','.join(('%s',) * n)
...     args = list(xrange(n))
...     return env.db_query(query, args)
...
>>> q(999)
[(u'database_version', u'29')]
>>> q(1000)
[(u'database_version', u'29')]
>>> q(250000)
[(u'database_version', u'29')]
>>> q(250001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in q
  File "/venv/trac/1.0.15/lib/python2.5/site-packages/trac/db/api.py", line 124, in execute
    return db.execute(query, params)
  File "/venv/trac/1.0.15/lib/python2.5/site-packages/trac/db/util.py", line 128, in execute
    cursor.execute(query, params if params is not None else [])
  File "/venv/trac/1.0.15/lib/python2.5/site-packages/trac/db/util.py", line 61, in execute
    r = self.cursor.execute(sql_escape_percent(sql), args)
  File "/venv/trac/1.0.15/lib/python2.5/site-packages/trac/db/sqlite_backend.py", line 82, in execute
    result = PyFormatCursor.execute(self, *args)
  File "/venv/trac/1.0.15/lib/python2.5/site-packages/trac/db/sqlite_backend.py", line 60, in execute
    args or [])
  File "/venv/trac/1.0.15/lib/python2.5/site-packages/trac/db/sqlite_backend.py", line 52, in _rollback_on_error
    return function(self, *args, **kwargs)
sqlite3.OperationalError: too many SQL variables

comment:7 by Jun Omae, 7 years ago

Release Notes: modified (diff)
Resolution: fixed
Status: assignedclosed

Committed in [16289] and merged in [16290-16291].

Note: See TracTickets for help on using tickets.