Opened 17 years ago
Closed 15 years ago
#6654 closed defect (fixed)
Changeset view is slow when using sub repository path
Reported by: | Owned by: | Remy Blank | |
---|---|---|---|
Priority: | high | Milestone: | 0.12 |
Component: | version control | Version: | |
Severity: | major | Keywords: | multirepos performance needmajor |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
When viewing a changeset restricted to a subpath, when that path hasn't changed for many revisions, the changeset viewer gets extremely slow.
This is using svn 1.3.2 with fsfs repository containing about 160krevs, Trac 0.11dev-r6396.
The reason is changeset.py:424: next_rev = repos.next_rev(chgset.rev, path)
Then looking into svn_fs.py's implementation of next_rev, you can see that it will iterate through every single revision in the repository after the starting revision, one by one, looking for the next revision number which contains a change for this path.
So, for a subpath which hasn't been changed for 50k revisions, this takes just about forever. And all that, just to show a little link in the corner named "next change".
I'm not sure if next_rev can be made faster somehow, as that would obviously be the best solution. But if not, perhaps the actual revision behind the "next change" link should only be computed when you actually click on it.
Attachments (2)
Change History (26)
comment:1 by , 17 years ago
Component: | general → version control |
---|---|
Milestone: | → 0.12 |
Owner: | changed from | to
comment:2 by , 17 years ago
Also, a quick hack would be in order for 0.11 for not even computing that value when the changeset view is requested from the annotate view (as we don't display the links in this case).
comment:3 by , 17 years ago
Priority: | normal → high |
---|
comment:5 by , 16 years ago
Milestone: | 0.13 → 0.11.1 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
Yes, and it's follow-up r7306. Thanks for the notice.
comment:6 by , 16 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
It's still *much* slower than it should be.
I added some logging code to the sqlite db backend (wouldn't it be nice if this was available as a log option?) to see why things were going so slowly, and found that the queries used are extremely suboptimal.
What trac currently uses:
time sqlite3 /ita/trac-env/db/trac.db 'SELECT rev FROM node_change WHERE 1*rev > 1000 ORDER BY 1*rev LIMIT 1' 1001 sqlite3 /ita/trac-env/db/trac.db 6.59s user 1.10s system 99% cpu 7.700 total
What it should be doing:
time sqlite3 /ita/trac-env/db/trac.db 'SELECT rev FROM node_change WHERE rev > 1000 ORDER BY rev LIMIT 1' 1001 sqlite3 /ita/trac-env/db/trac.db 0.00s user 0.00s system 0% cpu 0.002 total
(I have 2457822 entries in the node_changes table, if you'd like to reproduce the issue with a similarly sized database).
Here's a patch. It changes the type of the rev column to int, so that the "order by" clause works properly on it without the cast operation, and removes the casts.
Index: db_default.py =================================================================== --- db_default.py (revision 7728) +++ db_default.py (working copy) @@ -83,13 +83,13 @@ # Version control cache Table('revision', key='rev')[ - Column('rev'), + Column('rev', type='int'), Column('time', type='int'), Column('author'), Column('message'), Index(['time'])], Table('node_change', key=('rev', 'path', 'change_type'))[ - Column('rev'), + Column('rev', type='int'), Column('path'), Column('node_type', size=1), Column('change_type', size=1), Index: versioncontrol/cache.py =================================================================== --- versioncontrol/cache.py (revision 7728) +++ versioncontrol/cache.py (working copy) @@ -256,8 +256,8 @@ def _next_prev_rev(self, direction, rev, path=''): db = self.getdb() # the changeset revs are sequence of ints: - sql = "SELECT rev FROM node_change WHERE " + \ - db.cast('rev', 'int') + " " + direction + " %s" + sql = "SELECT rev FROM node_change WHERE rev " + \ + direction + " %s" args = [rev] if path: @@ -279,7 +279,7 @@ sql += " (path in (" + parent_insert + ") and change_type='D')" sql += ")" - sql += " ORDER BY " + db.cast('rev', 'int') + \ + sql += " ORDER BY rev" + \ (direction == '<' and " DESC" or "") + " LIMIT 1" cursor = db.cursor()
follow-ups: 9 12 16 comment:7 by , 16 years ago
The reason why the rev
columns are not of type int
is that other revision control systems don't necessarily have integer revisions (e.g. git, mercurial). Currently, only the Subversion backend actually uses these tables for caching AFAIK, but this is expected to change as part of multi-repository effort.
I wonder if it would be faster to pad integer revision numbers with leading zeros, and not to cast them to integer but to compare them as strings.
comment:8 by , 16 years ago
Keywords: | needinfo added |
---|
Also, as the 1*rev
part struck me as odd, I looked a bit in the code and found about r3699. So it seems that you're using a quite old version of SQLite (pre-3.2.3, so more than 3 years old).
What if you upgrade to a more recent one and let Trac use the CAST()
operator? How does the performance compare in this case with and without the cast?
follow-ups: 10 11 comment:9 by , 16 years ago
Keywords: | needinfo removed |
---|
The reason why the rev columns are not of type int is that other revision control systems don't necessarily have integer revisions (e.g. git, mercurial). Currently, only the Subversion backend actually uses these tables for caching AFAIK, but this is expected to change as part of multi-repository effort.
Well, for the sqlite backend this doesn't matter: the column type is just a hint, so you can still store arbitrary other data in the column. And with a real database, you should be able to create a new index based on the cast expression rather than the raw column. I guess that leaves the in-between databases in the lurch (ahem MySQL) but who'd ever want to use that anyhow, right? :)
I wonder if it would be faster to pad integer revision numbers with leading zeros, and not to cast them to integer but to compare them as strings.
I would expect so. The slowdown is because it can't use the index. But perhaps it makes more sense to just use different tables for different classes of VCS systems…
What if you upgrade to a more recent one and let Trac use the CAST() operator? How does the performance compare in this case with and without the cast?
I ran SELECT rev FROM node_change WHERE cast(rev as int) > 1000 ORDER BY cast(rev as int) LIMIT 1
with sqlite 3.6.7, and that has just as poor performance.
comment:10 by , 16 years ago
Replying to foom@…:
I would expect so. The slowdown is because it can't use the index.
I'll do a quick test to evaluate the performance gain. Sadly, I don't have 2457822 entries in my table for a real-life test.
comment:11 by , 16 years ago
Keywords: | multirepos added |
---|---|
Milestone: | 0.11.1 → 0.12 |
Replying to foom@…:
… But perhaps it makes more sense to just use different tables for different classes of VCS systems…
I think that would be the right thing to do (moving to 0.12 for now).
comment:12 by , 15 years ago
Keywords: | performance added |
---|---|
Severity: | normal → major |
comment:13 by , 15 years ago
(to do after #2086 - MultiRepos integration, in order to not mess up trac/db/upgrades)
comment:15 by , 15 years ago
Keywords: | needmajor added |
---|
comment:16 by , 15 years ago
Replying to rblank:
I wonder if it would be faster to pad integer revision numbers with leading zeros, and not to cast them to integer but to compare them as strings.
And how many zeros to pad? And how many space this will waste?
comment:17 by , 15 years ago
Right! And how many more trees will be killed when printing database dumps?
comment:18 by , 15 years ago
Owner: | changed from | to
---|---|
Status: | reopened → new |
I'll do that right after the multirepos merge.
comment:19 by , 15 years ago
Summary: | Changset view is slow when using sub repository path → Changeset view is slow when using sub repository path |
---|
by , 15 years ago
Attachment: | 6654-padded-revs-r9198+t6466.patch added |
---|
Pad revision numbers to 10 digits.
comment:20 by , 15 years ago
The patch above pads the revision numbers to 10 digits in the revision
and node_change
tables (applies on top of 6466-microsecond-times-9198.patch:ticket:6466). All tests pass, but I haven't measured the performance gain yet.
comment:21 by , 15 years ago
Patch looks great, I've tested the upgrade here on t.e.o (demo-0.12), everything seems to work fine (yeah, I've already said that before ;-) ).
I need to test it on a bigger Subversion repository to see if there's a noticeable speed improvement, but I'm confident there should be one.
If the original reporter could set up a test instance of Trac 0.12, that'd be great (quite easy, just hotcopy your current 0.11 environment, install Trac 0.12 in a virtualenv and use that to upgrade the copy, and run the tests).
comment:22 by , 15 years ago
Back to the original issue and
changeset.py:424: next_rev = repos.next_rev(chgset.rev, path)
Yes, this can effectively take for ever in the direct-svnfs case (see ticket:8813#comment:1).
I'll also try to come up with a fix for the direct-svnfs case.
by , 15 years ago
Attachment: | 6654-padded-revs-r9212.patch added |
---|
Updated patch, added a missing conversion to int
.
comment:23 by , 15 years ago
The patch above contains an a conversion to int
that was missing in the previous patches.
Very true.
A speedup could be obtained by using the cache.
An alternative would be to defer the computation of the next change, as you suggested (see also #2053).