Edgewall Software
Modify

Opened 10 years ago

Closed 8 years ago

#6654 closed defect (fixed)

Changeset view is slow when using sub repository path

Reported by: foom@… Owned by: Remy Blank
Priority: high Milestone: 0.12
Component: version control Version:
Severity: major Keywords: multirepos performance needmajor
Cc:
Release Notes:
API Changes:

Description

When viewing a changeset restricted to a subpath, when that path hasn't changed for many revisions, the changeset viewer gets extremely slow.

This is using svn 1.3.2 with fsfs repository containing about 160krevs, Trac 0.11dev-r6396.

The reason is changeset.py:424: next_rev = repos.next_rev(chgset.rev, path)

Then looking into svn_fs.py's implementation of next_rev, you can see that it will iterate through every single revision in the repository after the starting revision, one by one, looking for the next revision number which contains a change for this path.

So, for a subpath which hasn't been changed for 50k revisions, this takes just about forever. And all that, just to show a little link in the corner named "next change".

I'm not sure if next_rev can be made faster somehow, as that would obviously be the best solution. But if not, perhaps the actual revision behind the "next change" link should only be computed when you actually click on it.

Attachments (2)

6654-padded-revs-r9198+t6466.patch (13.0 KB ) - added by Remy Blank 8 years ago.
Pad revision numbers to 10 digits.
6654-padded-revs-r9212.patch (13.2 KB ) - added by Remy Blank 8 years ago.
Updated patch, added a missing conversion to int.

Download all attachments as: .zip

Change History (26)

comment:1 Changed 10 years ago by Christian Boos

Component: generalversion control
Milestone: 0.12
Owner: changed from Jonas Borgström to Christian Boos

Very true.

A speedup could be obtained by using the cache.

An alternative would be to defer the computation of the next change, as you suggested (see also #2053).

comment:2 Changed 10 years ago by Christian Boos

Also, a quick hack would be in order for 0.11 for not even computing that value when the changeset view is requested from the annotate view (as we don't display the links in this case).

comment:3 Changed 10 years ago by Christian Boos

Priority: normalhigh

comment:4 Changed 9 years ago by anonymous

This was fixed by r6898, wasn't it?

comment:5 Changed 9 years ago by Christian Boos

Milestone: 0.130.11.1
Resolution: fixed
Status: newclosed

Yes, and it's follow-up r7306. Thanks for the notice.

comment:6 Changed 9 years ago by foom@…

Resolution: fixed
Status: closedreopened

It's still *much* slower than it should be.

I added some logging code to the sqlite db backend (wouldn't it be nice if this was available as a log option?) to see why things were going so slowly, and found that the queries used are extremely suboptimal.

What trac currently uses:

time sqlite3 /ita/trac-env/db/trac.db 'SELECT rev FROM node_change WHERE 1*rev > 1000 ORDER BY 1*rev LIMIT 1'
1001
sqlite3 /ita/trac-env/db/trac.db   6.59s user 1.10s system 99% cpu 7.700 total

What it should be doing:

time sqlite3 /ita/trac-env/db/trac.db 'SELECT rev FROM node_change WHERE rev > 1000 ORDER BY rev LIMIT 1' 
1001
sqlite3 /ita/trac-env/db/trac.db   0.00s user 0.00s system 0% cpu 0.002 total

(I have 2457822 entries in the node_changes table, if you'd like to reproduce the issue with a similarly sized database).

Here's a patch. It changes the type of the rev column to int, so that the "order by" clause works properly on it without the cast operation, and removes the casts.

Index: db_default.py
===================================================================
--- db_default.py	(revision 7728)
+++ db_default.py	(working copy)
@@ -83,13 +83,13 @@
 
     # Version control cache
     Table('revision', key='rev')[
-        Column('rev'),
+        Column('rev', type='int'),
         Column('time', type='int'),
         Column('author'),
         Column('message'),
         Index(['time'])],
     Table('node_change', key=('rev', 'path', 'change_type'))[
-        Column('rev'),
+        Column('rev', type='int'),
         Column('path'),
         Column('node_type', size=1),
         Column('change_type', size=1),
Index: versioncontrol/cache.py
===================================================================
--- versioncontrol/cache.py	(revision 7728)
+++ versioncontrol/cache.py	(working copy)
@@ -256,8 +256,8 @@
     def _next_prev_rev(self, direction, rev, path=''):
         db = self.getdb()
         # the changeset revs are sequence of ints:
-        sql = "SELECT rev FROM node_change WHERE " + \
-              db.cast('rev', 'int') + " " + direction + " %s"
+        sql = "SELECT rev FROM node_change WHERE rev " + \
+              direction + " %s"
         args = [rev]
 
         if path:
@@ -279,7 +279,7 @@
             sql += " (path in (" + parent_insert + ") and change_type='D')"
             sql += ")"
 
-        sql += " ORDER BY " + db.cast('rev', 'int') + \
+        sql += " ORDER BY rev" + \
                 (direction == '<' and " DESC" or "") + " LIMIT 1"
         
         cursor = db.cursor()

comment:7 Changed 9 years ago by Remy Blank

The reason why the rev columns are not of type int is that other revision control systems don't necessarily have integer revisions (e.g. git, mercurial). Currently, only the Subversion backend actually uses these tables for caching AFAIK, but this is expected to change as part of multi-repository effort.

I wonder if it would be faster to pad integer revision numbers with leading zeros, and not to cast them to integer but to compare them as strings.

comment:8 Changed 9 years ago by Christian Boos

Keywords: needinfo added

Also, as the 1*rev part struck me as odd, I looked a bit in the code and found about r3699. So it seems that you're using a quite old version of SQLite (pre-3.2.3, so more than 3 years old).

What if you upgrade to a more recent one and let Trac use the CAST() operator? How does the performance compare in this case with and without the cast?

comment:9 in reply to:  7 ; Changed 9 years ago by foom@…

Keywords: needinfo removed

The reason why the rev columns are not of type int is that other revision control systems don't necessarily have integer revisions (e.g. git, mercurial). Currently, only the Subversion backend actually uses these tables for caching AFAIK, but this is expected to change as part of multi-repository effort.

Well, for the sqlite backend this doesn't matter: the column type is just a hint, so you can still store arbitrary other data in the column. And with a real database, you should be able to create a new index based on the cast expression rather than the raw column. I guess that leaves the in-between databases in the lurch (ahem MySQL) but who'd ever want to use that anyhow, right? :)

I wonder if it would be faster to pad integer revision numbers with leading zeros, and not to cast them to integer but to compare them as strings.

I would expect so. The slowdown is because it can't use the index. But perhaps it makes more sense to just use different tables for different classes of VCS systems…

What if you upgrade to a more recent one and let Trac use the CAST() operator? How does the performance compare in this case with and without the cast?

I ran SELECT rev FROM node_change WHERE cast(rev as int) > 1000 ORDER BY cast(rev as int) LIMIT 1 with sqlite 3.6.7, and that has just as poor performance.

comment:10 in reply to:  9 Changed 9 years ago by Remy Blank

Replying to foom@…:

I would expect so. The slowdown is because it can't use the index.

I'll do a quick test to evaluate the performance gain. Sadly, I don't have 2457822 entries in my table for a real-life test.

comment:11 in reply to:  9 Changed 9 years ago by Christian Boos

Keywords: multirepos added
Milestone: 0.11.10.12

Replying to foom@…:

… But perhaps it makes more sense to just use different tables for different classes of VCS systems…

I think that would be the right thing to do (moving to 0.12 for now).

comment:12 in reply to:  7 Changed 8 years ago by Christian Boos

Keywords: performance added
Severity: normalmajor

Replying to rblank:

I wonder if it would be faster to pad integer revision numbers with leading zeros, and not to cast them to integer but to compare them as strings.

Let's do that for 0.12.

Besides _next_prev_rev, this would also allow us to get rid of the cast in _get_node_revs from r8310.

comment:13 Changed 8 years ago by Christian Boos

(to do after #2086 - MultiRepos integration, in order to not mess up trac/db/upgrades)

comment:14 Changed 8 years ago by Christian Boos

(leaving this on 0.12 proper intentionally)

comment:15 Changed 8 years ago by Christian Boos

Keywords: needmajor added

comment:16 in reply to:  7 Changed 8 years ago by anonymous

Replying to rblank:

I wonder if it would be faster to pad integer revision numbers with leading zeros, and not to cast them to integer but to compare them as strings.

And how many zeros to pad? And how many space this will waste?

comment:17 Changed 8 years ago by Christian Boos

Right! And how many more trees will be killed when printing database dumps?

comment:18 Changed 8 years ago by Remy Blank

Owner: changed from Christian Boos to Remy Blank
Status: reopenednew

I'll do that right after the multirepos merge.

comment:19 Changed 8 years ago by Christian Boos

Summary: Changset view is slow when using sub repository pathChangeset view is slow when using sub repository path

Changed 8 years ago by Remy Blank

Pad revision numbers to 10 digits.

comment:20 Changed 8 years ago by Remy Blank

The patch above pads the revision numbers to 10 digits in the revision and node_change tables (applies on top of 6466-microsecond-times-9198.patch:ticket:6466). All tests pass, but I haven't measured the performance gain yet.

comment:21 Changed 8 years ago by Christian Boos

Patch looks great, I've tested the upgrade here on t.e.o (demo-0.12), everything seems to work fine (yeah, I've already said that before ;-) ).

I need to test it on a bigger Subversion repository to see if there's a noticeable speed improvement, but I'm confident there should be one.

If the original reporter could set up a test instance of Trac 0.12, that'd be great (quite easy, just hotcopy your current 0.11 environment, install Trac 0.12 in a virtualenv and use that to upgrade the copy, and run the tests).

comment:22 Changed 8 years ago by Christian Boos

Back to the original issue and

changeset.py:424: next_rev = repos.next_rev(chgset.rev, path)

Yes, this can effectively take for ever in the direct-svnfs case (see ticket:8813#comment:1).

I'll also try to come up with a fix for the direct-svnfs case.

Changed 8 years ago by Remy Blank

Updated patch, added a missing conversion to int.

comment:23 Changed 8 years ago by Remy Blank

The patch above contains an a conversion to int that was missing in the previous patches.

comment:24 Changed 8 years ago by Remy Blank

Resolution: fixed
Status: newclosed

Patch applied in [9224].

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Remy Blank.
The resolution will be deleted.
to The owner will be changed from Remy Blank to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .

 
Note: See TracTickets for help on using tickets.