Opened 15 years ago
Last modified 6 months ago
#8813 new defect
next_rev is slow, particularly in the direct-svnfs case
Reported by: | Christian Boos | Owned by: | anonymous |
---|---|---|---|
Priority: | highest | Milestone: | next-major-releases |
Component: | version control | Version: | 0.11 |
Severity: | major | Keywords: | slow performance newcache svn |
Cc: | Jun Omae, cronos586@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
This is a continuation of #7490, which became too "big" to be usable. Please keep that other ticket closed.
A lot of possible reasons for bad performance have been identified, some of them fixed, most being due to installation and configuration issues. Please refer to the TracPerformance checklist, and help maintain that wiki page with your findings.
This ticket should be used to help identify the remaining performance bottlenecks, a part from those which are already known (see TracDev/Performance).
Attachments (0)
Change History (34)
comment:1 by , 15 years ago
comment:2 by , 15 years ago
Milestone: | 0.11.7 → next-minor-0.12.x |
---|
comment:3 by , 14 years ago
Cc: | added |
---|
comment:4 by , 14 years ago
Component: | general → version control |
---|---|
Summary: | Trac 0.11 really slow, high cpu usage (continued) → next_rev is slow, particularly in the direct-svnfs case |
The general topic Trac 0.11 really slow, high cpu usage (continued) and in particular most of the remaining unsolved cases from #7490, have now been addressed part of #9111 by r10002. Condition.wait(timeout)
is a performance killer in Python, as it does a kind of "active" wait. This seems to have been fixed in Python 3.2 which uses the appropriate library call (see PythonBug:3155). Never assume the underlying system does the "right thing", but look at the code to be sure…
Retargeting this ticket to fix comment:1.
follow-up: 6 comment:5 by , 13 years ago
Hi,
I installed a Trac server about a year ago (0.11). The system that's running our Trac instances is a pretty fast system with 16gb of memory, so that shouldn't be the problem.
I've been noticing that Trac would take between 3 to 5 seconds to fully load a ticket. I cleaned up one plugin that was causing problems, but I'm still left with a system that is uber-slow.
I've been fishing through the Trac logs and I've found something that raises an eyebrow.
Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/Trac-0.12.2-py2.6.egg/trac/web/main.py", line 512, in _dispatch_request dispatcher.dispatch(req) File "/usr/lib/python2.6/site-packages/Trac-0.12.2-py2.6.egg/trac/web/main.py", line 238, in dispatch resp = chosen_handler.process_request(req) File "/usr/lib/python2.6/site-packages/TicketExtPlugin-0.3.2_r7776-py2.6.egg/ticketext/template.py", line 93, in process_request template_api.process_tickettemplate(self.env, req, 'type') File "/usr/lib/python2.6/site-packages/TicketExtPlugin-0.3.2_r7776-py2.6.egg/ticketext/api.py", line 62, in process_tickettemplate req.write(response) File "/usr/lib/python2.6/site-packages/Trac-0.12.2-py2.6.egg/trac/web/api.py", line 534, in write raise ValueError("Can't send unicode content") ValueError: Can't send unicode content
Could this be causing something? I noticed that it says Trac 0.12, which I thought I wasn't running on this system. As said, I thought my version was 0.11.
Any help would be great,
Thanks!
comment:6 by , 13 years ago
Replying to jvilla@…:
Hi,
I installed a Trac server about a year ago (0.11). The system that's running our Trac instances is a pretty fast system with 16gb of memory, so that shouldn't be the problem.
I've been noticing that Trac would take between 3 to 5 seconds to fully load a ticket. I cleaned up one plugin that was causing problems, but I'm still left with a system that is uber-slow.
I've been fishing through the Trac logs and I've found something that raises an eyebrow.
Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/Trac-0.12.2-py2.6.egg/trac/web/main.py", line 512, in _dispatch_request dispatcher.dispatch(req) File "/usr/lib/python2.6/site-packages/Trac-0.12.2-py2.6.egg/trac/web/main.py", line 238, in dispatch resp = chosen_handler.process_request(req) File "/usr/lib/python2.6/site-packages/TicketExtPlugin-0.3.2_r7776-py2.6.egg/ticketext/template.py", line 93, in process_request template_api.process_tickettemplate(self.env, req, 'type') File "/usr/lib/python2.6/site-packages/TicketExtPlugin-0.3.2_r7776-py2.6.egg/ticketext/api.py", line 62, in process_tickettemplate req.write(response) File "/usr/lib/python2.6/site-packages/Trac-0.12.2-py2.6.egg/trac/web/api.py", line 534, in write raise ValueError("Can't send unicode content") ValueError: Can't send unicode contentCould this be causing something? I noticed that it says Trac 0.12, which I thought I wasn't running on this system. As said, I thought my version was 0.11.
This is obviously an InstallationIssue… nothing wrong with Trac itself and nothing related to this ticket…
It seems that this TicketExtPlugin will only work with Trac < 0.11.6 (see r8609), and obviously you have 0.12.2 installed (also verify in the /about page as TRAC_ADMIN or in Admin / Plugins). So you can either try to fix that plugin (recommended) or downgrade (uninstall all Trac versions in site-packages, then reinstall 0.11.5 - not recommended).
If you can get anything more specific about the slow down (does it also happen without that plugin?), then please create a new ticket.
comment:7 by , 13 years ago
Thank you and my apologies. I got a little confused.
I will create a new ticket after this because even while disabling that plugin, Trac is still taking 4.83 seconds to load data from tickets.
comment:8 by , 13 years ago
Priority: | high → highest |
---|
Ok, and on a side note let me bump the priority of this ticket, as I think the next_rev thing is causing the slowdown I see in my own production Trac… I didn't have time yet to do an in-depth debugging session but there seem to be a problem with the source:tags/trac-0.12.2/trac/versioncontrol/cache.py@:365-366#L364 part which doesn't seem to use the indexes (well, indeed there are no indexes on the paths…).
comment:9 by , 13 years ago
Milestone: | next-minor-0.12.x → 0.12.4 |
---|
Next on my list of things we should really fix…
comment:10 by , 13 years ago
There's also a glitch with prev_rev, when the branch supporting a file has been deleted, e.g. source:/branches/0.11-stable/contrib/bugzilla2trac.py@8233, Previous Revision points to r6936. This corresponds to the deletion of a "wrong" version of branches/0.11-stable, which has been recreated later in r6940 - see log.
comment:11 by , 13 years ago
Keywords: | newcache added |
---|---|
Milestone: | 0.12.4 → next-major-0.1X |
The short term "fix" for the slowness will be addressed by #8639 (use relative links and defer computation of the actual next or prev revision).
The longer term fix is to rework the cache so that access to prev/next revision for any path could be obtained directly.
The short-term workaround is likely to stay ofr the direct-svnfs case, though.
follow-up: 13 comment:12 by , 11 years ago
I tried to improve performance of prev/next rev with an index on node_change (repos,path)
, repos:jomae.git:ticket8813/1.0-stable. As a result, 10-15 times faster.
- Run upgrade, create index node_change (repos,path)
- If you are using SQLite, run "ANALYZE"
Timing of _next_prev_rev
in source:trunk/tracopt/mimeview/tests/php.py@8802:
Database | before (r11950) | after | |||
< | > | < | > | ← direction parameter | |
SQLite 3.3.6 | 0.209179s | 0.045196s | 0.011844s | 0.011588s | |
MySQL 5.0.95 | 0.230615s | 0.236727s | 0.015379s | 0.015139s | innodb_buffer_pool_size = 32M |
PostgreSQL 8.1.23 | 0.085839s | 0.023007s | 0.005458s | 0.005478s |
- [b264536c4/jomae.git] and [31db53a0/jomae.git]: in SQLite, use
GLOB
for node_change.path instead ofLIKE
. TheGLOB
operator is case-sensitive and uses an index. - [31db53a0/jomae.git]: in SQLite,
WHERE ... AND (path=... OR ... OR ...)
uses the index for primary key. It is replaced withUNION ALL
to use the repos/path index. - [31db53a0/jomae.git]: in MySQL,
... ORDER rev DESC LIMIT 1
leads "using filesort" and is slow. UseMIN(rev)
/MAX(rev)
to avoid it. - [39827b0e/jomae.git]: in PostgreSQL, using the index for primary key is slow. Prevent using the index with
db.concat()
forchange_type
.
comment:13 by , 11 years ago
Replying to jomae:
I tried to improve performance of prev/next rev […] As a result, 10-15 times faster.
Or more ;-) Testing with SQLite and a node_change table with ~ 700000 rows for 52000 revs, the system is much more responsive.
Now it looks like I still had a (repos,path,rev)
index remaining from previous experiments, and this one gets used:
explain query plan SELECT MIN(rev) FROM node_change WHERE repos=1 AND rev>'0000046667' AND path='branches/maintenance/.../common/physinfo/PhysInfo.java' UNION ALL SELECT MIN(rev) FROM node_change WHERE repos=1 AND rev>'0000046667' AND change_type='D' AND path IN ( 'branches', 'branches/maintenance', ..., '.../common/physinfo'); 1|0|0|SEARCH TABLE node_change USING COVERING INDEX node_change_repo_path_rev_idx (repos=? AND path=? AND rev>?) (~1 rows) 2|0|0|SEARCH TABLE node_change USING INDEX node_change_repo_path_rev_idx (repos=? AND path=? AND rev>?) (~1 rows) 2|0|0|EXECUTE LIST SUBQUERY 3 0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL) .indices node_change node_change_path_idx node_change_path_only_idx node_change_path_rev_idx node_change_repo_path_rev_idx node_change_repos_path_idx node_change_repos_rev_idx sqlite_autoindex_node_change_1
I should probably retry without this index, to see if the speed improvement remains, but I wanted to give some early feedback to tell you how pleased I am with that improvement ;-)
comment:14 by , 11 years ago
Cc: | removed |
---|
comment:15 by , 11 years ago
Cc: | added |
---|
comment:16 by , 11 years ago
I sped up next_rev by reimplementing as a binary search instead of linear search. On a repo with 40000 revs this makes a huge difference:
Index: tracopt/versioncontrol/svn/svn_fs.py =================================================================== --- tracopt/versioncontrol/svn/svn_fs.py (revision 12592) +++ tracopt/versioncontrol/svn/svn_fs.py (working copy) @@ -595,19 +595,34 @@ given `path` or globally. """ rev = self.normalize_rev(rev) - next = rev + 1 - youngest = self.youngest_rev + oldest = rev + youngest = self.youngest_rev + 1 subpool = Pool(self.pool) - while next <= youngest: - subpool.clear() - for _, next in self._history(path, rev+1, next, subpool): - return next + path_utf8 = _to_svn(subpool(), self.scope, '/' + path) + while oldest < youngest: + subpool2 = Pool(subpool) + next = (oldest + youngest) // 2 + next_root = fs.revision_root(self.fs_ptr, next, subpool2()) + if fs.check_path(next_root, path_utf8, subpool2()) == core.svn_node_none: + hrev = 0 else: - if not find_initial_rev and \ - not self.has_node(path, next, subpool): - return next # a 'delete' event is also interesting... - next += 1 - return None + h = fs.node_history(next_root, path_utf8, subpool2()) + hpath, hrev = fs.history_location(h, subpool2()) + assert hpath == '/' + path, "hpath: %s path: %s" % (hpath, path) + assert hrev == next + h = fs.history_prev(h, 1, subpool2()) + hpath, hrev = fs.history_location(h, subpool2()) + assert hpath == '/' + path + assert hrev >= rev, "hrev: %d rev: %d" % (hrev, rev) + if rev < hrev: + youngest = next + else: + oldest = next + 1 + subpool2.clear() + if youngest == self.youngest_rev + 1: + return None + else: + return youngest def rev_older_than(self, rev1, rev2): """Check relative order between two revision specifications."""
comment:17 by , 11 years ago
Cc: | added |
---|
comment:18 by , 11 years ago
Thanks for patch. However, the patch would break scoped subversion repository and has unicode issues.
tracopt/versioncontrol/svn/svn_fs.py:611: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal assert hpath == '/' + path, "hpath: %s path: %s" % (hpath, path) ... ====================================================================== ERROR: test_rev_path_navigation (tracopt.versioncontrol.svn.tests.svn_fs.SubversionRepositoryNormalTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jun66j5/src/trac/edgewall/git/tracopt/versioncontrol/svn/tests/svn_fs.py", line 136, in test_rev_path_navigation self.assertEqual(6, self.repos.next_rev(5, u'tête')) File "/home/jun66j5/src/trac/edgewall/git/tracopt/versioncontrol/svn/svn_fs.py", line 611, in next_rev assert hpath == '/' + path, "hpath: %s path: %s" % (hpath, path) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 9: ordinal not in range(128) ====================================================================== ERROR: test_changeset_added_dirs (tracopt.versioncontrol.svn.tests.svn_fs.SvnCachedRepositoryScopedTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jun66j5/src/trac/edgewall/git/tracopt/versioncontrol/svn/tests/svn_fs.py", line 1043, in setUp self.repos.sync() File "/home/jun66j5/src/trac/edgewall/git/trac/versioncontrol/cache.py", line 260, in sync next_youngest = self.repos.next_rev(next_youngest) File "/home/jun66j5/src/trac/edgewall/git/tracopt/versioncontrol/svn/svn_fs.py", line 611, in next_rev assert hpath == '/' + path, "hpath: %s path: %s" % (hpath, path) AssertionError: hpath: /tête path: .... Ran 1526 tests in 126.021s FAILED (failures=1, errors=32) make: *** [unit-test] Error 1
comment:19 by , 11 years ago
Ah. I will look at those issues if I get some time. But I think they should be pretty easy to fix? (no design issue with the patch?)
Anyway I found that now fs.check_path becomes a bottleneck (much slower than revision_root even):
ncalls tottime percall cumtime percall filename:lineno(function) 470 0.238 0.001 0.247 0.001 {_fs.svn_fs_check_path} 200 0.059 0.000 0.073 0.000 {_fs.svn_fs_revision_root} 160 0.012 0.000 0.022 0.000 {_fs.svn_fs_history_prev} 160 0.010 0.000 0.021 0.000 {_fs.svn_fs_node_history} 20 0.010 0.000 0.011 0.001 {_fs.svn_fs_file_contents}
I ran a single call of fs.check_path under strace and it looks like libsvn_fs_fs reads the revision root, then follows to the leaf node, *and also* looks up the previous rev of the node. We don't need anything from the previous rev, the node type is already in the most recent rev.
Any idea how to use the SVN FS api to avoid this prev-rev lookup?
comment:20 by , 11 years ago
Ah! The only reason revision_root *seems* faster is that check_path fills the dag cache, and node_history etc. get lots of dag cache hits. Actually I believe the *first* call to check_path is slow, all other calls to check_path and other APIs are probably fast, because they work with a warm cache.
I took check_path out of next_rev, this is what the profile looked like:
ncalls tottime percall cumtime percall filename:lineno(function) 320 0.213 0.001 0.220 0.001 {_fs.svn_fs_check_path} 160 0.068 0.000 0.080 0.001 {_fs.svn_fs_node_history} 200 0.066 0.000 0.085 0.000 {_fs.svn_fs_revision_root}
30% fewer calls to check_path, but only 10% less time. So the check_path calls I removed must have been the "fast" ones. The slow one(s) that fill the cache must still be running, probably the one in SubversionNode.__init__
.
This shows how misleading these profiles can be. I guess it's because the C call structure inside libsvn_fs_fs is opaque to the Python profiler - otherwise I would see a lot of tottime spent in some leaf svn function like open_path and the rest would be light.
I don't think there's much optimization left here. In the context of Trac specifically, we eventually need to page in the dag nodes, for rendering the "previous revision" links, "last change" header, etc. It doesn't matter when it happens. Though the binary search will certainly continue to help, because it just avoids paging in a lot of nodes.
So far I've been benchmarking by rendering the same page 10 times. On the second and later renders, this will get very good hit rate in the dag cache. This is unlikely to happen under real traffic, so this is probably an unfair benchmark. At the other extreme, the most pessimistic benchmark would probably be to render random paths. That way the cache will be warm within the same request, but cold across requests. With small repos it probably doesn't matter, because you'll keep a lot in cache at various levels anyway, even across requests. I'm working with a 40000 revision repo.
comment:21 by , 11 years ago
Ok, give this patch a spin:
Index: tracopt/versioncontrol/svn/svn_fs.py =================================================================== --- tracopt/versioncontrol/svn/svn_fs.py (revision 12592) +++ tracopt/versioncontrol/svn/svn_fs.py (working copy) @@ -595,19 +595,36 @@ given `path` or globally. """ rev = self.normalize_rev(rev) - next = rev + 1 - youngest = self.youngest_rev + oldest = rev + youngest = self.youngest_rev + 1 subpool = Pool(self.pool) - while next <= youngest: - subpool.clear() - for _, next in self._history(path, rev+1, next, subpool): - return next + scope_path = posixpath.normpath(self.scope + '/' + path).lstrip('/') + path_utf8 = _to_svn(subpool(), scope_path) + while oldest < youngest: + subpool2 = Pool(subpool) + next = (oldest + youngest) // 2 + next_root = fs.revision_root(self.fs_ptr, next, subpool2()) + if fs.check_path(next_root, path_utf8, subpool2()) == core.svn_node_none: + hrev = 0 else: - if not find_initial_rev and \ - not self.has_node(path, next, subpool): - return next # a 'delete' event is also interesting... - next += 1 - return None + h = fs.node_history(next_root, path_utf8, subpool2()) + hpath, hrev = fs.history_location(h, subpool2()) + hpath = to_unicode(hpath) + assert hpath == '/' + scope_path, 'hpath: %s scope_path: /%s' % (hpath, scope_path) + assert hrev == next + h = fs.history_prev(h, 1, subpool2()) + hpath, hrev = fs.history_location(h, subpool2()) + hpath = to_unicode(hpath) + assert hpath == '/' + scope_path, 'hpath: %s scope_path: /%s' % (hpath, scope_path) + if rev < hrev: + youngest = next + else: + oldest = next + 1 + subpool2.clear() + if youngest == self.youngest_rev + 1: + return None + else: + return youngest def rev_older_than(self, rev1, rev2): """Check relative order between two revision specifications."""
comment:22 by , 11 years ago
Could you please provide benchmark results? I don't look significant differences between the patch and Trac 1.0.1.
Timing of SubversionRepository.next_rev
with trunk/setup_wininst.bmp@364
log:jomae.git@ticket8813_another_bsearch | 0.227722s | |
log:jomae.git@ticket8813_cronos586 | 0.225537s | |
Trac 1.0.1 | 0.228442s | |
log:jomae.git@ticket8813_add_index | 0.012705s | see comment:12 |
comment:23 by , 11 years ago
I reworked jomae.git@ticket8813_add_index branch which adds node_change (repos,path)
index. See comment:12 for the details.
follow-up: 25 comment:24 by , 11 years ago
It makes a huge difference for me.. This is with trunk/setup_wininst.bmp@364, from a repo svnsync'ed from http://svn.edgewall.org/repos/trac with 12623 revs, and packed (svnadmin pack), and repository_type = direct-svnfs
. Wall time of dispatch_request('/browser/trunk/setup_wininst.bmp?rev=364')
, averaged over 10 runs:
Trac log:trunk@12623 | 7.531s |
Patch from comment:21 | 0.068s |
I think we're not measuring the same things/same way.
follow-up: 26 comment:25 by , 11 years ago
I think we're not measuring the same things/same way.
Ah. I'm using misconfiguration. Sorry for noise. The patch leads fast response on direct-svnfs, great!
Another thing. source:trunk/templates/menu.template@5 shows r6 as Next Revision in navigation because the file has been deleted in r6. However, after the patch, Next Revision would be disabled.
comment:26 by , 11 years ago
Replying to jomae:
Another thing. source:trunk/templates/menu.template@5 shows r6 as Next Revision in navigation because the file has been deleted in r6. However, after the patch, Next Revision would be disabled.
If we have some information about the fact there's a deletion, that would be better than just having a disabled Next Revision link, as the latter could also mean we're at the latest revision. Of course, having an error following such a link is not nice, a rendering of the changeset box and the file content replaced by (the file was deleted in that revision) would be better!
comment:27 by , 11 years ago
I have been trying to fix the deleted revision bug, but it's difficult because of many edge cases. Fix one case, break another. There is indeed also a problem with scoped repos. This is shown in the svn_tests test case when generating the SVN repository cache - some changesets "disappear" and SvnCachedRepo.get_changeset fails. This is going to take a while longer…
comment:28 by , 10 years ago
Keywords: | svn added |
---|---|
Milestone: | next-major-releases → 1.1.3 |
Owner: | set to |
Status: | new → assigned |
I'll apply jomae.git@ticket8813_add_index to reduce time of next_rev
/prev_rev
on cached subversion repository. Changing milestone to 1.1.3 since upgrading db schema by adding a new index.
comment:29 by , 10 years ago
Milestone: | 1.1.3 → 1.1.4 |
---|
comment:30 by , 10 years ago
Milestone: | 1.1.4 → next-dev-1.1.x |
---|
Timing of _next_prev_rev for trunk/tracopt/mimeview/tests/php.py@8802:
direction arg | Trac 1.0 | 1.1.4dev r13829 | |
PostgreSQL 9.1.15 | < | 0.072520s | 0.024633s |
PostgreSQL 9.1.15 | > | 0.014539s | 0.005405s |
MySQL 5.5.41 | < | 0.045533s | 0.001696s |
MySQL 5.5.41 | > | 0.016426s | 0.001558s |
SQLite 3.7.9 | < | 0.027923s | 0.026966s |
SQLite 3.7.9 | > | 0.006086s | 0.005372s |
We've improved in PostgreSQL and MySQL, however the same performance in SQLite. We should tune the query in the method.
comment:31 by , 10 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:32 by , 9 years ago
Milestone: | next-dev-1.1.x → next-dev-1.3.x |
---|
Narrowing focus for milestone:1.2. Please move ticket to milestone:1.2 if you intend to fix it.
comment:34 by , 5 years ago
Milestone: | next-dev-1.5.x → next-major-releases |
---|
Got a surprisingly slow going request today, in our domestic Trac, so I investigated a bit.
This was a /browser request, accessing a small file from an old tag at r2259 in our main repository (> 33k revisions).
This takes between 1 to 3 minutes, whereas the corresponding
svn cat
is instantaneous and another Trac instance takes 1.8 seconds (both Trac instances running at r9150).The main differentiator was the use of direct-svnfs vs. svn, as I checked that the other Trac instance would show the same slowness when switching the repository type.
With gdb, I see that most of the time is spent in Subversion libraries, apparently going through a lot of revisions … pystack shows:
(and if someone has a version of pystack that terminates or can be interrupted, I'll take it!)
So
next_rev
is slow (we knew it), but can be extremely slow in the direct-svnfs case. Maybe we should consider bailing out and returningNone
after a while there, or even have a smarter implementation which uses a dichotomic search for example.