#7746 closed defect (fixed)
Browse Source is too slow on a big repo
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | highest | Milestone: | not applicable |
Component: | plugin/mercurial | Version: | 0.11.1 |
Severity: | critical | Keywords: | |
Cc: | maxim.vuets@…, bkocherov@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
I have a big repo — ~11.5K of files and ~200 MB .hg dir. And Browse Source takes 140 seconds to show a root of repo. It's ugly! Looks like design defect.
I've found the bottleneck—it is MercurialNode._init_path()
. There is a comment:
As directories are not first-class citizens in Mercurial, retrieving revision information for directory is much slower than for files.
As I understand Christian follows a repo tree in depth to obtain all files' revisions and such way gets dir's revision. (I may wrong.) And I cannot understand why such strange and slow method is used. Mercurial can give dir's revision itself. It is much quickly and simply, look:
~$ hg init hg-test ~$ cd hg-test/ hg-test$ mkdir somedir hg-test$ touch file somedir/some somedir/thing hg-test$ hg ci -Am 'init' adding file adding somedir/some adding somedir/thing hg-test$ echo abc >>file hg-test$ echo def >>somedir/some hg-test$ hg ci -m '2nd' hg-test$ echo ghi >>file hg-test$ hg ci -m '3rd' hg-test$ hg id -n 2 hg-test$ hg log -l1 --style=compact somedir/some 1 e56a4a4c991e 2008-10-22 17:47 +0300 maxim 2nd hg-test$ hg log -l1 --style=compact somedir/thing 0 d10179c1216f 2008-10-22 17:46 +0300 maxim init hg-test$ hg log -l1 --style=compact somedir 1 e56a4a4c991e 2008-10-22 17:47 +0300 maxim 2nd
Attachments (7)
Change History (40)
comment:1 by , 16 years ago
Milestone: | → not applicable |
---|---|
Priority: | normal → highest |
Severity: | normal → critical |
follow-up: 3 comment:2 by , 16 years ago
Cc: | added |
---|
Ou, so quick reply. Thanks Christian! Hope you'll fix it ASAP. Thanks in advance.
Just a proof to show how bad the things are (: I have 5 dirs and two files in the root. Measurement was done also under Windows:
>perl -e "for (<*>) { `hg --time log -l1` if -d }" Time: real 0.564 secs (user 0.016+0.000 sys 0.000+0.000) Time: real 0.052 secs (user 0.000+0.000 sys 0.047+0.000) Time: real 0.052 secs (user 0.016+0.000 sys 0.031+0.000) Time: real 0.052 secs (user 0.031+0.000 sys 0.031+0.000) Time: real 0.052 secs (user 0.000+0.000 sys 0.047+0.000) >sh -c "time perl lwp-download http://localhost/trac/browser >/dev/null" real 2m23.984s user 0m0.000s sys 0m0.015s
comment:3 by , 16 years ago
Replying to maxim.vuets@…:
perl -e "for (<*>) { `hg --time log -l1` if -d }"
Oops, I've cheated accidentally ): Missed $_
. Here is the true result:
>perl -e "for (<*>) { `hg --time log -l1 $_` if -d }" Time: real 0.663 secs (user 0.063+0.000 sys 0.141+0.000) Time: real 0.111 secs (user 0.047+0.000 sys 0.063+0.000) Time: real 0.144 secs (user 0.047+0.000 sys 0.063+0.000) Time: real 0.110 secs (user 0.016+0.000 sys 0.094+0.000) Time: real 0.122 secs (user 0.047+0.000 sys 0.078+0.000)
Doesn't make a big difference, but I wanna be honest (:
follow-ups: 6 7 comment:4 by , 16 years ago
Cc: | added |
---|
MercurialNode._init_path() is incorrect
If latest changes in dir "remove" some content _init_path() compute not correct node.
by , 16 years ago
Attachment: | correct_init_path.patch added |
---|
I try resolve this problem please review
comment:5 by , 16 years ago
My repo http://raskon.org/browse — 35.5K files, and ~300Mb .hg dir
With applied correct_init_path.patch.
>time wget http://raskon.org/browser -O /dev/null -q real 0m3.340s user 0m0.004s sys 0m0.008s
;)
comment:6 by , 16 years ago
Replying to Boris Kocherov <bkocherov@…>:
I try resolve this problem please review
It does not work for me ):
As I understand you made that patch for sandbox/mercurial-plugin-0.12, but I'm using stable branch of Trac thus I'm using sandbox/mercurial-plugin-0.11. I think so, because the patch was applied on TracMercurial 0.11 with a creak:
Hunk #1 succeeded at 560 (offset -27 lines). Hunk #3 succeeded at 585 (offset -27 lines). Hunk #5 succeeded at 629 (offset -27 lines).
But w/o any warnings on TracMercurial 0.12.
I have tried vanilla TracMercurial 0.12 with Trac 0.11 —- seems it work. Patched TracMercurial neither 0.11, nor 0.11 does not work. Here is the traceback:
Traceback (most recent call last): File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\web\main.py", line 423, in _dispatch_request dispatcher.dispatch(req) File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\web\main.py", line 197, in dispatch resp = chosen_handler.process_request(req) File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\versioncontrol\web_ui\browser.py", line 361, in process_request 'dir': node.isdir and self._render_dir(req, repos, node, rev), File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\versioncontrol\web_ui\browser.py", line 407, in _render_dir entries = [entry(n) for n in node.get_entries()] File "c:\usr\python\lib\site-packages\tracmercurial-0.11.0.5-py2.5.egg\tracext\hg\backend.py", line 679, in get_entries yield self.subnode(entry, precomputed_node = entries.get(entry, None)) File "c:\usr\python\lib\site-packages\tracmercurial-0.11.0.5-py2.5.egg\tracext\hg\backend.py", line 634, in subnode return MercurialNode(self.repos, p, self.n, self.manifest, self.mflags, **kwargs) File "c:\usr\python\lib\site-packages\tracmercurial-0.11.0.5-py2.5.egg\tracext\hg\backend.py", line 586, in __init__ self._init_path(log, path) File "c:\usr\python\lib\site-packages\tracmercurial-0.11.0.5-py2.5.egg\tracext\hg\backend.py", line 625, in _init_path self.time = self.repos.hg_time(log.read(node)[2]) File "c:\usr\python\lib\site-packages\mercurial\changelog.py", line 151, in read text = self.revision(node) File "c:\usr\python\lib\site-packages\mercurial\revlog.py", line 928, in revision rev = self.rev(node) File "c:\usr\python\lib\site-packages\mercurial\revlog.py", line 522, in rev raise LookupError(node, self.indexfile, _('no node')) LookupError: 00changelog.i@None: no node
And my configuration if any:
Trac | 0.11.1 |
Python | 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] |
Mercurial | 1.0.2 |
comment:7 by , 16 years ago
Replying to Boris Kocherov <bkocherov@…>:
MercurialNode._init_path() is incorrect
If latest changes in dir "remove" some content _init_path() compute not correct node.
Ok, so it looks like you've identified another issue than the general slowness problem. Looking at your timing in comment:5, the patch doesn't seem to address speed but rather correctness after some removal happened.
It seems better that you create a new ticket, describing the problem you're trying to address, what the current version of the plugin (mention if you're using -0.11 or -0.12) is showing and what you would expect. Even better, attach a sample .bundle of a repository showing that issue. Thanks!
follow-ups: 9 12 13 comment:8 by , 16 years ago
I public new version patch.
Without patch:
bk@bk-laptop:~$ time wget http://raskon.org/browser -O /dev/null -q real 2m15.650s user 0m0.012s sys 0m0.000s
With patch:
bk@bk-laptop:~$ time wget http://raskon.org/browser -O /dev/null -q real 0m2.844s user 0m0.004s sys 0m0.000s
comment:9 by , 16 years ago
Replying to Boris Kocherov <bkocherov@…>:
I public new version patch.
Still does not work. But another traceback:
Traceback (most recent call last): File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\web\main.py", line 423, in _dispatch_request dispatcher.dispatch(req) File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\web\main.py", line 197, in dispatch resp = chosen_handler.process_request(req) File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\versioncontrol\web_ui\browser.py", line 361, in process_request 'dir': node.isdir and self._render_dir(req, repos, node, rev), File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\versioncontrol\web_ui\browser.py", line 407, in _render_dir entries = [entry(n) for n in node.get_entries()] File "c:\usr\python\lib\site-packages\tracmercurial-0.11.0.5-py2.5.egg\tracext\hg\backend.py", line 684, in get_entries yield self.subnode(entry, precomputed_node = entries.get(entry+"/", None)) AttributeError: 'NoneType' object has no attribute 'get'
follow-up: 14 comment:10 by , 16 years ago
Send me ( Boris Kocherov <bkocherov@…> ) your's backend.py
comment:12 by , 16 years ago
Replying to Boris Kocherov <bkocherov@…>:
I public new version patch.
Without patch:real 2m15.650s
With patch:real 0m2.844s
Hm, thanks for getting me straight on this, I parsed "3 minutes" in comment:5, probably because I couldn't imagine it might be possible to talk about 3 seconds for a 35.5k files repository ;-)
This is good news indeed, I'll look at the patch more closely as soon as I can.
comment:13 by , 16 years ago
Previous comment by anonymous was mine.
Replying to Boris Kocherov <bkocherov@…>:
Without patch:
…
With patch:
…
I just want to ensure that your second try with patched version was performed on "empty disk cache". I mean that subsequent instant query will be done much quickly just because OS fetches info not from disk, but from memory (cache). Look:
>sh -c "time perl lwp-download http://localhost/trac/browser >/dev/null" real 2m5.313s user 0m0.000s sys 0m0.015s >sh -c "time perl lwp-download http://localhost/trac/browser >/dev/null" real 0m4.781s user 0m0.000s sys 0m0.015s
To clear that cache I do hg status
on another repos, launch any heavy apps etc.
comment:14 by , 16 years ago
Replying to Boris Kocherov <bkocherov@…>:
Send me ( Boris Kocherov <bkocherov@…> ) your's backend.py
I have exactly the same version as here: sandbox/mercurial-plugin-0.11/tracext/hg/backend.py@7592.
follow-up: 16 comment:15 by , 16 years ago
After full reboot server:
first time (patched version):
bk@bk-laptop:~$ time wget http://raskon.org/browser -O /dev/null -q real 0m14.680s user 0m0.000s sys 0m0.004s
later (patched version) from memory cache:
bk@bk-laptop:~$ time wget http://raskon.org/browser -O /dev/null -q real 0m2.931s user 0m0.000s sys 0m0.008s bk@bk-laptop:~$ time wget http://raskon.org/browser -O /dev/null -q real 0m2.956s user 0m0.012s sys 0m0.000s
2Maxim
About you trace:
I expect what you send me your's patched version file backend.py.
Are you agree what restart or reinstall patched version backend.py?
Which patch you used? please use correct_init_path-0.11.patch.
comment:16 by , 16 years ago
Replying to Boris Kocherov <bkocherov@…>:
After full reboot server:
first time (patched version):
bk@bk-laptop:~$ time wget http://raskon.org/browser -O /dev/null -q real 0m14.680s user 0m0.000s sys 0m0.004s
Cool!
2Maxim
About you trace:
I expect what you send me your's patched version file backend.py.
Are you agree what restart or reinstall patched version backend.py?
Which patch you used? please use correct_init_path-0.11.patch.
I found out that TracMercurial 0.11 + correct_init_path-0.11.patch does not work with my big repo, but it works with new created empty repo.
We (me and Boris) have been contacted with each other via Jabber to investigate the issue. Just to avoid flame here (: Further results will be posted here by one of us later.
by , 16 years ago
Attachment: | correct_init_path.0.11-2.patch added |
---|
Boris Kocherov's patch for TracMercurial 0.11 (and 0.12)
comment:17 by , 16 years ago
Good news! We have found a reason why Boris' patch worked well on his repo, but did not work at all on mine.
Here are my new timings:
>sh -c "time perl lwp-download http://localhost/trac/browser /dev/null >/dev/null" real 0m3.625s user 0m0.000s sys 0m0.015s >sh -c "time perl lwp-download http://localhost/trac/browser /dev/null >/dev/null" real 0m0.797s user 0m0.000s sys 0m0.015s
So, Boris Kocherov have fixed the issue. Please see attached patch. The patch is also applied on TracMercurial 0.12 (but just with line number shifts which is not a problem). Previous patches may not work! Ignore them.
Boris also said that he know how to increase performance a little more. He is working on it now.
Thanks a lot Boris!
comment:18 by , 16 years ago
A quick comment: in findnode
, you're deleting the current element dirname
from the list you're iterating on (dirnames
), which is wrong (see the Warning in http://www.python.org/doc/2.5.2/ref/for.html).
by , 16 years ago
Attachment: | optimize_init_path-cboos.patch added |
---|
My cleaned-up version of Boris' patch
comment:19 by , 16 years ago
I've started to work on this. First, a modified version of attachment:correct_init_path.0.11-2.patch, which fixes the problem mentioned in comment:18 and also removes the MercurialNode.mflags
which are not needed anymore. See attachment:optimize_init_path-cboos.patch.
Also, I've started an alternative approach, which so far is slower because it doesn't use the changectx(rev).files()
scanning strategy from findnode()
. The problem with that findnode
approach is that it's only a heuristic, and when there is a very old subfolder, this will force findnode
to go through nearly all the changesets… This could eventually be more expensive than going through all the nodes for retrieving the log information. In practice this heuristic works quite well though.
To be continued :-)
Boris, don't hesitate to post your other improvements.
comment:20 by , 16 years ago
I think you miss findnode() in 706 line attachment:optimize_init_path-cboos.patch . It completely disable "big optimization". I may wrong.
I think for perventing copy/paste needs merge findnode() with _init_node4dir().
by , 16 years ago
Attachment: | optimize_init_path-boris.patch added |
---|
My changed-up version of cboos' patch
comment:21 by , 16 years ago
small changes attachment:optimize_init_path-cboos.patch in attachment:optimize_init_path-boris.patch
- added MercurialNode.findnode() - preventing copy/paste
- added MercurialNode.node - another small optimization
- added raise TracError("Your's mercurial repository contain error.\nManifest contain file which not found in changelog.")
Thanks a lot cboos and Maxim!
I agree. To be continued :-)
comment:22 by , 16 years ago
Component: | plugin/mercurial → web frontend/mod_python |
---|---|
Owner: | changed from | to
Priority: | highest → normal |
Severity: | critical → major |
comment:23 by , 16 years ago
Component: | web frontend/mod_python → plugin/mercurial |
---|---|
Owner: | changed from | to
Priority: | normal → highest |
Severity: | major → critical |
comment:24 by , 16 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:26 by , 16 years ago
Christian, these patches work excelent for me. Do you plan to close the ticket and apply the patches to the trunk?
comment:27 by , 16 years ago
Yes, as so far I failed to improve upon those patches ;-) I'll refresh the latest patch from Boris once I've fixed the 1.1 compatibility issues (#7856).
follow-up: 29 comment:28 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Finally committed a change quite close to attachment:optimize_init_path-boris.patch in r7744 (for 0.11) and r7745 (for trunk and MultipleRepositorySupport).
I had a few additional ideas, but so far they don't beat the above, so better share this version which is already quite good: Linux-sized repositories become now browsable.
A big thank you goes to Boris Kocherov!
comment:29 by , 16 years ago
Replying to cboos:
Finally committed a change quite close to attachment:optimize_init_path-boris.patch in r7744 (for 0.11) and r7745 (for trunk and MultipleRepositorySupport).
Works perfectly for 0.11.
A big thank you goes to Boris Kocherov!
For sure. Also thanks a lot to you Christian for so quick reaction and fix!
comment:30 by , 16 years ago
There's another slight improvement I've been playing with: only scan the changelog with findnode
once for a given folder, i.e. retrieve the node information for the current dir and its subdirectories at once. Only do a findnode
in get_entries
if not already done by the previous step. Somehow this only manages to gain another 5% speed increase or so, but maybe I did something wrong. Boris, if you're interested you could try to take this further?
by , 16 years ago
Attachment: | call-findnode-once-r7745.diff added |
---|
Try to call findnode
only once for one directory listing
comment:31 by , 15 years ago
Please review my another very small patch. It for MultipleRepositorySupport branch instead TracMercurial plugin. I do not measure speed increase.
by , 15 years ago
Attachment: | speedup_expand_subdirectory_inplace.patch added |
---|
Speedup "Expand sub-directory inplace" by searching in ⇐ changesets reveison number
comment:32 by , 15 years ago
Ah yes, good idea. But we could directly set rev
, no?
What is given in data:
is probably just appended to the what's specified by the URL, but even if not, we could just modify url: a.attr("href")
by adding or replacing rev
as appropriate. That way, no modification to the python module would be needed.
No you're right.
Simple: that code pretty much dates of day one of writing that plugin and hasn't been improved since. There might well be a faster way now (also I developed this on Windows where they were issues with looking at a log for a directory; seems to be fixed by now).
I'll try to give a second look at this, thanks for the heads-up.