Edgewall Software
Modify

Opened 16 years ago

Closed 16 years ago

Last modified 15 years ago

#7746 closed defect (fixed)

Browse Source is too slow on a big repo

Reported by: maxim.vuets@… Owned by: Christian Boos
Priority: highest Milestone: not applicable
Component: plugin/mercurial Version: 0.11.1
Severity: critical Keywords:
Cc: maxim.vuets@…, bkocherov@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description

I have a big repo — ~11.5K of files and ~200 MB .hg dir. And Browse Source takes 140 seconds to show a root of repo. It's ugly! Looks like design defect.

I've found the bottleneck—it is MercurialNode._init_path(). There is a comment:

As directories are not first-class citizens in Mercurial, retrieving revision information for directory is much slower than for files.

As I understand Christian follows a repo tree in depth to obtain all files' revisions and such way gets dir's revision. (I may wrong.) And I cannot understand why such strange and slow method is used. Mercurial can give dir's revision itself. It is much quickly and simply, look:

~$ hg init hg-test
~$ cd hg-test/
hg-test$ mkdir somedir
hg-test$ touch file somedir/some somedir/thing
hg-test$ hg ci -Am 'init'
adding file
adding somedir/some
adding somedir/thing
hg-test$ echo abc >>file
hg-test$ echo def >>somedir/some
hg-test$ hg ci -m '2nd'
hg-test$ echo ghi >>file
hg-test$ hg ci -m '3rd'
hg-test$ hg id -n
2
hg-test$ hg log -l1 --style=compact somedir/some
1   e56a4a4c991e   2008-10-22 17:47 +0300   maxim
  2nd

hg-test$ hg log -l1 --style=compact somedir/thing
0   d10179c1216f   2008-10-22 17:46 +0300   maxim
  init

hg-test$ hg log -l1 --style=compact somedir
1   e56a4a4c991e   2008-10-22 17:47 +0300   maxim
  2nd

Attachments (7)

correct_init_path.patch (4.2 KB ) - added by Boris Kocherov <bkocherov@…> 16 years ago.
I try resolve this problem please review
correct_init_path-0.11.patch (4.4 KB ) - added by Boris Kocherov <bkocherov@…> 16 years ago.
for mercurial-plugin-0.11
correct_init_path.0.11-2.patch (4.4 KB ) - added by Maxim Vuets <maxim.vuets@…> 16 years ago.
Boris Kocherov's patch for TracMercurial 0.11 (and 0.12)
optimize_init_path-cboos.patch (4.6 KB ) - added by Christian Boos 16 years ago.
My cleaned-up version of Boris' patch
optimize_init_path-boris.patch (4.8 KB ) - added by Boris Kocherov <bkocherov@…> 16 years ago.
My changed-up version of cboos' patch
call-findnode-once-r7745.diff (3.3 KB ) - added by Christian Boos 16 years ago.
Try to call findnode only once for one directory listing
speedup_expand_subdirectory_inplace.patch (1.4 KB ) - added by Boris Kocherov <bkocherov@…> 15 years ago.
Speedup "Expand sub-directory inplace" by searching in ⇐ changesets reveison number

Download all attachments as: .zip

Change History (40)

in reply to:  description comment:1 by Christian Boos, 16 years ago

Milestone: not applicable
Priority: normalhighest
Severity: normalcritical

As I understand Christian follows a repo tree in depth to obtain all files' revisions and such way gets dir's revision. (I may wrong.)

No you're right.

And I cannot understand why such strange and slow method is used.

Simple: that code pretty much dates of day one of writing that plugin and hasn't been improved since. There might well be a faster way now (also I developed this on Windows where they were issues with looking at a log for a directory; seems to be fixed by now).

I'll try to give a second look at this, thanks for the heads-up.

comment:2 by maxim.vuets@…, 16 years ago

Cc: maxim.vuets@… added

Ou, so quick reply. Thanks Christian! Hope you'll fix it ASAP. Thanks in advance.

Just a proof to show how bad the things are (: I have 5 dirs and two files in the root. Measurement was done also under Windows:

>perl -e "for (<*>) { `hg --time log -l1` if -d }"
Time: real 0.564 secs (user 0.016+0.000 sys 0.000+0.000)
Time: real 0.052 secs (user 0.000+0.000 sys 0.047+0.000)
Time: real 0.052 secs (user 0.016+0.000 sys 0.031+0.000)
Time: real 0.052 secs (user 0.031+0.000 sys 0.031+0.000)
Time: real 0.052 secs (user 0.000+0.000 sys 0.047+0.000)

>sh -c "time perl lwp-download http://localhost/trac/browser >/dev/null"

real    2m23.984s
user    0m0.000s
sys     0m0.015s

in reply to:  2 comment:3 by maxim.vuets@…, 16 years ago

Replying to maxim.vuets@…:

perl -e "for (<*>) { `hg --time log -l1` if -d }"

Oops, I've cheated accidentally ): Missed $_. Here is the true result:

>perl -e "for (<*>) { `hg --time log -l1 $_` if -d }"
Time: real 0.663 secs (user 0.063+0.000 sys 0.141+0.000)
Time: real 0.111 secs (user 0.047+0.000 sys 0.063+0.000)
Time: real 0.144 secs (user 0.047+0.000 sys 0.063+0.000)
Time: real 0.110 secs (user 0.016+0.000 sys 0.094+0.000)
Time: real 0.122 secs (user 0.047+0.000 sys 0.078+0.000)

Doesn't make a big difference, but I wanna be honest (:

comment:4 by Boris Kocherov <bkocherov@…>, 16 years ago

Cc: bkocherov@… added

MercurialNode._init_path() is incorrect
If latest changes in dir "remove" some content _init_path() compute not correct node.

by Boris Kocherov <bkocherov@…>, 16 years ago

Attachment: correct_init_path.patch added

I try resolve this problem please review

comment:5 by Boris Kocherov <bkocherov@…>, 16 years ago

My repo http://raskon.org/browse — 35.5K files, and ~300Mb .hg dir
With applied correct_init_path.patch.

>time wget http://raskon.org/browser -O /dev/null -q

real	0m3.340s
user	0m0.004s
sys	0m0.008s

;)

in reply to:  4 comment:6 by maxim.vuets@…, 16 years ago

Replying to Boris Kocherov <bkocherov@…>:

I try resolve this problem please review

It does not work for me ):

As I understand you made that patch for sandbox/mercurial-plugin-0.12, but I'm using stable branch of Trac thus I'm using sandbox/mercurial-plugin-0.11. I think so, because the patch was applied on TracMercurial 0.11 with a creak:

Hunk #1 succeeded at 560 (offset -27 lines).
Hunk #3 succeeded at 585 (offset -27 lines).
Hunk #5 succeeded at 629 (offset -27 lines).

But w/o any warnings on TracMercurial 0.12.

I have tried vanilla TracMercurial 0.12 with Trac 0.11 —- seems it work. Patched TracMercurial neither 0.11, nor 0.11 does not work. Here is the traceback:

Traceback (most recent call last):
  File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\web\main.py", line 423, in _dispatch_request
    dispatcher.dispatch(req)
  File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\web\main.py", line 197, in dispatch
    resp = chosen_handler.process_request(req)
  File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\versioncontrol\web_ui\browser.py", line 361, in process_request
    'dir': node.isdir and self._render_dir(req, repos, node, rev),
  File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\versioncontrol\web_ui\browser.py", line 407, in _render_dir
    entries = [entry(n) for n in node.get_entries()]
  File "c:\usr\python\lib\site-packages\tracmercurial-0.11.0.5-py2.5.egg\tracext\hg\backend.py", line 679, in get_entries
    yield self.subnode(entry, precomputed_node = entries.get(entry, None))
  File "c:\usr\python\lib\site-packages\tracmercurial-0.11.0.5-py2.5.egg\tracext\hg\backend.py", line 634, in subnode
    return MercurialNode(self.repos, p, self.n, self.manifest, self.mflags, **kwargs)
  File "c:\usr\python\lib\site-packages\tracmercurial-0.11.0.5-py2.5.egg\tracext\hg\backend.py", line 586, in __init__
    self._init_path(log, path)
  File "c:\usr\python\lib\site-packages\tracmercurial-0.11.0.5-py2.5.egg\tracext\hg\backend.py", line 625, in _init_path
    self.time = self.repos.hg_time(log.read(node)[2])
  File "c:\usr\python\lib\site-packages\mercurial\changelog.py", line 151, in read
    text = self.revision(node)
  File "c:\usr\python\lib\site-packages\mercurial\revlog.py", line 928, in revision
    rev = self.rev(node)
  File "c:\usr\python\lib\site-packages\mercurial\revlog.py", line 522, in rev
    raise LookupError(node, self.indexfile, _('no node'))
LookupError: 00changelog.i@None: no node

And my configuration if any:

Trac 0.11.1
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)]
Mercurial 1.0.2

in reply to:  4 comment:7 by Christian Boos, 16 years ago

Replying to Boris Kocherov <bkocherov@…>:

MercurialNode._init_path() is incorrect
If latest changes in dir "remove" some content _init_path() compute not correct node.

Ok, so it looks like you've identified another issue than the general slowness problem. Looking at your timing in comment:5, the patch doesn't seem to address speed but rather correctness after some removal happened.

It seems better that you create a new ticket, describing the problem you're trying to address, what the current version of the plugin (mention if you're using -0.11 or -0.12) is showing and what you would expect. Even better, attach a sample .bundle of a repository showing that issue. Thanks!

by Boris Kocherov <bkocherov@…>, 16 years ago

for mercurial-plugin-0.11

comment:8 by Boris Kocherov <bkocherov@…>, 16 years ago

I public new version patch.
Without patch:

bk@bk-laptop:~$ time wget http://raskon.org/browser -O /dev/null -q

real	2m15.650s
user	0m0.012s
sys	0m0.000s

With patch:

bk@bk-laptop:~$ time wget http://raskon.org/browser -O /dev/null -q

real	0m2.844s
user	0m0.004s
sys	0m0.000s

in reply to:  8 comment:9 by anonymous, 16 years ago

Replying to Boris Kocherov <bkocherov@…>:

I public new version patch.

Still does not work. But another traceback:

Traceback (most recent call last):
  File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\web\main.py", line 423, in _dispatch_request
    dispatcher.dispatch(req)
  File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\web\main.py", line 197, in dispatch
    resp = chosen_handler.process_request(req)
  File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\versioncontrol\web_ui\browser.py", line 361, in process_request
    'dir': node.isdir and self._render_dir(req, repos, node, rev),
  File "C:/usr/python/lib/site-packages/trac-0.11.1-py2.5.egg\trac\versioncontrol\web_ui\browser.py", line 407, in _render_dir
    entries = [entry(n) for n in node.get_entries()]
  File "c:\usr\python\lib\site-packages\tracmercurial-0.11.0.5-py2.5.egg\tracext\hg\backend.py", line 684, in get_entries
    yield self.subnode(entry, precomputed_node = entries.get(entry+"/", None))
AttributeError: 'NoneType' object has no attribute 'get'

comment:10 by Boris Kocherov <bkocherov@…>, 16 years ago

Send me ( Boris Kocherov <bkocherov@…> ) your's backend.py

comment:11 by Boris Kocherov <bkocherov@…>, 16 years ago

Send me ( Boris Kocherov <bkocherov AT gmail.com> ) your's backend.py

in reply to:  8 comment:12 by Christian Boos, 16 years ago

Replying to Boris Kocherov <bkocherov@…>:

I public new version patch.
Without patch: real 2m15.650s With patch: real 0m2.844s

Hm, thanks for getting me straight on this, I parsed "3 minutes" in comment:5, probably because I couldn't imagine it might be possible to talk about 3 seconds for a 35.5k files repository ;-)

This is good news indeed, I'll look at the patch more closely as soon as I can.

in reply to:  8 comment:13 by maxim.vuets@…, 16 years ago

Previous comment by anonymous was mine.

Replying to Boris Kocherov <bkocherov@…>:

Without patch:

With patch:

I just want to ensure that your second try with patched version was performed on "empty disk cache". I mean that subsequent instant query will be done much quickly just because OS fetches info not from disk, but from memory (cache). Look:

>sh -c "time perl lwp-download http://localhost/trac/browser >/dev/null"

real    2m5.313s
user    0m0.000s
sys     0m0.015s

>sh -c "time perl lwp-download http://localhost/trac/browser >/dev/null"

real    0m4.781s
user    0m0.000s
sys     0m0.015s

To clear that cache I do hg status on another repos, launch any heavy apps etc.

in reply to:  10 comment:14 by maxim.vuets@…, 16 years ago

Replying to Boris Kocherov <bkocherov@…>:

Send me ( Boris Kocherov <bkocherov@…> ) your's backend.py

I have exactly the same version as here: sandbox/mercurial-plugin-0.11/tracext/hg/backend.py@7592.

comment:15 by Boris Kocherov <bkocherov@…>, 16 years ago

After full reboot server:

first time (patched version):

bk@bk-laptop:~$ time wget http://raskon.org/browser -O /dev/null -q

real	0m14.680s
user	0m0.000s
sys	0m0.004s

later (patched version) from memory cache:

bk@bk-laptop:~$ time wget http://raskon.org/browser -O /dev/null -q

real	0m2.931s
user	0m0.000s
sys	0m0.008s
bk@bk-laptop:~$ time wget http://raskon.org/browser -O /dev/null -q

real	0m2.956s
user	0m0.012s
sys	0m0.000s

2Maxim
About you trace:
I expect what you send me your's patched version file backend.py.
Are you agree what restart or reinstall patched version backend.py?
Which patch you used? please use correct_init_path-0.11.patch.

in reply to:  15 comment:16 by maxim.vuets@…, 16 years ago

Replying to Boris Kocherov <bkocherov@…>:

After full reboot server:

first time (patched version):

bk@bk-laptop:~$ time wget http://raskon.org/browser -O /dev/null -q

real	0m14.680s
user	0m0.000s
sys	0m0.004s

Cool!

2Maxim
About you trace:
I expect what you send me your's patched version file backend.py.
Are you agree what restart or reinstall patched version backend.py?
Which patch you used? please use correct_init_path-0.11.patch.

I found out that TracMercurial 0.11 + correct_init_path-0.11.patch does not work with my big repo, but it works with new created empty repo.

We (me and Boris) have been contacted with each other via Jabber to investigate the issue. Just to avoid flame here (: Further results will be posted here by one of us later.

by Maxim Vuets <maxim.vuets@…>, 16 years ago

Boris Kocherov's patch for TracMercurial 0.11 (and 0.12)

comment:17 by Maxim Vuets <maxim.vuets@…>, 16 years ago

Good news! We have found a reason why Boris' patch worked well on his repo, but did not work at all on mine.

Here are my new timings:

>sh -c "time perl lwp-download http://localhost/trac/browser /dev/null >/dev/null"

real    0m3.625s
user    0m0.000s
sys     0m0.015s

>sh -c "time perl lwp-download http://localhost/trac/browser /dev/null >/dev/null"

real    0m0.797s
user    0m0.000s
sys     0m0.015s

So, Boris Kocherov have fixed the issue. Please see attached patch. The patch is also applied on TracMercurial 0.12 (but just with line number shifts which is not a problem). Previous patches may not work! Ignore them.

Boris also said that he know how to increase performance a little more. He is working on it now.

Thanks a lot Boris!

comment:18 by Christian Boos, 16 years ago

A quick comment: in findnode, you're deleting the current element dirname from the list you're iterating on (dirnames), which is wrong (see the Warning in http://www.python.org/doc/2.5.2/ref/for.html).

by Christian Boos, 16 years ago

My cleaned-up version of Boris' patch

comment:19 by Christian Boos, 16 years ago

I've started to work on this. First, a modified version of attachment:correct_init_path.0.11-2.patch, which fixes the problem mentioned in comment:18 and also removes the MercurialNode.mflags which are not needed anymore. See attachment:optimize_init_path-cboos.patch.

Also, I've started an alternative approach, which so far is slower because it doesn't use the changectx(rev).files() scanning strategy from findnode(). The problem with that findnode approach is that it's only a heuristic, and when there is a very old subfolder, this will force findnode to go through nearly all the changesets… This could eventually be more expensive than going through all the nodes for retrieving the log information. In practice this heuristic works quite well though.

To be continued :-)

Boris, don't hesitate to post your other improvements.

comment:20 by Boris Kocherov <bkocherov@…>, 16 years ago

I think you miss findnode() in 706 line attachment:optimize_init_path-cboos.patch . It completely disable "big optimization". I may wrong.

I think for perventing copy/paste needs merge findnode() with _init_node4dir().

by Boris Kocherov <bkocherov@…>, 16 years ago

My changed-up version of cboos' patch

comment:21 by Boris Kocherov <bkocherov@…>, 16 years ago

small changes attachment:optimize_init_path-cboos.patch in attachment:optimize_init_path-boris.patch

  • added MercurialNode.findnode() - preventing copy/paste
  • added MercurialNode.node - another small optimization
  • added raise TracError("Your's mercurial repository contain error.\nManifest contain file which not found in changelog.")

Thanks a lot cboos and Maxim!

I agree. To be continued :-)

comment:22 by anonymous, 16 years ago

Component: plugin/mercurialweb frontend/mod_python
Owner: changed from Christian Boos to anonymous
Priority: highestnormal
Severity: criticalmajor

comment:23 by Remy Blank, 16 years ago

Component: web frontend/mod_pythonplugin/mercurial
Owner: changed from anonymous to Christian Boos
Priority: normalhighest
Severity: majorcritical

comment:24 by anonymous, 16 years ago

Resolution: wontfix
Status: newclosed

comment:25 by Remy Blank, 16 years ago

Resolution: wontfix
Status: closedreopened

Stop that, please.

comment:26 by Maxim Vuets <maxim.vuets@…>, 16 years ago

Christian, these patches work excelent for me. Do you plan to close the ticket and apply the patches to the trunk?

comment:27 by Christian Boos, 16 years ago

Yes, as so far I failed to improve upon those patches ;-) I'll refresh the latest patch from Boris once I've fixed the 1.1 compatibility issues (#7856).

comment:28 by Christian Boos, 16 years ago

Resolution: fixed
Status: reopenedclosed

Finally committed a change quite close to attachment:optimize_init_path-boris.patch in r7744 (for 0.11) and r7745 (for trunk and MultipleRepositorySupport).

I had a few additional ideas, but so far they don't beat the above, so better share this version which is already quite good: Linux-sized repositories become now browsable.

A big thank you goes to Boris Kocherov!

in reply to:  28 comment:29 by Maxim Vuets <maxim.vuets@…>, 16 years ago

Replying to cboos:

Finally committed a change quite close to attachment:optimize_init_path-boris.patch in r7744 (for 0.11) and r7745 (for trunk and MultipleRepositorySupport).

Works perfectly for 0.11.

A big thank you goes to Boris Kocherov!

For sure. Also thanks a lot to you Christian for so quick reaction and fix!

comment:30 by Christian Boos, 16 years ago

There's another slight improvement I've been playing with: only scan the changelog with findnode once for a given folder, i.e. retrieve the node information for the current dir and its subdirectories at once. Only do a findnode in get_entries if not already done by the previous step. Somehow this only manages to gain another 5% speed increase or so, but maybe I did something wrong. Boris, if you're interested you could try to take this further?

by Christian Boos, 16 years ago

Try to call findnode only once for one directory listing

comment:31 by Boris Kocherov <bkocherov@…>, 15 years ago

Please review my another very small patch. It for MultipleRepositorySupport branch instead TracMercurial plugin. I do not measure speed increase.

by Boris Kocherov <bkocherov@…>, 15 years ago

Speedup "Expand sub-directory inplace" by searching in ⇐ changesets reveison number

comment:32 by Christian Boos, 15 years ago

Ah yes, good idea. But we could directly set rev, no?

What is given in data: is probably just appended to the what's specified by the URL, but even if not, we could just modify url: a.attr("href") by adding or replacing rev as appropriate. That way, no modification to the python module would be needed.

comment:33 by Boris Kocherov <bkocherov@…>, 15 years ago

ohh, yes. you right. i try it.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.