Edgewall Software

Ticket #6614 (closed defect: fixed)

Opened 9 months ago

Last modified 4 months ago

Memory leak using apache

Reported by: anonymous Owned by: cboos
Priority: highest Milestone: 0.11
Component: version control Version: 0.11b1
Severity: blocker Keywords: memory
Cc: ilias@…, pierreroth64@…, kirean@…, kamil@…, mbertheau@…, benjixx@…, manuzhai@…, bdelaage@…, jdthood@…, mailinglist@…, alicebot@…, tereutes@…, daniel@…, gt4329b@…, ammon@…, chris@…, kts_66@…, bubbamagic87@…, gustav.svensson@…, trac@…

Description

When I browse my source, my apache process increases its memory usage dangerously (more than 1 GB !), never freeing it. It is particularly visible when trying to display a directory content with several entries in it.

I'm currently using :

  • Apache 2.0.61
  • Python 2.4
  • Trac 0.11b1

Attachments

timeline_rendering_backtrace.log (5.3 kB) - added by cboos 9 months ago.
The backtrace for the exception discussed in comment:23
no-permission-store-cache.diff (0.7 kB) - added by cboos 8 months ago.
gives a cleaner picture of real memory usage
single-threaded-tracd.diff (501 bytes) - added by cboos 8 months ago.
disables multi-threading for tracd
fix-_history.diff (1.4 kB) - added by cboos 8 months ago.
better Pool usage in SubversionRepository._history()
explicit_gc.diff (0.8 kB) - added by cboos 8 months ago.
call gc.collect() after every request
debug-mem-growth.diff (2.7 kB) - added by cboos 8 months ago.
same as explicit_gc.diff + some stats about accumulated objects
timeline-expression-bug.log (33.7 kB) - added by cboos 8 months ago.
log a session with timeline-expression-bug.diff active
timeline-expression-bug.diff (0.6 kB) - added by cboos 8 months ago.
raise a ValueError? from an Expression evaluation immediately during timeline rendering
timeline-suite-bug.diff (1.5 kB) - added by cboos 8 months ago.
raise a ValueError? from a Suite execution immediately during timeline rendering
timeline-expression-bug-genshi-0.4.4.log (12.1 kB) - added by cboos 8 months ago.
log a session with timeline-expression-bug.diff active - this time using Genshi 0.4.4 instead of [G789]
render-using-StringIO-rr6677.patch (1.1 kB) - added by cboos 7 months ago.
This patch helps to reduce the maximal memory usage.

Change History

follow-up: ↓ 2   Changed 9 months ago by hyuga <hyugaricdeau@…>

Hard to say without more detail, but could this be a duplicate of #6581? Try the patch suggested there and see if that helps.

in reply to: ↑ 1   Changed 9 months ago by hyuga <hyugaricdeau@…>

Replying to hyuga <hyugaricdeau@gmail.com>:

Hard to say without more detail, but could this be a duplicate of #6581? Try the patch suggested there and see if that helps.

Nevermind, apparently the patch won't help with the high memory usage issue.

in reply to: ↑ description   Changed 9 months ago by cboos

  • keywords memory added
  • owner changed from jonas to cboos
  • component changed from general to version control
  • priority changed from normal to highest

Replying to anonymous:

It is particularly visible when trying to display a directory content with several entries in it.

Thanks for the hint. For some reason, I was focusing on the rendering of file in the TracBrowser, but that seems to work fine whereas the rendering of a folder indeed appears to leak.

follow-up: ↓ 5   Changed 9 months ago by cboos

Looks like this is also present when using TracMercurial and none of the fancy new stuff from 0.11 (no quickjump, no color scale). Maybe a <py:include> issue?

in reply to: ↑ 4   Changed 9 months ago by hyuga <hyugaricdeau@…>

Replying to cboos:

Looks like this is also present when using TracMercurial and none of the fancy new stuff from 0.11 (no quickjump, no color scale). Maybe a <py:include> issue?

Just out of curiosity, why do you think it's an issue with includes, or even anything having to do with Genshi?

  Changed 9 months ago by anonymous

  • cc xarquonster@… added

follow-up: ↓ 12   Changed 9 months ago by cboos

From my various tests, it seems that Trac memory is growing as long as new data is processed.

For example, when requesting 6000 different changesets on a t.e.o clone, the memory slowly grows from 100 to 200 Mbytes. But then, when requesting the same range of changesets again, there's no extra growth, i.e. memory usage is still 200 Mbytes. This is inconsistent with a "classical" memory leak.

So my current take on the issue is that perhaps a lot of strings get interned. From a superficial inspection, I couldn't find the origin of this, and I won't have time in the next days to go more in depth, so maybe someone else can explore that hint.

follow-up: ↓ 9   Changed 9 months ago by ThurnerRupert

how trac and apache are connected? mod_python?

in reply to: ↑ 8   Changed 9 months ago by anonymous

Replying to ThurnerRupert:

how trac and apache are connected? mod_python?

Yes, the problem occurs with mod_python.

  Changed 9 months ago by esaj

I'd be interested to know if this happens in mod_wsgi too.

  Changed 9 months ago by cboos

Happens with tracd as well, so I think this has nothing to do with apache and mod_* ...

in reply to: ↑ 7 ; follow-up: ↓ 13   Changed 9 months ago by osimons

Replying to cboos:

From my various tests, it seems that Trac memory is growing as long as new data is processed.

That is what I see too. As i browse new changesets it just keeps growing, but revisiting them makes little difference to memory - tried new user, and scoping the changeset display as well just to test if variations had any impact.

However, when browsing changesets, I see that a changeset with many diff's - like typically for a branch merge - can increase the memory with 25MB or more in one go. It does not get released again. Viewing a similar size changeset with only adds (initial import of a new project) leaves no noticeable impact on memory at all.

May the diff rendering code be a starting point for further research?

in reply to: ↑ 12 ; follow-up: ↓ 14   Changed 9 months ago by cboos

Sorry for the late reply - I thought I would be able to locate the precise problem, but as I'm not yet there, I'll post some intermediate findings.

Replying to osimons:

May the diff rendering code be a starting point for further research?

I don't think so: I forgot to mention that for the test mentioned in comment:7, I disabled the diff generation (same goes for the results below).

Here are some more results from my recent testings, using tracd on Linux x86_64, psycopg 2.0.6, latest Trac and Genshi.

It seems that there are actually two issues:

  1. memory always increases and is never returned to the system: Python 2.5 performs better here, as explained in http://evanjones.ca/memoryallocator
  2. Subversion bindings, or the way we're using them, cause a serious leak

In more details:

When doing a wget -r -np -l2 / on a t.e.o clone (no Subversion bindings):

  • Python 2.4.4 stabilizes at 303 Mbytes
  • Python 2.5.1 stabilizes at 138 Mbytes, with a peak at about 310 Mbytes

stabilizes means here that we no longer see an increase in the memory usage when running the same wget command again. Here we see that 2.5.1 does a better job at memory management, and that the overall memory usage is relatively reasonable. Still a bit high, but I think we can't expect to have less than that.

Now I did the same wget test, this time with the 1.4.3 bindings enabled, using repository_type = direct-svnfs to make things more apparent:

  • Python 2.4.4 stabilizes at 3338 Mbytes (yes, 3.3 Gbytes!)
  • Python 2.5.1 stabilizes at 2590 Mbytes

So while 2.5 does better, it's still pretty bad. The timeline queries seem to be the most expensive ones, sometimes adding 100 Mbytes each, when the changesets are accessed for the first time. When using repository_type = svn those timeline queries don't involve the svn bindings and are fast and lean.

in reply to: ↑ 13 ; follow-up: ↓ 16   Changed 9 months ago by anonymous

When using repository_type = svn those timeline queries don't involve the svn bindings and are fast and lean.

So a workaround is to use repository_type = svn for now, for those of us using 0.11b1?

  Changed 9 months ago by anonymous

  • cc pierreroth64@… added

in reply to: ↑ 14   Changed 9 months ago by cboos

Replying to anonymous:

When using repository_type = svn those timeline queries don't involve the svn bindings and are fast and lean.

So a workaround is to use repository_type = svn for now, for those of us using 0.11b1?

Well, not really, svn is anyway the default, and I doubt the people having the issue reported here had used direct-svnfs. The latter only makes the problem more apparent, but the bindings are nevertheless still used for the former, e.g. for the browser queries.

We currently can't use the cache for everything, but there are still opportunities to use it more (like in #6654). Future work (0.12, 0.13?) should aim at a complete by-pass of the svn bindings, or alternatively, should use the newer ctypes bindings.

follow-up: ↓ 18   Changed 9 months ago by cboos

Back to the point: I think I've isolated one of the issues this week-end, related to Repository._history() and the weird behavior of fs.node_history(path), fs.history_prev and fs.history_location, most notably when path is not the root. I wrote an alternative implementation of _history(), using temporary memory pools. At first, it seemed to help a lot, but actually it was the explicit call to gc.collect() that I added at the end of request processing, in order to get more reproducible results, which was the main reason for the memory savings!

In more details:

With the following change only:

  • trac/web/main.py

    diff --git a/trac/web/main.py b/trac/web/main.py
    a b  
    389389        return _dispatch_request(req, env, env_error) 
    390390    finally: 
    391391        if env and not run_once: 
     392            import gc 
     393            gc.collect() 
     394            print "Garbage collected..." 
    392395            env.shutdown(threading._get_ident()) 
    393396 
    394397def _dispatch_request(req, env, env_error): 

on the test case from comment:13 above, the memory usage drops to 504M (!). More specifically:

wget -r ... tracd 0.11 tracd 0.11, with explicit gc.collect()
1st run 2040M 476M
2nd run 3121M 504M
3rd run 3123M 504M

Also, in all my tests, the ThreadingMixIn was removed from the TracHTTPServer, in order to eliminate possible multi-threading issues.

Now, with the alternative implementation for _history(), using temporary pools:

    def _history(self, svn_path, start, end, pool):
        """`svn_path` must be a full scope path, UTF-8 encoded string.

        Generator yielding `(path, rev)` pairs, where `path` is an `unicode`
        object.
        Must start with `(path, created rev)`.
        """
        if start < end:
            start, end = end, start
        root = fs.revision_root(self.fs_ptr, start, pool())
        tmp1 = Pool(pool)
        tmp2 = Pool(pool)
        history_ptr = fs.node_history(root, svn_path, tmp1())
        cross_copies = 1
        while history_ptr:
            history_ptr = fs.history_prev(history_ptr, cross_copies, tmp2())
            tmp1.clear()
            tmp1, tmp2 = tmp2, tmp1
            if history_ptr:
                path, rev = fs.history_location(history_ptr, tmp2())
                tmp2.clear()
                if rev < end:
                    break
                path = _from_svn(path)
                if not self.authz.has_permission(path):
                    break
                yield path, rev
        del tmp1
        del tmp2

we can get even further savings, i.e.

wget -r ... tracd 0.11 tracd 0.11, with explicit gc.collect() and temp pools for _history()
1st run 2040M 425M
2nd run 3121M 453M
3rd run 3123M 453M

The additional savings (roughly 50M) are a hint that we can probably get even more savings by using temporary pools in other places. Or even that we should rewrite the whole Pool stuff, see #1401...

Of course, those findings are quite counter-intuitive. If everything would work "as expected", the above changes in _history() would actually make no difference on the long run, and the same could be said for the gc.collect(). I think that this shows that there's definitely something wrong going on with the management of the pools, inside the bindings.

This still doesn't explain why it seems this is a 0.11 issue, as the svn_fs code hasn't changed that much since 0.10.5dev. I should probably make the same test on a 0.10.5dev install as well.

to be continued...

in reply to: ↑ 17   Changed 9 months ago by cboos

To complement comment:17 with testing results using python 2.5:

wget -r ... tracd 0.11 tracd' 0.11 - 2.4 tracd' 0.11 - 2.5
1st run 2040M 425M 277M
2nd run 3121M 453M 294M
3rd run 3123M 453M 288M

where tracd' means "with explicit gc.collect() and temp pools for _history()".

Also, in case that wasn't clear, the wget -r -np -l2 / command is run 3 times in succession, without restarting tracd.

  Changed 9 months ago by cboos

  • severity changed from normal to blocker

Needless to say, this is a blocker issue.

Note that #5213 was probably related to the same cause (r6328 only tried to "hide" the issue). #6588 as well, and that one was reported against 0.10.4.

  Changed 9 months ago by cboos

I repeated the same test using several variants of 0.10.5dev tracd, using Python 2.4:

wget -r ... tracd
(1)
tracd +gc
(2)
tracd +0.11
(3)
tracd +0.11 +gc
(4)
tracd +0.11 +tmp
(5)
tracd +0.11 +tmp +gc
(6)
1st run 124M 120M 117.8M 113M 118M 115M
2nd run 125.4M 122.4M 119M 117.4M 119M 118.6M
3rd run 125.4M 122.4M 120.8M 117.4M 119M 118.6M

Notes:

  1. straight from source:/branches/0.10-stable@6401
  2. same as 1. + the gc.collect() after every request
  3. same as 1. + svn_fs.py was patched with the 0.11 changes.
    I verified that (3) doesn't leak, as it stabilizes at 120.8M for a 4th run.
  4. same as 3. + the gc.collect() after every request
  5. same as 3. + the temp pools for _history().
  6. same as 5. + the gc.collect() after every request

From those numbers, it looks like the 0.11 changes are not harmful, they even seem to reduce the memory usage a bit. It's interesting to see that the temp pools optimizations are not worth doing (3 better than 5), unless the explicit call to gc.collect() is also done (6 better than 3). However, the explicit gc.collect() call alone is better than combined with the tmp pool optimization! (4 better than 6). Doing only the gc.collect() call on stock 0.10.5dev doesn't buy much (2 vs. 1).

Also, besides the memory issue, 0.10.5dev is a lot faster than 0.11: one wget run takes about 4 minutes for 0.10.5dev (even 3m 20s when not doing the systematic gc.collect() call), versus 33 minutes for 0.11. But that's old news.

Summary of the findings so far:

  • From the present results, it seems that the svn changes alone have no real impact on the memory.
  • From comment:13, we can see the big difference it makes for 0.11 when running with or without the bindings
  • From comment:18, we can see that thanks to some specific optimizations, the huge memory requirements added by the bindings can be lowered a lot.

Surprisingly enough, the main optimization is simply to call gc.collect() at regular intervals. Also interesting is the fact that the temp pool optimization (the changes to SvnRepository._history() described in comment:17) seem to give an additional 50M savings in 0.11, whereas we've seen here that those savings should really be marginals, if any (4 vs 6).

follow-up: ↓ 26   Changed 9 months ago by hyuga <hyugaricdeau@…>

But what about the TracMercurial plugin? In comment:3 you said that this problem seems to happen with it too, in which case it would have nothing to do with Subversion. I haven't tried it myself, so I don't know. Do you still think the issue arises with TracMercurial?

  Changed 9 months ago by anonymous

  • cc kirean@… added

  Changed 9 months ago by anonymous

  • cc kamil@… added

  Changed 9 months ago by anonymous

  • cc gustavo@… added

  Changed 9 months ago by osimons

#5782 may be closely related to this issue.

in reply to: ↑ 21   Changed 9 months ago by cboos

Replying to hyuga <hyugaricdeau@gmail.com>:

But what about the TracMercurial plugin? In comment:3 you said that this problem seems to happen with it too, in which case it would have nothing to do with Subversion.

It happens as well with Mercurial, but to a lesser extent than with Subversion. Subversion makes the effects of the underlying problem very apparent, because each request will consume a lot of memory.

I still don't have a solution, but I made some progress: when there's an exception, it looks like the frame information is being referenced from somewhere. It's not due to cycles or uncollectable objects (gc.garbage remains empty, also with gc.DEBUG_UNCOLLECTABLE set). I'm not even sure that's the only reason, but that's definitely one reason, as I've got a constant +10M increase when repeating a given failing request, and by looking at the objects that are accumulated, they really correspond to what's reachable from the stack (and this is using Mercurial as a backend, even).

/me still deeply puzzled and looking...

  Changed 9 months ago by cboos

It's not the logger: modifed trac/log.py with

class FakeLogger(object):
    def info(self, *args, **kwargs):
        pass
    def debug(self, *args, **kwargs):
        pass
    def warn(self, *args, **kwargs):
        pass
    def warning(self, *args, **kwargs):
        pass
    def error(self, *args, **kwargs):
        pass
    def exception(self, *args, **kwargs):
        pass

def logger_factory(logtype='syslog', logfile=None, level='WARNING',
                   logid='Trac', format=None):
    return FakeLogger()

Same results.

  Changed 9 months ago by cboos

It's not the somewhat complex error handling in trac/web either. If I silence the error that I get during template generation like that:

  • trac/web/main.py

    diff -r 419b87217da6 trac/web/main.py
    a b  
    206206                        template, data, content_type = \ 
    207207                                  self._post_process_request(req, *resp) 
    208208                        if 'hdfdump' in req.args: 
    209                             req.perm.require('TRAC_ADMIN') 
     209                            #req.perm.require('TRAC_ADMIN') 
    210210                            # debugging helper - no need to render first 
    211211                            from pprint import pprint 
    212212                            out = StringIO() 
    213213                            pprint(data, out) 
    214214                            req.send(out.getvalue(), 'text/plain') 
    215215                        else: 
    216                             output = chrome.render_template(req, template, 
     216                            try: 
     217                                output = chrome.render_template(req, template, 
    217218                                                            data, content_type) 
     219                            except: 
     220                                pass 
     221                            output = '****' 
    218222                            # Give the session a chance to persist changes 
    219223                            if req.session: 
    220224                                req.session.save() 

Same results.

So probably back to genshi?

  Changed 9 months ago by ilias@…

  • cc ilias@… added

  Changed 9 months ago by cboos

The error I'm referring to in comment:26 is an exception raised in a WikiSyntaxProvider, inside a wiki formatting done while rendering the timeline.

The corresponding backtrace for that error can be seen in attachment:timeline_rendering_backtrace.log

Now, if I catch the exception before it goes through the template rendering, the memory usage is OK again. The live object count, as measured by doing a len(gc.get_objects()) after a gc.collect(), remains constant at 62642, after successive requests.

If I raise an exception on purpose after the template rendering, there's also no increase in memory usage.

But if the original exception is not caught, the live object count increase by ~50000 after each call.

I'll try to locate the exact point in the backtrace which cause the memory retention.

Changed 9 months ago by cboos

The backtrace for the exception discussed in comment:23

  Changed 9 months ago by anonymous

  • cc mbertheau@… added

  Changed 9 months ago by anonymous

  • cc benjixx@… added

  Changed 8 months ago by anonymous

  • cc manuzhai@… added

  Changed 8 months ago by anonymous

  • cc etix@… added

  Changed 8 months ago by anonymous

  • cc bdelaage@… added

  Changed 8 months ago by ilias@…

May I suggest to the team the following:

  • provide a wiki page which summarizes the findings
  • provide some debug-code which prints mem-usage data into the log file, thus test give specific results.
  • users can publish the test results (from different systems) on the wiki page

I can run the tests on 2 servers (one has a mem problem, the other one seems(!) not to have it). Possibly the configuration of the servers is the answer.

follow-up: ↓ 41   Changed 8 months ago by anonymous

FC5, r6432, tracd, svn, running for 3 days now: VSZ 123M, RSS 85M.

  Changed 8 months ago by cboos

To summarize the issues at this point:

  1. prefer Python 2.5 over Python 2.4, the former has much improved over the latter at giving back memory to the system
  2. the alternative _history implementation helps a bit, when Subversion is used (fix-_history.diff), but not that much
  3. doing frequent full garbage collections helps a lot (explicit_gc.diff)
  4. there's a major issue going on as soon as an exception is raised when evaluating a Genshi Expression or Suite.

For testing 4. one can use the following set of patches (against latest trunk - r6493):

Also attached is a log file showing the accumulation of objects when testing the timeline expression bug. The VM memory usage grows approximately by 125Mb for each request (/timeline, daysback=10, date=2005-02-08, repository_dir=trac repository, repository_type=direct-svnfs).

That huge amount of memory can be attributed to an excessive reuse of the main Pool by Repository.get_changesets() when direct-svnfs is used. The real problem however is that this pool is not collected because it's kept alive. It's kept alive because there are 9 sub-pools kept alive, each corresponding to the 9 changesets stored in the 'data' object... which is itself kept alive for some reason.

I still haven't figured out why this happens. One possible approach to fix the problem would be to catch exceptions at the Base._eval level, and propagating that up as an EXCEPTION event. At an upper level, probably just before serialization, we can try to raise the exception again without running into the memory issue (it seems that exceptions raised at upper levels don't trigger the issue). More work is needed here.

Changed 8 months ago by cboos

gives a cleaner picture of real memory usage

Changed 8 months ago by cboos

disables multi-threading for tracd

Changed 8 months ago by cboos

better Pool usage in SubversionRepository._history()

Changed 8 months ago by cboos

call gc.collect() after every request

Changed 8 months ago by cboos

same as explicit_gc.diff + some stats about accumulated objects

Changed 8 months ago by cboos

log a session with timeline-expression-bug.diff active

Changed 8 months ago by cboos

raise a ValueError? from an Expression evaluation immediately during timeline rendering

Changed 8 months ago by cboos

raise a ValueError? from a Suite execution immediately during timeline rendering

follow-up: ↓ 40   Changed 8 months ago by cboos

More findings:

  • among the frames in the alive objects, it seems that none refers to the data dictionary
  • when using Genshi:source:tags/0.4.4, the bug is still present but behaves differently: memory goes from 150Mb to 275Mb exactly like before, then stays at 275Mb for subsequent repeated requests (even slightly different, like when changing the data), i.e. it stops growing after the second failure.

See timeline-expression-bug-with-genshi-0.4.4.log.

Was anonymous also using Genshi 0.4.4?

Changed 8 months ago by cboos

log a session with timeline-expression-bug.diff active - this time using Genshi 0.4.4 instead of [G789]

in reply to: ↑ 39   Changed 8 months ago by cboos

Correcting the above, I meant:

even slightly different, like when changing the date

I also checked that [G799] makes no difference with [G789].

The difference of behavior between 0.4 and 0.5 can actually be explained by [G770]... Reverting that changeset on trunk makes it behave like 0.4. Of course, we known that we can't just revert that change (as this would re-introduce nasty multi-threading issues - #G158) and anyway this wouldn't remove the issue completely as the memory increment at the second error is still there, but I think this narrows the issue even more.

in reply to: ↑ 37 ; follow-up: ↓ 42   Changed 8 months ago by mbertheau@…

I'm anonymous. We're using Genshi 0.5dev [G789] and python 2.4.2.

in reply to: ↑ 41   Changed 8 months ago by cboos

Replying to mbertheau@gmail.com:

Great, so you just probably never had any exception raised within an Expression in a template :-)

  Changed 8 months ago by anonymous

  • cc jdthood@… added

(I try again to add my e-mail address to the cc field. If I just add the address then the submission gets rejected as spam. If you are reading this then it means that it helped to add a comment.)

  Changed 8 months ago by anonymous

  • cc mailinglist@… added

adding my email so I can be notified when this is fixed.

  Changed 8 months ago by alicebot@…

  • cc alicebot@… added

  Changed 8 months ago by anonymous

  • cc tereutes@… added

follow-up: ↓ 48   Changed 8 months ago by cboos

  • status changed from new to assigned

In r6593, I added explicit garbage collection after each request. This should already help a lot (see comment:17).

In r6594, I added the SubversionRepository._history optimization, which should also help a bit (comment:17 again).

Finally, using Python 2.5 in preference to 2.4 will also help greatly (see comment:18).

Some feedback would be useful, in order to see how well this performs in practice.

What's left to be done:

  • workaround the real memory leak that happens when an exception is raised from inlined expressions in Genshi templates (comment:38, comment:39 and comment:40).
  • further improve the memory usage in the Subversion backend (#5782)

in reply to: ↑ 47 ; follow-up: ↓ 49   Changed 8 months ago by ilias@…

Replying to cboos:

In r6593, I added explicit garbage collection after each request. This should already help a lot (see comment:17).

What effects has this on performance?

(I like to repeat my plea in comment:36, as the information gets lost within the ticket comments)

in reply to: ↑ 48   Changed 8 months ago by cboos

Replying to ilias@lazaridis.com:

Replying to cboos:

In r6593, I added explicit garbage collection after each request. This should already help a lot (see comment:17).

What effects has this on performance?

Approximately a 10% slowdown, IIRC.

  Changed 8 months ago by daniel@…

  • cc daniel@… added

follow-up: ↓ 53   Changed 8 months ago by cboos

Concerning the current trunk - [6603], I'm still interested in getting feedback in term of memory usage.

Some news concerning the Genshi issue: I've not yet been able to trigger the problem purely at the Genshi level, so I'll go from the Trac side, stripping things away.

So far, I've found that the problem is not exactly tied to an exception being raised from an expression.

Having even an UndefinedError triggered from inside the py:for block in the timeline template triggers the same problem, with the same memory usage as in attachment:timeline-expression-bug.log (first request goes up to 146Mb, then 271Mb, then 396Mb, etc.)

  • trac/timeline/templates/timeline.html

     
    3232      </form> 
    3333 
    3434      <py:for each="day, events in groupby(events, key=lambda e: format_date(e.date))"> 
     35      ${say.hello} 
    3536        <h2>${day}: ${day == today and 'Today' or day == yesterday and 'Yesterday' or None}</h2> 
    3637        <dl> 
    3738          <py:for each="event in events" 

Now, changing that groupby to a simple day in event loop, things are even more interesting:

  • trac/timeline/templates/timeline.html

     
    3131       </div> 
    3232      </form> 
    3333 
    34       <py:for each="day, events in groupby(events, key=lambda e: format_date(e.date))"> 
     34      <py:for each="day in events"> 
     35      ${say.hello} 
    3536        <h2>${day}: ${day == today and 'Today' or day == yesterday and 'Yesterday' or None}</h2> 
    3637        <dl> 
    3738          <py:for each="event in events" 

Here, the memory usage starts with 146Mb, then 271Mb, then stays at 271Mb for every other repeated requests.

Seeing that, I've modified timeline.html even more drastically:

<!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:py="http://genshi.edgewall.org/"
      xmlns:xi="http://www.w3.org/2001/XInclude">
  ${say.hello}
</html>

The memory usage starts with 145Mb, then 270Mb, then stays at 270Mb.

Now, thinking again even the following results look suspicious:

<!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:py="http://genshi.edgewall.org/"
      xmlns:xi="http://www.w3.org/2001/XInclude">
</html>

i.e. dummy template, no error generated: memory starts and stays at 145Mb. Why? This is with Python 2.5, the memory allocated during this request should be returned to the system.

Note that with an empty template, I get a TemplateSyntaxError: no element found error and the memory usage is the same as above, starts and stays at 145Mb.

... to be continued