Edgewall Software
Modify

Opened 14 years ago

Closed 14 years ago

#2913 closed defect (fixed)

Memory Leak on SVN Repo Access using tracd or WSGI

Reported by: edsuom Owned by: Jonas Borgström
Priority: high Milestone: 0.10
Component: version control/browser Version: devel
Severity: major Keywords: wsgi, tracd, memory, leak
Cc: ed@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description

A memory leak is apparent in the trac SVN browser when it is accessed via tracd or WSGI , both of which methods leave the Python interpreter running between web requests. Every time a source file page from SVN is loaded via your web browser, memory usage goes up by over 1 MB. I did some snooping on the Python classes with the most refcounts and noted the following new, non-garbage-collected class instances added with every reload of a source file from SVN with the trac browser:

  • Four of the following:
libsvn.core.apr_pool_tPtr
  • Two of each of the following:
libsvn.core.apr_pool_t
libsvn.repos.svn_repos_tPtr
libsvn.fs.svn_fs_root_tPtr
libsvn.fs.svn_fs_tPtr
  • One of each of the following:
libsvn.repos.svn_repos_t
libsvn.fs.svn_fs_root_t
libsvn.fs.svn_fs_t

Then I did some further snooping with logging statements inserted into source:root/trunk/trac/versioncontrol/svn_fs.py. That revealed that every source file reload from the SVN browser instantiates trac.versioncontrol.svn_fs.Pool twice. One of the two instances is the self._parent_pool for the other.

This is a pretty serious problem for anyone wanting to run a robust trac site via WSGI or by reverse proxying to tracd. Indeed, I heard the comment on IRC that tracd is a memory hog, and I imagine this issue is exactly the reason for that perception.

Attachments (3)

diff (2.9 KB ) - added by edsuom 14 years ago.
Non-working attempt at caching instances of trac.versioncontrol.svn_fs.Pool
cached-pool.diff (9.0 KB ) - added by edsuom 14 years ago.
Fixes leak by caching versioncontrol.svn_fs.Pool instances
svn_pool_cache.diff (4.4 KB ) - added by Matthew Good 14 years ago.
updated patch to follow TracDev/CodingStyle (still slightly broken)

Download all attachments as: .zip

Change History (13)

comment:1 by edsuom, 14 years ago

Here are the results of some more snooping with logging statements inserted into source:trunk/trac/versioncontrol/svn_fs.py. (The link in the original ticket doesn't work, but this one does.)

  • One of the two instances of trac.versioncontrol.svn_fs.Pool that is created with every browser reload, the one that is self._parent_pool for the other one, has application_pool as its self._parent_pool.
  • Neither of the two instances has its destroy or __del__ methods called.
  • There does not seem to be any memory leak associated with browser accesses to directories of an SVN repo, as opposed to source files. Interestingly, trac.versioncontrol.svn_fs.Pool objects instantiated from directory browser access do have their __del__ methods called, and the result of self.valid() comes back as True, so they are destroyed. Hence no memory leak there.

by edsuom, 14 years ago

Attachment: diff added

Non-working attempt at caching instances of trac.versioncontrol.svn_fs.Pool

comment:2 by edsuom, 14 years ago

Keywords: wsgi tracd memory leak added

I spent a lot of time today trying to resolve this problem by caching instances of trac.versioncontrol.svn_fs.Pool but am abandoning the effort for two reasons:

  • It only helped keep new instances of libsvn.core.apr_pool_tPtr from being constructed on page reloads. The other offending libsvn objects multiplied like rabbits just as before.
  • The caching is broken; I couldn't figure out how to fully account for Pool instances that were no longer valid.

I've attached the patch for possibly historical interest, but don't recommend trying to use it.

I've done everything I'm capable of on this ticket and request that someone please take it on and fix it so I can continue with my WSGI trac deployment. Thanks!

comment:3 by Christian Boos, 14 years ago

Well, I was not able to reproduce this, neither on Windows nor on Linux, using Subversion 1.3.0.

A few more details about your setup would be useful.

If you're using Subversion 1.3.0, be sure to check the advices given in TracSubversion#TracandSubversion1.3.0, in particular be sure to use a compatible SWIG if you apply the patch for issue #2611.

by edsuom, 14 years ago

Attachment: cached-pool.diff added

Fixes leak by caching versioncontrol.svn_fs.Pool instances

comment:4 by edsuom, 14 years ago

Cc: ed@… added
Milestone: 0.10

I am pleased to announce that the patch cached-pool.diff fixes the problem of this ticket by modifying source:trunk/trac/versioncontrol/svn_fs.py to cache instances of versioncontrol.svn_fs.Pool. With the patch, RAM usage increases with subsequent reloads of a given page via the SVN browser, but the RAM will be released when the browser goes somewhere else instead of being allocated forever and forming a memory leak. Instead of having RAM usage climb quickly past 100MB+ from modest SVN repo access, it seems to stay at around 30-45 MB regardless of whether the HTTP server is run via WSGI or the standalone tracd.

When an instance of versioncontrol.svn_fs.Pool is constructed for the first time with a given path and parent pool (supplied as keywords), it creates a new SVN memory pool and cache it in a class-wide dictionary that is keyed by up to two keywords, a path and the SVN pool object of the pool's own parent. On subsequent instantiations with the same combination of keywords, the Pool constructor retrieves the SVN memory pool from cache and use it if it's still valid, instead of constructing a new pool afresh.

Validating cache entries is a bit tricky. If a cached pool is itself valid, the cacheAccess method tries to explores the pool's ancestry all the way to the top level {{None}}} parent pool object. The exploration loop will break if any invalid pool is found along the way. So if the exploration completes, all parents to the cached pool must be valid. In that case, that Pool()._pool attribute is assigned to the cached pool and the exploration loop terminates successfully.

I look forward to seeing this committed to trunk soon! I hereby license it for such inclusion under the same terms as Trac itself.

Edwin A. Suominen

comment:5 by Matthew Good, 14 years ago

Well, I'm glad the caching has made Trac usable for you, but it seems like it's really only slowing the leak, not a "fix". Maybe it's unavoidable, but I think it could use some more investigation.

Also, when submitting patches please try to follow TracDev/CodingStyle. In particular, Trac uses the lowercase_with_underscores for functions, variables and class members. I made some changes to the patch to follow the conventions, and keep the Pool.__init__ arguments ordered so that it's backwards-compatible. It's not quite working, but I figured I'd hold off on fixing it since I'd like to get some other opinions on whether caching is the best solution.

by Matthew Good, 14 years ago

Attachment: svn_pool_cache.diff added

updated patch to follow TracDev/CodingStyle (still slightly broken)

comment:6 by Christian Boos, 14 years ago

I have some big reservations about that patch. As I said before, the reality of the leak itself is still questionable for me: I simply don't have it. If you follow the instructions in TracSubversion, in particular if you pay attention to the compatible SWIG versions, I don't think a leak of pool objects will happen. Do someone else also have the leak? mgood?

What happens when you insert the following debug statements?

  • trac/versioncontrol/svn_fs.py

     
    143143        # Destroy pool
    144144        self.apr_pool_destroy(self._pool)
    145145
     146        import gc
     147        gc.set_debug(gc.DEBUG_UNCOLLECTABLE)
     148        print "GarbageCollect: ", gc.collect()
     149        print "Garbage: ", len(gc.garbage)
     150       
    146151        # Clear application pool and terminate APR if necessary
    147152        if not self._parent_pool:
    148153            application_pool = None

You should see this kind of output in your log:

...
GarbageCollect:  5
Garbage:  0
Trac[svn_fs] DEBUG: Closing subversion file-system at C:/Workspace/local/svn/trac-svnrepos
GarbageCollect:  0
Garbage:  0

comment:7 by Matthew Good, 14 years ago

No, I can't actually reproduce this either on Ubuntu Dapper w/ SVN 1.3.0 and SWIG 1.3.27. I also added a debugging test to track the active Pool objects, and it always drops to 1 after each request (1 instance is always kept as the application_pool):

  • trac/versioncontrol/svn_fs.py

     
    9090class Pool(object):
    9191    """A Pythonic memory pool object"""
    9292
     93    _pools = set()
     94
    9395    # Protect svn.core methods from GC
    9496    apr_pool_destroy = staticmethod(core.apr_pool_destroy)
    9597    apr_terminate = staticmethod(core.apr_terminate)
     
    143145        # Destroy pool
    144146        self.apr_pool_destroy(self._pool)
    145147
     148        import gc
     149        from sys import stderr
     150        gc.set_debug(gc.DEBUG_UNCOLLECTABLE)
     151        print >>stderr, 'GarbageCollect:', gc.collect()
     152        print >>stderr, 'Garbage:', len(gc.garbage)
     153
    146154        # Clear application pool and terminate APR if necessary
    147155        if not self._parent_pool:
    148156            application_pool = None
    149157            self.apr_terminate()
    150158
    151159        self._mark_invalid()
     160        print >>stderr, 'Pools:', len(Pool._pools)
    152161
    153162    def __del__(self):
    154163        """Automatically destroy memory pools, if necessary"""
     
    170179
    171180        # mark pool as valid
    172181        self._is_valid = lambda: 1
     182        Pool._pools.add(id(self))
    173183
    174184    def _mark_invalid(self):
    175185        """Mark pool as invalid"""
     
    181191            del self._parent_pool
    182192            if hasattr(self, "_weakref"):
    183193                del self._weakref
     194            Pool._pools.remove(id(self))
    184195
    185196
    186197# Initialize application-level pool

comment:8 by anonymous, 14 years ago

Found the source of the leak, finally. It appears in my installation when Trac is forced to use enscript for source highlighting as a result of exception handling. Now that I've got Trac working with SilverCity v0.9.5 (annoyingly, v0.9.6 doesn't work!), there's no leak. Here's some pertinent IRC discussion on the topic. My take on this is that the first exception caused by SilverCity either being missing or choking due to the v0.9.6 incompatibility (which ought to have its own ticket if it doesn't already!) should set a flag somewhere (perhaps a class attribute of the class that tried to run SilverCity) to avoid trying to do that something again. If the sysadmin fixes the environment so SilverCity will work, he will naturally expect to have to restart the server in order to see if the change worked.

[11:22] <edsuom> cmlenz: You know the memory leak of mine?
[11:22] <cmlenz> yeah
[11:22] <edsuom> Try forcing your trac to use enscript instead of silvercity and see what happens :-(
[11:24] <cmlenz> you're saying it has sth to do with the syntax coloring mechanism?
[11:25] <edsuom> cmlenz: The "fix" for the leak for me was to get silvercity working.
[11:25] <cmlenz> if that's the case, I'd suspect NaivePopen
[11:26] <-- [a]tomi has left this server ("optimizing...").
[11:26] <edsuom> cmlenz: I did some pretty elaborate gc tracking and saw the svn pool stuff instances growing along with the exceptions caused by silvercity.
[11:26] <edsuom> First it was the import exception, from it not being there. Then it was the 2-tuple error, from 0.9.6. In both cases, it leaked.
[11:27] <cmlenz> so you're seeing the leaks with a broken silvercity, not with enscript?
[11:28] <cmlenz> weird that that would leak
[11:28] <edsuom> cmlenz: I was seeing the leak when trac was forced to use enscript. Now that I've got it working with silvercity, there's no leak.
[11:28] <cmlenz> maybe something with references to pools in the traceback
[11:29] <cmlenz> ... creating a cycle
[11:29] <edsuom> cmlenz: That sounds like a good guess. I think the first silvercity exception should make trac quit trying silvercity for subsequent requests and just go to enscript.
[11:30] <edsuom> I don't think it's ever wise to rely on exception handling as a routine part of your code's operation.
[11:30] <cmlenz> remembering something like that between requests adds significant overhead, though
[11:31] <cmlenz> in python you often rely on importerrors etc to fallback for optional dependenices
[11:31] <edsuom> cmlenz: I don't see why. Just use a class attribute wherever the decision is being made to try silvercity.
[11:32] <cmlenz> so to use silvercity, the admin needs to restart the server... I can see the tickets already :-P
[11:32] <edsuom> To *add* silvercity, you do. It's pretty widely expected that you restart a server when you make changes to the guts of it.
[11:32] <pacopablo> eh, you have to restart the server for pretty much any config cahnge anyway
[11:33] * inviso_ nods
[11:33] <cmlenz> pacopablo: that's not the intention, actually
[11:33] <cmlenz> changes to the server config, yes, but not changes to the trac config
[11:33] <pacopablo> well no, but if you use just about anything other than CGi that's the reality
[11:33] <inviso_> It's honestly not an unexpected thing though. Nothing wrong with a server restart to bring in new modules. I'd do it anyway to reload newer versions
[11:34] <edsuom> cmlenz: I'm not talking about config changes. I'm saying that if the *environment* causes an exception when you try to do something, it's good practice to set a flag to avoid trying to do that something again. If you fix the environment, you restart the server.
[11:34] <cmlenz> probably the simplest solution would be to not even register the SilverCity component if the import fails
[11:35] <edsuom> cmlenz: Sure, although the same leak problem happened when it raised the exception about 2-tuple from silvercity 0.9.6.
[11:35] <cmlenz> gotta go... bbiab
[11:35] <-- cmlenz has left this server.
[11:37] <edsuom> I'm gonna copy this discussion to ticket #2913 and hope that one of you guys picks it up...I've spent a *huge* amount of time investigating it and someone can do the fixing.

comment:9 by Christian Boos, 14 years ago

So… that was fixed by r3241, right?

comment:10 by Christian Boos, 14 years ago

Resolution: fixed
Status: newclosed

Fixed by r3241 and r3341.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Jonas Borgström.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Jonas Borgström to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.