Opened 19 years ago
Closed 19 years ago
#2913 closed defect (fixed)
Memory Leak on SVN Repo Access using tracd or WSGI
Reported by: | edsuom | Owned by: | Jonas Borgström |
---|---|---|---|
Priority: | high | Milestone: | 0.10 |
Component: | version control/browser | Version: | devel |
Severity: | major | Keywords: | wsgi, tracd, memory, leak |
Cc: | ed@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
A memory leak is apparent in the trac SVN browser when it is accessed via tracd or WSGI , both of which methods leave the Python interpreter running between web requests. Every time a source file page from SVN is loaded via your web browser, memory usage goes up by over 1 MB. I did some snooping on the Python classes with the most refcounts and noted the following new, non-garbage-collected class instances added with every reload of a source file from SVN with the trac browser:
- Four of the following:
libsvn.core.apr_pool_tPtr
- Two of each of the following:
libsvn.core.apr_pool_t libsvn.repos.svn_repos_tPtr libsvn.fs.svn_fs_root_tPtr libsvn.fs.svn_fs_tPtr
- One of each of the following:
libsvn.repos.svn_repos_t libsvn.fs.svn_fs_root_t libsvn.fs.svn_fs_t
Then I did some further snooping with logging statements inserted into
source:root/trunk/trac/versioncontrol/svn_fs.py. That revealed that every
source file reload from the SVN browser instantiates
trac.versioncontrol.svn_fs.Pool
twice. One of the two instances is the
self._parent_pool
for the other.
This is a pretty serious problem for anyone wanting to run a robust trac site via WSGI or by reverse proxying to tracd. Indeed, I heard the comment on IRC that tracd is a memory hog, and I imagine this issue is exactly the reason for that perception.
Attachments (3)
Change History (13)
comment:1 by , 19 years ago
by , 19 years ago
Non-working attempt at caching instances of trac.versioncontrol.svn_fs.Pool
comment:2 by , 19 years ago
Keywords: | wsgi tracd memory leak added |
---|
I spent a lot of time today trying to resolve this problem by caching instances
of trac.versioncontrol.svn_fs.Pool
but am abandoning the effort for two
reasons:
- It only helped keep new instances of
libsvn.core.apr_pool_tPtr
from being constructed on page reloads. The other offendinglibsvn
objects multiplied like rabbits just as before.
- The caching is broken; I couldn't figure out how to fully account for
Pool
instances that were no longer valid.
I've attached the patch for possibly historical interest, but don't recommend trying to use it.
I've done everything I'm capable of on this ticket and request that someone please take it on and fix it so I can continue with my WSGI trac deployment. Thanks!
comment:3 by , 19 years ago
Well, I was not able to reproduce this, neither on Windows nor on Linux, using Subversion 1.3.0.
A few more details about your setup would be useful.
If you're using Subversion 1.3.0, be sure to check the advices given in TracSubversion#TracandSubversion1.3.0, in particular be sure to use a compatible SWIG if you apply the patch for issue #2611.
by , 19 years ago
Attachment: | cached-pool.diff added |
---|
Fixes leak by caching versioncontrol.svn_fs.Pool instances
comment:4 by , 19 years ago
Cc: | added |
---|---|
Milestone: | → 0.10 |
I am pleased to announce that the patch cached-pool.diff fixes the problem of this ticket by modifying source:trunk/trac/versioncontrol/svn_fs.py to cache instances of versioncontrol.svn_fs.Pool
. With the patch, RAM usage increases with subsequent reloads of a given page via the SVN browser, but the RAM will be released when the browser goes somewhere else instead of being allocated forever and forming a memory leak. Instead of having RAM usage climb quickly past 100MB+ from modest SVN repo access, it seems to stay at around 30-45 MB regardless of whether the HTTP server is run via WSGI or the standalone tracd.
When an instance of versioncontrol.svn_fs.Pool
is constructed for the first time with a given path and parent pool (supplied as keywords), it creates a new SVN memory pool and cache it in a class-wide dictionary that is keyed by up to two keywords, a path and the SVN pool object of the pool's own parent. On subsequent instantiations with the same combination of keywords, the Pool
constructor retrieves the SVN memory pool from cache and use it if it's still valid, instead of constructing a new pool afresh.
Validating cache entries is a bit tricky. If a cached pool is itself valid, the cacheAccess
method tries to explores the pool's ancestry all the way to the top level {{None}}} parent pool object. The exploration loop will break if any invalid pool is found along the way. So if the exploration completes, all parents to the cached pool must be valid. In that case, that Pool()._pool
attribute is assigned to the cached pool and the exploration loop terminates successfully.
I look forward to seeing this committed to trunk soon! I hereby license it for such inclusion under the same terms as Trac itself.
Edwin A. Suominen
comment:5 by , 19 years ago
Well, I'm glad the caching has made Trac usable for you, but it seems like it's really only slowing the leak, not a "fix". Maybe it's unavoidable, but I think it could use some more investigation.
Also, when submitting patches please try to follow TracDev/CodingStyle. In particular, Trac uses the lowercase_with_underscores
for functions, variables and class members. I made some changes to the patch to follow the conventions, and keep the Pool.__init__
arguments ordered so that it's backwards-compatible. It's not quite working, but I figured I'd hold off on fixing it since I'd like to get some other opinions on whether caching is the best solution.
by , 19 years ago
Attachment: | svn_pool_cache.diff added |
---|
updated patch to follow TracDev/CodingStyle (still slightly broken)
comment:6 by , 19 years ago
I have some big reservations about that patch. As I said before, the reality of the leak itself is still questionable for me: I simply don't have it. If you follow the instructions in TracSubversion, in particular if you pay attention to the compatible SWIG versions, I don't think a leak of pool objects will happen. Do someone else also have the leak? mgood?
What happens when you insert the following debug statements?
-
trac/versioncontrol/svn_fs.py
143 143 # Destroy pool 144 144 self.apr_pool_destroy(self._pool) 145 145 146 import gc 147 gc.set_debug(gc.DEBUG_UNCOLLECTABLE) 148 print "GarbageCollect: ", gc.collect() 149 print "Garbage: ", len(gc.garbage) 150 146 151 # Clear application pool and terminate APR if necessary 147 152 if not self._parent_pool: 148 153 application_pool = None
You should see this kind of output in your log:
... GarbageCollect: 5 Garbage: 0 Trac[svn_fs] DEBUG: Closing subversion file-system at C:/Workspace/local/svn/trac-svnrepos GarbageCollect: 0 Garbage: 0
comment:7 by , 19 years ago
No, I can't actually reproduce this either on Ubuntu Dapper w/ SVN 1.3.0 and SWIG 1.3.27. I also added a debugging test to track the active Pool
objects, and it always drops to 1 after each request (1 instance is always kept as the application_pool
):
-
trac/versioncontrol/svn_fs.py
90 90 class Pool(object): 91 91 """A Pythonic memory pool object""" 92 92 93 _pools = set() 94 93 95 # Protect svn.core methods from GC 94 96 apr_pool_destroy = staticmethod(core.apr_pool_destroy) 95 97 apr_terminate = staticmethod(core.apr_terminate) … … 143 145 # Destroy pool 144 146 self.apr_pool_destroy(self._pool) 145 147 148 import gc 149 from sys import stderr 150 gc.set_debug(gc.DEBUG_UNCOLLECTABLE) 151 print >>stderr, 'GarbageCollect:', gc.collect() 152 print >>stderr, 'Garbage:', len(gc.garbage) 153 146 154 # Clear application pool and terminate APR if necessary 147 155 if not self._parent_pool: 148 156 application_pool = None 149 157 self.apr_terminate() 150 158 151 159 self._mark_invalid() 160 print >>stderr, 'Pools:', len(Pool._pools) 152 161 153 162 def __del__(self): 154 163 """Automatically destroy memory pools, if necessary""" … … 170 179 171 180 # mark pool as valid 172 181 self._is_valid = lambda: 1 182 Pool._pools.add(id(self)) 173 183 174 184 def _mark_invalid(self): 175 185 """Mark pool as invalid""" … … 181 191 del self._parent_pool 182 192 if hasattr(self, "_weakref"): 183 193 del self._weakref 194 Pool._pools.remove(id(self)) 184 195 185 196 186 197 # Initialize application-level pool
comment:8 by , 19 years ago
Found the source of the leak, finally. It appears in my installation when Trac is forced to use enscript for source highlighting as a result of exception handling. Now that I've got Trac working with SilverCity v0.9.5 (annoyingly, v0.9.6 doesn't work!), there's no leak. Here's some pertinent IRC discussion on the topic. My take on this is that the first exception caused by SilverCity either being missing or choking due to the v0.9.6 incompatibility (which ought to have its own ticket if it doesn't already!) should set a flag somewhere (perhaps a class attribute of the class that tried to run SilverCity) to avoid trying to do that something again. If the sysadmin fixes the environment so SilverCity will work, he will naturally expect to have to restart the server in order to see if the change worked.
[11:22] <edsuom> cmlenz: You know the memory leak of mine? [11:22] <cmlenz> yeah [11:22] <edsuom> Try forcing your trac to use enscript instead of silvercity and see what happens :-( [11:24] <cmlenz> you're saying it has sth to do with the syntax coloring mechanism? [11:25] <edsuom> cmlenz: The "fix" for the leak for me was to get silvercity working. [11:25] <cmlenz> if that's the case, I'd suspect NaivePopen [11:26] <-- [a]tomi has left this server ("optimizing..."). [11:26] <edsuom> cmlenz: I did some pretty elaborate gc tracking and saw the svn pool stuff instances growing along with the exceptions caused by silvercity. [11:26] <edsuom> First it was the import exception, from it not being there. Then it was the 2-tuple error, from 0.9.6. In both cases, it leaked. [11:27] <cmlenz> so you're seeing the leaks with a broken silvercity, not with enscript? [11:28] <cmlenz> weird that that would leak [11:28] <edsuom> cmlenz: I was seeing the leak when trac was forced to use enscript. Now that I've got it working with silvercity, there's no leak. [11:28] <cmlenz> maybe something with references to pools in the traceback [11:29] <cmlenz> ... creating a cycle [11:29] <edsuom> cmlenz: That sounds like a good guess. I think the first silvercity exception should make trac quit trying silvercity for subsequent requests and just go to enscript. [11:30] <edsuom> I don't think it's ever wise to rely on exception handling as a routine part of your code's operation. [11:30] <cmlenz> remembering something like that between requests adds significant overhead, though [11:31] <cmlenz> in python you often rely on importerrors etc to fallback for optional dependenices [11:31] <edsuom> cmlenz: I don't see why. Just use a class attribute wherever the decision is being made to try silvercity. [11:32] <cmlenz> so to use silvercity, the admin needs to restart the server... I can see the tickets already :-P [11:32] <edsuom> To *add* silvercity, you do. It's pretty widely expected that you restart a server when you make changes to the guts of it. [11:32] <pacopablo> eh, you have to restart the server for pretty much any config cahnge anyway [11:33] * inviso_ nods [11:33] <cmlenz> pacopablo: that's not the intention, actually [11:33] <cmlenz> changes to the server config, yes, but not changes to the trac config [11:33] <pacopablo> well no, but if you use just about anything other than CGi that's the reality [11:33] <inviso_> It's honestly not an unexpected thing though. Nothing wrong with a server restart to bring in new modules. I'd do it anyway to reload newer versions [11:34] <edsuom> cmlenz: I'm not talking about config changes. I'm saying that if the *environment* causes an exception when you try to do something, it's good practice to set a flag to avoid trying to do that something again. If you fix the environment, you restart the server. [11:34] <cmlenz> probably the simplest solution would be to not even register the SilverCity component if the import fails [11:35] <edsuom> cmlenz: Sure, although the same leak problem happened when it raised the exception about 2-tuple from silvercity 0.9.6. [11:35] <cmlenz> gotta go... bbiab [11:35] <-- cmlenz has left this server. [11:37] <edsuom> I'm gonna copy this discussion to ticket #2913 and hope that one of you guys picks it up...I've spent a *huge* amount of time investigating it and someone can do the fixing.
Here are the results of some more snooping with logging statements inserted into source:trunk/trac/versioncontrol/svn_fs.py. (The link in the original ticket doesn't work, but this one does.)
trac.versioncontrol.svn_fs.Pool
that is created with every browser reload, the one that isself._parent_pool
for the other one, hasapplication_pool
as itsself._parent_pool
.destroy
or__del__
methods called.trac.versioncontrol.svn_fs.Pool
objects instantiated from directory browser access do have their__del__
methods called, and the result ofself.valid()
comes back asTrue
, so they are destroyed. Hence no memory leak there.