#3504 closed defect (fixed)
AssertionError in trac.db.pool.PooledConnection.__del__
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | high | Milestone: | 0.10 |
Component: | general | Version: | devel |
Severity: | critical | Keywords: | database connection pool |
Cc: | exarkun@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
On nearly every request, my server logs this error:
Exception exceptions.AssertionError: <exceptions.AssertionError instance at 0x89faeacc> in <bound method PooledConnection.__del__ of <trac.db.pool.PooledConnection object at 0xb47cc88c>> ignored
Attachments (6)
Change History (50)
comment:1 by , 18 years ago
Cc: | added |
---|
follow-up: 3 comment:2 by , 18 years ago
comment:3 by , 18 years ago
Milestone: | → 0.10.1 |
---|---|
Severity: | normal → critical |
Replying to anonymous:
… When this error occurs, the message is repeatedly written to the apache error log until that partition gets filled.
Ouch! that sounds serious. Putting a tentative milestone on this.
comment:4 by , 18 years ago
First, what's the database backend you're using, and what are the versions used (for the bindings and for the db itself)?
comment:5 by , 18 years ago
I apologize - inspecting the logs after this happened again shows another error than originally reported in this ticket is being repeatedly added to the error log file. I'll open a new ticket to describe the errors there. I apologize again for the inconvenience caused.
comment:6 by , 18 years ago
Keywords: | database connection pool added |
---|---|
Milestone: | 0.10.1 → 0.10 |
Owner: | changed from | to
Severity: | critical → blocker |
Version: | 0.9.6 → devel |
The AssertionError
error reported here is indicative of a very serious issue (see #3742, which I just closed as duplicate of this one).
Now the good news is that I think I understood the problem… The pool.py code assumes that garbage collection of a PooledConnection will happen in the same thread than the one in which the instance was created. I believe this is a wrong assumption.
In turn, this messes up the logic of the pool, and after a while, when all the slots for new connections have been exhausted, every new thread will become stuck waiting on the _available
condition variable, giving the impression that the application freezes.
If that would explain my freezes on #3332, I would be very happy!
comment:7 by , 18 years ago
Status: | new → assigned |
---|
More detailed explanation:
- with tracd, each request is processed by a dedicated thread. It happens (1) that some PooledConnection created by this thread don't get garbage collected before the thread exits.
- those PooledConnection were therefore garbage collected in another thread, and this triggered the assertion failed discussed above.
- however, even if the assertion is avoided by carrying the right thread identifier around and the proper connection would be freed from the other thread, a pysqlite ProgrammingError would then be raised, because pysqlite doesn't allow a connection object to be manipulated from the thread in which it has been created.
- therefore, we must ensure that before the thread exists, the pysqlite Connection object created by it is properly closed.
The attachment:force_close_at_thread_exit.patch implements that.
I'm not entirely satisfied with the patch, though, as we still force the creation/disposal of a pysqlite connection, even if no db connection is needed for processing the request (think Chrome).
Perhaps doing a env.signal_thread_exits(threading._get_ident())
, that gets propagated by the environment to the db pool would be a way…
(1) what I still don't understand is:
- why do we see this bug only now? It's been there since ages.
- why does it seem to be triggered so quickly by the report module (copy a report, modify the sql, delete it, repeat that a few times and bang, it hangs)
by , 18 years ago
Attachment: | force_close_at_thread_exit.patch added |
---|
by , 18 years ago
Attachment: | selective_shutdown_at_thread_exit.patch added |
---|
Another possible fix for this ticket, still on top of r3747. This solution is a bit more satisfying, as it doesn't require to create a Connection when there's no need to, so an additional benefit of this patch could be a small performance boost.
comment:8 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Fixed in r3756, by applying attachment:selective_shutdown_at_thread_exit.patch (which also contained r3755).
comment:9 by , 18 years ago
by , 18 years ago
Attachment: | even_more_threadsafe_pool-r3756.diff added |
---|
Now the PooledConnection remembers which thread it has been created in. If GC of the instance happens in another thread, avoid to do the cleanup from there, and instead defer the cleanup of the Connection to be done when the thread that created it terminates.
comment:10 by , 18 years ago
Well, attachment:even_more_threadsafe_pool-r3756.diff implements an even more robust cleanup, which should really avoid the risk of AssertError and closing of a Connection from a different thread.
But this starts to become heavy, especially for backends which don't support pooling in the first place ;)
comment:11 by , 18 years ago
FWIW, I updated to r3756 and have seen the ignored AssertionError once since then.
comment:12 by , 18 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Oh, then I think it's worth applying the latest patch.
by , 18 years ago
Attachment: | even_more_threadsafe_pool-r3762.diff added |
---|
Ok, this last patch seems to be more correct. Please review. (take 2)
comment:14 by , 18 years ago
I'll update and try it out soon. In the mean time, I woke up this morning to hundreds or thousands of occurrences of this tracebacks like this in my logfile:
2006/09/22 07:06 CDT [-] Exception rendering: 2006/09/22 07:06 CDT [-] Unhandled Error Traceback (most recent call last): File "/usr/lib/python2.4/threading.py", line 422, in run self.__target(*self.__args, **self.__kwargs) File "/home/trac/Projects/Twisted/trunk/twisted/python/threadpool.py", line 148, in _worker context.call(ctx, function, *args, **kwargs) File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 59, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 37, in callWithContext return func(*args,**kw) --- <exception caught here> --- File "/home/trac/Projects/Twisted/trunk/twisted/web2/wsgi.py", line 138, in run result = self.application(self.environment, self.startWSGIResponse) File "/home/trac/Run/trac/external.py", line 74, in __call__ return self.application(environ, start_response) File "/home/trac/Run/trac/external.py", line 115, in tracApplication return trac.web.main.dispatch_request(environ, start_response) File "/home/trac/Projects/trac/trunk/trac/web/main.py", line 389, in dispatch_request req.send_error(sys.exc_info(), status=500) File "/home/trac/Projects/trac/trunk/trac/web/api.py", line 375, in send_error data = self.hdf.render(template) File "/home/trac/Projects/trac/trunk/trac/web/clearsilver.py", line 288, in render neo_util.Error: Traceback (innermost last): File "csparse.c", line 298, in cs_parse_file() File "neo_files.c", line 86, in ne_load_file_len() SystemError: Unable to open file /home/trac/Projects/trac/trunk/templates/error.cs: [24] Too many open files
Before restarting I checked /proc to find out what files were open. About a thousand of the open file descriptors referred to trac.db.
comment:15 by , 18 years ago
I see you're running Trac with Twisted… could it be that the 'wsgi.multithread' wsgi environment is not correctly set? Because that would explain the error. Check by adding a "print" in source:trunk/trac/web/main.py@3762#L361.
PS: on a somewhat related note, can you point me to some instructions about how to use Trac within Twisted? Thanks!
follow-up: 17 comment:16 by , 18 years ago
The variable is true. Is that right? :) I have no idea. I do know the server doesn't fork.
The site is running based on the stuff here:
http://cvs.twistedmatrix.com/cvs/sandbox/exarkun/t.web2-wsgi-trac/
It's somewhat specialized for our particular configuration, but it might still be useful as a starting point if you want to run trac in Twisted. (I'm not sure this is a great thing to want just yet, Twisted's wsgi support is very new and not yet entirely stable.)
comment:17 by , 18 years ago
comment:18 by , 18 years ago
I've also stress-tested r3763 with a brand new PosgreSQL 8.1.4 (pyPgSQL-2.5.1.win32-py2.4), and at first it seemed pretty fast and robust (… and no database is locked
anymore, joy!).
But upon closer inspection, I discovered that the pool was created over and over again. Hence the following fix:
Index: trac/db/api.py =================================================================== --- trac/db/api.py (revision 3763) +++ trac/db/api.py (working copy) @@ -73,7 +73,8 @@ def shutdown(self, tid=None): if self._cnx_pool: self._cnx_pool.shutdown(tid) - self._cnx_pool = None + if not tid: + self._cnx_pool = None def _get_connector(self): ### FIXME: Make it public? scheme, args = _parse_db_str(self.connection_uri)
After that, things seemed even faster, if possible ;)
Now I'll test back with pysqlite, plus test both backends in single thread mode.
comment:19 by , 18 years ago
Ok, with r3764 it seems really solid with sqlite/posgres, with tracd multi/singlethread on Windows. Now I'll test on Linux…
comment:20 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Tested on Linux with apache/prefork (not that relevant, I know) and tracd, both with pysqlite. I verified that tracd doesn't leak file handles.
I'm pretty confident we can close this now.
So, if the leak of open files still occurs with Twisted even with r3764, please open a new ticket, targeted to 0.10.1.
comment:21 by , 18 years ago
The file leak does appear to be fixed. However, the AssertionError still occurs, although apparently less freqeuntly. Since upgrading on Friday, it has only happened twice:
2006/09/22 21:10 CDT [HTTPChannel,327,198.49.126.190] Exception exceptions.AssertionError: <exceptions.AssertionError instance at 0xb7794dac> in <bound method PooledConnection.del of <trac.db.pool.PooledConnection object at 0xb4b75dec>> ignored 2006/09/25 05:12 CDT [HTTPChannel,25208,198.49.126.190] Exception exceptions.AssertionError: <exceptions.AssertionError instance at 0xb7770ecc> in <bound method PooledConnection.del of <trac.db.pool.PooledConnection object at 0x87c248c>> ignored
follow-ups: 23 24 comment:22 by , 18 years ago
Uh ho, right, there was a bug:
Index: pool.py =================================================================== --- pool.py (revision 3788) +++ pool.py (working copy) @@ -122,8 +122,8 @@ cnx.rollback() # non-poolable but same thread: close cnx.close() self._cursize -= 1 - else: # non-poolable, different thread: do nothing - self._active[tid][0] = 0 + else: # non-poolable, different thread: push it back + self._active[tid] = (0, cnx) self._available.notify() def shutdown(self, tid=None):
In some rather convoluted circumstances, this could lead to the AssertionError…
In case anyone's interested in the details: two PooledConnections (a and b) in the same thread (say A) are ready to be GCed; if a happens to be GCed in another thread, it will take the self._active[tid][0] = 0
buggy code path and the self._active[A]
slot is now empty; after that, if a new PooledConnection is created in A, there will be a new cnx created and put into self._active[A]
; then, when b gets GCed, the AssertionError will be raised…
comment:23 by , 18 years ago
Follow-up to comment:22:
In case anyone's interested in the details: …
Oh well, the above scenario seems not possible, as only one PooledConnection can trigger the _cleanup. It must be something else, similarly hairy…
comment:24 by , 18 years ago
Another follow-up to comment:22:
Uh ho, right, there was a bug:
...
The correct and tested fix is r3791.
follow-up: 27 comment:26 by , 18 years ago
From my server log running r3791:
2006/09/26 14:57 CDT [HTTPChannel,5859,198.49.126.190] Exception exceptions.AssertionError: <exceptions.AssertionError instance at 0xb76c418c> in <bound method PooledConnection.__del__ of <trac.db.pool.PooledConnection object at 0xb583268c>> ignored
comment:27 by , 18 years ago
Milestone: | 0.10 → 0.10.1 |
---|---|
Resolution: | fixed |
Severity: | blocker → major |
Status: | closed → reopened |
Replying to anonymous:
From my server log running r3791: …
Please add some debugging info:
Index: pool.py =================================================================== --- pool.py (revision 3791) +++ pool.py (working copy) @@ -102,6 +102,12 @@ try: if tid in self._active: num, cnx_ = self._active.get(tid) + if cnx is not cnx_: + print 'In [%d], _active -> %s\n _dormant -> %s' % \ + (threading._get_ident(), + repr(self._active), repr(self._dormant)) + print 'PooledConnection created from [%d] %s' % \ + (tid, repr(cnx)) assert cnx is cnx_ if num > 1: self._active[tid][0] = num - 1
follow-up: 29 comment:28 by , 18 years ago
Some output from the log file with the above instrumentation added:
2006/09/26 21:16 CDT [HTTPChannel,2702,198.49.126.190] In [-1240794192], _active -> {-1276134480: [1, <trac.db.sqlite_backend.SQLiteConnection object at 0x8e4b70c>]} 2006/09/26 21:16 CDT [HTTPChannel,2702,198.49.126.190] _dormant -> [] 2006/09/26 21:16 CDT [HTTPChannel,2702,198.49.126.190] PooledConnection created from [-1276134480] <trac.db.sqlite_backend.SQLiteConnection object at 0x8e3a5cc> 2006/09/26 21:16 CDT [HTTPChannel,2702,198.49.126.190] Exception exceptions.AssertionError: <exceptions.AssertionError instance at 0x8e4b8ac> in <bound method PooledConnection.__del__ of <trac.db.pool.PooledConnection object at 0x8e4144c>> ignored 2006/09/26 21:16 CDT [HTTPChannel,2884,198.49.126.190] In [-1267741776], _active -> {-1218258000: [1, <trac.db.sqlite_backend.SQLiteConnection object at 0x8730d0c>]} 2006/09/26 21:16 CDT [HTTPChannel,2884,198.49.126.190] _dormant -> [] 2006/09/26 21:16 CDT [HTTPChannel,2884,198.49.126.190] PooledConnection created from [-1218258000] <trac.db.sqlite_backend.SQLiteConnection object at 0x8ab222c> 2006/09/26 21:16 CDT [HTTPChannel,2884,198.49.126.190] Exception exceptions.AssertionError: <exceptions.AssertionError instance at 0x8ab2ecc> in <bound method PooledConnection.__del__ of <trac.db.pool.PooledConnection object at 0x8ab25ec>> ignored 2006/09/26 21:16 CDT [HTTPChannel,3021,198.49.126.190] In [-1364345936], _active -> {-1267741776: [2, <trac.db.sqlite_backend.SQLiteConnection object at 0x85e3aac>]} 2006/09/26 21:16 CDT [HTTPChannel,3021,198.49.126.190] _dormant -> [] 2006/09/26 21:16 CDT [HTTPChannel,3021,198.49.126.190] PooledConnection created from [-1267741776] <trac.db.sqlite_backend.SQLiteConnection object at 0x85d380c> 2006/09/26 21:16 CDT [HTTPChannel,3021,198.49.126.190] Exception exceptions.AssertionError: <exceptions.AssertionError instance at 0x8604f8c> in <bound method PooledConnection.__del__ of <trac.db.pool.PooledConnection object at 0xb584b5ac>> ignored
I don't know if this is related, but very shortly afterwards:
2006/09/26 21:17 CDT [-] Exception rendering: 2006/09/26 21:17 CDT [-] Unhandled Error Traceback (most recent call last): File "/usr/lib/python2.4/threading.py", line 422, in run self.__target(*self.__args, **self.__kwargs) File "/home/trac/Projects/Twisted/trunk/twisted/python/threadpool.py", line 148, in _worker context.call(ctx, function, *args, **kwargs) File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 59, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 37, in callWithContext return func(*args,**kw) --- <exception caught here> --- File "/home/trac/Projects/Twisted/trunk/twisted/web2/wsgi.py", line 138, in run result = self.application(self.environment, self.startWSGIResponse) File "/home/trac/Run/trac/external.py", line 74, in __call__ return self.application(environ, start_response) File "/home/trac/Run/trac/external.py", line 115, in tracApplication return trac.web.main.dispatch_request(environ, start_response) File "/home/trac/Projects/trac/trunk/trac/web/main.py", line 389, in dispatch_request req.send_error(sys.exc_info(), status=500) File "/home/trac/Projects/trac/trunk/trac/web/api.py", line 375, in send_error data = self.hdf.render(template) File "/home/trac/Projects/trac/trunk/trac/web/clearsilver.py", line 289, in render return template.render() neo_util.Error: Traceback (innermost last): File "csparse.c", line 3170, in cs_render() File "csparse.c", line 3156, in render_node() File "csparse.c", line 2410, in each_eval() File "csparse.c", line 3156, in render_node() File "csparse.c", line 2410, in each_eval() File "csparse.c", line 3156, in render_node() File "csparse.c", line 2073, in var_eval() File "neo_cs.c", line 142, in render_cb() File "neo_str.c", line 124, in string_append() File "neo_str.c", line 105, in string_check_length() MemoryError: Unable to allocate STRING buf of size 1024
and then
2006/09/26 21:17 CDT [-] Exception rendering: 2006/09/26 21:17 CDT [-] Unhandled Error Traceback (most recent call last): File "/usr/lib/python2.4/threading.py", line 422, in run self.__target(*self.__args, **self.__kwargs) File "/home/trac/Projects/Twisted/trunk/twisted/python/threadpool.py", line 148, in _worker context.call(ctx, function, *args, **kwargs) File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 59, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 37, in callWithContext return func(*args,**kw) --- <exception caught here> --- File "/home/trac/Projects/Twisted/trunk/twisted/web2/wsgi.py", line 138, in run result = self.application(self.environment, self.startWSGIResponse) File "/home/trac/Run/trac/external.py", line 64, in __call__ path_parts = filter(None, path_info.split('/')) exceptions.MemoryError:
and finally
2006/09/26 21:17 CDT [HTTPChannel,3293,198.49.126.190] Unhandled Error Traceback (most recent call last): File "/usr/lib/python2.4/threading.py", line 442, in __bootstrap self.run() File "/usr/lib/python2.4/threading.py", line 422, in run self.__target(*self.__args, **self.__kwargs) --- <exception caught here> --- File "/home/trac/Projects/Twisted/trunk/twisted/python/threadpool.py", line 148, in _worker context.call(ctx, function, *args, **kwargs) File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 59, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 37, in callWithContext return func(*args,**kw) File "/home/trac/Projects/Twisted/trunk/twisted/web2/wsgi.py", line 142, in run reactor.callFromThread(self.__error, failure.Failure()) exceptions.MemoryError:
comment:29 by , 18 years ago
Replying to exarkun@divmod.com:
The various MemoryError
s mean that your server is leaking memory, for some reason.
I hope it's not due to our little pool issue ;)
I think I understood what's going on with the AssertionError, finally. It can be explained if the thread ids are reused quickly, before all the PooledConnection created in the previous "same" thread are GCed. Can you try out attachment:debug_thread_reuse.diff in order to verify this hypothesis?
by , 18 years ago
Attachment: | debug_thread_reuse.diff added |
---|
Some more debugging code to check the "rapid thread id reuse" hypothesis
comment:30 by , 18 years ago
… and if this is indeed the problem, the fix would be to simply acknowledge that this can happen:
-
pool.py
102 102 try: 103 103 if tid in self._active: 104 104 num, cnx_ = self._active.get(tid) 105 assert cnx is cnx_ 106 if num > 1: 107 self._active[tid][0] = num - 1 108 else: 109 self._cleanup(tid) 105 if cnx is cnx_: 106 if num > 1: 107 self._active[tid][0] = num - 1 108 else: 109 self._cleanup(tid) 110 # otherwise, cnx was already cleaned up during a shutdown(tid), 111 # and in the meantime, `tid` has been reused (#3504) 110 112 finally: 111 113 self._available.release() 112 114
comment:31 by , 18 years ago
It just occurred to me that you're probably using a Thread pool in Twisted. Is that the case? That would explain why you're seeing that error relatively often. With tracd on windows, I do observe thread id reuse, but probably in this case the reuse interval is large enough to give time to all the PooledConnection to be GCed before the reuse happens. With a Thread pool, the reuse interval is certainly much shorter.
But no matter if the thread id gets reused because of a thread pool or because the OS creates a new thread with a "recycled" id, the analysis and "fix" remains correct, I think.
comment:32 by , 18 years ago
Milestone: | 0.10.1 → 0.10 |
---|
It's probably still time to put that patch in 0.10.
comment:34 by , 18 years ago
Resolution: | → fixed |
---|---|
Severity: | major → critical |
Status: | reopened → closed |
So I've committed r3798.
exarkun: I'd like to see what attachment:debug_thread_reuse.diff (on top of r3797) produces, if you find time to do this test.
follow-up: 36 comment:35 by , 18 years ago
I updated to r3798 before I noticed that patch, so I modified it slightly in order to get it all to apply. The difference is that the 2nd to last hunk in pool.py now looks like:
@@ -102,6 +123,15 @@ try: if tid in self._active: num, cnx_ = self._active.get(tid) + if cnx is not cnx_: + print 'ConnectionPool is ', repr(self) + print 'In [%d], _active -> %s\n _dormant -> %s' % \ + (threading._get_ident(), + repr(self._active), repr(self._dormant)) + print 'Available connections: %d/%d' % \ + (self._cursize, self._maxsize) + print 'PooledConnection created from [%d] %s' % \ + (tid, repr(cnx)) if cnx is cnx_: if num > 1: self._active[tid][0] = num - 1
Here's some preliminary output.
2006/09/27 09:51 CDT [HTTPChannel,10,198.49.126.190] [-1217746000] THREAD REUSED (10) 2006/09/27 09:51 CDT [HTTPChannel,10,198.49.126.190] [-1217746000] THREAD REUSED (10) 2006/09/27 09:51 CDT [HTTPChannel,10,198.49.126.190] [-1217746000] THREAD REUSED (10) 2006/09/27 09:51 CDT [HTTPChannel,14,198.49.126.190] [-1217746000] THREAD REUSED (14) 2006/09/27 09:51 CDT [HTTPChannel,14,198.49.126.190] [-1217746000] THREAD REUSED (14) 2006/09/27 09:51 CDT [HTTPChannel,14,198.49.126.190] [-1217746000] THREAD REUSED (14) 2006/09/27 09:51 CDT [HTTPChannel,17,198.49.126.190] [-1217746000] THREAD REUSED (17) 2006/09/27 09:51 CDT [HTTPChannel,17,198.49.126.190] [-1217746000] THREAD REUSED (17) 2006/09/27 09:51 CDT [HTTPChannel,17,198.49.126.190] [-1217746000] THREAD REUSED (17) 2006/09/27 09:51 CDT [HTTPChannel,18,198.49.126.190] [-1217746000] THREAD REUSED (18) 2006/09/27 09:51 CDT [HTTPChannel,18,198.49.126.190] [-1217746000] THREAD REUSED (18) 2006/09/27 09:51 CDT [HTTPChannel,18,198.49.126.190] [-1217746000] THREAD REUSED (18) 2006/09/27 09:51 CDT [HTTPChannel,19,198.49.126.190] [-1217746000] THREAD REUSED (19) 2006/09/27 09:51 CDT [HTTPChannel,19,198.49.126.190] [-1217746000] THREAD REUSED (19) 2006/09/27 09:51 CDT [HTTPChannel,19,198.49.126.190] [-1217746000] THREAD REUSED (19) 2006/09/27 09:51 CDT [HTTPChannel,20,198.49.126.190] [-1217746000] THREAD REUSED (20) 2006/09/27 09:51 CDT [HTTPChannel,20,198.49.126.190] [-1217746000] THREAD REUSED (20) 2006/09/27 09:52 CDT [HTTPChannel,30,198.49.126.190] [-1217746000] THREAD REUSED (30) 2006/09/27 09:52 CDT [HTTPChannel,30,198.49.126.190] [-1217746000] THREAD REUSED (30) 2006/09/27 09:52 CDT [HTTPChannel,30,198.49.126.190] [-1217746000] THREAD REUSED (30) 2006/09/27 09:52 CDT [HTTPChannel,34,198.49.126.190] [-1217746000] THREAD REUSED (34) 2006/09/27 09:52 CDT [HTTPChannel,34,198.49.126.190] [-1217746000] THREAD REUSED (34) 2006/09/27 09:52 CDT [HTTPChannel,35,198.49.126.190] [-1217746000] THREAD REUSED (35) 2006/09/27 09:52 CDT [HTTPChannel,35,198.49.126.190] [-1217746000] THREAD REUSED (35) 2006/09/27 09:52 CDT [HTTPChannel,35,198.49.126.190] [-1217746000] THREAD REUSED (35) 2006/09/27 09:52 CDT [HTTPChannel,39,198.49.126.190] [-1240282192] THREAD REUSED (1) 2006/09/27 09:52 CDT [HTTPChannel,39,198.49.126.190] [-1240282192] THREAD REUSED (1) 2006/09/27 09:52 CDT [HTTPChannel,39,198.49.126.190] [-1240282192] THREAD REUSED (1) 2006/09/27 09:52 CDT [HTTPChannel,39,198.49.126.190] [-1240282192] THREAD REUSED (1) 2006/09/27 09:52 CDT [HTTPChannel,39,198.49.126.190] [-1240282192] THREAD REUSED (1) 2006/09/27 09:52 CDT [HTTPChannel,39,198.49.126.190] [-1240282192] THREAD REUSED (1) 2006/09/27 09:52 CDT [HTTPChannel,40,198.49.126.190] [-1217746000] THREAD REUSED (37) 2006/09/27 09:52 CDT [HTTPChannel,40,198.49.126.190] [-1217746000] THREAD REUSED (37) 2006/09/27 09:52 CDT [HTTPChannel,40,198.49.126.190] [-1217746000] THREAD REUSED (37) 2006/09/27 09:52 CDT [HTTPChannel,40,198.49.126.190] [-1217746000] THREAD REUSED (37) 2006/09/27 09:52 CDT [HTTPChannel,40,198.49.126.190] [-1217746000] THREAD REUSED (37) 2006/09/27 09:52 CDT [HTTPChannel,40,198.49.126.190] [-1217746000] THREAD REUSED (37) 2006/09/27 09:52 CDT [HTTPChannel,41,198.49.126.190] [-1240282192] THREAD REUSED (2) 2006/09/27 09:52 CDT [HTTPChannel,41,198.49.126.190] [-1240282192] THREAD REUSED (2) 2006/09/27 09:52 CDT [HTTPChannel,41,198.49.126.190] [-1240282192] THREAD REUSED (2) 2006/09/27 09:52 CDT [HTTPChannel,42,198.49.126.190] [-1217746000] THREAD REUSED (38) 2006/09/27 09:52 CDT [HTTPChannel,42,198.49.126.190] [-1217746000] THREAD REUSED (38) 2006/09/27 09:52 CDT [HTTPChannel,51,198.49.126.190] [-1240282192] THREAD REUSED (7) 2006/09/27 09:52 CDT [HTTPChannel,51,198.49.126.190] [-1240282192] THREAD REUSED (7) 2006/09/27 09:52 CDT [HTTPChannel,51,198.49.126.190] [-1240282192] THREAD REUSED (7) 2006/09/27 09:52 CDT [HTTPChannel,51,198.49.126.190] [-1240282192] THREAD REUSED (7) 2006/09/27 09:52 CDT [HTTPChannel,51,198.49.126.190] [-1240282192] THREAD REUSED (7) 2006/09/27 09:52 CDT [HTTPChannel,51,198.49.126.190] [-1240282192] THREAD REUSED (7) 2006/09/27 09:52 CDT [HTTPChannel,52,198.49.126.190] [-1217746000] THREAD REUSED (43) 2006/09/27 09:52 CDT [HTTPChannel,52,198.49.126.190] [-1217746000] THREAD REUSED (43) 2006/09/27 09:52 CDT [HTTPChannel,52,198.49.126.190] [-1217746000] THREAD REUSED (43) 2006/09/27 09:52 CDT [HTTPChannel,52,198.49.126.190] [-1217746000] THREAD REUSED (43) 2006/09/27 09:52 CDT [HTTPChannel,52,198.49.126.190] [-1217746000] THREAD REUSED (43) 2006/09/27 09:52 CDT [HTTPChannel,52,198.49.126.190] [-1217746000] THREAD REUSED (43) 2006/09/27 09:53 CDT [HTTPChannel,53,198.49.126.190] [-1240282192] THREAD REUSED (8) 2006/09/27 09:53 CDT [HTTPChannel,53,198.49.126.190] [-1240282192] THREAD REUSED (8) 2006/09/27 09:53 CDT [HTTPChannel,53,198.49.126.190] [-1240282192] THREAD REUSED (8) 2006/09/27 09:53 CDT [HTTPChannel,54,198.49.126.190] [-1217746000] THREAD REUSED (44) 2006/09/27 09:53 CDT [HTTPChannel,54,198.49.126.190] [-1217746000] THREAD REUSED (44) 2006/09/27 09:53 CDT [HTTPChannel,54,198.49.126.190] [-1217746000] THREAD REUSED (44) 2006/09/27 09:53 CDT [HTTPChannel,56,198.49.126.190] [-1240282192] THREAD REUSED (9) 2006/09/27 09:53 CDT [HTTPChannel,56,198.49.126.190] [-1240282192] THREAD REUSED (9) 2006/09/27 09:54 CDT [HTTPChannel,57,198.49.126.190] [-1217746000] THREAD REUSED (46) 2006/09/27 09:54 CDT [HTTPChannel,57,198.49.126.190] [-1217746000] THREAD REUSED (46) 2006/09/27 09:54 CDT [HTTPChannel,58,198.49.126.190] [-1240282192] THREAD REUSED (10) 2006/09/27 09:54 CDT [HTTPChannel,58,198.49.126.190] [-1240282192] THREAD REUSED (10) 2006/09/27 09:54 CDT [HTTPChannel,59,198.49.126.190] [-1217746000] THREAD REUSED (47) 2006/09/27 09:54 CDT [HTTPChannel,59,198.49.126.190] [-1217746000] THREAD REUSED (47) 2006/09/27 09:54 CDT [HTTPChannel,59,198.49.126.190] [-1217746000] THREAD REUSED (47) 2006/09/27 09:54 CDT [HTTPChannel,59,198.49.126.190] [-1217746000] THREAD REUSED (47) 2006/09/27 09:54 CDT [HTTPChannel,59,198.49.126.190] [-1217746000] THREAD REUSED (47) 2006/09/27 09:54 CDT [HTTPChannel,59,198.49.126.190] [-1217746000] THREAD REUSED (47) 2006/09/27 09:54 CDT [HTTPChannel,60,198.49.126.190] [-1240282192] THREAD REUSED (11) 2006/09/27 09:54 CDT [HTTPChannel,60,198.49.126.190] [-1240282192] THREAD REUSED (11) 2006/09/27 09:54 CDT [HTTPChannel,60,198.49.126.190] [-1240282192] THREAD REUSED (11) 2006/09/27 09:54 CDT [HTTPChannel,60,198.49.126.190] [-1240282192] THREAD REUSED (11) 2006/09/27 09:54 CDT [HTTPChannel,60,198.49.126.190] [-1240282192] THREAD REUSED (11) 2006/09/27 09:54 CDT [HTTPChannel,60,198.49.126.190] [-1240282192] THREAD REUSED (11) 2006/09/27 09:54 CDT [HTTPChannel,60,198.49.126.190] [-1240282192] THREAD REUSED (11) 2006/09/27 09:54 CDT [HTTPChannel,63,198.49.126.190] [-1217746000] THREAD REUSED (50) 2006/09/27 09:54 CDT [HTTPChannel,63,198.49.126.190] [-1217746000] THREAD REUSED (50) 2006/09/27 09:54 CDT [HTTPChannel,63,198.49.126.190] [-1217746000] THREAD REUSED (50) 2006/09/27 09:54 CDT [HTTPChannel,63,198.49.126.190] [-1217746000] THREAD REUSED (50) 2006/09/27 09:54 CDT [HTTPChannel,63,198.49.126.190] [-1217746000] THREAD REUSED (50) 2006/09/27 09:54 CDT [HTTPChannel,64,198.49.126.190] [-1240282192] THREAD REUSED (12) 2006/09/27 09:54 CDT [HTTPChannel,64,198.49.126.190] [-1240282192] THREAD REUSED (12) 2006/09/27 09:54 CDT [HTTPChannel,64,198.49.126.190] [-1240282192] THREAD REUSED (12) 2006/09/27 09:54 CDT [HTTPChannel,64,198.49.126.190] [-1240282192] THREAD REUSED (12) 2006/09/27 09:54 CDT [HTTPChannel,64,198.49.126.190] [-1240282192] THREAD REUSED (12) 2006/09/27 09:54 CDT [HTTPChannel,64,198.49.126.190] [-1240282192] THREAD REUSED (12) 2006/09/27 09:55 CDT [HTTPChannel,65,198.49.126.190] [-1217746000] THREAD REUSED (51) 2006/09/27 09:55 CDT [HTTPChannel,65,198.49.126.190] [-1217746000] THREAD REUSED (51) 2006/09/27 09:55 CDT [HTTPChannel,66,198.49.126.190] [-1240282192] THREAD REUSED (13) 2006/09/27 09:55 CDT [HTTPChannel,66,198.49.126.190] [-1240282192] THREAD REUSED (13) 2006/09/27 09:56 CDT [HTTPChannel,76,198.49.126.190] [-1240282192] THREAD REUSED (18) 2006/09/27 09:56 CDT [HTTPChannel,76,198.49.126.190] [-1240282192] THREAD REUSED (18) 2006/09/27 09:56 CDT [HTTPChannel,77,198.49.126.190] [-1217746000] THREAD REUSED (57) 2006/09/27 09:56 CDT [HTTPChannel,77,198.49.126.190] [-1217746000] THREAD REUSED (57) 2006/09/27 09:56 CDT [HTTPChannel,83,198.49.126.190] [-1217746000] THREAD REUSED (60) 2006/09/27 09:56 CDT [HTTPChannel,83,198.49.126.190] [-1217746000] THREAD REUSED (60) 2006/09/27 09:57 CDT [HTTPChannel,84,198.49.126.190] [-1240282192] THREAD REUSED (22) 2006/09/27 09:57 CDT [HTTPChannel,84,198.49.126.190] [-1240282192] THREAD REUSED (22) 2006/09/27 09:57 CDT [HTTPChannel,84,198.49.126.190] [-1240282192] THREAD REUSED (22) 2006/09/27 09:57 CDT [HTTPChannel,85,198.49.126.190] [-1217746000] THREAD REUSED (61) 2006/09/27 09:57 CDT [HTTPChannel,85,198.49.126.190] [-1217746000] THREAD REUSED (61) 2006/09/27 09:57 CDT [HTTPChannel,84,198.49.126.190] [-1240282192] THREAD REUSED (22) 2006/09/27 09:57 CDT [HTTPChannel,84,198.49.126.190] [-1240282192] THREAD REUSED (22) 2006/09/27 09:57 CDT [HTTPChannel,95,198.49.126.190] [-1217746000] THREAD REUSED (71) 2006/09/27 09:57 CDT [HTTPChannel,95,198.49.126.190] [-1217746000] THREAD REUSED (71) 2006/09/27 09:57 CDT [HTTPChannel,95,198.49.126.190] [-1217746000] THREAD REUSED (71) 2006/09/27 09:57 CDT [HTTPChannel,96,198.49.126.190] [-1240282192] THREAD REUSED (23) 2006/09/27 09:57 CDT [HTTPChannel,96,198.49.126.190] [-1240282192] THREAD REUSED (23) 2006/09/27 09:57 CDT [HTTPChannel,97,198.49.126.190] [-1217746000] THREAD REUSED (72) 2006/09/27 09:57 CDT [HTTPChannel,97,198.49.126.190] [-1217746000] THREAD REUSED (72) 2006/09/27 09:57 CDT [HTTPChannel,97,198.49.126.190] [-1217746000] THREAD REUSED (72) 2006/09/27 09:57 CDT [HTTPChannel,107,198.49.126.190] [-1240282192] THREAD REUSED (28) 2006/09/27 09:57 CDT [HTTPChannel,107,198.49.126.190] [-1240282192] THREAD REUSED (28)
I can provide more later if you want.
comment:36 by , 18 years ago
Replying to exarkun@divmod.com:
…
I can provide more later if you want.
It would be interesting to see if you get an AssertionError for assert pooled_connections_per_thread == 0
. If this is the case, then this is the situation which would have triggered an AssertionError for assert cnx is cnx_
in the "old" code.
Other than that, I think we now have a satisfactory implementation of the connection pool, until we make 2.4 mandatory and can use thread-local storage ;)
Ah, it would be also interesting to see if you still have your out-of-memory issues, but I think they are not related…
follow-up: 38 comment:37 by , 18 years ago
I haven't seen any log message other than the THREAD REUSED message so far. With about the same frequency of the previous AttributeError, though, the server now stops responding to all requests and needs to be SIGKILL'd and restarted.
comment:38 by , 18 years ago
Replying to anonymous:
I haven't seen any log message other than the THREAD REUSED message so far. With about the same frequency of the previous AttributeError, though, the server now stops responding to all requests and needs to be SIGKILL'd and restarted.
Hm, that's probably due to the assert pooled_connections_per_thread == 0
. Simply revert all the debugging changes and see if this freeze still happens with stock 0.10.
Two related questions:
- what was that frequency, approximately?
- what about the out-of-memory errors, do they still happen?
comment:39 by , 18 years ago
Okay, running with the same revision minus the debug changes. I'll comment again in a few days with a summary of the behavior.
The frequency of hangs was somewhere around twice per day. I haven't seen any out of memory errors since the ones I included in that comment.
follow-up: 41 comment:40 by , 18 years ago
The frequency of hangs has fallen quite a bit. I've only seen one in the week or so since updating last.
comment:41 by , 18 years ago
Replying to anonymous:
… I've only seen one in the week or so since updating last.
Good… and bad, this means the possibility of hangs is still there. After looking at the code a bit more, I think there was one more bug…
If you don't mind testing a bit more, I've prepared a new patch (on trunk, but should probably apply on 0.10 as well). Also, if there's still a hang due to Trac, it will get replaced by a user visible "TimeoutError" exception.
(Side note: while stress-testing the code, I still couldn't trigger any Database is locked exception with SQLite, so it looks like r3830 was really a good idea ;) )
by , 18 years ago
Attachment: | db_pool_timeout-r3868.patch added |
---|
There was apparently a missing decrement of the cursize, potentially the cause for a hang. We'll see…
comment:42 by , 18 years ago
Trying the patch out now. Looks like the first week of good behavior was a slight fluke, the server's back to hanging about once a day now.
comment:43 by , 18 years ago
At least some of the hangs seem to be due to a bug in Twisted's WSGI container. Once this is resolved, I'll open a new ticket if I still have problems, since it seems like the issue originally associated with this ticket is indeed resolved.
comment:44 by , 18 years ago
No sooner than I speak does it hang due to a different bug… Filed #3923
I sometimes get this error message too on FreeBSD 6.1 AMD64, with Apache 2.2.3 and FastCGI 2.4.2, using Trac 0.10-dev (r3623).
When this error occurs, the message is repeatedly written to the apache error log until that partition gets filled. Restarting the webserver gets things going again until the same problem occurs a few days later.