Edgewall Software
Modify

Opened 18 years ago

Closed 18 years ago

Last modified 6 years ago

#3504 closed defect (fixed)

AssertionError in trac.db.pool.PooledConnection.__del__

Reported by: exarkun@… Owned by: Christian Boos
Priority: high Milestone: 0.10
Component: general Version: devel
Severity: critical Keywords: database connection pool
Cc: exarkun@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description

On nearly every request, my server logs this error:

Exception exceptions.AssertionError: <exceptions.AssertionError instance at 0x89faeacc> in <bound method PooledConnection.__del__ of <trac.db.pool.PooledConnection object at 0xb47cc88c>> ignored

Attachments (6)

force_close_at_thread_exit.patch (2.6 KB ) - added by Christian Boos 18 years ago.
Fix for #3504, on top of r3747. The final call to cnx.close() in the request dispatcher forces the disposal of the pysqlite Connection object, even if there are still other PooledConnections waiting to be garbage collected.
selective_shutdown_at_thread_exit.patch (4.5 KB ) - added by Christian Boos 18 years ago.
Another possible fix for this ticket, still on top of r3747. This solution is a bit more satisfying, as it doesn't require to create a Connection when there's no need to, so an additional benefit of this patch could be a small performance boost.
even_more_threadsafe_pool-r3756.diff (3.4 KB ) - added by Christian Boos 18 years ago.
Now the PooledConnection remembers which thread it has been created in. If GC of the instance happens in another thread, avoid to do the cleanup from there, and instead defer the cleanup of the Connection to be done when the thread that created it terminates.
even_more_threadsafe_pool-r3762.diff (2.7 KB ) - added by Christian Boos 18 years ago.
Ok, this last patch seems to be more correct. Please review. (take 2)
debug_thread_reuse.diff (3.1 KB ) - added by Christian Boos 18 years ago.
Some more debugging code to check the "rapid thread id reuse" hypothesis
db_pool_timeout-r3868.patch (6.2 KB ) - added by Christian Boos 18 years ago.
There was apparently a missing decrement of the cursize, potentially the cause for a hang. We'll see…

Download all attachments as: .zip

Change History (50)

comment:1 by anonymous, 18 years ago

Cc: exarkun@… added

comment:2 by anonymous, 18 years ago

I sometimes get this error message too on FreeBSD 6.1 AMD64, with Apache 2.2.3 and FastCGI 2.4.2, using Trac 0.10-dev (r3623).

When this error occurs, the message is repeatedly written to the apache error log until that partition gets filled. Restarting the webserver gets things going again until the same problem occurs a few days later.

in reply to:  2 comment:3 by Christian Boos, 18 years ago

Milestone: 0.10.1
Severity: normalcritical

Replying to anonymous:

… When this error occurs, the message is repeatedly written to the apache error log until that partition gets filled.

Ouch! that sounds serious. Putting a tentative milestone on this.

comment:4 by Christian Boos, 18 years ago

First, what's the database backend you're using, and what are the versions used (for the bindings and for the db itself)?

comment:5 by hpdl@…, 18 years ago

I apologize - inspecting the logs after this happened again shows another error than originally reported in this ticket is being repeatedly added to the error log file. I'll open a new ticket to describe the errors there. I apologize again for the inconvenience caused.

comment:6 by Christian Boos, 18 years ago

Keywords: database connection pool added
Milestone: 0.10.10.10
Owner: changed from Jonas Borgström to Christian Boos
Severity: criticalblocker
Version: 0.9.6devel

The AssertionError error reported here is indicative of a very serious issue (see #3742, which I just closed as duplicate of this one).

Now the good news is that I think I understood the problem… The pool.py code assumes that garbage collection of a PooledConnection will happen in the same thread than the one in which the instance was created. I believe this is a wrong assumption. In turn, this messes up the logic of the pool, and after a while, when all the slots for new connections have been exhausted, every new thread will become stuck waiting on the _available condition variable, giving the impression that the application freezes.

If that would explain my freezes on #3332, I would be very happy!

comment:7 by Christian Boos, 18 years ago

Status: newassigned

More detailed explanation:

  • with tracd, each request is processed by a dedicated thread. It happens (1) that some PooledConnection created by this thread don't get garbage collected before the thread exits.
  • those PooledConnection were therefore garbage collected in another thread, and this triggered the assertion failed discussed above.
  • however, even if the assertion is avoided by carrying the right thread identifier around and the proper connection would be freed from the other thread, a pysqlite ProgrammingError would then be raised, because pysqlite doesn't allow a connection object to be manipulated from the thread in which it has been created.
  • therefore, we must ensure that before the thread exists, the pysqlite Connection object created by it is properly closed.

The attachment:force_close_at_thread_exit.patch implements that.

I'm not entirely satisfied with the patch, though, as we still force the creation/disposal of a pysqlite connection, even if no db connection is needed for processing the request (think Chrome).

Perhaps doing a env.signal_thread_exits(threading._get_ident()), that gets propagated by the environment to the db pool would be a way…


(1) what I still don't understand is:

  • why do we see this bug only now? It's been there since ages.
  • why does it seem to be triggered so quickly by the report module (copy a report, modify the sql, delete it, repeat that a few times and bang, it hangs)

by Christian Boos, 18 years ago

Fix for #3504, on top of r3747. The final call to cnx.close() in the request dispatcher forces the disposal of the pysqlite Connection object, even if there are still other PooledConnections waiting to be garbage collected.

by Christian Boos, 18 years ago

Another possible fix for this ticket, still on top of r3747. This solution is a bit more satisfying, as it doesn't require to create a Connection when there's no need to, so an additional benefit of this patch could be a small performance boost.

comment:8 by Christian Boos, 18 years ago

Resolution: fixed
Status: assignedclosed

Fixed in r3756, by applying attachment:selective_shutdown_at_thread_exit.patch (which also contained r3755).

comment:9 by Christian Boos, 18 years ago

Ok, after a good night of sleep I just realized that though the fix seems to work (I couldn't reproduce neither the AssertionError nor the hangs with r3756), it's still theoretically possible that a PooledConnection created in one thread gets GCed in another one…

I'll follow-up to r3756 shortly.

by Christian Boos, 18 years ago

Now the PooledConnection remembers which thread it has been created in. If GC of the instance happens in another thread, avoid to do the cleanup from there, and instead defer the cleanup of the Connection to be done when the thread that created it terminates.

comment:10 by Christian Boos, 18 years ago

Well, attachment:even_more_threadsafe_pool-r3756.diff implements an even more robust cleanup, which should really avoid the risk of AssertError and closing of a Connection from a different thread.

But this starts to become heavy, especially for backends which don't support pooling in the first place ;)

comment:11 by anonymous, 18 years ago

FWIW, I updated to r3756 and have seen the ignored AssertionError once since then.

comment:12 by Christian Boos, 18 years ago

Resolution: fixed
Status: closedreopened

Oh, then I think it's worth applying the latest patch.

by Christian Boos, 18 years ago

Ok, this last patch seems to be more correct. Please review. (take 2)

comment:13 by Christian Boos, 18 years ago

Above patch applied in r3763, minus the print this time ;)

comment:14 by anonymous, 18 years ago

I'll update and try it out soon. In the mean time, I woke up this morning to hundreds or thousands of occurrences of this tracebacks like this in my logfile:

2006/09/22 07:06 CDT [-] Exception rendering:
2006/09/22 07:06 CDT [-] Unhandled Error
        Traceback (most recent call last):
          File "/usr/lib/python2.4/threading.py", line 422, in run
            self.__target(*self.__args, **self.__kwargs)
          File "/home/trac/Projects/Twisted/trunk/twisted/python/threadpool.py", line 148, in _worker
            context.call(ctx, function, *args, **kwargs)
          File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 59, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 37, in callWithContext
            return func(*args,**kw)
        --- <exception caught here> ---
          File "/home/trac/Projects/Twisted/trunk/twisted/web2/wsgi.py", line 138, in run
            result = self.application(self.environment, self.startWSGIResponse)
          File "/home/trac/Run/trac/external.py", line 74, in __call__
            return self.application(environ, start_response)
          File "/home/trac/Run/trac/external.py", line 115, in tracApplication
            return trac.web.main.dispatch_request(environ, start_response)
          File "/home/trac/Projects/trac/trunk/trac/web/main.py", line 389, in dispatch_request
            req.send_error(sys.exc_info(), status=500)
          File "/home/trac/Projects/trac/trunk/trac/web/api.py", line 375, in send_error
            data = self.hdf.render(template)
          File "/home/trac/Projects/trac/trunk/trac/web/clearsilver.py", line 288, in render
            
        neo_util.Error: Traceback (innermost last):
          File "csparse.c", line 298, in cs_parse_file()
          File "neo_files.c", line 86, in ne_load_file_len()
        SystemError: Unable to open file /home/trac/Projects/trac/trunk/templates/error.cs: [24] Too many open files

Before restarting I checked /proc to find out what files were open. About a thousand of the open file descriptors referred to trac.db.

comment:15 by Christian Boos, 18 years ago

I see you're running Trac with Twisted… could it be that the 'wsgi.multithread' wsgi environment is not correctly set? Because that would explain the error. Check by adding a "print" in source:trunk/trac/web/main.py@3762#L361.

PS: on a somewhat related note, can you point me to some instructions about how to use Trac within Twisted? Thanks!

comment:16 by anonymous, 18 years ago

The variable is true. Is that right? :) I have no idea. I do know the server doesn't fork.

The site is running based on the stuff here:

http://cvs.twistedmatrix.com/cvs/sandbox/exarkun/t.web2-wsgi-trac/

It's somewhat specialized for our particular configuration, but it might still be useful as a starting point if you want to run trac in Twisted. (I'm not sure this is a great thing to want just yet, Twisted's wsgi support is very new and not yet entirely stable.)

in reply to:  16 comment:17 by Christian Boos, 18 years ago

Replying to anonymous:

The variable is true. Is that right? :) I have no idea. I do know the server doesn't fork.

Yes, it should be true, so that's fine. Well, now please test on r3756 (and also with the upcoming patch).

The site is running based on the stuff here: …

Thanks!

comment:18 by Christian Boos, 18 years ago

I've also stress-tested r3763 with a brand new PosgreSQL 8.1.4 (pyPgSQL-2.5.1.win32-py2.4), and at first it seemed pretty fast and robust (… and no database is locked anymore, joy!).

But upon closer inspection, I discovered that the pool was created over and over again. Hence the following fix:

Index: trac/db/api.py
===================================================================
--- trac/db/api.py	(revision 3763)
+++ trac/db/api.py	(working copy)
@@ -73,7 +73,8 @@
     def shutdown(self, tid=None):
         if self._cnx_pool:
             self._cnx_pool.shutdown(tid)
-            self._cnx_pool = None
+            if not tid:
+                self._cnx_pool = None
 
     def _get_connector(self): ### FIXME: Make it public?
         scheme, args = _parse_db_str(self.connection_uri)

After that, things seemed even faster, if possible ;)

Now I'll test back with pysqlite, plus test both backends in single thread mode.

comment:19 by Christian Boos, 18 years ago

Ok, with r3764 it seems really solid with sqlite/posgres, with tracd multi/singlethread on Windows. Now I'll test on Linux…

comment:20 by Christian Boos, 18 years ago

Resolution: fixed
Status: reopenedclosed

Tested on Linux with apache/prefork (not that relevant, I know) and tracd, both with pysqlite. I verified that tracd doesn't leak file handles.

I'm pretty confident we can close this now.

So, if the leak of open files still occurs with Twisted even with r3764, please open a new ticket, targeted to 0.10.1.

comment:21 by anonymous, 18 years ago

The file leak does appear to be fixed. However, the AssertionError still occurs, although apparently less freqeuntly. Since upgrading on Friday, it has only happened twice:

2006/09/22 21:10 CDT [HTTPChannel,327,198.49.126.190] Exception exceptions.AssertionError: <exceptions.AssertionError instance at 0xb7794dac> in <bound method PooledConnection.del of <trac.db.pool.PooledConnection object at 0xb4b75dec>> ignored 2006/09/25 05:12 CDT [HTTPChannel,25208,198.49.126.190] Exception exceptions.AssertionError: <exceptions.AssertionError instance at 0xb7770ecc> in <bound method PooledConnection.del of <trac.db.pool.PooledConnection object at 0x87c248c>> ignored

comment:22 by Christian Boos, 18 years ago

Uh ho, right, there was a bug:

Index: pool.py
===================================================================
--- pool.py	(revision 3788)
+++ pool.py	(working copy)
@@ -122,8 +122,8 @@
                     cnx.rollback() # non-poolable but same thread: close
                     cnx.close()
                     self._cursize -= 1
-                else: # non-poolable, different thread: do nothing
-                    self._active[tid][0] = 0
+                else: # non-poolable, different thread: push it back
+                    self._active[tid] = (0, cnx)
                 self._available.notify()
 
     def shutdown(self, tid=None):

In some rather convoluted circumstances, this could lead to the AssertionError…

In case anyone's interested in the details: two PooledConnections (a and b) in the same thread (say A) are ready to be GCed; if a happens to be GCed in another thread, it will take the self._active[tid][0] = 0 buggy code path and the self._active[A] slot is now empty; after that, if a new PooledConnection is created in A, there will be a new cnx created and put into self._active[A]; then, when b gets GCed, the AssertionError will be raised…

in reply to:  22 comment:23 by Christian Boos, 18 years ago

Follow-up to comment:22:

In case anyone's interested in the details: …

Oh well, the above scenario seems not possible, as only one PooledConnection can trigger the _cleanup. It must be something else, similarly hairy…

in reply to:  22 comment:24 by Christian Boos, 18 years ago

Another follow-up to comment:22:

Uh ho, right, there was a bug:

...

The correct and tested fix is r3791.

comment:25 by Matthew Good, 18 years ago

#3788 has been marked as a duplicate.

comment:26 by anonymous, 18 years ago

From my server log running r3791:

2006/09/26 14:57 CDT [HTTPChannel,5859,198.49.126.190] Exception exceptions.AssertionError: <exceptions.AssertionError instance at 0xb76c418c> in <bound method PooledConnection.__del__ of <trac.db.pool.PooledConnection object at 0xb583268c>> ignored

in reply to:  26 comment:27 by Christian Boos, 18 years ago

Milestone: 0.100.10.1
Resolution: fixed
Severity: blockermajor
Status: closedreopened

Replying to anonymous:

From my server log running r3791: …

Please add some debugging info:

Index: pool.py
===================================================================
--- pool.py	(revision 3791)
+++ pool.py	(working copy)
@@ -102,6 +102,12 @@
         try:
             if tid in self._active:
                 num, cnx_ = self._active.get(tid)
+                if cnx is not cnx_:
+                    print 'In [%d], _active -> %s\n _dormant -> %s' % \
+                          (threading._get_ident(),
+                           repr(self._active), repr(self._dormant))
+                    print 'PooledConnection created from [%d] %s' % \
+                          (tid, repr(cnx))
                 assert cnx is cnx_
                 if num > 1:
                     self._active[tid][0] = num - 1

comment:28 by exarkun@…, 18 years ago

Some output from the log file with the above instrumentation added:

2006/09/26 21:16 CDT [HTTPChannel,2702,198.49.126.190] In [-1240794192], _active -> {-1276134480: [1, <trac.db.sqlite_backend.SQLiteConnection object at 0x8e4b70c>]}
2006/09/26 21:16 CDT [HTTPChannel,2702,198.49.126.190]  _dormant -> []
2006/09/26 21:16 CDT [HTTPChannel,2702,198.49.126.190] PooledConnection created from [-1276134480] <trac.db.sqlite_backend.SQLiteConnection object at 0x8e3a5cc>
2006/09/26 21:16 CDT [HTTPChannel,2702,198.49.126.190] Exception exceptions.AssertionError: <exceptions.AssertionError instance at 0x8e4b8ac> in <bound method PooledConnection.__del__ of <trac.db.pool.PooledConnection object at 0x8e4144c>> ignored
2006/09/26 21:16 CDT [HTTPChannel,2884,198.49.126.190] In [-1267741776], _active -> {-1218258000: [1, <trac.db.sqlite_backend.SQLiteConnection object at 0x8730d0c>]}
2006/09/26 21:16 CDT [HTTPChannel,2884,198.49.126.190]  _dormant -> []
2006/09/26 21:16 CDT [HTTPChannel,2884,198.49.126.190] PooledConnection created from [-1218258000] <trac.db.sqlite_backend.SQLiteConnection object at 0x8ab222c>
2006/09/26 21:16 CDT [HTTPChannel,2884,198.49.126.190] Exception exceptions.AssertionError: <exceptions.AssertionError instance at 0x8ab2ecc> in <bound method PooledConnection.__del__ of <trac.db.pool.PooledConnection object at 0x8ab25ec>> ignored
2006/09/26 21:16 CDT [HTTPChannel,3021,198.49.126.190] In [-1364345936], _active -> {-1267741776: [2, <trac.db.sqlite_backend.SQLiteConnection object at 0x85e3aac>]}
2006/09/26 21:16 CDT [HTTPChannel,3021,198.49.126.190]  _dormant -> []
2006/09/26 21:16 CDT [HTTPChannel,3021,198.49.126.190] PooledConnection created from [-1267741776] <trac.db.sqlite_backend.SQLiteConnection object at 0x85d380c>
2006/09/26 21:16 CDT [HTTPChannel,3021,198.49.126.190] Exception exceptions.AssertionError: <exceptions.AssertionError instance at 0x8604f8c> in <bound method PooledConnection.__del__ of <trac.db.pool.PooledConnection object at 0xb584b5ac>> ignored

I don't know if this is related, but very shortly afterwards:

2006/09/26 21:17 CDT [-] Exception rendering:
2006/09/26 21:17 CDT [-] Unhandled Error
        Traceback (most recent call last):
          File "/usr/lib/python2.4/threading.py", line 422, in run
            self.__target(*self.__args, **self.__kwargs)
          File "/home/trac/Projects/Twisted/trunk/twisted/python/threadpool.py", line 148, in _worker
            context.call(ctx, function, *args, **kwargs)
          File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 59, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 37, in callWithContext
            return func(*args,**kw)
        --- <exception caught here> ---
          File "/home/trac/Projects/Twisted/trunk/twisted/web2/wsgi.py", line 138, in run
            result = self.application(self.environment, self.startWSGIResponse)
          File "/home/trac/Run/trac/external.py", line 74, in __call__
            return self.application(environ, start_response)
          File "/home/trac/Run/trac/external.py", line 115, in tracApplication
            return trac.web.main.dispatch_request(environ, start_response)
          File "/home/trac/Projects/trac/trunk/trac/web/main.py", line 389, in dispatch_request
            req.send_error(sys.exc_info(), status=500)
          File "/home/trac/Projects/trac/trunk/trac/web/api.py", line 375, in send_error
            data = self.hdf.render(template)
          File "/home/trac/Projects/trac/trunk/trac/web/clearsilver.py", line 289, in render
            return template.render()
        neo_util.Error: Traceback (innermost last):
          File "csparse.c", line 3170, in cs_render()
          File "csparse.c", line 3156, in render_node()
          File "csparse.c", line 2410, in each_eval()
          File "csparse.c", line 3156, in render_node()
          File "csparse.c", line 2410, in each_eval()
          File "csparse.c", line 3156, in render_node()
          File "csparse.c", line 2073, in var_eval()
          File "neo_cs.c", line 142, in render_cb()
          File "neo_str.c", line 124, in string_append()
          File "neo_str.c", line 105, in string_check_length()
        MemoryError: Unable to allocate STRING buf of size 1024

and then

2006/09/26 21:17 CDT [-] Exception rendering:
2006/09/26 21:17 CDT [-] Unhandled Error
        Traceback (most recent call last):
          File "/usr/lib/python2.4/threading.py", line 422, in run
            self.__target(*self.__args, **self.__kwargs)
          File "/home/trac/Projects/Twisted/trunk/twisted/python/threadpool.py", line 148, in _worker
            context.call(ctx, function, *args, **kwargs)
          File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 59, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 37, in callWithContext
            return func(*args,**kw)
        --- <exception caught here> ---
          File "/home/trac/Projects/Twisted/trunk/twisted/web2/wsgi.py", line 138, in run
            result = self.application(self.environment, self.startWSGIResponse)
          File "/home/trac/Run/trac/external.py", line 64, in __call__
            path_parts = filter(None, path_info.split('/'))
        exceptions.MemoryError:

and finally

2006/09/26 21:17 CDT [HTTPChannel,3293,198.49.126.190] Unhandled Error
        Traceback (most recent call last):
          File "/usr/lib/python2.4/threading.py", line 442, in __bootstrap
            self.run()
          File "/usr/lib/python2.4/threading.py", line 422, in run
            self.__target(*self.__args, **self.__kwargs)
        --- <exception caught here> ---
          File "/home/trac/Projects/Twisted/trunk/twisted/python/threadpool.py", line 148, in _worker
            context.call(ctx, function, *args, **kwargs)
          File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 59, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/home/trac/Projects/Twisted/trunk/twisted/python/context.py", line 37, in callWithContext
            return func(*args,**kw)
          File "/home/trac/Projects/Twisted/trunk/twisted/web2/wsgi.py", line 142, in run
            reactor.callFromThread(self.__error, failure.Failure())
        exceptions.MemoryError:

in reply to:  28 comment:29 by Christian Boos, 18 years ago

Replying to exarkun@divmod.com:

The various MemoryErrors mean that your server is leaking memory, for some reason. I hope it's not due to our little pool issue ;)

I think I understood what's going on with the AssertionError, finally. It can be explained if the thread ids are reused quickly, before all the PooledConnection created in the previous "same" thread are GCed. Can you try out attachment:debug_thread_reuse.diff in order to verify this hypothesis?

by Christian Boos, 18 years ago

Attachment: debug_thread_reuse.diff added

Some more debugging code to check the "rapid thread id reuse" hypothesis

comment:30 by Christian Boos, 18 years ago

… and if this is indeed the problem, the fix would be to simply acknowledge that this can happen:

  • pool.py

     
    102102        try:
    103103            if tid in self._active:
    104104                num, cnx_ = self._active.get(tid)
    105                 assert cnx is cnx_
    106                 if num > 1:
    107                     self._active[tid][0] = num - 1
    108                 else:
    109                     self._cleanup(tid)
     105                if cnx is cnx_:
     106                    if num > 1:
     107                        self._active[tid][0] = num - 1
     108                    else:
     109                        self._cleanup(tid)
     110                # otherwise, cnx was already cleaned up during a shutdown(tid),
     111                # and in the meantime, `tid` has been reused (#3504)
    110112        finally:
    111113            self._available.release()
    112114

comment:31 by Christian Boos, 18 years ago

It just occurred to me that you're probably using a Thread pool in Twisted. Is that the case? That would explain why you're seeing that error relatively often. With tracd on windows, I do observe thread id reuse, but probably in this case the reuse interval is large enough to give time to all the PooledConnection to be GCed before the reuse happens. With a Thread pool, the reuse interval is certainly much shorter.

But no matter if the thread id gets reused because of a thread pool or because the OS creates a new thread with a "recycled" id, the analysis and "fix" remains correct, I think.

comment:32 by Christian Boos, 18 years ago

Milestone: 0.10.10.10

It's probably still time to put that patch in 0.10.

comment:33 by exarkun@…, 18 years ago

Yes, there's a thread pool.

comment:34 by Christian Boos, 18 years ago

Resolution: fixed
Severity: majorcritical
Status: reopenedclosed

So I've committed r3798.

exarkun: I'd like to see what attachment:debug_thread_reuse.diff (on top of r3797) produces, if you find time to do this test.

comment:35 by exarkun@…, 18 years ago

I updated to r3798 before I noticed that patch, so I modified it slightly in order to get it all to apply. The difference is that the 2nd to last hunk in pool.py now looks like:

@@ -102,6 +123,15 @@
         try:
             if tid in self._active:
                 num, cnx_ = self._active.get(tid)
+                if cnx is not cnx_:
+                    print 'ConnectionPool is ', repr(self)
+                    print 'In [%d], _active -> %s\n _dormant -> %s' % \
+                          (threading._get_ident(),
+                           repr(self._active), repr(self._dormant))
+                    print 'Available connections: %d/%d' % \
+                          (self._cursize, self._maxsize)
+                    print 'PooledConnection created from [%d] %s' % \
+                          (tid, repr(cnx))
                 if cnx is cnx_:
                     if num > 1:
                         self._active[tid][0] = num - 1

Here's some preliminary output.

2006/09/27 09:51 CDT [HTTPChannel,10,198.49.126.190] [-1217746000] THREAD REUSED (10)
2006/09/27 09:51 CDT [HTTPChannel,10,198.49.126.190] [-1217746000] THREAD REUSED (10)
2006/09/27 09:51 CDT [HTTPChannel,10,198.49.126.190] [-1217746000] THREAD REUSED (10)
2006/09/27 09:51 CDT [HTTPChannel,14,198.49.126.190] [-1217746000] THREAD REUSED (14)
2006/09/27 09:51 CDT [HTTPChannel,14,198.49.126.190] [-1217746000] THREAD REUSED (14)
2006/09/27 09:51 CDT [HTTPChannel,14,198.49.126.190] [-1217746000] THREAD REUSED (14)
2006/09/27 09:51 CDT [HTTPChannel,17,198.49.126.190] [-1217746000] THREAD REUSED (17)
2006/09/27 09:51 CDT [HTTPChannel,17,198.49.126.190] [-1217746000] THREAD REUSED (17)
2006/09/27 09:51 CDT [HTTPChannel,17,198.49.126.190] [-1217746000] THREAD REUSED (17)
2006/09/27 09:51 CDT [HTTPChannel,18,198.49.126.190] [-1217746000] THREAD REUSED (18)
2006/09/27 09:51 CDT [HTTPChannel,18,198.49.126.190] [-1217746000] THREAD REUSED (18)
2006/09/27 09:51 CDT [HTTPChannel,18,198.49.126.190] [-1217746000] THREAD REUSED (18)
2006/09/27 09:51 CDT [HTTPChannel,19,198.49.126.190] [-1217746000] THREAD REUSED (19)
2006/09/27 09:51 CDT [HTTPChannel,19,198.49.126.190] [-1217746000] THREAD REUSED (19)
2006/09/27 09:51 CDT [HTTPChannel,19,198.49.126.190] [-1217746000] THREAD REUSED (19)
2006/09/27 09:51 CDT [HTTPChannel,20,198.49.126.190] [-1217746000] THREAD REUSED (20)
2006/09/27 09:51 CDT [HTTPChannel,20,198.49.126.190] [-1217746000] THREAD REUSED (20)
2006/09/27 09:52 CDT [HTTPChannel,30,198.49.126.190] [-1217746000] THREAD REUSED (30)
2006/09/27 09:52 CDT [HTTPChannel,30,198.49.126.190] [-1217746000] THREAD REUSED (30)
2006/09/27 09:52 CDT [HTTPChannel,30,198.49.126.190] [-1217746000] THREAD REUSED (30)
2006/09/27 09:52 CDT [HTTPChannel,34,198.49.126.190] [-1217746000] THREAD REUSED (34)
2006/09/27 09:52 CDT [HTTPChannel,34,198.49.126.190] [-1217746000] THREAD REUSED (34)
2006/09/27 09:52 CDT [HTTPChannel,35,198.49.126.190] [-1217746000] THREAD REUSED (35)
2006/09/27 09:52 CDT [HTTPChannel,35,198.49.126.190] [-1217746000] THREAD REUSED (35)
2006/09/27 09:52 CDT [HTTPChannel,35,198.49.126.190] [-1217746000] THREAD REUSED (35)
2006/09/27 09:52 CDT [HTTPChannel,39,198.49.126.190] [-1240282192] THREAD REUSED (1)
2006/09/27 09:52 CDT [HTTPChannel,39,198.49.126.190] [-1240282192] THREAD REUSED (1)
2006/09/27 09:52 CDT [HTTPChannel,39,198.49.126.190] [-1240282192] THREAD REUSED (1)
2006/09/27 09:52 CDT [HTTPChannel,39,198.49.126.190] [-1240282192] THREAD REUSED (1)
2006/09/27 09:52 CDT [HTTPChannel,39,198.49.126.190] [-1240282192] THREAD REUSED (1)
2006/09/27 09:52 CDT [HTTPChannel,39,198.49.126.190] [-1240282192] THREAD REUSED (1)
2006/09/27 09:52 CDT [HTTPChannel,40,198.49.126.190] [-1217746000] THREAD REUSED (37)
2006/09/27 09:52 CDT [HTTPChannel,40,198.49.126.190] [-1217746000] THREAD REUSED (37)
2006/09/27 09:52 CDT [HTTPChannel,40,198.49.126.190] [-1217746000] THREAD REUSED (37)
2006/09/27 09:52 CDT [HTTPChannel,40,198.49.126.190] [-1217746000] THREAD REUSED (37)
2006/09/27 09:52 CDT [HTTPChannel,40,198.49.126.190] [-1217746000] THREAD REUSED (37)
2006/09/27 09:52 CDT [HTTPChannel,40,198.49.126.190] [-1217746000] THREAD REUSED (37)
2006/09/27 09:52 CDT [HTTPChannel,41,198.49.126.190] [-1240282192] THREAD REUSED (2)
2006/09/27 09:52 CDT [HTTPChannel,41,198.49.126.190] [-1240282192] THREAD REUSED (2)
2006/09/27 09:52 CDT [HTTPChannel,41,198.49.126.190] [-1240282192] THREAD REUSED (2)
2006/09/27 09:52 CDT [HTTPChannel,42,198.49.126.190] [-1217746000] THREAD REUSED (38)
2006/09/27 09:52 CDT [HTTPChannel,42,198.49.126.190] [-1217746000] THREAD REUSED (38)
2006/09/27 09:52 CDT [HTTPChannel,51,198.49.126.190] [-1240282192] THREAD REUSED (7)
2006/09/27 09:52 CDT [HTTPChannel,51,198.49.126.190] [-1240282192] THREAD REUSED (7)
2006/09/27 09:52 CDT [HTTPChannel,51,198.49.126.190] [-1240282192] THREAD REUSED (7)
2006/09/27 09:52 CDT [HTTPChannel,51,198.49.126.190] [-1240282192] THREAD REUSED (7)
2006/09/27 09:52 CDT [HTTPChannel,51,198.49.126.190] [-1240282192] THREAD REUSED (7)
2006/09/27 09:52 CDT [HTTPChannel,51,198.49.126.190] [-1240282192] THREAD REUSED (7)
2006/09/27 09:52 CDT [HTTPChannel,52,198.49.126.190] [-1217746000] THREAD REUSED (43)
2006/09/27 09:52 CDT [HTTPChannel,52,198.49.126.190] [-1217746000] THREAD REUSED (43)
2006/09/27 09:52 CDT [HTTPChannel,52,198.49.126.190] [-1217746000] THREAD REUSED (43)
2006/09/27 09:52 CDT [HTTPChannel,52,198.49.126.190] [-1217746000] THREAD REUSED (43)
2006/09/27 09:52 CDT [HTTPChannel,52,198.49.126.190] [-1217746000] THREAD REUSED (43)
2006/09/27 09:52 CDT [HTTPChannel,52,198.49.126.190] [-1217746000] THREAD REUSED (43)
2006/09/27 09:53 CDT [HTTPChannel,53,198.49.126.190] [-1240282192] THREAD REUSED (8)
2006/09/27 09:53 CDT [HTTPChannel,53,198.49.126.190] [-1240282192] THREAD REUSED (8)
2006/09/27 09:53 CDT [HTTPChannel,53,198.49.126.190] [-1240282192] THREAD REUSED (8)
2006/09/27 09:53 CDT [HTTPChannel,54,198.49.126.190] [-1217746000] THREAD REUSED (44)
2006/09/27 09:53 CDT [HTTPChannel,54,198.49.126.190] [-1217746000] THREAD REUSED (44)
2006/09/27 09:53 CDT [HTTPChannel,54,198.49.126.190] [-1217746000] THREAD REUSED (44)
2006/09/27 09:53 CDT [HTTPChannel,56,198.49.126.190] [-1240282192] THREAD REUSED (9)
2006/09/27 09:53 CDT [HTTPChannel,56,198.49.126.190] [-1240282192] THREAD REUSED (9)
2006/09/27 09:54 CDT [HTTPChannel,57,198.49.126.190] [-1217746000] THREAD REUSED (46)
2006/09/27 09:54 CDT [HTTPChannel,57,198.49.126.190] [-1217746000] THREAD REUSED (46)
2006/09/27 09:54 CDT [HTTPChannel,58,198.49.126.190] [-1240282192] THREAD REUSED (10)
2006/09/27 09:54 CDT [HTTPChannel,58,198.49.126.190] [-1240282192] THREAD REUSED (10)
2006/09/27 09:54 CDT [HTTPChannel,59,198.49.126.190] [-1217746000] THREAD REUSED (47)
2006/09/27 09:54 CDT [HTTPChannel,59,198.49.126.190] [-1217746000] THREAD REUSED (47)
2006/09/27 09:54 CDT [HTTPChannel,59,198.49.126.190] [-1217746000] THREAD REUSED (47)
2006/09/27 09:54 CDT [HTTPChannel,59,198.49.126.190] [-1217746000] THREAD REUSED (47)
2006/09/27 09:54 CDT [HTTPChannel,59,198.49.126.190] [-1217746000] THREAD REUSED (47)
2006/09/27 09:54 CDT [HTTPChannel,59,198.49.126.190] [-1217746000] THREAD REUSED (47)
2006/09/27 09:54 CDT [HTTPChannel,60,198.49.126.190] [-1240282192] THREAD REUSED (11)
2006/09/27 09:54 CDT [HTTPChannel,60,198.49.126.190] [-1240282192] THREAD REUSED (11)
2006/09/27 09:54 CDT [HTTPChannel,60,198.49.126.190] [-1240282192] THREAD REUSED (11)
2006/09/27 09:54 CDT [HTTPChannel,60,198.49.126.190] [-1240282192] THREAD REUSED (11)
2006/09/27 09:54 CDT [HTTPChannel,60,198.49.126.190] [-1240282192] THREAD REUSED (11)
2006/09/27 09:54 CDT [HTTPChannel,60,198.49.126.190] [-1240282192] THREAD REUSED (11)
2006/09/27 09:54 CDT [HTTPChannel,60,198.49.126.190] [-1240282192] THREAD REUSED (11)
2006/09/27 09:54 CDT [HTTPChannel,63,198.49.126.190] [-1217746000] THREAD REUSED (50)
2006/09/27 09:54 CDT [HTTPChannel,63,198.49.126.190] [-1217746000] THREAD REUSED (50)
2006/09/27 09:54 CDT [HTTPChannel,63,198.49.126.190] [-1217746000] THREAD REUSED (50)
2006/09/27 09:54 CDT [HTTPChannel,63,198.49.126.190] [-1217746000] THREAD REUSED (50)
2006/09/27 09:54 CDT [HTTPChannel,63,198.49.126.190] [-1217746000] THREAD REUSED (50)
2006/09/27 09:54 CDT [HTTPChannel,64,198.49.126.190] [-1240282192] THREAD REUSED (12)
2006/09/27 09:54 CDT [HTTPChannel,64,198.49.126.190] [-1240282192] THREAD REUSED (12)
2006/09/27 09:54 CDT [HTTPChannel,64,198.49.126.190] [-1240282192] THREAD REUSED (12)
2006/09/27 09:54 CDT [HTTPChannel,64,198.49.126.190] [-1240282192] THREAD REUSED (12)
2006/09/27 09:54 CDT [HTTPChannel,64,198.49.126.190] [-1240282192] THREAD REUSED (12)
2006/09/27 09:54 CDT [HTTPChannel,64,198.49.126.190] [-1240282192] THREAD REUSED (12)
2006/09/27 09:55 CDT [HTTPChannel,65,198.49.126.190] [-1217746000] THREAD REUSED (51)
2006/09/27 09:55 CDT [HTTPChannel,65,198.49.126.190] [-1217746000] THREAD REUSED (51)
2006/09/27 09:55 CDT [HTTPChannel,66,198.49.126.190] [-1240282192] THREAD REUSED (13)
2006/09/27 09:55 CDT [HTTPChannel,66,198.49.126.190] [-1240282192] THREAD REUSED (13)
2006/09/27 09:56 CDT [HTTPChannel,76,198.49.126.190] [-1240282192] THREAD REUSED (18)
2006/09/27 09:56 CDT [HTTPChannel,76,198.49.126.190] [-1240282192] THREAD REUSED (18)
2006/09/27 09:56 CDT [HTTPChannel,77,198.49.126.190] [-1217746000] THREAD REUSED (57)
2006/09/27 09:56 CDT [HTTPChannel,77,198.49.126.190] [-1217746000] THREAD REUSED (57)
2006/09/27 09:56 CDT [HTTPChannel,83,198.49.126.190] [-1217746000] THREAD REUSED (60)
2006/09/27 09:56 CDT [HTTPChannel,83,198.49.126.190] [-1217746000] THREAD REUSED (60)
2006/09/27 09:57 CDT [HTTPChannel,84,198.49.126.190] [-1240282192] THREAD REUSED (22)
2006/09/27 09:57 CDT [HTTPChannel,84,198.49.126.190] [-1240282192] THREAD REUSED (22)
2006/09/27 09:57 CDT [HTTPChannel,84,198.49.126.190] [-1240282192] THREAD REUSED (22)
2006/09/27 09:57 CDT [HTTPChannel,85,198.49.126.190] [-1217746000] THREAD REUSED (61)
2006/09/27 09:57 CDT [HTTPChannel,85,198.49.126.190] [-1217746000] THREAD REUSED (61)
2006/09/27 09:57 CDT [HTTPChannel,84,198.49.126.190] [-1240282192] THREAD REUSED (22)
2006/09/27 09:57 CDT [HTTPChannel,84,198.49.126.190] [-1240282192] THREAD REUSED (22)
2006/09/27 09:57 CDT [HTTPChannel,95,198.49.126.190] [-1217746000] THREAD REUSED (71)
2006/09/27 09:57 CDT [HTTPChannel,95,198.49.126.190] [-1217746000] THREAD REUSED (71)
2006/09/27 09:57 CDT [HTTPChannel,95,198.49.126.190] [-1217746000] THREAD REUSED (71)
2006/09/27 09:57 CDT [HTTPChannel,96,198.49.126.190] [-1240282192] THREAD REUSED (23)
2006/09/27 09:57 CDT [HTTPChannel,96,198.49.126.190] [-1240282192] THREAD REUSED (23)
2006/09/27 09:57 CDT [HTTPChannel,97,198.49.126.190] [-1217746000] THREAD REUSED (72)
2006/09/27 09:57 CDT [HTTPChannel,97,198.49.126.190] [-1217746000] THREAD REUSED (72)
2006/09/27 09:57 CDT [HTTPChannel,97,198.49.126.190] [-1217746000] THREAD REUSED (72)
2006/09/27 09:57 CDT [HTTPChannel,107,198.49.126.190] [-1240282192] THREAD REUSED (28)
2006/09/27 09:57 CDT [HTTPChannel,107,198.49.126.190] [-1240282192] THREAD REUSED (28)

I can provide more later if you want.

in reply to:  35 comment:36 by Christian Boos, 18 years ago

Replying to exarkun@divmod.com:

I can provide more later if you want.

It would be interesting to see if you get an AssertionError for assert pooled_connections_per_thread == 0. If this is the case, then this is the situation which would have triggered an AssertionError for assert cnx is cnx_ in the "old" code.

Other than that, I think we now have a satisfactory implementation of the connection pool, until we make 2.4 mandatory and can use thread-local storage ;)

Ah, it would be also interesting to see if you still have your out-of-memory issues, but I think they are not related…

comment:37 by anonymous, 18 years ago

I haven't seen any log message other than the THREAD REUSED message so far. With about the same frequency of the previous AttributeError, though, the server now stops responding to all requests and needs to be SIGKILL'd and restarted.

in reply to:  37 comment:38 by Christian Boos, 18 years ago

Replying to anonymous:

I haven't seen any log message other than the THREAD REUSED message so far. With about the same frequency of the previous AttributeError, though, the server now stops responding to all requests and needs to be SIGKILL'd and restarted.

Hm, that's probably due to the assert pooled_connections_per_thread == 0. Simply revert all the debugging changes and see if this freeze still happens with stock 0.10.

Two related questions:

  • what was that frequency, approximately?
  • what about the out-of-memory errors, do they still happen?

comment:39 by exarkun@…, 18 years ago

Okay, running with the same revision minus the debug changes. I'll comment again in a few days with a summary of the behavior.

The frequency of hangs was somewhere around twice per day. I haven't seen any out of memory errors since the ones I included in that comment.

comment:40 by anonymous, 18 years ago

The frequency of hangs has fallen quite a bit. I've only seen one in the week or so since updating last.

in reply to:  40 comment:41 by Christian Boos, 18 years ago

Replying to anonymous:

… I've only seen one in the week or so since updating last.

Good… and bad, this means the possibility of hangs is still there. After looking at the code a bit more, I think there was one more bug…

If you don't mind testing a bit more, I've prepared a new patch (on trunk, but should probably apply on 0.10 as well). Also, if there's still a hang due to Trac, it will get replaced by a user visible "TimeoutError" exception.

(Side note: while stress-testing the code, I still couldn't trigger any Database is locked exception with SQLite, so it looks like r3830 was really a good idea ;) )

by Christian Boos, 18 years ago

Attachment: db_pool_timeout-r3868.patch added

There was apparently a missing decrement of the cursize, potentially the cause for a hang. We'll see…

comment:42 by exarkun@…, 18 years ago

Trying the patch out now. Looks like the first week of good behavior was a slight fluke, the server's back to hanging about once a day now.

comment:43 by anonymous, 18 years ago

At least some of the hangs seem to be due to a bug in Twisted's WSGI container. Once this is resolved, I'll open a new ticket if I still have problems, since it seems like the issue originally associated with this ticket is indeed resolved.

comment:44 by exarkun@…, 18 years ago

No sooner than I speak does it hang due to a different bug… Filed #3923

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.