#8602 closed defect (duplicate)
TimeoutError: Unable to get database connection within 20 seconds
Reported by: | anonymous | Owned by: | |
---|---|---|---|
Priority: | highest | Milestone: | |
Component: | project | Version: | none |
Severity: | normal | Keywords: | |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
How to Reproduce
While doing a GET operation on /wiki/TracGuide
, Trac issued an internal error.
(please provide additional details here)
User Agent was: Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2
System Information
Trac | 0.11.5stable-r8354
|
Python | 2.5.2 (r252:60911, Oct 5 2008, 19:29:17) [GCC 4.3.2]
|
setuptools | 0.6c9
|
psycopg2 | 2.0.8
|
Genshi | 0.6dev-r999
|
Pygments | 1.0
|
Subversion | 1.5.1 (r32289)
|
jQuery: | 1.2.6
|
Python Traceback
Traceback (most recent call last): File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/web/main.py", line 444, in _dispatch_request dispatcher.dispatch(req) File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/web/main.py", line 166, in dispatch if handler.match_request(req): File "build/bdist.linux-x86_64/egg/tracvote/__init__.py", line 96, in match_request File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 524, in has_permission return self._has_permission(action, resource) File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 538, in _has_permission check_permission(action, perm.username, resource, perm) File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 425, in check_permission perm) File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 282, in check_permission get_user_permissions(username) File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 357, in get_user_permissions for perm in self.store.get_user_permissions(username): File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 173, in get_user_permissions db = self.env.get_db_cnx() File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/env.py", line 273, in get_db_cnx return DatabaseManager(self).get_connection() File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/db/api.py", line 87, in get_connection return self._cnx_pool.get_cnx(self.timeout or None) File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/db/pool.py", line 176, in get_cnx return _backend.get_cnx(self._connector, self._kwargs, timeout) File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/db/pool.py", line 124, in get_cnx 'seconds', time=timeout)) TimeoutError: Unable to get database connection within 20 seconds
Attachments (0)
Change History (4)
comment:1 by , 15 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
comment:2 by , 15 years ago
Component: | general → project |
---|---|
Milestone: | → not applicable |
Priority: | normal → highest |
Resolution: | duplicate |
Status: | closed → reopened |
The problem was rather due to a combination of the following factors:
- long running SQL queries due to Bitten charts
- max_connections in postgres.conf was lower than the number of trac.fcgi processes multiplied by the potential number connections in each
- the #7785 issue alive and kicking…
- more?
Point 1. is now fixed (charts disabled), but that was not enough to make the timeout issue go away.
Point 2. shouldn't be that problematic, as even if we're bordering on max connection usage, how can it happen that all new connections would systematically hit the 20 seconds timeout? Could it be some kind of request dispatching issue at the lighty level, which would stop allocating requests to older processes without killing them, hence keeping the connections in their pool established?
Point 3., while very annoying, couldn't be the problem, as the server was only a bit slower but nevertheless usable while that problem was occurring.
Reopening until we figure out all the aspects of the problem.
comment:3 by , 15 years ago
Milestone: | not applicable |
---|---|
Resolution: | → duplicate |
Status: | reopened → closed |
This happened again at around 09:51 AM, the system load was at 23, and ps -ef
show a lot of postgres: www-data trac [local] idle in transaction lines.
The trac.log doesn't show anything obvious at first sight.
However, the lighttpd log shows the following:
... 2010-02-02 09:43:40: (server.c.1258) NOTE: a request for /demo-0.12/query? [...] 2010-02-02 09:52:22: (server.c.1365) [note] sockets disabled, out-of-fds 2010-02-02 09:52:35: (server.c.1317) [note] sockets enabled again 2010-02-02 09:52:35: (server.c.1365) [note] sockets disabled, out-of-fds 2010-02-02 09:52:52: (server.c.1317) [note] sockets enabled again 2010-02-02 09:52:53: (server.c.1365) [note] sockets disabled, out-of-fds 2010-02-02 09:53:08: (server.c.1317) [note] sockets enabled again 2010-02-02 09:53:08: (server.c.1365) [note] sockets disabled, out-of-fds 2010-02-02 09:53:25: (server.c.1317) [note] sockets enabled again 2010-02-02 09:53:25: (server.c.1365) [note] sockets disabled, out-of-fds 2010-02-02 09:53:44: (server.c.1317) [note] sockets enabled again 2010-02-02 09:53:44: (server.c.1365) [note] sockets disabled, out-of-fds 2010-02-02 09:54:02: (server.c.1317) [note] sockets enabled again 2010-02-02 09:54:02: (server.c.1365) [note] sockets disabled, out-of-fds ... 2010-02-02 09:58:05: (server.c.1317) [note] sockets enabled again 2010-02-02 09:58:05: (server.c.1365) [note] sockets disabled, out-of-fds 2010-02-02 09:58:14: (server.c.1258) NOTE: a request for /ticket/301 timed out after writing 13140 bytes. [...] 2010-02-02 09:58:24: (server.c.1317) [note] sockets enabled again 2010-02-02 09:58:24: (server.c.1365) [note] sockets disabled, out-of-fds 2010-02-02 09:58:46: (server.c.1317) [note] sockets enabled again 2010-02-02 09:58:46: (server.c.1365) [note] sockets disabled, out-of-fds 2010-02-02 09:58:59: (server.c.1317) [note] sockets enabled again ...
It looks like the system run out of file descriptors for a while. The postgresql log doesn't show anything special, except for a few statements that seem to proceed during that period.
The "incident" went away by itself, without the need for a server restart on my part.
Looking at the fds used by trac.fcgi, there doesn't seem to be anything abnormal (only socket descriptors + the trac.log and lighttpd error.log), but I see that postgres processes do have a lot of open file descriptors each (60-80), and there are about 60 such processes. OTOH, Munin reports a opened file usage of 7k files out of a maximum of 305k, so the lighttpd out-of-fds seems to have been lighttpd's own. I'll increase server.max-fds
and see if that helps, nevertheless.
Bottom-line: it seems the postgres' idle in transaction issue was a cause rather than a consequence of this out-of-fds error on lighttpd side, since the first timeout error happened at 09:51:17 and the out of fd condition was most likely only for lighttpd. The primary cause is still a bit unclear, let's hope it's only the lack of rollbacks.
Closing in favor of #8443.
cmlenz had a look on the server and it seems the lag of today is due to the "IDLE in transaction" issue, #8443.