Edgewall Software
Modify

Opened 15 years ago

Closed 15 years ago

Last modified 14 years ago

#8602 closed defect (duplicate)

TimeoutError: Unable to get database connection within 20 seconds

Reported by: anonymous Owned by:
Priority: highest Milestone:
Component: project Version: none
Severity: normal Keywords:
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description

How to Reproduce

While doing a GET operation on /wiki/TracGuide, Trac issued an internal error.

(please provide additional details here)

User Agent was: Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2

System Information

Trac 0.11.5stable-r8354
Python 2.5.2 (r252:60911, Oct 5 2008, 19:29:17)
[GCC 4.3.2]
setuptools 0.6c9
psycopg2 2.0.8
Genshi 0.6dev-r999
Pygments 1.0
Subversion 1.5.1 (r32289)
jQuery: 1.2.6

Python Traceback

Traceback (most recent call last):
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/web/main.py", line 444, in _dispatch_request
    dispatcher.dispatch(req)
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/web/main.py", line 166, in dispatch
    if handler.match_request(req):
  File "build/bdist.linux-x86_64/egg/tracvote/__init__.py", line 96, in match_request
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 524, in has_permission
    return self._has_permission(action, resource)
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 538, in _has_permission
    check_permission(action, perm.username, resource, perm)
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 425, in check_permission
    perm)
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 282, in check_permission
    get_user_permissions(username)
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 357, in get_user_permissions
    for perm in self.store.get_user_permissions(username):
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 173, in get_user_permissions
    db = self.env.get_db_cnx()
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/env.py", line 273, in get_db_cnx
    return DatabaseManager(self).get_connection()
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/db/api.py", line 87, in get_connection
    return self._cnx_pool.get_cnx(self.timeout or None)
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/db/pool.py", line 176, in get_cnx
    return _backend.get_cnx(self._connector, self._kwargs, timeout)
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/db/pool.py", line 124, in get_cnx
    'seconds', time=timeout))
TimeoutError: Unable to get database connection within 20 seconds

Attachments (0)

Change History (4)

comment:1 by Christian Boos, 15 years ago

Resolution: duplicate
Status: newclosed

cmlenz had a look on the server and it seems the lag of today is due to the "IDLE in transaction" issue, #8443.

comment:2 by Christian Boos, 15 years ago

Component: generalproject
Milestone: not applicable
Priority: normalhighest
Resolution: duplicate
Status: closedreopened

The problem was rather due to a combination of the following factors:

  1. long running SQL queries due to Bitten charts
  2. max_connections in postgres.conf was lower than the number of trac.fcgi processes multiplied by the potential number connections in each
  3. the #7785 issue alive and kicking…
  4. more?

Point 1. is now fixed (charts disabled), but that was not enough to make the timeout issue go away.

Point 2. shouldn't be that problematic, as even if we're bordering on max connection usage, how can it happen that all new connections would systematically hit the 20 seconds timeout? Could it be some kind of request dispatching issue at the lighty level, which would stop allocating requests to older processes without killing them, hence keeping the connections in their pool established?

Point 3., while very annoying, couldn't be the problem, as the server was only a bit slower but nevertheless usable while that problem was occurring.

Reopening until we figure out all the aspects of the problem.

comment:3 by Christian Boos, 15 years ago

Milestone: not applicable
Resolution: duplicate
Status: reopenedclosed

This happened again at around 09:51 AM, the system load was at 23, and ps -ef show a lot of postgres: www-data trac [local] idle in transaction lines.

The trac.log doesn't show anything obvious at first sight.

However, the lighttpd log shows the following:

...
2010-02-02 09:43:40: (server.c.1258) NOTE: a request for /demo-0.12/query? [...]
2010-02-02 09:52:22: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:52:35: (server.c.1317) [note] sockets enabled again
2010-02-02 09:52:35: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:52:52: (server.c.1317) [note] sockets enabled again
2010-02-02 09:52:53: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:53:08: (server.c.1317) [note] sockets enabled again
2010-02-02 09:53:08: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:53:25: (server.c.1317) [note] sockets enabled again
2010-02-02 09:53:25: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:53:44: (server.c.1317) [note] sockets enabled again
2010-02-02 09:53:44: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:54:02: (server.c.1317) [note] sockets enabled again
2010-02-02 09:54:02: (server.c.1365) [note] sockets disabled, out-of-fds

...
2010-02-02 09:58:05: (server.c.1317) [note] sockets enabled again
2010-02-02 09:58:05: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:58:14: (server.c.1258) NOTE: a request for /ticket/301 timed out after writing 13140 bytes. [...]
2010-02-02 09:58:24: (server.c.1317) [note] sockets enabled again
2010-02-02 09:58:24: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:58:46: (server.c.1317) [note] sockets enabled again
2010-02-02 09:58:46: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:58:59: (server.c.1317) [note] sockets enabled again
...

It looks like the system run out of file descriptors for a while. The postgresql log doesn't show anything special, except for a few statements that seem to proceed during that period.

The "incident" went away by itself, without the need for a server restart on my part.

Looking at the fds used by trac.fcgi, there doesn't seem to be anything abnormal (only socket descriptors + the trac.log and lighttpd error.log), but I see that postgres processes do have a lot of open file descriptors each (60-80), and there are about 60 such processes. OTOH, Munin reports a opened file usage of 7k files out of a maximum of 305k, so the lighttpd out-of-fds seems to have been lighttpd's own. I'll increase server.max-fds and see if that helps, nevertheless.

Bottom-line: it seems the postgres' idle in transaction issue was a cause rather than a consequence of this out-of-fds error on lighttpd side, since the first timeout error happened at 09:51:17 and the out of fd condition was most likely only for lighttpd. The primary cause is still a bit unclear, let's hope it's only the lack of rollbacks.

Closing in favor of #8443.

comment:4 by Christian Boos, 14 years ago

See #9111 now.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The ticket will remain with no owner.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from (none) to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.