Edgewall Software
Modify

Ticket #8602 (closed defect: duplicate)

Opened 3 years ago

Last modified 2 years ago

TimeoutError: Unable to get database connection within 20 seconds

Reported by: anonymous Owned by:
Priority: highest Milestone:
Component: project Version: none
Severity: normal Keywords:
Cc:
Release Notes:
API Changes:

Description

How to Reproduce

While doing a GET operation on /wiki/TracGuide, Trac issued an internal error.

(please provide additional details here)

User Agent was: Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2

System Information

Trac 0.11.5stable-r8354
Python 2.5.2 (r252:60911, Oct 5 2008, 19:29:17)
[GCC 4.3.2]
setuptools 0.6c9
psycopg2 2.0.8
Genshi 0.6dev-r999
Pygments 1.0
Subversion 1.5.1 (r32289)
jQuery: 1.2.6

Python Traceback

Traceback (most recent call last):
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/web/main.py", line 444, in _dispatch_request
    dispatcher.dispatch(req)
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/web/main.py", line 166, in dispatch
    if handler.match_request(req):
  File "build/bdist.linux-x86_64/egg/tracvote/__init__.py", line 96, in match_request
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 524, in has_permission
    return self._has_permission(action, resource)
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 538, in _has_permission
    check_permission(action, perm.username, resource, perm)
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 425, in check_permission
    perm)
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 282, in check_permission
    get_user_permissions(username)
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 357, in get_user_permissions
    for perm in self.store.get_user_permissions(username):
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/perm.py", line 173, in get_user_permissions
    db = self.env.get_db_cnx()
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/env.py", line 273, in get_db_cnx
    return DatabaseManager(self).get_connection()
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/db/api.py", line 87, in get_connection
    return self._cnx_pool.get_cnx(self.timeout or None)
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/db/pool.py", line 176, in get_cnx
    return _backend.get_cnx(self._connector, self._kwargs, timeout)
  File "/usr/local/virtualenv/lib/python2.5/site-packages/Trac-0.11.5stable_r8354-py2.5.egg/trac/db/pool.py", line 124, in get_cnx
    'seconds', time=timeout))
TimeoutError: Unable to get database connection within 20 seconds


Attachments

Change History

comment:1 Changed 3 years ago by cboos

  • Resolution set to duplicate
  • Status changed from new to closed

cmlenz had a look on the server and it seems the lag of today is due to the "IDLE in transaction" issue, #8443.

comment:2 Changed 3 years ago by cboos

  • Component changed from general to project
  • Milestone set to not applicable
  • Priority changed from normal to highest
  • Resolution duplicate deleted
  • Status changed from closed to reopened

The problem was rather due to a combination of the following factors:

  1. long running SQL queries due to Bitten charts
  2. max_connections in postgres.conf was lower than the number of trac.fcgi processes multiplied by the potential number connections in each
  3. the #7785 issue alive and kicking…
  4. more?

Point 1. is now fixed (charts disabled), but that was not enough to make the timeout issue go away.

Point 2. shouldn't be that problematic, as even if we're bordering on max connection usage, how can it happen that all new connections would systematically hit the 20 seconds timeout? Could it be some kind of request dispatching issue at the lighty level, which would stop allocating requests to older processes without killing them, hence keeping the connections in their pool established?

Point 3., while very annoying, couldn't be the problem, as the server was only a bit slower but nevertheless usable while that problem was occurring.

Reopening until we figure out all the aspects of the problem.

comment:3 Changed 2 years ago by cboos

  • Milestone not applicable deleted
  • Resolution set to duplicate
  • Status changed from reopened to closed

This happened again at around 09:51 AM, the system load was at 23, and ps -ef show a lot of postgres: www-data trac [local] idle in transaction lines.

The trac.log doesn't show anything obvious at first sight.

However, the lighttpd log shows the following:

...
2010-02-02 09:43:40: (server.c.1258) NOTE: a request for /demo-0.12/query? [...]
2010-02-02 09:52:22: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:52:35: (server.c.1317) [note] sockets enabled again
2010-02-02 09:52:35: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:52:52: (server.c.1317) [note] sockets enabled again
2010-02-02 09:52:53: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:53:08: (server.c.1317) [note] sockets enabled again
2010-02-02 09:53:08: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:53:25: (server.c.1317) [note] sockets enabled again
2010-02-02 09:53:25: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:53:44: (server.c.1317) [note] sockets enabled again
2010-02-02 09:53:44: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:54:02: (server.c.1317) [note] sockets enabled again
2010-02-02 09:54:02: (server.c.1365) [note] sockets disabled, out-of-fds

...
2010-02-02 09:58:05: (server.c.1317) [note] sockets enabled again
2010-02-02 09:58:05: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:58:14: (server.c.1258) NOTE: a request for /ticket/301 timed out after writing 13140 bytes. [...]
2010-02-02 09:58:24: (server.c.1317) [note] sockets enabled again
2010-02-02 09:58:24: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:58:46: (server.c.1317) [note] sockets enabled again
2010-02-02 09:58:46: (server.c.1365) [note] sockets disabled, out-of-fds
2010-02-02 09:58:59: (server.c.1317) [note] sockets enabled again
...

It looks like the system run out of file descriptors for a while. The postgresql log doesn't show anything special, except for a few statements that seem to proceed during that period.

The "incident" went away by itself, without the need for a server restart on my part.

Looking at the fds used by trac.fcgi, there doesn't seem to be anything abnormal (only socket descriptors + the trac.log and lighttpd error.log), but I see that postgres processes do have a lot of open file descriptors each (60-80), and there are about 60 such processes. OTOH, Munin reports a opened file usage of 7k files out of a maximum of 305k, so the lighttpd out-of-fds seems to have been lighttpd's own. I'll increase server.max-fds and see if that helps, nevertheless.

Bottom-line: it seems the postgres' idle in transaction issue was a cause rather than a consequence of this out-of-fds error on lighttpd side, since the first timeout error happened at 09:51:17 and the out of fd condition was most likely only for lighttpd. The primary cause is still a bit unclear, let's hope it's only the lack of rollbacks.

Closing in favor of #8443.

comment:4 Changed 2 years ago by cboos

See #9111 now.

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
to The owner will be changed from (none). Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.