Edgewall Software
Modify

Opened 11 years ago

Last modified 3 years ago

#7785 new defect

Trac slow / unreachable objects / database locked

Reported by: cmarschalek Owned by:
Priority: normal Milestone: next-stable-1.2.x
Component: general Version: 0.11.1
Severity: normal Keywords: performance database locked svn apr_terminate
Cc: Emmanuel Blot, felix.schwarz@… Branch:
Release Notes:
API Changes:

Description (last modified by Christian Boos)

Since I've upgraded trac to 0.11 (0.11.1 didn't fix this) it became very slow.

If you open a few changesets in tabs, they sometimes take very long to get ready, but when they do, they all load at the same time. While they're loading the apache process takes up pretty much the whole cpu. The speed somewhat improves but trac is a lot slower as a whole since the upgrade. Nothing else has changed on the server. Looking at my trac.log file I see lots of unreachable objects messages like:

2008-11-05 10:37:26,680 Trac[main] DEBUG: 880 unreachable objects found.
2008-11-05 10:37:29,217 Trac[main] DEBUG: Dispatching <Request "GET u'/changeset/12445'">
2008-11-05 10:37:29,233 Trac[session] DEBUG: Retrieving session for ID u'marschalek'
2008-11-05 10:37:29,250 Trac[chrome] DEBUG: Prepare chrome data for request
2008-11-05 10:37:36,715 Trac[main] DEBUG: 1533 unreachable objects found.
2008-11-05 10:37:37,171 Trac[main] DEBUG: Dispatching <Request "GET u'/changeset/12446'">
2008-11-05 10:37:37,195 Trac[session] DEBUG: Retrieving session for ID u'marschalek'
2008-11-05 10:37:37,211 Trac[chrome] DEBUG: Prepare chrome data for request
2008-11-05 10:37:39,828 Trac[main] DEBUG: 694 unreachable objects found.

Could this be the problem? At least the timestamps show a significant increase in time.

Rarely I also get database locked errors, which go away after a short while. Here's the last one I got:

How to Reproduce

While doing a GET operation on /timeline, Trac issued an internal error.

(please provide additional details here)

System Information

Trac 0.11.1
Python 2.3.3 (#1, Nov 2 2005, 16:31:23)
[GCC 3.3.3 (SuSE Linux)]
setuptools 0.6c8
SQLite 2.8.12
pysqlite 1.0.1
Genshi 0.5
mod_python < 3.2
Subversion 1.3.0 (r17949)

Python Traceback

Traceback (most recent call last):
  File "/usr/lib/python2.3/site-packages/Trac-0.11.1-py2.3.egg/trac/web/main.py", line 423, in _dispatch_request
    dispatcher.dispatch(req)
  File "/usr/lib/python2.3/site-packages/Trac-0.11.1-py2.3.egg/trac/web/main.py", line 222, in dispatch
    req.session.save()
  File "/usr/lib/python2.3/site-packages/Trac-0.11.1-py2.3.egg/trac/web/session.py", line 97, in save
    (self.sid,))
  File "/usr/lib/python2.3/site-packages/Trac-0.11.1-py2.3.egg/trac/db/util.py", line 50, in execute
    return self.cursor.execute(sql_escape_percent(sql), args)
  File "/usr/lib/python2.3/site-packages/sqlite/main.py", line 237, in execute
    self.con._begin()
  File "/usr/lib/python2.3/site-packages/sqlite/main.py", line 503, in _begin
    self.db.execute("BEGIN")
OperationalError: database is locked

I don't know if these problems are related though.

Issue Summary

The first report (above) was about a perceived slowness when rendering big changesets, after an upgrade from 0.10 to 0.11. Well, that's Genshi mainly, see #7975 for the ticket addressing specifically this point.

Then, in comment:12, some specific bug involving the svn bindings and apr_terminate, which was later fixed in comment:16.

comment:18 reported another, similar error, involving apr_pool_destroy this time. The latter is quite obscure and hard to reproduce, the present ticket remains open for this problem.

Attachments (0)

Change History (42)

comment:1 by cmarschalek, 11 years ago

Version: none0.11.1

forgot to set the version

comment:2 by Christian Boos, 11 years ago

Resolution: worksforme
Status: newclosed

Except for Trac, you seem to have very outdated versions of everything :-)

Please upgrade to SQLite 3 and Pysqlite 2 (see the PySqlite page), for a start. Getting mod_python 3.3.1 or one of the recommended versions of mod_wsgi might help as well.

Python 2.5 or 2.6 would also presumably give you a boost (and have a reasonably recent version of SQLite/Pysqlite2 bundled).

Don't hesitate to reopen if you still have performance issues after those updates.

in reply to:  description comment:3 by Christian Boos, 11 years ago

Keywords: database added; databse unreachable objects removed

Some follow-ups, explaining why it's now slower than it used to be.

Replying to cmarschalek:

Since I've upgraded trac to 0.11 (0.11.1 didn't fix this) it became very slow.

If you open a few changesets in tabs, they sometimes take very long to get ready, but when they do, they all load at the same time.

The 0.10 → 0.11 upgrade also introduced the switch from ClearSilver to Genshi. While more flexible and feature rich, this template engine is also much slower and it especially shows for big pages likes changeset pages can be (probably up to 10 times slower in this case).

Looking at my trac.log file I see lots of unreachable objects messages like:

Not indicative of a problem, this is just feedback from the garbage collector (those unreachable objects are collected).

comment:4 by anonymous, 11 years ago

Resolution: worksforme
Status: closedreopened

I have the exact same problem here when upgrading from Debian/Etch (trac 0.10.3) to Debian/Lenny (trac 0.11.1) with the default sqlite3 backend. Is there anything I can do? I've read that switching to MySQL would probably fix this database locked issue but I've read as well that the MySQL backend is actually slower than the sqlite3 backend …

in reply to:  4 comment:5 by Christian Boos, 11 years ago

Keywords: needinfo added

What are the versions of the various libraries used? If it's really the "exact same problem" as originally reported here, then it's not sqlite3 that you're using but sqlite2 … (see description).

Besides, what kind of web front-end are you using? mod_python? mod_wsgi? Which versions? In any case, please report how tracd performs in comparison.

This is likely a duplicate of #7490, but please first answer on this ticket.

comment:6 by Christian Boos, 11 years ago

Resolution: worksforme
Status: reopenedclosed

Well, if someone feels like reopening that ticket again, then please:

  • subscribe to the ticket CC:
  • be prepared to answer to the questions asked (unless you provide by yourself all the relevant information, of course, but that never happen the first time ;-) )

comment:7 by Emmanuel Blot, 11 years ago

Resolution: worksforme
Status: closedreopened

comment:8 by Emmanuel Blot, 11 years ago

So let give a try:

I went for a day off on friday, our Trac installation just stop working with the infamous "Database is locked" error message - again ;-)

Which kind of information would you need?

  • Trac environment:
    Trac:	0.12dev-r7811
    Python:	2.5.2 (r252:60911, Jan 4 2009, 17:40:26) [GCC 4.3.2]
    setuptools:	0.6c8
    SQLite:	3.5.9
    pysqlite:	2.4.1
    Genshi:	0.6dev-r1052
    Pygments:	0.10
    Subversion:	1.5.6 (r36142)
    jQuery:	1.2.6
    
  • OS
    Debian Lenny, with some "testing" or "unstable" packages, such as SVN 1.5.6
    
  • Apache
    Server Settings
    
    Server Version: Apache/2.2.9 (Debian) DAV/2 SVN/1.5.6 mod_fastcgi/2.4.6
    Server Built: Jan 20 2009 17:25:53
    Module Magic Number: 20051115:15
    Timeouts: connection: 300    keep-alive: 300
    MPM Name: Worker
    MPM Information: Max Daemons: 3 Threaded: yes Forked: yes
    Server Architecture: 32-bit
    Server Root: /etc/apache2
    Config File: /etc/apache2/apache2.conf
    Server Built With: 
     -D APACHE_MPM_DIR="server/mpm/worker"
     -D APR_HAS_SENDFILE
     -D APR_HAS_MMAP
     -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
     -D APR_USE_SYSVSEM_SERIALIZE
     -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
     -D APR_HAS_OTHER_CHILD
     -D AP_HAVE_RELIABLE_PIPED_LOGS
     -D HTTPD_ROOT=""
     -D SUEXEC_BIN="/usr/lib/apache2/suexec"
     -D DEFAULT_PIDLOG="/var/run/apache2.pid"
     -D DEFAULT_ERRORLOG="logs/error_log"
     -D AP_TYPES_CONFIG_FILE="/etc/apache2/mime.types"
     -D SERVER_CONFIG_FILE="/etc/apache2/apache2.conf"
    

I guess one of the main cause of the "database is locked" issue is the Bitten plugin.

However, when I came back this morning, one of trac.fcgi process was eating 99.9% cpu time, and /etc/init.d/apache stop has not been able to shut it down: I had to use kill -9 to knock it down.

in reply to:  8 ; comment:9 by Christian Boos, 11 years ago

Replying to eblot:

I guess one of the main cause of the "database is locked" issue is the Bitten plugin.

Does it keep connections opened for a long time? Consider using fetchall() in places where it does a lot of processing while iterating over cursor results.

However, when I came back this morning, one of trac.fcgi process was eating 99.9% cpu time, and /etc/init.d/apache stop has not been able to shut it down: I had to use kill -9 to knock it down.

Too bad… gdb python + attaching that process could have shed some light on this…

If you're already using trunk, then one option would be to switch to MultiRepos and use the new hooks (fresh from today ;-) ) to perform the sync() offline instead of for each request. That would presumably put much less stress on the SQLite database. I don't know how Bitten operates, but maybe if it could be driven by the hooks (i.e. implement IChangesetListener), that would also help.

This is not to say that having such fcgi process going crazy is not a bug … but it's hard to tell what really happens without a backtrace. Most likely a timeout, then some busy loop trying to recover the situation?

in reply to:  9 ; comment:10 by Emmanuel Blot, 11 years ago

Replying to cboos:

Replying to eblot:

I guess one of the main cause of the "database is locked" issue is the Bitten plugin.

Does it keep connections opened for a long time? Consider using fetchall() in places where it does a lot of processing while iterating over cursor results.

I have not been through this part of the code, but I guess so. I had to ditch mod_python in favor of mod_fastcgi, as w/ mod_python, "database is locked" errors used to occur several time an hour. Since I moved to mod_fastcgi and I've filter out the SVN checkout message, this kind of error occurs only a very few times a month.

Too bad… gdb python + attaching that process could have shed some light on this…

Yes, but people were waiting to commit their work. Next time, I'll try to run gdb.

If you're already using trunk, then one option would be to switch to MultiRepos and use the new hooks (fresh from today ;-) ) to perform the sync() offline instead of for each request.

I have not had a look at the hooks yet, however if these are commit hooks, I have some work to do first: we use our own hooks, and they cannot be removed. I'll have a look and keep you posted.

I'm not sure to understand how the MultiRepos can help here (?).

Most likely a timeout, then some busy loop trying to recover the situation?

I can't tell, but next time I'll try to get the call stack.

in reply to:  10 comment:11 by anonymous, 11 years ago

Replying to eblot:

I have not had a look at the hooks yet, however if these are commit hooks, I have some work to do first: we use our own hooks, and they cannot be removed. I'll have a look and keep you posted.

Yes, this is actually a script to be called from the svn post-commit hook (source:sandbox/multirepos/contrib/trac-svn-post-commit-hook). Integrating it in an existing commit hook should always be possible (if not, I'm interested to know why).

I'm not sure to understand how the MultiRepos can help here (?).

Because there, we're going to get rid of the call to repos.sync() at the beginning of each request. Well, it's not finalized yet (see discussion at the bottom of #7723) but for now simply remove or comment out the preprocess request call in RepositoryManager.

in reply to:  10 comment:12 by Christian Boos, 10 years ago

Keywords: svn apr_terminate added; needinfo removed
Milestone: 0.11.5
Priority: normalhigh

Replying to eblot:

(…) next time I'll try to get the call stack.

Here we go:

(gdb) bt
#0  0xb7f4c424 in __kernel_vsyscall ()
#1  0xb7f35bc9 in __lll_lock_wait () from /lib/i686/cmov/libpthread.so.0
#2  0xb7f30ff9 in _L_lock_89 () from /lib/i686/cmov/libpthread.so.0
#3  0xb7f30902 in pthread_mutex_lock () from /lib/i686/cmov/libpthread.so.0
#4  0xb53139e1 in apr_thread_mutex_lock () from /usr/lib/libapr-1.so.0
#5  0xb5314988 in apr_pool_destroy () from /usr/lib/libapr-1.so.0
#6  0xb5314935 in apr_pool_destroy () from /usr/lib/libapr-1.so.0
#7  0xb53145a7 in apr_pool_terminate () from /usr/lib/libapr-1.so.0
#8  0xb53174ec in apr_terminate () from /usr/lib/libapr-1.so.0
#9  0xb7dc8b59 in exit () from /lib/i686/cmov/libc.so.6
#10 0xb7db077d in __libc_start_main () from /lib/i686/cmov/libc.so.6
#11 0x08058661 in _start ()

apr_terminate called by an atexit handler, once the fcgi application has been told to exit (#9), hangs.

comment:13 by Christian Boos, 10 years ago

Cc: Emmanuel Blot added

It looks like the swig bindings for svn do an atexit(apr_terminate) since 1.3.0 or so (in swigutil_py.c).

Maybe we shouldn't call it ourselves explicitly, then. Please apply the following change on your server and see if the problem happens again (you could send a few SIGHUPs to the python process after having browsed the repository, to exercise the code).

  • trac/versioncontrol/svn_fs.py

     
    7676                core.svn_node_file: Node.FILE}
    7777    # Protect svn.core methods from GC
    7878    Pool.apr_pool_clear = staticmethod(core.apr_pool_clear)
    79     Pool.apr_terminate = staticmethod(core.apr_terminate)
    8079    Pool.apr_pool_destroy = staticmethod(core.apr_pool_destroy)
    8180
    8281def _to_svn(*args):
     
    199198        # Clear application pool and terminate APR if necessary
    200199        if not self._parent_pool:
    201200            application_pool = None
    202             self.apr_terminate()
    203201
    204202        self._mark_invalid()

in reply to:  10 ; comment:14 by anonymous, 10 years ago

Replying to eblot:

Replying to cboos:

Replying to eblot:

I guess one of the main cause of the "database is locked" issue is the Bitten plugin.

Does it keep connections opened for a long time? Consider using fetchall() in places where it does a lot of processing while iterating over cursor results.

I have not been through this part of the code, but I guess so. I had to ditch mod_python in favor of mod_fastcgi, as w/ mod_python, "database is locked" errors used to occur several time an hour. Since I moved to mod_fastcgi and I've filter out the SVN checkout message, this kind of error occurs only a very few times a month.

did you ever try mod_wsgi as well? since we switched to it we do not have hanging apache threads any more eating up the cpu, which makes it the same speed as mod_python, just more stable.

in reply to:  14 comment:15 by Emmanuel Blot, 10 years ago

Replying to anonymous:

did you ever try mod_wsgi as well? since we switched to it we do not have hanging apache threads any more eating up the cpu, which makes it the same speed as mod_python, just more stable.

I moved away mod_python, as it made Trac pretty unusable when Bitten plugin was enabled ("Database is locked"). I never gave a try to mod_wsgi, but does it offer better performances than mod_fastcgi?

comment:16 by Christian Boos, 10 years ago

Resolution: fixed
Status: reopenedclosed

From eblot feedback, it seems that the problem was fixed by the patch comment:13.

Patch committed in [8281].

Of course, please reopen if the problem reappears (and if so, with a gdb backtrace please, like in comment:12! This is the only way we could eventually fix something)

comment:17 by Remy Blank, 10 years ago

Owner: set to Christian Boos

comment:18 by Christian Boos, 10 years ago

Milestone: 0.11.50.11.6
Resolution: fixed
Status: closedreopened

The recent troubles on t.e.o (slowness and all) might be related to this problem, which has not been completely fixed by r8281 (t.e.o is currently running r8354).

 PID USER     PR NI VIRT RES  SHR  S %CPU %MEM TIME+     COMMAND
9375 www-data 20 0 2051m 1.1g 4964 S  100 37.8 142:54.16 trac.fcgi

So 100% CPU, for more than two hours… That couldn't possibly be Genshi, right?

No, it's our good friend apr_pool_destroy:

(gdb) bt
2 	#0 0x00007fcf03641b04 in __lll_lock_wait () from /lib/libpthread.so.0
3 	#1 0x00007fcf0363d1a0 in _L_lock_102 () from /lib/libpthread.so.0
4 	#2 0x00007fcf0363caae in pthread_mutex_lock () from /lib/libpthread.so.0
5 	#3 0x00007fcef3c15b22 in apr_thread_mutex_lock () from /usr/lib/libapr-1.so.0
6 	#4 0x00007fcef3c175bf in ?? () from /usr/lib/libapr-1.so.0
7 	#5 0x00007fcef3c17d36 in apr_pool_destroy () from /usr/lib/libapr-1.so.0
8 	#6 0x00007fcef3c17c4f in apr_pool_destroy () from /usr/lib/libapr-1.so.0
9 	#7 0x00007fcef3c17c4f in apr_pool_destroy () from /usr/lib/libapr-1.so.0
10 	#8 0x00007fcef3c17c4f in apr_pool_destroy () from /usr/lib/libapr-1.so.0
11 	#9 0x00007fcef3c17c4f in apr_pool_destroy () from /usr/lib/libapr-1.so.0
12 	#10 0x00007fcef325f3c4 in ?? () from /var/lib/python-support/python2.5/libsvn/_core.so
13 	#11 0x0000000000491052 in PyEval_EvalFrameEx ()
14 	#12 0x00000000004920e2 in PyEval_EvalFrameEx ()
15 	#13 0x00000000004927cd in PyEval_EvalCodeEx ()
16 	#14 0x00000000004dd5b9 in ?? ()
17 	#15 0x0000000000418c33 in PyObject_Call ()
18 	#16 0x000000000048f927 in PyEval_EvalFrameEx ()
19 	#17 0x00000000004927cd in PyEval_EvalCodeEx ()
20 	#18 0x00000000004dd4c2 in ?? ()
21 	#19 0x0000000000418c33 in PyObject_Call ()
22 	#20 0x000000000048a852 in PyEval_CallObjectWithKeywords ()
23 	#21 0x00000000004b1e8e in Py_Finalize ()
24 	#22 0x000000000041431b in Py_Main ()
25 	#23 0x00007fcf02c54466 in __libc_start_main () from /lib/libc.so.6
26 	#24 0x0000000000413ca9 in _start () 

But this backtrace is significantly different from the one in comment:12 (no apr_terminate).

Note that we're about to exit the server, in Py_Finalize. This must be some clean-up code within the SWIG bindings calling this, probably on the binding's global pool.

comment:19 by Christian Boos, 10 years ago

See also the list of libraries.

I noticed there: -

Loaded symbols for /usr/lib/libldap_r-2.4.so.2
...
Loaded symbols for /usr/lib/libsasl2.so.2

Hm, why ldap here? Probably because of libpq (PostgreSQL client library can be configured --with-ldap support), but libmysqlclient_r was also loaded.

There was a report of a similar hang on the Subversion-devel list, svnserve: lock up when using SASL/LDAP. Unfortunately no follow-up. It might be related, don't know yet, still looking for clues.

comment:20 by Christian Boos, 10 years ago

Besides, from subversion/bindings/swig/python/libsvn/core.py:

class apr_pool_t:
    """Proxy of C apr_pool_t struct"""

   #...

   def destroy(self):
      """Destroy embedded memory pool. If you do not destroy
      the memory pool manually, Python will destroy it
      automatically."""
      global application_pool

      self.assert_valid()

      is_application_pool = not self._parent_pool

      # Destroy pool
      self._apr_pool_destroy(self)
      # Clear application pool if necessary
      if is_application_pool:
        application_pool = None
        self._svn_swig_py_clear_application_pool()

      # Mark self as invalid
      if hasattr(self, "_parent_pool"):
        del self._parent_pool
      if hasattr(self, "_is_valid"):
        del self._is_valid

    def __del__(self):
      """Automatically destroy memory pools, if necessary"""
      if self.valid():
        self.destroy()

So we might want to invalidate that global pool before it's getting destroyed automatically, apparently too late.

comment:21 by Felix Schwarz <felix.schwarz@…>, 10 years ago

Cc: felix.schwarz@… added

comment:22 by Christian Boos, 10 years ago

jonas: about r8281, munin's IRQ stats show a similar peak back in March, so before that changeset. Though it's not possible to known if this was the same issue we're seeing now, my feeling is that it is.

comment:23 by anonymous, 10 years ago

Don't know if it is any help, but since I reported this in the first place, and we didn't yet upgrade the server (not that easy…):

My issues seem to be related to opening rather large changesets, or when I open a few in seperate tabs. But it's really hard to pinpoint… Here's what top showed while the tabs where loading:

top - 18:04:22 up 388 days, 12:58,  1 user,  load average: 4.86, 1.71, 0.61
Tasks:  68 total,   1 running,  67 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.0% us,  1.0% sy,  0.0% ni,  0.0% id, 96.7% wa,  0.3% hi,  0.0% si
Mem:    507792k total,   502728k used,     5064k free,     7456k buffers
Swap:  1020088k total,   103036k used,   917052k free,    73688k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6557 wwwrun    15   0  106m  66m 8880 D  0.7 13.5   2:31.24 httpd2-prefork
14225 wwwrun    15   0 89224  57m 8672 D  0.7 11.5   1:00.90 httpd2-prefork
14443 wwwrun    15   0  101m  66m 8672 D  0.3 13.3   1:16.92 httpd2-prefork
20251 wwwrun    15   0 87636  56m 8932 D  0.3 11.4   1:05.53 httpd2-prefork
27381 wwwrun    18   0 71996  43m 8648 D  0.3  8.7   0:08.97 httpd2-prefork
27382 wwwrun    18   0 70744  41m 8628 D  0.3  8.4   0:04.22 httpd2-prefork

Interesting is the percentage the cpu is waiting…

in reply to:  23 ; comment:24 by Christian Boos, 10 years ago

Milestone: 0.11.60.11.7

Replying to anonymous:

Don't know if it is any help, but since I reported this in the first place, and we didn't yet upgrade the server (not that easy…):

My issues seem to be related to opening rather large changesets,

This is a known problem with Genshi, see explanations in ticket:7029#comment:1.

You may want to update to Genshi's trunk, which should be a bit faster (and that's an "easy" update). Still, upgrading from whatever Trac 0.11.x you have to 0.11.6dev (or wait a bit for the 0.11.6 release proper) shouldn't be a problem, no? Or if it is a problem, then it's our problem and we're interested to know what we did that prevents you to upgrade from one minor version to the next ;-)

or when I open a few in separate tabs. But it's really hard to pinpoint… Here's what top showed while the tabs where loading: … Interesting is the percentage the cpu is waiting…

As usual, it would make sense to hook into those processes with gdb and see what's going on, otherwise it's a bit too hard to guess…


For the issue discussed here in comment:18, we've not see that happen again on t.e.o, so no real way to troubleshoot that either.

On hold.

in reply to:  24 ; comment:25 by cmarschalek, 10 years ago

Replying to cboos:

You may want to update to Genshi's trunk, which should be a bit faster (and that's an "easy" update). Still, upgrading from whatever Trac 0.11.x you have to 0.11.6dev (or wait a bit for the 0.11.6 release proper) shouldn't be a problem, no? Or if it is a problem, then it's our problem and we're interested to know what we did that prevents you to upgrade from one minor version to the next ;-)

I ment the upgrade you suggested here: #comment:2
Our trac is always up to date, and upgrading works just fine ;)

As usual, it would make sense to hook into those processes with gdb and see what's going on, otherwise it's a bit too hard to guess…

I'll try doing that!

in reply to:  25 comment:26 by Christian Boos, 10 years ago

Replying to cmarschalek:

Replying to cboos: I ment the upgrade you suggested here: #comment:2
Our trac is always up to date, and upgrading works just fine ;)

Ah great, thanks for the clarification!

But still, don't expect the best performances if you stick to using sqlite 2… Note that we removed support for it in 0.12 and 0.11.6 will give you a deprecation warning.

The upgrade is really easy (PySqlite#UpgradingSQLitefrom2.xto3.x) and installing the new pysqlite bindings should be easy as well.

mod_python < 3.2 is possibly what gives you the main performance trouble, after all.

Installing mod_wsgi is also really straightforward (./configure --with-apxs=/yourapacheinstall/bin/apxs; make install), and can coexist with your legacy mod_python apps if any.

As usual, it would make sense to hook into those processes with gdb and see what's going on, otherwise it's a bit too hard to guess…

I'll try doing that!

comment:27 by cmarschalek, 10 years ago

I always planned on upgrading the whole server to a newer os-version, but I'll try upgrading the components, as you suggested. In the meantime, here's a bt from on of the http2-prefork processes. Due to lack of debugging information, the bt is a little scarce, but sqlite seems to be a problem (we have >17000 changesets btw).

(gdb) bt
#0  0xffffe410 in ?? ()
#1  0xbfffcbb8 in ?? ()
#2  0x00000400 in ?? ()
#3  0x095e8400 in ?? ()
#4  0x40349e93 in __read_nocancel () from /lib/tls/libpthread.so.0
#5  0x4249aa63 in sqliteOsRead () from /usr/lib/libsqlite.so.0
#6  0x4249d1ac in sqlitepager_get () from /usr/lib/libsqlite.so.0
#7  0x4248666e in moveToChild () from /usr/lib/libsqlite.so.0
#8  0x42486799 in moveToLeftmost () from /usr/lib/libsqlite.so.0
#9  0x424b2898 in sqliteVdbeExec () from /usr/lib/libsqlite.so.0
#10 0x4248ef90 in sqliteExec () from /usr/lib/libsqlite.so.0
#11 0x424a0806 in sqliteParser () from /usr/lib/libsqlite.so.0
#12 0x424a9461 in sqliteRunParser () from /usr/lib/libsqlite.so.0
#13 0x42499d51 in sqliteMain () from /usr/lib/libsqlite.so.0
#14 0x424623c9 in _con_execute () from /usr/lib/python2.3/site-packages/_sqlite.so
#15 0x4061d293 in PyCFunction_Call () from /usr/lib/apache2/mod_python.so
#16 0x4064be34 in eval_frame () from /usr/lib/apache2/mod_python.so
#17 0x4064d6ae in PyEval_EvalCodeEx () from /usr/lib/apache2/mod_python.so
#18 0x4064bdce in eval_frame () from /usr/lib/apache2/mod_python.so
#19 0x4064d6ae in PyEval_EvalCodeEx () from /usr/lib/apache2/mod_python.so
#20 0x4064bdce in eval_frame () from /usr/lib/apache2/mod_python.so
#21 0x4064d6ae in PyEval_EvalCodeEx () from /usr/lib/apache2/mod_python.so
#22 0x4064bdce in eval_frame () from /usr/lib/apache2/mod_python.so
#23 0x4064d6ae in PyEval_EvalCodeEx () from /usr/lib/apache2/mod_python.so
#24 0x4064bdce in eval_frame () from /usr/lib/apache2/mod_python.so
#25 0x4064d6ae in PyEval_EvalCodeEx () from /usr/lib/apache2/mod_python.so
#26 0x4064bdce in eval_frame () from /usr/lib/apache2/mod_python.so
#27 0x4064d050 in eval_frame () from /usr/lib/apache2/mod_python.so
#28 0x4064d050 in eval_frame () from /usr/lib/apache2/mod_python.so
#29 0x4064d050 in eval_frame () from /usr/lib/apache2/mod_python.so
#30 0x4064d050 in eval_frame () from /usr/lib/apache2/mod_python.so
#31 0x4064d050 in eval_frame () from /usr/lib/apache2/mod_python.so
#32 0x4064d050 in eval_frame () from /usr/lib/apache2/mod_python.so
#33 0x4064d6ae in PyEval_EvalCodeEx () from /usr/lib/apache2/mod_python.so
#34 0x4068ca5a in function_call () from /usr/lib/apache2/mod_python.so
#35 0x40603077 in PyObject_Call () from /usr/lib/apache2/mod_python.so
#36 0x4060acab in instancemethod_call () from /usr/lib/apache2/mod_python.so
#37 0x40603077 in PyObject_Call () from /usr/lib/apache2/mod_python.so
#38 0x4060394f in PyObject_CallMethod () from /usr/lib/apache2/mod_python.so
#39 0x405ffaf4 in python_handler () from /usr/lib/apache2/mod_python.so
#40 0x08076ce8 in ap_run_handler ()
#41 0x080a11a0 in num_request_notes ()
#42 0x080bc690 in ?? ()
#43 0x080a24a8 in __JCR_LIST__ ()
#44 0x08c326c0 in ?? ()
#45 0x08c326c0 in ?? ()
#46 0xbfffeeb8 in ?? ()
#47 0x0807a537 in ap_invoke_handler ()
Previous frame inner to this frame (corrupt stack?)
(gdb)

comment:28 by Christian Boos, 10 years ago

I'm not sure the above backtrace is significative of that problem, but you may simply experience the poor handling of concurrent queries that SQLite / PySqlite used to have.

Upgrading to a recent SQLite 3, Pysqlite 2.5.x and the "eager cursor" behavior of Trac 0.11.6dev (#3446) would certainly be a win for you.

comment:29 by cmarschalek, 10 years ago

Seems like upgrading sqlite and pysqlite has boosted performance a lot (and was quite easy). Now I wonder how much the "eager cursor" in 0.11.6 will help.

Thanks for now!

comment:30 by cmarschalek, 10 years ago

What's also interesting is, that subsequent clicks on the same changset load A LOT faster than the initial click!

comment:31 by Christian Boos, 10 years ago

Description: modified (diff)
Milestone: 0.11.7not applicable

Ok, it didn't happen since a while and as it's not something we can reproduce at will, I'll move that ticket out of sight…

in reply to:  16 comment:32 by Christian Boos, 10 years ago

Milestone: not applicablenext-minor-0.12.x

Replying to cboos:

From eblot feedback, it seems that the problem was fixed by the patch comment:13.

… reopen if the problem reappears

Avoiding to call apr_terminate was not enough. What actually happens is that there's a deadlock between two apr_pool_destroy call.

(gdb) bt
#0  0x00007f6b21f54b04 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007f6b21f501a0 in _L_lock_102 () from /lib/libpthread.so.0
#2  0x00007f6b21f4faae in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x00007f6b168c9b22 in apr_thread_mutex_lock () from /usr/lib/libapr-1.so.0
#4  0x00007f6b168cbca6 in apr_pool_destroy () from /usr/lib/libapr-1.so.0
#5  0x00007f6b168cbc4f in apr_pool_destroy () from /usr/lib/libapr-1.so.0
#6  0x00007f6b168cb874 in apr_pool_terminate () from /usr/lib/libapr-1.so.0
#7  0x00007f6b168cd13c in apr_terminate () from /usr/lib/libapr-1.so.0
#8  0x00007f6b2157f6ad in exit () from /lib/libc.so.6
#9  0x00007f6b2156746d in __libc_start_main () from /lib/libc.so.6
#10 0x0000000000413ca9 in _start ()
(gdb) info threads
  5 Thread 0x43cc6950 (LWP 20364)  0x00007f6b21f542f1 in sem_wait () from /lib/libpthread.so.0
  4 Thread 0x434c5950 (LWP 20365)  0x00007f6b21f542f1 in sem_wait () from /lib/libpthread.so.0
  3 Thread 0x44cc8950 (LWP 20366)  0x00007f6b168cb635 in ?? () from /usr/lib/libapr-1.so.0
  2 Thread 0x454c9950 (LWP 24553)  0x00007f6b21f542f1 in sem_wait () from /lib/libpthread.so.0
  1 Thread 0x7f6b223776e0 (LWP 13309)  0x00007f6b21f54b04 in __lll_lock_wait () from /lib/libpthread.so.0
(gdb) thread apply 3 bt

Thread 3 (Thread 0x44cc8950 (LWP 20366)):
#0  0x00007f6b168cb635 in ?? () from /usr/lib/libapr-1.so.0
#1  0x00007f6b168cbd36 in apr_pool_destroy () from /usr/lib/libapr-1.so.0
#2  0x00007f6b15f133c4 in ?? () from /var/lib/python-support/python2.5/libsvn/_core.so
#3  0x0000000000491052 in PyEval_EvalFrameEx ()
#4  0x00000000004920e2 in PyEval_EvalFrameEx ()
#5  0x00000000004927cd in PyEval_EvalCodeEx ()
#6  0x00000000004dd4c2 in ?? ()
#7  0x0000000000418c33 in PyObject_Call ()
#8  0x000000000041fb08 in ?? ()
#9  0x0000000000418c33 in PyObject_Call ()
#10 0x000000000048a852 in PyEval_CallObjectWithKeywords ()
#11 0x0000000000422c3b in ?? ()
#12 0x0000000000446f1f in PyDict_DelItem ()
#13 0x0000000000420854 in ?? ()
#14 0x000000000044a719 in PyObject_SetAttr ()
#15 0x000000000048df2a in PyEval_EvalFrameEx ()
#16 0x00000000004920e2 in PyEval_EvalFrameEx ()
#17 0x00000000004927cd in PyEval_EvalCodeEx ()
#18 0x00000000004dd4c2 in ?? ()
#19 0x0000000000418c33 in PyObject_Call ()
#20 0x000000000041fb08 in ?? ()
#21 0x0000000000418c33 in PyObject_Call ()
#22 0x000000000048a852 in PyEval_CallObjectWithKeywords ()
#23 0x0000000000422c3b in ?? ()
#24 0x0000000000446d23 in ?? ()
#25 0x000000000045f47b in ?? ()
#26 0x0000000000446d23 in ?? ()
#27 0x000000000045f47b in ?? ()
#28 0x00000000004dbcd3 in ?? ()
#29 0x00000000004db02a in ?? ()
#30 0x000000000043b84e in ?? ()
#31 0x000000000043bd90 in ?? ()
#32 0x000000000046df6b in ?? ()
#33 0x0000000000418c33 in PyObject_Call ()
#34 0x000000000048f582 in PyEval_EvalFrameEx ()
#35 0x00000000004920e2 in PyEval_EvalFrameEx ()
#36 0x00000000004920e2 in PyEval_EvalFrameEx ()
#37 0x00000000004920e2 in PyEval_EvalFrameEx ()
#38 0x00000000004927cd in PyEval_EvalCodeEx ()
#39 0x0000000000490cf3 in PyEval_EvalFrameEx ()
#40 0x00000000004927cd in PyEval_EvalCodeEx ()
#41 0x0000000000490cf3 in PyEval_EvalFrameEx ()
#42 0x00000000004920e2 in PyEval_EvalFrameEx ()
#43 0x00000000004920e2 in PyEval_EvalFrameEx ()
#44 0x00000000004920e2 in PyEval_EvalFrameEx ()
#45 0x00000000004920e2 in PyEval_EvalFrameEx ()
#46 0x00000000004927cd in PyEval_EvalCodeEx ()
#47 0x00000000004dd4c2 in ?? ()
#48 0x0000000000418c33 in PyObject_Call ()
#49 0x000000000041fb08 in ?? ()
#50 0x0000000000418c33 in PyObject_Call ()
#51 0x000000000048a852 in PyEval_CallObjectWithKeywords ()
#52 0x00000000004be71d in ?? ()
#53 0x00007f6b21f4e3ea in start_thread () from /lib/libpthread.so.0
#54 0x00007f6b2162fcbd in clone () from /lib/libc.so.6
#55 0x0000000000000000 in ?? ()
Version 0, edited 10 years ago by Christian Boos (next)

comment:33 by Christian Boos, 10 years ago

Interesting, I just got the following backtrace + crash on Windows, when stopping tracd with CTRL-C:

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "C:\Dev\Python261\lib\atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "C:\Dev\Python261\lib\site-packages\svn\core.py", line 75, in _cleanup_application_pool
    application_pool.destroy()
  File "C:\Dev\Python261\lib\site-packages\libsvn\core.py", line 6119, in destroy
    self._svn_swig_py_clear_application_pool()
KeyboardInterrupt
Error in sys.exitfunc:
Traceback (most recent call last):
  File "C:\Dev\Python261\lib\atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "C:\Dev\Python261\lib\site-packages\svn\core.py", line 75, in _cleanup_application_pool
    application_pool.destroy()
  File "C:\Dev\Python261\lib\site-packages\libsvn\core.py", line 6119, in destroy
    self._svn_swig_py_clear_application_pool()
KeyboardInterrupt

That's the first time, so this is a rare event given the number of times I did this in the past couple years, but maybe there's a connection with the first backtrace seen in comment:32 above.

comment:34 by anyeos@…, 10 years ago

Nobody just payed attention in the client side? Just try to load the page with "nc" (netcat).

In my problem (I guess it is not the same as it bug but may be related in the client side) I noticed a very very slow page load in all pages of the trac 0.12b1 with all the rest (Python, SQLite, etc) updated. A very very slow page loading in all sections (no matters where). I just guessed it is a slowdown in trac.

I'm using trac.fcg in an apache web server with mod-fastcgi. It was working perfect in my old server (software, the hardware is new 4GB RAM dual channel and so). I just do an software upgrade (Installed Ubuntu 9.10 server) and reconfigured all the system to the new distro (I was using LFS an oldest one 5 years old).

I noticed that loading trac.fcg pages was very slow from my own computer!

I tried with top, gdb, and a lot of other workarounds to take account in what happening there. top just showed 0% cpu load most of the time. The global system load was near 0. So?? Where is the problem? The system have full power to server the request and? Maybe a network problem… I tested the communication with fastcgi spawned processes and all is ok/faster enought! Too fast for my like!

So I decided to load that page from netcat:

nc servidor 80
GET /page/trac.fcg/proyect

What I get is the page what I was espectind but very fast! So the problem is my client navigator: Firefox. I tested with Epiphany and yes, Epiphany loaded it very faster. Continiung the tests I noticed a firefox halt in:

<script type="text/javascript">/*<![CDATA[*/
      jQuery(document).ready(function($) {
        var desc = $("form.newticket textarea");
        desc.text(desc.eq(0).text()
                  .replace(/==== System Information ====\s+/m,
                           "User Agent was: `" + navigator.userAgent + "`\n\n$&")
                  .replace(/\|\|\s+==== Enabled Plugins ====/m,
                           "||\n|| '''`jQuery`''' || `" + $().jquery + "` $&"));
      });
    /*]]>*/</script>

mmm, that maybe the problem.

I will try updating jquery so including an updated version in my own Trac system. Why I'm writting this here? Because you need to pay attention too to the client side. Some slowdowns may be clientside scripts, so you must too correct that scripts for that. And why not rewrite some portions to be more webbrowser friendly.

Last edited 4 years ago by Jun Omae (previous) (diff)

comment:35 by anyeos@…, 10 years ago

I investigated my case in the clientside and was a problem specific with firefox (tested with google chrome, epiphany, and netcat in command line working well). Not tested with any more browsers but I discovered the problem. In fact, it is clientside but too server related. A combination of the deflate module in apache and Trac with fastcgi does not work properly in firefox. I guess it is because firefox is the only webbrowser what I have supporting for gzip encoded data. I used wireshark to discover that. And just discovered that firefox accepted gzip encoding receiving some files like CSS from the server encoded in gzip instead in plaintext form. Giving it a size near 4156 bytes… vs 17K that is actually the file is a great choice but:

I recommend to test the combination of mod_deflate with mod_fastcgi and the fcgi_frontend from Trac. Because that was my case of a slow Trac webpage loading in firefox. As a workaround avoid using that combination.

comment:36 by anonymous, 9 years ago

can someone help me …i've problem in my trac…trac application is not able to accesss so i need to restart the trac application… can you please help me how i can troubleshoot it. when i had check in syslog it shows 0 unreachable objects found.

in reply to:  36 comment:37 by Christian Boos, 9 years ago

Replying to anonymous:

can someone help me …

Hijacking a random ticket is definitely not the proper way to get some help. Please read the InstallationIssue page.

comment:38 by Ryan J Ollos, 5 years ago

Milestone: next-minor-0.12.xnext-stable-1.0.x

comment:39 by Ryan J Ollos, 4 years ago

Owner: Christian Boos removed
Status: reopenednew

comment:40 by figaro, 4 years ago

Keywords: performance added; slow removed

comment:41 by figaro, 4 years ago

Priority: highnormal

Issue not reported in over 5 years, workaround available: changed priority from high to normal.

comment:42 by Ryan J Ollos, 3 years ago

Milestone: next-stable-1.0.xnext-stable-1.2.x

Moved ticket assigned to next-stable-1.0.x since maintenance of 1.0.x is coming to a close. Please move the ticket back if it's critical to fix on 1.0.x.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The ticket will remain with no owner.
The ticket will be disowned. Next status will be 'new'.
as The resolution will be set. Next status will be 'closed'.
The owner will be changed from (none) to anonymous. Next status will be 'assigned'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.