Edgewall Software
Modify

Opened 13 years ago

Closed 13 years ago

Last modified 5 years ago

#3996 closed defect (fixed)

Browse Source takes forever, no debug output

Reported by: Markus Tacker <m@…> Owned by: Christian Boos
Priority: highest Milestone: 0.10.2
Component: version control/browser Version: 0.10
Severity: major Keywords: svn
Cc: tjb@…, trudolphi@…, rhind@… Branch:
Release Notes:
API Changes:

Description

So, this is kinda weird.

This problem exists on 0.10-stable and trunk with tracd and apache.

I have two repositories (one very small, one medium sized) where I get no response when I try to access the Browser Source page. The request just hangs there (I can see it in apaches server-status).

Nothing is in debug log except the info that the egg files are loaded. The repo itself is ok and has no error.

Attachments (2)

#3996-tacker.txt (17.9 KB ) - added by Markus Tacker <m@…> 13 years ago.
Full gdb session of hanging apache process
#3996-tacker-trac-gdb.txt (3.7 KB ) - added by Markus Tacker <m@…> 13 years ago.
GDB session of tracd

Download all attachments as: .zip

Change History (36)

comment:1 by Christian Boos, 13 years ago

Keywords: svn added

This problem exists on 0.10-stable and trunk with tracd and apache.

So this must be an issue with the bindings.

Can you access that repos by other means?

(+ is it fsfs or bdb, local disk or network?)

comment:2 by Markus Tacker <m@…>, 13 years ago

It's fsfs, I can access (co, ls) it via file:// and via Apache.

comment:3 by tjb@…, 13 years ago

Cc: tjb@… added

We have the same problem, and when Browse Source hangs, no other access to Trac is possible!

We are running on SLES9, postgres backend, swig 1.3.27, apache 2.0.49, pyPgSql 2.4, Subversion 1.3.2

comment:4 by tjb@…, 13 years ago

Severity: normalmajor

Further to the above, we also run ViewVC, which also uses the python-subversion bindings, and works fine. Because of the potential to lock up our (internal) web server, we see this as kind of urgent…

comment:5 by Christian Boos, 13 years ago

Milestone: 0.10.1

… we see this as kind of urgent…

I understand, however you must also understand that we can't really make progress on this until either:

  1. we're able to reproduce that ourselves. I can't, apparently neither can the other developers
  2. we have some understanding of what might be going on. For that, we need more information, like where it hangs. You should try your best to get a stacktrace of the server process which is hanging, either by issuing an interrupt to tracd and then get the Python backtrace, or by attaching to the Apache process using GDB (if you happen to use Linux) then get the C backtrace.

Waiting for more input on your part, here are a few wild guesses:

in reply to:  5 comment:6 by Markus Tacker <m@…>, 13 years ago

Replying to cboos:

You should try your best to get a stacktrace of the server process which is hanging, either by issuing an interrupt to tracd and then get the Python backtrace, or by attaching to the Apache process using GDB (if you happen to use Linux) then get the C backtrace.

How do I do this?

comment:7 by Christian Boos, 13 years ago

I guess it's time to follow your advice about creating a general TracTroubleshooting page ;)

in reply to:  7 comment:8 by markus, 13 years ago

Replying to cboos:

I guess it's time to follow your advice about creating a general TracTroubleshooting page ;)

That advice was made by another Markus. ;-)

comment:9 by Christian Boos, 13 years ago

Oh! Sorry ;)

I apparently share Trac's case insensitive search mode ;)

comment:10 by Christian Boos, 13 years ago

Keywords: needinfo added

Does this still happen?

Also, have you tried the latest of source:branches/0.10-stable, which contains two bugfixes for patterns in the Wiki formatter which could have taken an exponentially long time to complete?

comment:11 by tjb@…, 13 years ago

Sorry for the silence, I have been absent on a course for a few days. As at r4083, we still cannot Browse Source. The web server no longer seems to hang however.

If I get some time soon, I will try to produce some sort of dump for you too look at. if I can reproduce this with a tracd, I'll do that.

comment:12 by Markus Tacker <m@…>, 13 years ago

This is the backtrace of the httpd process while it hangs in browser:

#0  0x0044d402 in ?? ()
#1  0x00bf42fe in sem_wait@GLIBC_2.0 () from /lib/libpthread.so.0
#2  0x0106c9cb in PyThread_acquire_lock () from /usr/lib/libpython2.4.so.1.0
#3  0x0107092e in initthread () from /usr/lib/libpython2.4.so.1.0
#4  0x0100dd2d in PyCFunction_Call () from /usr/lib/libpython2.4.so.1.0
#5  0x01048e2b in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#6  0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#7  0x01049758 in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0
#8  0x01047b0b in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#9  0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#10 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#11 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#12 0x01049758 in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0
#13 0x01047b0b in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#14 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#15 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#16 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#17 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#18 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#19 0x01049758 in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0
#20 0x00ffb4ea in PyClassMethod_New () from /usr/lib/libpython2.4.so.1.0
#21 0x00fe3587 in PyObject_Call () from /usr/lib/libpython2.4.so.1.0
#22 0x00fe9b78 in PyClass_IsSubclass () from /usr/lib/libpython2.4.so.1.0
#23 0x00fe3587 in PyObject_Call () from /usr/lib/libpython2.4.so.1.0
#24 0x00fe3f38 in PyObject_CallMethod () from /usr/lib/libpython2.4.so.1.0
#25 0x00bd3307 in init_apache () from /etc/httpd/modules/mod_python.so
#26 0x00a371fd in ap_run_handler () from /usr/sbin/httpd
#27 0x00a3ab98 in ap_invoke_handler () from /usr/sbin/httpd
#28 0x00a46a9e in ap_process_request () from /usr/sbin/httpd
#29 0x00a4389f in ap_register_input_filter () from /usr/sbin/httpd
#30 0x00a3f07d in ap_run_process_connection () from /usr/sbin/httpd
#31 0x00a3f17c in ap_process_connection () from /usr/sbin/httpd
#32 0x00a4b052 in ap_graceful_stop_signalled () from /usr/sbin/httpd
#33 0x00a4b2c4 in ap_graceful_stop_signalled () from /usr/sbin/httpd
#34 0x00a4c1d9 in ap_mpm_run () from /usr/sbin/httpd
#35 0x00a22a9a in main () from /usr/sbin/httpd

Full gdb session will be attached.

by Markus Tacker <m@…>, 13 years ago

Attachment: #3996-tacker.txt added

Full gdb session of hanging apache process

comment:13 by Markus Tacker <m@…>, 13 years ago

Cc: trudolphi@… added

comment:14 by Christian Boos, 13 years ago

Could you also try to get the Python back trace, using the gdbinit script (as explained in TracTroubleshooting#SystemErrors)?

Is this with trunk or 0.10-stable?

Now both have the database timeout parameter (r3919 and r4167, timeout value defaults to 20 seconds). Do you now hit that timeout or is it something else?

Finally, does this happen systematically when trying to go to the TracBrowser, or does that happen only sometimes, perhaps just after a commit?

by Markus Tacker <m@…>, 13 years ago

Attachment: #3996-tacker-trac-gdb.txt added

GDB session of tracd

in reply to:  14 ; comment:15 by Markus Tacker <m@…>, 13 years ago

Replying to cboos:

Could you also try to get the Python back trace, using the gdbinit script (as explained in TracTroubleshooting#SystemErrors)?

I am getting no output. See end of #3996-tacker-trac-gdb.txt

Is this with trunk or 0.10-stable?

trunk

Now both have the database timeout parameter (r3919 and r4167, timeout value defaults to 20 seconds). Do you now hit that timeout or is it something else?

No, I do not hit a timeout.

Finally, does this happen systematically when trying to go to the TracBrowser, or does that happen only sometimes, perhaps just after a commit?

This happens every time I try to access the TracBrowser or the Timeline.

in reply to:  15 ; comment:16 by Christian Boos, 13 years ago

Replying to Markus Tacker <m@tacker.org>:

Replying to cboos:

Could you also try to get the Python back trace, using the gdbinit script (as explained in TracTroubleshooting#SystemErrors)?

I am getting no output. See end of #3996-tacker-trac-gdb.txt

This is because you first have to move to a Python frame, e.g. do:

gdb> f 3
gdb> pystack

This happens every time

But only for some repositories, according to what you said in the comment:description, right? There's really nothing that distinguishes those repositories from those which are working? (present on a mounted drive or not, different permissions, different filesystem, etc.)

comment:17 by Christian Boos, 13 years ago

Ah, also I didn't pay enough attention, in the tracd case you're not watching the "hanged" thread, rather you're on the one which waits on accept (and this is normal).

You can switch to a different by issuing an info threads command, followed by a thread n where n is the identifier of the thread. Identify the appropriate thread by performing a bt on each.

comment:18 by Christian Boos, 13 years ago

example:

(gdb) info threads
  2 Thread 1075841376 (LWP 26201)  0x0000002a958d25af in __accept_nocancel ()
   from /lib64/tls/libpthread.so.0
* 1 Thread 182907738880 (LWP 26200)  0x0000002a960f5fe6 in __select_nocancel ()
   from /lib64/tls/libc.so.6
(gdb) thread 2
[Switching to thread 2 (Thread 1075841376 (LWP 26201))]#0  0x0000002a958d25af in __accept_nocancel () from /lib64/tls/libpthread.so.0
(gdb) bt
#0  0x0000002a958d25af in __accept_nocancel () from /lib64/tls/libpthread.so.0
#1  0x0000002a964a5fd7 in sock_accept (s=0x2a97e01190)
    at /usr/local/src/Python-2.4.2/Modules/socketmodule.c:1368
#2  0x0000002a9570a183 in PyEval_EvalFrame (f=0x5ce650) at ceval.c:3542
...
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) f 2
#2  0x0000002a9570a183 in PyEval_EvalFrame (f=0x5ce650) at ceval.c:3542
3542                                    C_TRACE(x, (*meth)(self,NULL));
(gdb) pystack
/opt/python-2.4.2/lib/python2.4/socket.py (170): accept
/opt/python-2.4.2/lib/python2.4/SocketServer.py (170): get_request
...
(gdb)

Here I switched on the thread doing the accept, because I don't have a hanging thread ;)

in reply to:  16 comment:19 by Markus Tacker <m@…>, 13 years ago

ATM I am recompiling python as I get

(gdb) pystack No symbol table is loaded. Use the "file" command.

The working and non-working repos are on the same fs, with the same permission and so. Plan SVN access to them is working.

comment:20 by Markus Tacker <m@…>, 13 years ago

I just noticed another thing.

The problem does not depend on the repository but on the trac environment.

I created a new environent for the same repository and there browser and timeline work.

comment:21 by the.ajarn@…, 13 years ago

Summary: Browse Source takes forever, no debug output, only some reposBrowse Source takes forever, no debug outpu

I have the same problem, using trunk.

The problem only happens after assigning FineGrainedPermissions. The last thing in the trac-log is '2006-11-08 19:05:13,434 Trac[main] DEBUG: Dispatching <Request "GET /browser">' and it doesn't send the GET to the webserver (watched httpd-access.log).

It doesn't matter what is in the authz_file. If I set authz_file≤blank> then I can see the timeline again. If authz_file≤invalid_path> it does the same thing, and just hangs. Maybe it's a permission issue?

comment:22 by the.ajarn@…, 13 years ago

Summary: Browse Source takes forever, no debug outpuBrowse Source takes forever, no debug output

Sorry, not a permission issue. But everything else I said hopefully will help.

comment:23 by supermac@…, 13 years ago

I am confronted with the same problem, "Browse Sources", "Timeline" and "Admin" take forever. I can access my repository in a normal way outside from pyhon…

comment:24 by Russell Hind <rhind@…>, 13 years ago

Cc: rhind@… added

comment:25 by sam@…, 13 years ago

If you have a :

authz_file = /something

line in your trac.ini, you might want to empty it. It seems that there's a problem when reading this file, or when checking permissions, or … well, I didn't really investigate, but removing it makes the timeline and browse source links works again.

comment:26 by simon-code@…, 13 years ago

I just upgraded from 0.10 to 0.10.1, and experience the same error. Repos is fully functional with fine-grained permissions when browsing directly (Apache http svn browser), but the Trac browser and timeline (with changeset enabled) just hangs when accessed.

I have reviewed all changes to version control source, and have identified these changes as the ones that make the modules hang:

  • trac/versioncontrol/svn_authz.py

    old new  
    3939
    4040    module_name = env.config.get('trac', 'authz_module_name')
    41     db = env.get_db_cnx()
    42     return RealSubversionAuthorizer(db, authname, module_name, authz_file)
     41    repos = env.get_repository()
     42    return RealSubversionAuthorizer(repos, authname, module_name, authz_file)
    4343
    4444def parent_iter(path):
     
    6565    conf_authz = None
    6666
    67     def __init__(self, db, auth_name, module_name, cfg_file, cfg_fp=None):
    68         self.db = db
     67    def __init__(self, repos, auth_name, module_name, cfg_file, cfg_fp=None):
     68        self.repos = repos
    6969        self.auth_name = auth_name
    7070        self.module_name = module_name

When I revert just these 2 changes to 0.10 behavior, I can access the browser with support for fine-grained permissions. However, when I access the changesets, I get an error as likely new code expect this attribute (AttributeError: 'RealSubversionAuthorizer' object has no attribute 'repos').

Now, someone with more intimate knowledge will likely be able to find what and why this goes wrong, but with the change from env.get_db_cnx() to env.get_repository() there might be some code left that expects the old db behaviour?

in reply to:  25 comment:27 by Markus Tacker <m@…>, 13 years ago

Replying to sam@bulix.org:

If you have a :

authz_file = /something

line in your trac.ini, you might want to empty it. It seems that there's a problem when reading this file, or when checking permissions, or … well, I didn't really investigate, but removing it makes the timeline and browse source links works again.

Confirmed.

in reply to:  26 comment:28 by simon-code@…, 13 years ago

Replying to simon-code@bvnetwork.no:

I have reviewed all changes to version control source, and have identified these changes as the ones that make the modules hang:

Actually, svn_authz.py contains 3 changes when going from 0.10 to 0.10.1. When I revert the file to 0.10 (and keeping all other files 0.10.1), everything works fine. I can browse, view revision logs and detailed diffs. No errors at all.

comment:29 by Christian Boos, 13 years ago

Keywords: needinfo removed
Priority: normalhighest

Ok, thanks for the detailed investigations.

In r3971, I didn't look closely at the svn_authz.py changes. There are clearly wrong, as get_repository() itself tries to create a SubversionAuthorizer, so we're in a loop. The loop doesn't manifest itself because we're caught on trying to acquire a non-reentrant lock before that ;)

comment:30 by Christian Boos, 13 years ago

Resolution: fixed
Status: newclosed

I think this should now be fixed by [4200] (trunk) and [4201] (0.10-stable).

Sorry for the inconvenience, we should have noticed this before (ideally before the 0.10.1 release), but I think none of the developers was using authz on a regular basis … this is now also "fixed", as my authz.conf file is going to stay ;)

comment:31 by dkg-debian.org@…, 13 years ago

I got bit by this same issue this morning on one of my 0.10.1 trac installs that uses authz and has an SVN repository attached. [4201] seems to resolve the problem for me.

this bug actually caused my apache processes to hang indefinitely, which eventually chewed up all available apache instances, which DoS'ed the rest of the stuff that web server was serving. Thanks for having a fix worked out and posted by the time i went searching for it!

comment:32 by Markus Tacker <m@…>, 13 years ago

Fix confirmed in r4210.

comment:33 by dkg-debian.org@…, 13 years ago

i needed to apply [4207] as well as [4201] to the above-mentioned 0.10.1 install to complete the fix. Turns out [4201] breaks some pageviews, including search output.

comment:34 by the.ajarn@…, 13 years ago

I knew I was onto something. Good work guys.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.