#3996 closed defect (fixed)
Browse Source takes forever, no debug output
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | highest | Milestone: | 0.10.2 |
Component: | version control/browser | Version: | 0.10 |
Severity: | major | Keywords: | svn |
Cc: | tjb@…, trudolphi@…, rhind@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
So, this is kinda weird.
This problem exists on 0.10-stable and trunk with tracd and apache.
I have two repositories (one very small, one medium sized) where I get no response when I try to access the Browser Source page. The request just hangs there (I can see it in apaches server-status).
Nothing is in debug log except the info that the egg files are loaded. The repo itself is ok and has no error.
Attachments (2)
Change History (36)
comment:1 by , 18 years ago
Keywords: | svn added |
---|
comment:3 by , 18 years ago
Cc: | added |
---|
We have the same problem, and when Browse Source hangs, no other access to Trac is possible!
We are running on SLES9, postgres backend, swig 1.3.27, apache 2.0.49, pyPgSql 2.4, Subversion 1.3.2
comment:4 by , 18 years ago
Severity: | normal → major |
---|
Further to the above, we also run ViewVC, which also uses the python-subversion bindings, and works fine. Because of the potential to lock up our (internal) web server, we see this as kind of urgent…
follow-up: 6 comment:5 by , 18 years ago
Milestone: | → 0.10.1 |
---|
… we see this as kind of urgent…
I understand, however you must also understand that we can't really make progress on this until either:
- we're able to reproduce that ourselves. I can't, apparently neither can the other developers
- we have some understanding of what might be going on. For that, we need more information, like where it hangs. You should try your best to get a stacktrace of the server process which is hanging, either by issuing an interrupt to
tracd
and then get the Python backtrace, or by attaching to the Apache process using GDB (if you happen to use Linux) then get the C backtrace.
Waiting for more input on your part, here are a few wild guesses:
- it might be related to the #2611 issue (see TracSubversion#TracandSubversion1.3.1); so far this problem didn't manifest itself with "hangs", but who knows…
- it might be the Apache process who "spikes" at 100% cpu usage, as reported and discussed in this thread
comment:6 by , 18 years ago
Replying to cboos:
You should try your best to get a stacktrace of the server process which is hanging, either by issuing an interrupt to
tracd
and then get the Python backtrace, or by attaching to the Apache process using GDB (if you happen to use Linux) then get the C backtrace.
How do I do this?
follow-up: 8 comment:7 by , 18 years ago
I guess it's time to follow your advice about creating a general TracTroubleshooting page ;)
comment:8 by , 18 years ago
Replying to cboos:
I guess it's time to follow your advice about creating a general TracTroubleshooting page ;)
That advice was made by another Markus. ;-)
comment:10 by , 18 years ago
Keywords: | needinfo added |
---|
Does this still happen?
Also, have you tried the latest of source:branches/0.10-stable, which contains two bugfixes for patterns in the Wiki formatter which could have taken an exponentially long time to complete?
comment:11 by , 18 years ago
Sorry for the silence, I have been absent on a course for a few days. As at r4083, we still cannot Browse Source. The web server no longer seems to hang however.
If I get some time soon, I will try to produce some sort of dump for you too look at. if I can reproduce this with a tracd, I'll do that.
comment:12 by , 18 years ago
This is the backtrace of the httpd process while it hangs in browser:
#0 0x0044d402 in ?? () #1 0x00bf42fe in sem_wait@GLIBC_2.0 () from /lib/libpthread.so.0 #2 0x0106c9cb in PyThread_acquire_lock () from /usr/lib/libpython2.4.so.1.0 #3 0x0107092e in initthread () from /usr/lib/libpython2.4.so.1.0 #4 0x0100dd2d in PyCFunction_Call () from /usr/lib/libpython2.4.so.1.0 #5 0x01048e2b in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #6 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #7 0x01049758 in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0 #8 0x01047b0b in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #9 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #10 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #11 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #12 0x01049758 in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0 #13 0x01047b0b in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #14 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #15 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #16 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #17 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #18 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #19 0x01049758 in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0 #20 0x00ffb4ea in PyClassMethod_New () from /usr/lib/libpython2.4.so.1.0 #21 0x00fe3587 in PyObject_Call () from /usr/lib/libpython2.4.so.1.0 #22 0x00fe9b78 in PyClass_IsSubclass () from /usr/lib/libpython2.4.so.1.0 #23 0x00fe3587 in PyObject_Call () from /usr/lib/libpython2.4.so.1.0 #24 0x00fe3f38 in PyObject_CallMethod () from /usr/lib/libpython2.4.so.1.0 #25 0x00bd3307 in init_apache () from /etc/httpd/modules/mod_python.so #26 0x00a371fd in ap_run_handler () from /usr/sbin/httpd #27 0x00a3ab98 in ap_invoke_handler () from /usr/sbin/httpd #28 0x00a46a9e in ap_process_request () from /usr/sbin/httpd #29 0x00a4389f in ap_register_input_filter () from /usr/sbin/httpd #30 0x00a3f07d in ap_run_process_connection () from /usr/sbin/httpd #31 0x00a3f17c in ap_process_connection () from /usr/sbin/httpd #32 0x00a4b052 in ap_graceful_stop_signalled () from /usr/sbin/httpd #33 0x00a4b2c4 in ap_graceful_stop_signalled () from /usr/sbin/httpd #34 0x00a4c1d9 in ap_mpm_run () from /usr/sbin/httpd #35 0x00a22a9a in main () from /usr/sbin/httpd
Full gdb session will be attached.
comment:13 by , 18 years ago
Cc: | added |
---|
follow-up: 15 comment:14 by , 18 years ago
Could you also try to get the Python back trace, using the gdbinit script (as explained in TracTroubleshooting#SystemErrors)?
Is this with trunk or 0.10-stable?
Now both have the database timeout
parameter (r3919 and r4167, timeout value defaults to 20 seconds). Do you now hit that timeout or is it something else?
Finally, does this happen systematically when trying to go to the TracBrowser, or does that happen only sometimes, perhaps just after a commit?
follow-up: 16 comment:15 by , 18 years ago
Replying to cboos:
Could you also try to get the Python back trace, using the gdbinit script (as explained in TracTroubleshooting#SystemErrors)?
I am getting no output. See end of #3996-tacker-trac-gdb.txt
Is this with trunk or 0.10-stable?
trunk
Now both have the database
timeout
parameter (r3919 and r4167, timeout value defaults to 20 seconds). Do you now hit that timeout or is it something else?
No, I do not hit a timeout.
Finally, does this happen systematically when trying to go to the TracBrowser, or does that happen only sometimes, perhaps just after a commit?
This happens every time I try to access the TracBrowser or the Timeline.
follow-up: 19 comment:16 by , 18 years ago
Replying to Markus Tacker <m@tacker.org>:
Replying to cboos:
Could you also try to get the Python back trace, using the gdbinit script (as explained in TracTroubleshooting#SystemErrors)?
I am getting no output. See end of #3996-tacker-trac-gdb.txt
This is because you first have to move to a Python frame, e.g. do:
gdb> f 3 gdb> pystack
This happens every time
But only for some repositories, according to what you said in the comment:description, right? There's really nothing that distinguishes those repositories from those which are working? (present on a mounted drive or not, different permissions, different filesystem, etc.)
comment:17 by , 18 years ago
Ah, also I didn't pay enough attention, in the tracd case you're not watching the "hanged" thread, rather you're on the one which waits on accept (and this is normal).
You can switch to a different by issuing an info threads
command, followed by a thread n
where n is the identifier of the thread. Identify the appropriate thread by performing a bt
on each.
comment:18 by , 18 years ago
example:
(gdb) info threads 2 Thread 1075841376 (LWP 26201) 0x0000002a958d25af in __accept_nocancel () from /lib64/tls/libpthread.so.0 * 1 Thread 182907738880 (LWP 26200) 0x0000002a960f5fe6 in __select_nocancel () from /lib64/tls/libc.so.6 (gdb) thread 2 [Switching to thread 2 (Thread 1075841376 (LWP 26201))]#0 0x0000002a958d25af in __accept_nocancel () from /lib64/tls/libpthread.so.0 (gdb) bt #0 0x0000002a958d25af in __accept_nocancel () from /lib64/tls/libpthread.so.0 #1 0x0000002a964a5fd7 in sock_accept (s=0x2a97e01190) at /usr/local/src/Python-2.4.2/Modules/socketmodule.c:1368 #2 0x0000002a9570a183 in PyEval_EvalFrame (f=0x5ce650) at ceval.c:3542 ... ---Type <return> to continue, or q <return> to quit---q Quit (gdb) f 2 #2 0x0000002a9570a183 in PyEval_EvalFrame (f=0x5ce650) at ceval.c:3542 3542 C_TRACE(x, (*meth)(self,NULL)); (gdb) pystack /opt/python-2.4.2/lib/python2.4/socket.py (170): accept /opt/python-2.4.2/lib/python2.4/SocketServer.py (170): get_request ... (gdb)
Here I switched on the thread doing the accept
, because I don't have a hanging thread ;)
comment:19 by , 18 years ago
ATM I am recompiling python as I get
(gdb) pystack No symbol table is loaded. Use the "file" command.
The working and non-working repos are on the same fs, with the same permission and so. Plan SVN access to them is working.
comment:20 by , 18 years ago
I just noticed another thing.
The problem does not depend on the repository but on the trac environment.
I created a new environent for the same repository and there browser and timeline work.
comment:21 by , 18 years ago
Summary: | Browse Source takes forever, no debug output, only some repos → Browse Source takes forever, no debug outpu |
---|
I have the same problem, using trunk.
The problem only happens after assigning FineGrainedPermissions. The last thing in the trac-log is '2006-11-08 19:05:13,434 Trac[main] DEBUG: Dispatching <Request "GET /browser">' and it doesn't send the GET to the webserver (watched httpd-access.log).
It doesn't matter what is in the authz_file. If I set authz_file≤blank> then I can see the timeline again. If authz_file≤invalid_path> it does the same thing, and just hangs. Maybe it's a permission issue?
comment:22 by , 18 years ago
Summary: | Browse Source takes forever, no debug outpu → Browse Source takes forever, no debug output |
---|
Sorry, not a permission issue. But everything else I said hopefully will help.
comment:23 by , 18 years ago
I am confronted with the same problem, "Browse Sources", "Timeline" and "Admin" take forever. I can access my repository in a normal way outside from pyhon…
comment:24 by , 18 years ago
Cc: | added |
---|
follow-up: 27 comment:25 by , 18 years ago
If you have a :
authz_file = /something
line in your trac.ini
, you might want to empty it. It seems that there's a problem when reading this file, or when checking permissions, or … well, I didn't really investigate, but removing it makes the timeline and browse source links works again.
follow-up: 28 comment:26 by , 18 years ago
I just upgraded from 0.10 to 0.10.1, and experience the same error. Repos is fully functional with fine-grained permissions when browsing directly (Apache http svn browser), but the Trac browser and timeline (with changeset enabled) just hangs when accessed.
I have reviewed all changes to version control source, and have identified these changes as the ones that make the modules hang:
-
trac/versioncontrol/svn_authz.py
old new 39 39 40 40 module_name = env.config.get('trac', 'authz_module_name') 41 db = env.get_db_cnx()42 return RealSubversionAuthorizer( db, authname, module_name, authz_file)41 repos = env.get_repository() 42 return RealSubversionAuthorizer(repos, authname, module_name, authz_file) 43 43 44 44 def parent_iter(path): … … 65 65 conf_authz = None 66 66 67 def __init__(self, db, auth_name, module_name, cfg_file, cfg_fp=None):68 self. db = db67 def __init__(self, repos, auth_name, module_name, cfg_file, cfg_fp=None): 68 self.repos = repos 69 69 self.auth_name = auth_name 70 70 self.module_name = module_name
When I revert just these 2 changes to 0.10 behavior, I can access the browser with support for fine-grained permissions. However, when I access the changesets, I get an error as likely new code expect this attribute (AttributeError: 'RealSubversionAuthorizer' object has no attribute 'repos'
).
Now, someone with more intimate knowledge will likely be able to find what and why this goes wrong, but with the change from env.get_db_cnx()
to env.get_repository()
there might be some code left that expects the old db behaviour?
comment:27 by , 18 years ago
Replying to sam@bulix.org:
If you have a :
authz_file = /somethingline in your
trac.ini
, you might want to empty it. It seems that there's a problem when reading this file, or when checking permissions, or … well, I didn't really investigate, but removing it makes the timeline and browse source links works again.
Confirmed.
comment:28 by , 18 years ago
Replying to simon-code@bvnetwork.no:
I have reviewed all changes to version control source, and have identified these changes as the ones that make the modules hang:
Actually, svn_authz.py contains 3 changes when going from 0.10 to 0.10.1. When I revert the file to 0.10 (and keeping all other files 0.10.1), everything works fine. I can browse, view revision logs and detailed diffs. No errors at all.
comment:29 by , 18 years ago
Keywords: | needinfo removed |
---|---|
Priority: | normal → highest |
Ok, thanks for the detailed investigations.
In r3971, I didn't look closely at the svn_authz.py changes. There are clearly wrong, as get_repository()
itself tries to create a SubversionAuthorizer
, so we're in a loop. The loop doesn't manifest itself because we're caught on trying to acquire a non-reentrant lock before that ;)
comment:30 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
I think this should now be fixed by [4200] (trunk) and [4201] (0.10-stable).
Sorry for the inconvenience, we should have noticed this before (ideally before the 0.10.1 release), but I think none of the developers was using authz on a regular basis … this is now also "fixed", as my authz.conf file is going to stay ;)
comment:31 by , 18 years ago
I got bit by this same issue this morning on one of my 0.10.1 trac installs that uses authz and has an SVN repository attached. [4201] seems to resolve the problem for me.
this bug actually caused my apache processes to hang indefinitely, which eventually chewed up all available apache instances, which DoS'ed the rest of the stuff that web server was serving. Thanks for having a fix worked out and posted by the time i went searching for it!
So this must be an issue with the bindings.
Can you access that repos by other means?
(+ is it fsfs or bdb, local disk or network?)