Edgewall Software
Modify

Opened 11 years ago

Closed 11 years ago

Last modified 3 years ago

#3996 closed defect (fixed)

Browse Source takes forever, no debug output

Reported by: Markus Tacker <m@…> Owned by: Christian Boos
Priority: highest Milestone: 0.10.2
Component: version control/browser Version: 0.10
Severity: major Keywords: svn
Cc: tjb@…, trudolphi@…, rhind@…
Release Notes:
API Changes:

Description

So, this is kinda weird.

This problem exists on 0.10-stable and trunk with tracd and apache.

I have two repositories (one very small, one medium sized) where I get no response when I try to access the Browser Source page. The request just hangs there (I can see it in apaches server-status).

Nothing is in debug log except the info that the egg files are loaded. The repo itself is ok and has no error.

Attachments (2)

#3996-tacker.txt (17.9 KB ) - added by Markus Tacker <m@…> 11 years ago.
Full gdb session of hanging apache process
#3996-tacker-trac-gdb.txt (3.7 KB ) - added by Markus Tacker <m@…> 11 years ago.
GDB session of tracd

Download all attachments as: .zip

Change History (36)

comment:1 Changed 11 years ago by Christian Boos

Keywords: svn added

This problem exists on 0.10-stable and trunk with tracd and apache.

So this must be an issue with the bindings.

Can you access that repos by other means?

(+ is it fsfs or bdb, local disk or network?)

comment:2 Changed 11 years ago by Markus Tacker <m@…>

It's fsfs, I can access (co, ls) it via file:// and via Apache.

comment:3 Changed 11 years ago by tjb@…

Cc: tjb@… added

We have the same problem, and when Browse Source hangs, no other access to Trac is possible!

We are running on SLES9, postgres backend, swig 1.3.27, apache 2.0.49, pyPgSql 2.4, Subversion 1.3.2

comment:4 Changed 11 years ago by tjb@…

Severity: normalmajor

Further to the above, we also run ViewVC, which also uses the python-subversion bindings, and works fine. Because of the potential to lock up our (internal) web server, we see this as kind of urgent…

comment:5 Changed 11 years ago by Christian Boos

Milestone: 0.10.1

… we see this as kind of urgent…

I understand, however you must also understand that we can't really make progress on this until either:

  1. we're able to reproduce that ourselves. I can't, apparently neither can the other developers
  2. we have some understanding of what might be going on. For that, we need more information, like where it hangs. You should try your best to get a stacktrace of the server process which is hanging, either by issuing an interrupt to tracd and then get the Python backtrace, or by attaching to the Apache process using GDB (if you happen to use Linux) then get the C backtrace.

Waiting for more input on your part, here are a few wild guesses:

comment:6 in reply to:  5 Changed 11 years ago by Markus Tacker <m@…>

Replying to cboos:

You should try your best to get a stacktrace of the server process which is hanging, either by issuing an interrupt to tracd and then get the Python backtrace, or by attaching to the Apache process using GDB (if you happen to use Linux) then get the C backtrace.

How do I do this?

comment:7 Changed 11 years ago by Christian Boos

I guess it's time to follow your advice about creating a general TracTroubleshooting page ;)

comment:8 in reply to:  7 Changed 11 years ago by markus

Replying to cboos:

I guess it's time to follow your advice about creating a general TracTroubleshooting page ;)

That advice was made by another Markus. ;-)

comment:9 Changed 11 years ago by Christian Boos

Oh! Sorry ;)

I apparently share Trac's case insensitive search mode ;)

comment:10 Changed 11 years ago by Christian Boos

Keywords: needinfo added

Does this still happen?

Also, have you tried the latest of source:branches/0.10-stable, which contains two bugfixes for patterns in the Wiki formatter which could have taken an exponentially long time to complete?

comment:11 Changed 11 years ago by tjb@…

Sorry for the silence, I have been absent on a course for a few days. As at r4083, we still cannot Browse Source. The web server no longer seems to hang however.

If I get some time soon, I will try to produce some sort of dump for you too look at. if I can reproduce this with a tracd, I'll do that.

comment:12 Changed 11 years ago by Markus Tacker <m@…>

This is the backtrace of the httpd process while it hangs in browser:

#0  0x0044d402 in ?? ()
#1  0x00bf42fe in sem_wait@GLIBC_2.0 () from /lib/libpthread.so.0
#2  0x0106c9cb in PyThread_acquire_lock () from /usr/lib/libpython2.4.so.1.0
#3  0x0107092e in initthread () from /usr/lib/libpython2.4.so.1.0
#4  0x0100dd2d in PyCFunction_Call () from /usr/lib/libpython2.4.so.1.0
#5  0x01048e2b in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#6  0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#7  0x01049758 in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0
#8  0x01047b0b in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#9  0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#10 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#11 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#12 0x01049758 in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0
#13 0x01047b0b in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#14 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#15 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#16 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#17 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#18 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#19 0x01049758 in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0
#20 0x00ffb4ea in PyClassMethod_New () from /usr/lib/libpython2.4.so.1.0
#21 0x00fe3587 in PyObject_Call () from /usr/lib/libpython2.4.so.1.0
#22 0x00fe9b78 in PyClass_IsSubclass () from /usr/lib/libpython2.4.so.1.0
#23 0x00fe3587 in PyObject_Call () from /usr/lib/libpython2.4.so.1.0
#24 0x00fe3f38 in PyObject_CallMethod () from /usr/lib/libpython2.4.so.1.0
#25 0x00bd3307 in init_apache () from /etc/httpd/modules/mod_python.so
#26 0x00a371fd in ap_run_handler () from /usr/sbin/httpd
#27 0x00a3ab98 in ap_invoke_handler () from /usr/sbin/httpd
#28 0x00a46a9e in ap_process_request () from /usr/sbin/httpd
#29 0x00a4389f in ap_register_input_filter () from /usr/sbin/httpd
#30 0x00a3f07d in ap_run_process_connection () from /usr/sbin/httpd
#31 0x00a3f17c in ap_process_connection () from /usr/sbin/httpd
#32 0x00a4b052 in ap_graceful_stop_signalled () from /usr/sbin/httpd
#33 0x00a4b2c4 in ap_graceful_stop_signalled () from /usr/sbin/httpd
#34 0x00a4c1d9 in ap_mpm_run () from /usr/sbin/httpd
#35 0x00a22a9a in main () from /usr/sbin/httpd

Full gdb session will be attached.

Changed 11 years ago by Markus Tacker <m@…>

Attachment: #3996-tacker.txt added

Full gdb session of hanging apache process

comment:13 Changed 11 years ago by Markus Tacker <m@…>

Cc: trudolphi@… added

comment:14 Changed 11 years ago by Christian Boos

Could you also try to get the Python back trace, using the gdbinit script (as explained in TracTroubleshooting#SystemErrors)?

Is this with trunk or 0.10-stable?

Now both have the database timeout parameter (r3919 and r4167, timeout value defaults to 20 seconds). Do you now hit that timeout or is it something else?

Finally, does this happen systematically when trying to go to the TracBrowser, or does that happen only sometimes, perhaps just after a commit?

Changed 11 years ago by Markus Tacker <m@…>

Attachment: #3996-tacker-trac-gdb.txt added

GDB session of tracd

comment:15 in reply to:  14 ; Changed 11 years ago by Markus Tacker <m@…>

Replying to cboos:

Could you also try to get the Python back trace, using the gdbinit script (as explained in TracTroubleshooting#SystemErrors)?

I am getting no output. See end of #3996-tacker-trac-gdb.txt

Is this with trunk or 0.10-stable?

trunk

Now both have the database timeout parameter (r3919 and r4167, timeout value defaults to 20 seconds). Do you now hit that timeout or is it something else?

No, I do not hit a timeout.

Finally, does this happen systematically when trying to go to the TracBrowser, or does that happen only sometimes, perhaps just after a commit?

This happens every time I try to access the TracBrowser or the Timeline.

comment:16 in reply to:  15 ; Changed 11 years ago by Christian Boos

Replying to Markus Tacker <m@tacker.org>:

Replying to cboos:

Could you also try to get the Python back trace, using the gdbinit script (as explained in TracTroubleshooting#SystemErrors)?

I am getting no output. See end of #3996-tacker-trac-gdb.txt

This is because you first have to move to a Python frame, e.g. do:

gdb> f 3
gdb> pystack

This happens every time

But only for some repositories, according to what you said in the comment:description, right? There's really nothing that distinguishes those repositories from those which are working? (present on a mounted drive or not, different permissions, different filesystem, etc.)

comment:17 Changed 11 years ago by Christian Boos

Ah, also I didn't pay enough attention, in the tracd case you're not watching the "hanged" thread, rather you're on the one which waits on accept (and this is normal).

You can switch to a different by issuing an info threads command, followed by a thread n where n is the identifier of the thread. Identify the appropriate thread by performing a bt on each.

comment:18 Changed 11 years ago by Christian Boos

example:

(gdb) info threads
  2 Thread 1075841376 (LWP 26201)  0x0000002a958d25af in __accept_nocancel ()
   from /lib64/tls/libpthread.so.0
* 1 Thread 182907738880 (LWP 26200)  0x0000002a960f5fe6 in __select_nocancel ()
   from /lib64/tls/libc.so.6
(gdb) thread 2
[Switching to thread 2 (Thread 1075841376 (LWP 26201))]#0  0x0000002a958d25af in __accept_nocancel () from /lib64/tls/libpthread.so.0
(gdb) bt
#0  0x0000002a958d25af in __accept_nocancel () from /lib64/tls/libpthread.so.0
#1  0x0000002a964a5fd7 in sock_accept (s=0x2a97e01190)
    at /usr/local/src/Python-2.4.2/Modules/socketmodule.c:1368
#2  0x0000002a9570a183 in PyEval_EvalFrame (f=0x5ce650) at ceval.c:3542
...
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) f 2
#2  0x0000002a9570a183 in PyEval_EvalFrame (f=0x5ce650) at ceval.c:3542
3542                                    C_TRACE(x, (*meth)(self,NULL));
(gdb) pystack
/opt/python-2.4.2/lib/python2.4/socket.py (170): accept
/opt/python-2.4.2/lib/python2.4/SocketServer.py (170): get_request
...
(gdb)

Here I switched on the thread doing the accept, because I don't have a hanging thread ;)

comment:19 in reply to:  16 Changed 11 years ago by Markus Tacker <m@…>

ATM I am recompiling python as I get

(gdb) pystack No symbol table is loaded. Use the "file" command.

The working and non-working repos are on the same fs, with the same permission and so. Plan SVN access to them is working.

comment:20 Changed 11 years ago by Markus Tacker <m@…>

I just noticed another thing.

The problem does not depend on the repository but on the trac environment.

I created a new environent for the same repository and there browser and timeline work.

comment:21 Changed 11 years ago by the.ajarn@…

Summary: Browse Source takes forever, no debug output, only some reposBrowse Source takes forever, no debug outpu

I have the same problem, using trunk.

The problem only happens after assigning FineGrainedPermissions. The last thing in the trac-log is '2006-11-08 19:05:13,434 Trac[main] DEBUG: Dispatching <Request "GET /browser">' and it doesn't send the GET to the webserver (watched httpd-access.log).

It doesn't matter what is in the authz_file. If I set authz_file≤blank> then I can see the timeline again. If authz_file≤invalid_path> it does the same thing, and just hangs. Maybe it's a permission issue?

comment:22 Changed 11 years ago by the.ajarn@…

Summary: Browse Source takes forever, no debug outpuBrowse Source takes forever, no debug output

Sorry, not a permission issue. But everything else I said hopefully will help.

comment:23 Changed 11 years ago by supermac@…

I am confronted with the same problem, "Browse Sources", "Timeline" and "Admin" take forever. I can access my repository in a normal way outside from pyhon…

comment:24 Changed 11 years ago by Russell Hind <rhind@…>

Cc: rhind@… added

comment:25 Changed 11 years ago by sam@…

If you have a :

authz_file = /something

line in your trac.ini, you might want to empty it. It seems that there's a problem when reading this file, or when checking permissions, or … well, I didn't really investigate, but removing it makes the timeline and browse source links works again.

comment:26 Changed 11 years ago by simon-code@…

I just upgraded from 0.10 to 0.10.1, and experience the same error. Repos is fully functional with fine-grained permissions when browsing directly (Apache http svn browser), but the Trac browser and timeline (with changeset enabled) just hangs when accessed.

I have reviewed all changes to version control source, and have identified these changes as the ones that make the modules hang:

  • trac/versioncontrol/svn_authz.py

    old new  
    3939
    4040    module_name = env.config.get('trac', 'authz_module_name')
    41     db = env.get_db_cnx()
    42     return RealSubversionAuthorizer(db, authname, module_name, authz_file)
     41    repos = env.get_repository()
     42    return RealSubversionAuthorizer(repos, authname, module_name, authz_file)
    4343
    4444def parent_iter(path):
     
    6565    conf_authz = None
    6666
    67     def __init__(self, db, auth_name, module_name, cfg_file, cfg_fp=None):
    68         self.db = db
     67    def __init__(self, repos, auth_name, module_name, cfg_file, cfg_fp=None):
     68        self.repos = repos
    6969        self.auth_name = auth_name
    7070        self.module_name = module_name

When I revert just these 2 changes to 0.10 behavior, I can access the browser with support for fine-grained permissions. However, when I access the changesets, I get an error as likely new code expect this attribute (AttributeError: 'RealSubversionAuthorizer' object has no attribute 'repos').

Now, someone with more intimate knowledge will likely be able to find what and why this goes wrong, but with the change from env.get_db_cnx() to env.get_repository() there might be some code left that expects the old db behaviour?

comment:27 in reply to:  25 Changed 11 years ago by Markus Tacker <m@…>

Replying to sam@bulix.org:

If you have a :

authz_file = /something

line in your trac.ini, you might want to empty it. It seems that there's a problem when reading this file, or when checking permissions, or … well, I didn't really investigate, but removing it makes the timeline and browse source links works again.

Confirmed.

comment:28 in reply to:  26 Changed 11 years ago by simon-code@…

Replying to simon-code@bvnetwork.no:

I have reviewed all changes to version control source, and have identified these changes as the ones that make the modules hang:

Actually, svn_authz.py contains 3 changes when going from 0.10 to 0.10.1. When I revert the file to 0.10 (and keeping all other files 0.10.1), everything works fine. I can browse, view revision logs and detailed diffs. No errors at all.

comment:29 Changed 11 years ago by Christian Boos

Keywords: needinfo removed
Priority: normalhighest

Ok, thanks for the detailed investigations.

In r3971, I didn't look closely at the svn_authz.py changes. There are clearly wrong, as get_repository() itself tries to create a SubversionAuthorizer, so we're in a loop. The loop doesn't manifest itself because we're caught on trying to acquire a non-reentrant lock before that ;)

comment:30 Changed 11 years ago by Christian Boos

Resolution: fixed
Status: newclosed

I think this should now be fixed by [4200] (trunk) and [4201] (0.10-stable).

Sorry for the inconvenience, we should have noticed this before (ideally before the 0.10.1 release), but I think none of the developers was using authz on a regular basis … this is now also "fixed", as my authz.conf file is going to stay ;)

comment:31 Changed 11 years ago by dkg-debian.org@…

I got bit by this same issue this morning on one of my 0.10.1 trac installs that uses authz and has an SVN repository attached. [4201] seems to resolve the problem for me.

this bug actually caused my apache processes to hang indefinitely, which eventually chewed up all available apache instances, which DoS'ed the rest of the stuff that web server was serving. Thanks for having a fix worked out and posted by the time i went searching for it!

comment:32 Changed 11 years ago by Markus Tacker <m@…>

Fix confirmed in r4210.

comment:33 Changed 11 years ago by dkg-debian.org@…

i needed to apply [4207] as well as [4201] to the above-mentioned 0.10.1 install to complete the fix. Turns out [4201] breaks some pageviews, including search output.

comment:34 Changed 11 years ago by the.ajarn@…

I knew I was onto something. Good work guys.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted.
to The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .

 
Note: See TracTickets for help on using tickets.