Edgewall Software
Modify

Ticket #3996 (closed defect: fixed)

Opened 5 years ago

Last modified 2 years ago

Browse Source takes forever, no debug output

Reported by: Markus Tacker <m@…> Owned by: cboos
Priority: highest Milestone: 0.10.2
Component: version control/browser Version: 0.10
Severity: major Keywords: svn
Cc: tjb@…, trudolphi@…, rhind@…
Release Notes:
API Changes:

Description

So, this is kinda weird.

This problem exists on 0.10-stable and trunk with tracd and apache.

I have two repositories (one very small, one medium sized) where I get no response when I try to access the Browser Source page. The request just hangs there (I can see it in apaches server-status).

Nothing is in debug log except the info that the egg files are loaded. The repo itself is ok and has no error.

Attachments

#3996-tacker.txt (17.9 KB) - added by Markus Tacker <m@…> 5 years ago.
Full gdb session of hanging apache process
#3996-tacker-trac-gdb.txt (3.7 KB) - added by Markus Tacker <m@…> 5 years ago.
GDB session of tracd

Download all attachments as: .zip

Change History

comment:1 Changed 5 years ago by cboos

  • Keywords svn added

This problem exists on 0.10-stable and trunk with tracd and apache.

So this must be an issue with the bindings.

Can you access that repos by other means?

(+ is it fsfs or bdb, local disk or network?)

comment:2 Changed 5 years ago by Markus Tacker <m@…>

It's fsfs, I can access (co, ls) it via file:// and via Apache.

comment:3 Changed 5 years ago by tjb@…

  • Cc tjb@… added

We have the same problem, and when Browse Source hangs, no other access to Trac is possible!

We are running on SLES9, postgres backend, swig 1.3.27, apache 2.0.49, pyPgSql 2.4, Subversion 1.3.2

comment:4 Changed 5 years ago by tjb@…

  • Severity changed from normal to major

Further to the above, we also run ViewVC, which also uses the python-subversion bindings, and works fine. Because of the potential to lock up our (internal) web server, we see this as kind of urgent...

comment:5 follow-up: Changed 5 years ago by cboos

  • Milestone set to 0.10.1

... we see this as kind of urgent...

I understand, however you must also understand that we can't really make progress on this until either:

  1. we're able to reproduce that ourselves. I can't, apparently neither can the other developers
  2. we have some understanding of what might be going on. For that, we need more information, like where it hangs. You should try your best to get a stacktrace of the server process which is hanging, either by issuing an interrupt to tracd and then get the Python backtrace, or by attaching to the Apache process using GDB (if you happen to use Linux) then get the C backtrace.

Waiting for more input on your part, here are a few wild guesses:

comment:6 in reply to: ↑ 5 Changed 5 years ago by Markus Tacker <m@…>

Replying to cboos:

You should try your best to get a stacktrace of the server process which is hanging, either by issuing an interrupt to tracd and then get the Python backtrace, or by attaching to the Apache process using GDB (if you happen to use Linux) then get the C backtrace.

How do I do this?

comment:7 follow-up: Changed 5 years ago by cboos

I guess it's time to follow your advice about creating a general TracTroubleshooting page ;)

comment:8 in reply to: ↑ 7 Changed 5 years ago by markus

Replying to cboos:

I guess it's time to follow your advice about creating a general TracTroubleshooting page ;)

That advice was made by another Markus. ;-)

comment:9 Changed 5 years ago by cboos

Oh! Sorry ;)

I apparently share Trac's case insensitive search mode ;)

comment:10 Changed 5 years ago by cboos

  • Keywords needinfo added

Does this still happen?

Also, have you tried the latest of source:branches/0.10-stable, which contains two bugfixes for patterns in the Wiki formatter which could have taken an exponentially long time to complete?

comment:11 Changed 5 years ago by tjb@…

Sorry for the silence, I have been absent on a course for a few days.
As at r4083, we still cannot Browse Source. The web server no longer seems to hang however.

If I get some time soon, I will try to produce some sort of dump for you too look at. if I can reproduce this with a tracd, I'll do that.

comment:12 Changed 5 years ago by Markus Tacker <m@…>

This is the backtrace of the httpd process while it hangs in browser:

#0  0x0044d402 in ?? ()
#1  0x00bf42fe in sem_wait@GLIBC_2.0 () from /lib/libpthread.so.0
#2  0x0106c9cb in PyThread_acquire_lock () from /usr/lib/libpython2.4.so.1.0
#3  0x0107092e in initthread () from /usr/lib/libpython2.4.so.1.0
#4  0x0100dd2d in PyCFunction_Call () from /usr/lib/libpython2.4.so.1.0
#5  0x01048e2b in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#6  0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#7  0x01049758 in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0
#8  0x01047b0b in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#9  0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#10 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#11 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#12 0x01049758 in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0
#13 0x01047b0b in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#14 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#15 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#16 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#17 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#18 0x010488cc in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0
#19 0x01049758 in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0
#20 0x00ffb4ea in PyClassMethod_New () from /usr/lib/libpython2.4.so.1.0
#21 0x00fe3587 in PyObject_Call () from /usr/lib/libpython2.4.so.1.0
#22 0x00fe9b78 in PyClass_IsSubclass () from /usr/lib/libpython2.4.so.1.0
#23 0x00fe3587 in PyObject_Call () from /usr/lib/libpython2.4.so.1.0
#24 0x00fe3f38 in PyObject_CallMethod () from /usr/lib/libpython2.4.so.1.0
#25 0x00bd3307 in init_apache () from /etc/httpd/modules/mod_python.so
#26 0x00a371fd in ap_run_handler () from /usr/sbin/httpd
#27 0x00a3ab98 in ap_invoke_handler () from /usr/sbin/httpd
#28 0x00a46a9e in ap_process_request () from /usr/sbin/httpd
#29 0x00a4389f in ap_register_input_filter () from /usr/sbin/httpd
#30 0x00a3f07d in ap_run_process_connection () from /usr/sbin/httpd
#31 0x00a3f17c in ap_process_connection () from /usr/sbin/httpd
#32 0x00a4b052 in ap_graceful_stop_signalled () from /usr/sbin/httpd
#33 0x00a4b2c4 in ap_graceful_stop_signalled () from /usr/sbin/httpd
#34 0x00a4c1d9 in ap_mpm_run () from /usr/sbin/httpd
#35 0x00a22a9a in main () from /usr/sbin/httpd

Full gdb session will be attached.

Changed 5 years ago by Markus Tacker <m@…>

Full gdb session of hanging apache process

comment:13 Changed 5 years ago by Markus Tacker <m@…>

  • Cc trudolphi@… added

comment:14 follow-up: Changed 5 years ago by cboos

Could you also try to get the Python back trace, using the gdbinit script (as explained in TracTroubleshooting#SystemErrors)?

Is this with trunk or 0.10-stable?

Now both have the database timeout parameter (r3919 and r4167, timeout value defaults to 20 seconds). Do you now hit that timeout or is it something else?

Finally, does this happen systematically when trying to go to the TracBrowser, or does that happen only sometimes, perhaps just after a commit?

Changed 5 years ago by Markus Tacker <m@…>

GDB session of tracd

comment:15 in reply to: ↑ 14 ; follow-up: Changed 5 years ago by Markus Tacker <m@…>

Replying to cboos:

Could you also try to get the Python back trace, using the gdbinit script (as explained in TracTroubleshooting#SystemErrors)?

I am getting no output. See end of #3996-tacker-trac-gdb.txt

Is this with trunk or 0.10-stable?

trunk

Now both have the database timeout parameter (r3919 and r4167, timeout value defaults to 20 seconds). Do you now hit that timeout or is it something else?

No, I do not hit a timeout.

Finally, does this happen systematically when trying to go to the TracBrowser, or does that happen only sometimes, perhaps just after a commit?

This happens every time I try to access the TracBrowser or the Timeline.

comment:16 in reply to: ↑ 15 ; follow-up: Changed 5 years ago by cboos

Replying to Markus Tacker <m@tacker.org>:

Replying to cboos:

Could you also try to get the Python back trace, using the gdbinit script (as explained in TracTroubleshooting#SystemErrors)?

I am getting no output. See end of #3996-tacker-trac-gdb.txt

This is because you first have to move to a Python frame, e.g. do:

gdb> f 3
gdb> pystack

This happens every time

But only for some repositories, according to what you said in the comment:description, right? There's really nothing that distinguishes those repositories from those which are working? (present on a mounted drive or not, different permissions, different filesystem, etc.)

comment:17 Changed 5 years ago by cboos

Ah, also I didn't pay enough attention, in the tracd case you're not watching the "hanged" thread, rather you're on the one which waits on accept (and this is normal).

You can switch to a different by issuing an info threads command, followed by a thread n where n is the identifier of the thread. Identify the appropriate thread by performing a bt on each.

comment:18 Changed 5 years ago by cboos

example:

(gdb) info threads
  2 Thread 1075841376 (LWP 26201)  0x0000002a958d25af in __accept_nocancel ()
   from /lib64/tls/libpthread.so.0
* 1 Thread 182907738880 (LWP 26200)  0x0000002a960f5fe6 in __select_nocancel ()
   from /lib64/tls/libc.so.6
(gdb) thread 2
[Switching to thread 2 (Thread 1075841376 (LWP 26201))]#0  0x0000002a958d25af in __accept_nocancel () from /lib64/tls/libpthread.so.0
(gdb) bt
#0  0x0000002a958d25af in __accept_nocancel () from /lib64/tls/libpthread.so.0
#1  0x0000002a964a5fd7 in sock_accept (s=0x2a97e01190)
    at /usr/local/src/Python-2.4.2/Modules/socketmodule.c:1368
#2  0x0000002a9570a183 in PyEval_EvalFrame (f=0x5ce650) at ceval.c:3542
...
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) f 2
#2  0x0000002a9570a183 in PyEval_EvalFrame (f=0x5ce650) at ceval.c:3542
3542                                    C_TRACE(x, (*meth)(self,NULL));
(gdb) pystack
/opt/python-2.4.2/lib/python2.4/socket.py (170): accept
/opt/python-2.4.2/lib/python2.4/SocketServer.py (170): get_request
...
(gdb)

Here I switched on the thread doing the accept, because I don't have a hanging thread ;)

comment:19 in reply to: ↑ 16 Changed 5 years ago by Markus Tacker <m@…>

ATM I am recompiling python as I get

(gdb) pystack
No symbol table is loaded. Use the "file" command.

The working and non-working repos are on the same fs, with the same permission and so. Plan SVN access to them is working.

comment:20 Changed 5 years ago by Markus Tacker <m@…>

I just noticed another thing.

The problem does not depend on the repository but on the trac environment.

I created a new environent for the same repository and there browser and timeline work.

comment:21 Changed 5 years ago by the.ajarn@…

  • Summary changed from Browse Source takes forever, no debug output, only some repos to Browse Source takes forever, no debug outpu

I have the same problem, using trunk.

The problem only happens after assigning FineGrainedPermissions. The last thing in the trac-log is '2006-11-08 19:05:13,434 Trac[main] DEBUG: Dispatching <Request "GET /browser">' and it doesn't send the GET to the webserver (watched httpd-access.log).

It doesn't matter what is in the authz_file. If I set authz_file=<blank> then I can see the timeline again. If authz_file=<invalid_path> it does the same thing, and just hangs. Maybe it's a permission issue?

comment:22 Changed 5 years ago by the.ajarn@…

  • Summary changed from Browse Source takes forever, no debug outpu to Browse Source takes forever, no debug output

Sorry, not a permission issue. But everything else I said hopefully will help.

comment:23 Changed 5 years ago by supermac@…

I am confronted with the same problem, "Browse Sources", "Timeline" and "Admin" take forever. I can access my repository in a normal way outside from pyhon…

comment:24 Changed 5 years ago by Russell Hind <rhind@…>

  • Cc rhind@… added

comment:25 follow-up: Changed 5 years ago by sam@…

If you have a :

authz_file = /something

line in your trac.ini, you might want to empty it. It seems that there's a problem when reading this file, or when checking permissions, or ... well, I didn't really investigate, but removing it makes the timeline and browse source links works again.

comment:26 follow-up: Changed 5 years ago by simon-code@…

I just upgraded from 0.10 to 0.10.1, and experience the same error. Repos is fully functional with fine-grained permissions when browsing directly (Apache http svn browser), but the Trac browser and timeline (with changeset enabled) just hangs when accessed.

I have reviewed all changes to version control source, and have identified these changes as the ones that make the modules hang:

  • trac/versioncontrol/svn_authz.py

    old new  
    3939 
    4040    module_name = env.config.get('trac', 'authz_module_name') 
    41     db = env.get_db_cnx() 
    42     return RealSubversionAuthorizer(db, authname, module_name, authz_file) 
     41    repos = env.get_repository() 
     42    return RealSubversionAuthorizer(repos, authname, module_name, authz_file) 
    4343 
    4444def parent_iter(path): 
     
    6565    conf_authz = None 
    6666 
    67     def __init__(self, db, auth_name, module_name, cfg_file, cfg_fp=None): 
    68         self.db = db 
     67    def __init__(self, repos, auth_name, module_name, cfg_file, cfg_fp=None): 
     68        self.repos = repos 
    6969        self.auth_name = auth_name 
    7070        self.module_name = module_name 

When I revert just these 2 changes to 0.10 behavior, I can access the browser with support for fine-grained permissions. However, when I access the changesets, I get an error as likely new code expect this attribute (AttributeError: 'RealSubversionAuthorizer' object has no attribute 'repos').

Now, someone with more intimate knowledge will likely be able to find what and why this goes wrong, but with the change from env.get_db_cnx() to env.get_repository() there might be some code left that expects the old db behaviour?

comment:27 in reply to: ↑ 25 Changed 5 years ago by Markus Tacker <m@…>

Replying to sam@bulix.org:

If you have a :

authz_file = /something

line in your trac.ini, you might want to empty it. It seems that there's a problem when reading this file, or when checking permissions, or ... well, I didn't really investigate, but removing it makes the timeline and browse source links works again.

Confirmed.

comment:28 in reply to: ↑ 26 Changed 5 years ago by simon-code@…

Replying to simon-code@bvnetwork.no:

I have reviewed all changes to version control source, and have identified these changes as the ones that make the modules hang:

Actually, svn_authz.py contains 3 changes when going from 0.10 to 0.10.1. When I revert the file to 0.10 (and keeping all other files 0.10.1), everything works fine. I can browse, view revision logs and detailed diffs. No errors at all.

comment:29 Changed 5 years ago by cboos

  • Keywords needinfo removed
  • Priority changed from normal to highest

Ok, thanks for the detailed investigations.

In r3971, I didn't look closely at the svn_authz.py changes. There are clearly wrong, as get_repository() itself tries to create a SubversionAuthorizer, so we're in a loop. The loop doesn't manifest itself because we're caught on trying to acquire a non-reentrant lock before that ;)

comment:30 Changed 5 years ago by cboos

  • Resolution set to fixed
  • Status changed from new to closed

I think this should now be fixed by [4200] (trunk) and [4201] (0.10-stable).

Sorry for the inconvenience, we should have noticed this before (ideally before the 0.10.1 release), but I think none of the developers was using authz on a regular basis ... this is now also "fixed", as my authz.conf file is going to stay ;)

comment:31 Changed 5 years ago by dkg-debian.org@…

I got bit by this same issue this morning on one of my 0.10.1 trac installs that uses authz and has an SVN repository attached. [4201] seems to resolve the problem for me.

this bug actually caused my apache processes to hang indefinitely, which eventually chewed up all available apache instances, which DoS'ed the rest of the stuff that web server was serving. Thanks for having a fix worked out and posted by the time i went searching for it!

comment:32 Changed 5 years ago by Markus Tacker <m@…>

Fix confirmed in r4210.

comment:33 Changed 5 years ago by dkg-debian.org@…

i needed to apply [4207] as well as [4201] to the above-mentioned 0.10.1 install to complete the fix. Turns out [4201] breaks some pageviews, including search output.

comment:34 Changed 5 years ago by the.ajarn@…

I knew I was onto something. Good work guys.

comment:35 Changed 4 years ago by anonymous

Get super discount via purchasing sell wow gold in Igegolds right now.
We offer high to 5% bonus for customers who order more than 500 gold and less than 3000 gold.
Those who purchase more than 4000 gold can enjoy the bonus of 8% ectra gold for free!
Hope you have a happy Christmas! More surprises in coming soon.

Speedy Delivery of World of Warcraft

Now we are glad to announce that after a period of adjustment, we try to provide the fast delivery of 30 minutes in more than 90% servers.Our gold is safe, you needn’t worry about the security of your account.

If we are short of stock for the moment, there will be a slight delay and you should receive your shipment in approximately 6-12 hours. You can check the status in our website,

If we happen to be out of gold at the moment you make your purchase. We promise to send your gold to you in 1 to 3 days. Have fun!

Call Verification will be needed for over $40 Purchases. After payment is verified, orders will be delivered within 45 minutes of purchase. Please feel free to contact our customer service via our live chat system. Also, we will contact you by phone ,email and live chat to arrange the delivery.

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
to The owner will be changed from cboos. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.