Edgewall Software

Ticket #2252 (closed defect: worksforme)

Opened 3 years ago

Last modified 2 years ago

FastCGI gets a timeout but renders correct pages afterwards

Reported by: thomas.jachmann@… Owned by: mgood
Priority: normal Milestone:
Component: general Version: 0.9.4
Severity: normal Keywords:
Cc: thomas.jachmann@…

Description

From time to time (every 5 to 10 requests), trac hangs. This is the output of apache's error log:

[Thu Oct 20 21:33:01 2005] [error] [client ...] FastCGI: comm with server "/usr/share/trac/cgi-bin/trac.fcgi" aborted: idle timeout (30 sec), referer: [...]
[Thu Oct 20 21:33:01 2005] [error] [client ...] FastCGI: server "/usr/share/trac/cgi-bin/trac.fcgi" stderr: Traceback (most recent call last):, referer: [...]
[Thu Oct 20 21:33:01 2005] [error] [client ...] FastCGI: server "/usr/share/trac/cgi-bin/trac.fcgi" stderr:   File "/usr/lib/python2.3/site-packages/trac/web/_fcgi.py", line 567, in run, referer: [...]
[Thu Oct 20 21:33:01 2005] [error] [client ...] FastCGI: server "/usr/share/trac/cgi-bin/trac.fcgi" stderr:     protocolStatus, appStatus = self.server.handler(self), referer: [...]
[Thu Oct 20 21:33:01 2005] [error] [client ...] FastCGI: server "/usr/share/trac/cgi-bin/trac.fcgi" stderr: TypeError: unpack non-sequence, referer: [...]

The first line only appears when the request hangs, the remaining four lines are written to the error log for each - even not-hanging - request.

This seems to be an error with trac.fcgi. Parts of the page are rendered, then it hangs for 30 seconds (the FastCGI timeout). Then, the rest of the page is passed to the browser and gets rendered. Nothing is broken. From the browser's view, it just looks like the server takes a break in the middle of the page. I just don't understand the timeout - this seems as if trac.fcgi execution gets cancelled by FastCGI, but still all content gets back to the browser in the end.

The following is my current apache configuration for the virtual host running trac. I run several projects off the root of the virtual host, avoiding the trac.fcgi script in the URL by using the ScriptAliasMatch directive. This is all taken from trac's documentation.

FastCgiConfig -initial-env TRAC_ENV_PARENT_DIR=/var/trac/ -idle-timeout 1
<VirtualHost *:80>
        ServerName [...]
        DocumentRoot /usr/share/trac/htdocs

        <Directory "/usr/share/trac/htdocs">
                Options Indexes MultiViews
                AllowOverride None
                Order allow,deny
                Allow from all
        </Directory>

        AliasMatch ^/[^/]+/chrome/common(.*) /usr/share/trac/htdocs$1
        ScriptAliasMatch ^(.*) /usr/share/trac/cgi-bin/trac.fcgi$1
</VirtualHost>

As you can see, I avoided the lag by just reducing the timeout of FastCGI to one second. This way, users don't notice that trac.fcgi gets a timeout. But with increasing load on the server, one second might not be sufficient.

I use:

  • Fedora Core 3
  • Apache 2.0.53
  • Python 2.3.4
  • Trac 0.9b2

Attachments

Change History

Changed 3 years ago by thomas.jachmann@…

  • version changed from 0.8.4 to 0.9b2

Changed 3 years ago by jonas

  • status changed from new to closed
  • resolution set to duplicate

Duplicate of #2106.

Changed 3 years ago by mgood

  • status changed from closed to reopened
  • resolution duplicate deleted

No, this is a distinct issue from #2106, since [2425] should have already fixed that one.

I have this issue on my production server, but I've been unable to reproduce it elsewhere to do much testing. However, it seems like it must be an error within the fcgi module, not the Trac code that calls it, since I've verified that the Trac handler returns normally. I believe that somehow the buffers are not being flushed properly.

Changed 3 years ago by anonymous

I think mgood's right, since the machine isn't experiencing any load during the lag. It just sits there waiting until the timeout occurs. This also matches the fact that the content is completely rendered after the timeout - a flush might be issued on timeout.

Changed 3 years ago by mgood

#2139 has been marked as a duplicate of this ticket.

Changed 3 years ago by fago

i think, i am concerned by the same issue.

i'm using debian sarge with its apache2 (2.0.54) and python 2.3.5

basically everything works fine, however every 5-10 page load results in the described behaviour: no output until the fastcgi process times out. it seems to me, that it appears most time with ticket overview/view pages.

the server load is up to 99% idle during the timeout error log: [Tue Dec 13 12:06:16 2005] [error] [client 193.170.48.58] FastCGI: comm with server "/var/www/tracwrapper.fcgi" aborted: idle timeout (5 sec)

(i'm just setting the environment for trac in tracwrapper.fcgi) however i don't get the python traceback?

Changed 3 years ago by fago

sry, forgot to mention my trac version, i am using the latest stable release: 0.9.2

Changed 3 years ago by anonymous

  • version changed from 0.9b2 to 0.9.2

After upgrading to 0.9.2, I still have the same problem, although I have lesser error messages in apache's error log than with 0.9b2:

[Thu Dec 22 12:29:12 2005] [error] [client ...] FastCGI: comm with server "/usr/share/trac/cgi-bin/trac.fcgi" aborted: idle timeout (2 sec)

I still can't figure out what's going wrong. I also tried the -flush parameter to FastCGI and configuring the script as static FastCgiServer? instead of dynamic, but both without any effect.

This keeps me from setting up a public trac instance for one of our open source projects, since we've got quite some traffic on the project's current site. Unfortunately, I'm not too familiar with python and wasn't lucky looking around in trac's code trying to find the cause.

Changed 3 years ago by anonymous

  • cc thomas.jachmann@… added

Changed 3 years ago by mgood

  • owner changed from jonas to mgood
  • status changed from reopened to new

Well, I think that the problem lies outside of the Trac code. I encountered the problem when we switched the FastCGI module since it was not compatible with the change to a BSD license. I've tried some debugging and verfied that it stalls after Trac's FastCGI handler has exited. Unfortunately I can't reproduce the problem on my own system and haven't wanted to interrupt my production server to debug it. However, I think that the site should be unused this weekend and I may be able to take it down so that I can look into it.

Changed 3 years ago by thomas.jachmann <thomas.jachmann@…>

I don't know if this is related, but sometimes I also get errors where the content of apache's "Internal Server Error" page is displayed somewhere within the page trac is about to generate, usually at the same spot. In the error log, I have the following:

[Tue Jan 03 17:30:38 2006] [error] [client ...] (104)Connection reset by
peer: FastCGI: comm with server "/usr/share/trac/cgi-bin/trac.fcgi"
aborted: read failed

AFAIK, this usually is written to the log when the server was unable to send data back to the browser since the connection has been cut, eg the browser has been closed before the page has been fully delivered. But this didn't occur. Maybe this helps in finding the problem?

Changed 3 years ago by james@…

I get a similar thing:

[Tue Mar 21 14:23:50 2006] [error] [client xxx] FastCGI: comm with server "/usr/share/trac/cgi-bin/trac.fcgi" aborted: idle timeout (30 sec), referer: https://blah/foo/wiki/QuickTortoiseUsage

Occasionally too I get 'read failed', and on and off the page rendered will include Apache's error page, or it also may just be truncated. It's a lottery, really.

I have no idea how to fix this - should I use a different connection method altogether? I'm using Trac 0.9.4 and FastCGI 2.4.2 on Debian Sarge with Apache 2.

Changed 3 years ago by Thomas Jachmann <thomas.jachmann@…>

  • version changed from 0.9.2 to 0.9.4

Changed 3 years ago by otto.hilska@…

I had the same problem with Apache 2, mod_fastcgi and Trac 0.9.4. However, switching mod_fastcgi to mod_fcgid helped, so I guess this really isn't a Trac problem.

Changed 3 years ago by Thomas Jachmann <thomas.jachmann@…>

  • status changed from new to closed
  • resolution set to worksforme

OK, I also got this working, so i'll better close the ticket. Thanks Otto for the hint!

If anyone else is interested:

  1. download mod_fcgid from http://fastcgi.coremail.cn/download.htm and do make/make install
  2. on Fedora, put this into /etc/http/conf.d/fcgid.conf:
    LoadModule fcgid_module modules/mod_fcgid.so
    <IfModule mod_fcgid.c>
        AddHandler fcgid-script .fcgid
        SocketPath /tmp/fcgid/sock
        IPCCommTimeout 60
    </IfModule>
    
  3. start apache
  4. chmod -R 777 /tmp/fcgid/
  5. restart apache

The IPCCommTimeout is necessary since some reports and changeset views can run quite long. See the following URLs for any configuration hints:

Add/Change #2252 (FastCGI gets a timeout but renders correct pages afterwards)

Author



Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
to The owner will change from mgood. Next status will be 'closed'
 
Note: See TracTickets for help on using tickets.