Edgewall Software
Modify

Opened 19 years ago

Closed 18 years ago

Last modified 18 years ago

#1764 closed defect (fixed)

FastCGI

Reported by: ph@… Owned by: Jonas Borgström
Priority: normal Milestone: 0.9.1
Component: general Version: 0.9
Severity: critical Keywords: fastcgi, apache
Cc: ph@…, jouvin@…, gabriel.blum@…, mrm@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description

The FastCGI frontend (implemented in [1399]) doesn't work with Apache. error.log contains:

[Thu Jul  7 09:20:27 2005] [error] [client 69.208.101.159] FastCGI: comm with (dynamic) server "/home/postmod/postmod.net/trac.fcgi" aborted: (first read) idle timeout (30 sec)
[Thu Jul  7 09:20:27 2005] [error] [client 69.208.101.159] FastCGI: incomplete headers (0 bytes) received from server "/home/postmod/postmod.net/trac.fcgi"

Apache 1.3.31 FastCGI 2.4.2 Python 2.4.1 Trac [1919]

Attachments (0)

Change History (17)

comment:1 by garym@…, 19 years ago

I am seeing the same issue using the very latest of everything, I have tried using

mod_fastcgi-2.4.2-1.i386.rpm mod_fastcgi-2.4.2.tar.gz ruby-fcgi-0.8.6.tar.gz or the 0.8.6.1 gem file

and I still get

[Wed Sep 07 21:02:18 2005] [error] [client 72.3.240.148] FastCGI: comm with (dynamic) server "/var/www/feed/bcast/index.fcgi" aborted: (first read) idle timeout (30 sec) [Wed Sep 07 21:02:18 2005] [error] [client 72.3.240.148] FastCGI: incomplete headers (0 bytes) received from server "/var/www/feed/bcast/index.fcgi"

but only when the load is moderately high, several requests per second, and under lower load the CGI works perfectly and resumes operation when the load subsides.

comment:2 by jouvin@…, 19 years ago

I am also running into the same problem trying to use trac.fcgi. I installed Trac revision 2245 (we already have 0.8.4 running in prod) and I started with a new project. It looks like if trac.fcgi is doing a segfault, according to the main Apache error file :

[Fri Sep 16 23:48:13 2005] [warn] FastCGI: server "/htdocs/tractest/bin/trac.fcgi" (pid 1437658) terminated due to uncaught signal '11' (Segmentation fault)
[Fri Sep 16 23:48:13 2005] [warn] FastCGI: server "/htdocs/tractest/bin/trac.fcgi" has remained running for more than 30 seconds, its restart interval has been restored to 5 seconds
[Fri Sep 16 23:48:13 2005] [warn] FastCGI: server "/htdocs/tractest/bin/trac.fcgi" restarted (pid 1440907)

In the error log file for the VH used by trac, I find :

[Fri Sep 16 23:48:13 2005] [error] [client 134.158.88.81] FastCGI: incomplete headers (0 bytes) received from server "/htdocs/tractest/bin/trac.fcgi", referer: https://tractest.lal.in2p3.fr/trac/

My configuration is :

Apache 2.0.54
FastCGI 2.4.2
Python 2.4.1

I enabled debugging info in trac.ini for the project. But there is nothing in trac.log, except :

23:48:13 Trac[api] DEBUG: Updating wiki page index

I cannot figure out what is wrong.

Michel

comment:3 by anonymous, 19 years ago

Version: devel0.9b2

With 0.9b2 or last trunk (source:/trunk#2353), the problem is still there and I have not yet succeeded to run Trac using FastCGI (or mod_python, the pb is probably the same, I get a segfault but in this case Apache is unusable and I have to restart it).

Any hint on how troubleshooting this problem…

Michel

comment:4 by anonymous, 19 years ago

Milestone: 0.9

comment:5 by jouvin@…, 19 years ago

Cc: jouvin@… added

comment:6 by anonymous, 19 years ago

Cc: gabriel.blum@… added

comment:7 by Christian Boos, 19 years ago

Milestone: 0.90.9.1

The TracFastCgi is deemed to be experimental for milestone:0.9, so I don't think that ticket should delay the 0.9 release.

Also, by looking at the comments above, it looks like the problems reported where caused by segfaults under heavy load, which is possibly due to something unrelated to the web front-end but rather due to database lock problems (see #2170), which are now fixed.

comment:8 by anonymous, 18 years ago

Cc: mrm@… added

comment:9 by jouvin@…, 18 years ago

Version: 0.9b20.9

Just to give you an update : the problem remains the same with 0.9 release. I have not tried to install the writer_contention patch documented in #2170. Everything is possible, but I wouldn't say our Trac server is under heavy load. In most of the tests, I was alone.

Again, this segfault problem affects both FastCGI and mod_python. It can be a different problem… but there is also a chance that this is the same ! On the other hand, Trac is running very well (but very slowly..) as a CGI. Probably, fixing FastCGI should be easier. I can try to have a look if somebody tell me where to look first for difference between FastCGI and CGI frontends.

Michel

comment:10 by Christian Boos, 18 years ago

Michel, you shouldn't do anything with #2170 patches, as 0.9 already contains those.

What SQLite and PySqlite versions are you using?

In order to be able to understand better the problem, a stack trace would help a lot…

Also, you should try with TracStandalone, to check if you also have problems with that.

comment:11 by anonymous, 18 years ago

It was also my impression that #2170 patches were integrated to 0.9. Just I had no time to check it.

We are using SQLite 3.0.8 (built without option —enable-threadsafe ) and PySqlite 1.1.6. What I don't understand, if the problem is with underlying components, is why this doesn't affect CGI mode.

I understand that a stack trace would help a lot. But where I should expect to find one (something like a coredump of the process launched by FastCGI) ?

Michel

comment:12 by Christian Boos, 18 years ago

From r2355:

Note that without a threadsafe SQLite, Trac will continue to work, simply it will still be vulnerable to the abovementioned problem.

Maybe I should document better the need for a thread-safe SQLite when using a multithreaded server. But that would explain the failures when you're using mod_python, not when using FastCGI (…except that I don't know really how the FastCGI mode works, maybe it's also multithreaded).

The CGI mode is definitely single threaded.

That's also why I suggested running tracd: it's multithreaded, but it's easy to run it under gdb and to get a stack trace.

You should also try with a newer SQLite (3.2.7, built in threadsafe mode) and a newer PySqlite (≥ 2.0.5).

comment:13 by anonymous, 18 years ago

I upgraded to SQLite 3.2.7 and PySqlite 2.0.5. It didn't improve the situation with FastCGI and mod_python.

As suggested, I tried to run tracd and, good news, it reproduces the problem. Below is the traceback. I have the core file but I'm afraid you cannot use it directly as we run trac on Tru64.

(dbx) t

0 nxm_thread_kill(0xb, 0x20000f17440, 0x3ff805ba108, 0x20000f17440, 0x20000f17440) [0x3ff805c1ac8]

1 pthread_kill(0x0, 0x20000f11cb8, 0x0, 0x20000f16000, 0x20000000001) [0x3ff805ba120] 2 (unknown)() [0x3ff805c62a8] 3 (unknown)() [0x3ff80633994] 4 exc_raise_signal_exception(0xb0ffe0003, 0x86, 0x0, 0x3ff800d67d0, 0x1) [0x3ff80633d80] 5 strlen(0x1, 0x1, 0x0, 0x20000f17440, 0x3ff00000000) [0x3ff800d67d0] 6 _doprnt_dis(0x140605990, 0x20000f13519, 0x140351d14, 0x20000f13040, 0x3ff00000018) [0x3ff8021bd98] 7 vsnprintf(0x4200000000, 0x0, 0x0, 0x0, 0x20000f13040) [0x3ff801967c0] 8 snprintf(0x20000f134b0, 0x400, 0x3004c835ea0, 0x3004c834bb0, 0x125) [0x3ff8019baf4] 9 nerr_error_traceback() ["neo_err.c":1, 0x3000c822820]

10 p_neo_error(0x140781f50, 0x3004c830ba8, 0x3000c80adcc, 0x140794098, 0x1407f9540) [0x3000c80b1b0] 11 (unknown)() [0x3000c80adc8] 12 PyCFunction_Call(0x3ffbf95d364, 0x20000f13a68, 0x0, 0x1, 0x140552fa0) [0x3ffbf904ca0] 13 (unknown)() [0x3ffbf95d360] 14 PyEval_EvalFrame(0x2, 0x1405c8560, 0x1403878c0, 0x0, 0x140552fa0) [0x3ffbf95977c] 15 (unknown)() [0x3ffbf95d5e4]

More (n if no)?

16 PyEval_EvalFrame(0x18, 0x140552fa0, 0x0, 0x0, 0x140552fa0) [0x3ffbf95977c] 17 PyEval_EvalCodeEx(0x300000003, 0x140160e10, 0x140552fa0, 0x0, 0x3ffbf95d404) [0x3ffbf95c0a8] 18 (unknown)() [0x3ffbf95d690] 19 PyEval_EvalFrame(0x2, 0x140602310, 0x140387ed8, 0x0, 0x140552fa0) [0x3ffbf95977c] 20 (unknown)() [0x3ffbf95d5e4] 21 PyEval_EvalFrame(0x3, 0x1401e5210, 0x140387f50, 0x0, 0x140552fa0) [0x3ffbf95977c] 22 (unknown)() [0x3ffbf95d5e4] 23 PyEval_EvalFrame(0x1, 0x1404c8fc8, 0x14052c848, 0x0, 0x140552fa0) [0x3ffbf95977c] 24 (unknown)() [0x3ffbf95d5e4] 25 PyEval_EvalFrame(0x1, 0x1404c8fc8, 0x14052c7d0, 0x0, 0x140552fa0) [0x3ffbf95977c] 26 (unknown)() [0x3ffbf95d5e4] 27 PyEval_EvalFrame(0x1, 0x1404c8fc8, 0x140528938, 0x0, 0x140552fa0) [0x3ffbf95977c] 28 (unknown)() [0x3ffbf95d5e4] 29 PyEval_EvalFrame(0x1, 0x1404c8fc8, 0x1405289b0, 0x0, 0x140552fa0) [0x3ffbf95977c] 30 (unknown)() [0x3ffbf95d5e4] 31 PyEval_EvalFrame(0x0, 0x140552fa0, 0x0, 0x0, 0x140552fa0) [0x3ffbf95977c] 32 PyEval_EvalCodeEx(0x400000004, 0x0, 0x140552fa0, 0x0, 0x0) [0x3ffbf95c0a8]

More (n if no)?

33 (unknown)() [0x3ffbf8eaa30] 34 PyObject_Call(0x140070050, 0x0, 0x140070068, 0x140528398, 0x3ffbf8d81f4) [0x3ffbf8cce44] 35 (unknown)() [0x3ffbf8d81f0] 36 PyObject_Call(0x140528398, 0x1404c8fc8, 0x0, 0x14051cd20, 0x3ffbf95cce4) [0x3ffbf8cce44] 37 PyEval_CallObjectWithKeywords(0x3ffbf95cce4, 0x14051cd20, 0x3ffbf8d1f24, 0x14051cd20, 0x0) [0x3ffbf95cce0] 38 PyInstance_New(0x3ffbf8cce48, 0x20000f150d8, 0x140552fa0, 0x3, 0x0) [0x3ffbf8d1f20] 39 PyObject_Call(0x140552fa0, 0x3, 0x0, 0x20000f150d8, 0x3ffbf95d714) [0x3ffbf8cce44] 40 (unknown)() [0x3ffbf95d710] 41 PyEval_EvalFrame(0x3, 0x1404c84d0, 0x140524b18, 0x0, 0x140552fa0) [0x3ffbf95977c] 42 (unknown)() [0x3ffbf95d5e4] 43 PyEval_EvalFrame(0x140507a80, 0x140552fa0, 0x0, 0x2, 0x140552fa0) [0x3ffbf95977c] 44 PyEval_EvalCodeEx(0x300000003, 0x140507a80, 0x140552fa0, 0x0, 0x140507a88) [0x3ffbf95c0a8] 45 (unknown)() [0x3ffbf8eaa30] 46 PyObject_Call(0x14051cc30, 0x0, 0x14051cc48, 0x14029b618, 0x3ffbf959d04) [0x3ffbf8cce44] 47 PyEval_EvalFrame(0x1405282a8, 0x14051cc30, 0x1404d0518, 0x140286520, 0x0) [0x3ffbf959d00]

More (n if no)?

48 (unknown)() [0x3ffbf95d5e4] 49 PyEval_EvalFrame(0x0, 0x140552fa0, 0x0, 0x3ffc01b8140, 0x140552fa0) [0x3ffbf95977c] 50 PyEval_EvalCodeEx(0x100000001, 0x0, 0x140552fa0, 0x0, 0x0) [0x3ffbf95c0a8] 51 (unknown)() [0x3ffbf8eaa30] 52 PyObject_Call(0x14022ebd0, 0x0, 0x14022ebe8, 0x1402ad0c8, 0x3ffbf8d81f4) [0x3ffbf8cce44] 53 (unknown)() [0x3ffbf8d81f0] 54 PyObject_Call(0x140159470, 0x0, 0x3ffbf984164, 0x140552fa0, 0x3ffbf95cce4) [0x3ffbf8cce44] 55 PyEval_CallObjectWithKeywords(0x3ffbf95cce4, 0x20000f17440, 0x3ffbf9930d8, 0x140159470, 0x140159470) [0x3ffbf95cce0] 56 (unknown)() [0x3ffbf9930d4] 57 thdBase(0x1, 0x1, 0x0, 0x0, 0x0) [0x3ff805cf538]

comment:14 by Christian Boos, 18 years ago

Ok, the relevant part seems to be:

    9 nerr_error_traceback() ["neo_err.c":1, 0x3000c822820]
   10 p_neo_error(0x140781f50, 0x3004c830ba8, 0x3000c80adcc, 0x140794098,
 0x1407f9540) [0x3000c80b1b0]
   11 (unknown)() [0x3000c80adc8]
   12 PyCFunction_Call(0x3ffbf95d364, 0x20000f13a68, 0x0, 0x1, 0x140552fa0)
 [0x3ffbf904ca0]

which indicates an error in the neo library (i.e. ClearSilver).

At best, it's simply a bad handling of template errors, which can manifest themselves if for some reason an old template is picked up. I've noticed a few crashes in such situations, even on Windows. This can be solved by removing the old templates.

At worst, it's ClearSilver not being threadsafe on your platform. Try to see if the error still happens when tracd uses the single threaded mode:

  • trac/web/standalone.py

     
    189189        return auth['username']
    190190
    191191
    192 class TracHTTPServer(ThreadingMixIn, HTTPServer):
     192class TracHTTPServer(HTTPServer):
    193193
    194194    projects = None
    195195

Maybe there's a threadsafe configuration switch for ClearSilver that you forgot to turn on?

comment:15 by anonymous, 18 years ago

It works… at least with tracd (I didn't find the time yet to test with FastCGI and mod_python, I'll do it tomorrow).

You were right : the problem was that ClearSilver was not built threadsafe. There is no option but ClearSilver configure is very Linux minded (and may be Windows) and had many problem in our Tru64 environment and with the way we installed software (generally not in standard location). I had to fight to build successfully CS 0.9.14 but now it works.

Thank for your help. I'll add an update tomorrow to let you know the status with mod_python and FastCGI.

comment:16 by anonymous, 18 years ago

I reconfigured my Trac installation with both FastCGI and mod_python and it works very well… Thanks again.

I suggest a better documentation of the requirement for ClearSilver to be thread safe. The problem is that this is very easy to miss the fact that ClearSilver has not been able to configure support for threads. There is no warning or error returned by configure. Just 'Checking pthread.h… no'. Even worst, Makefile for CS are far from perfect and can continue to build CS even in case of error during compilation. We should really insist on the need to examine carefully the logs produced by configure and make. Building ClearSilver on a non Linux machine or with a compiler other than gcc is very painful. I'll submit a few mods to building scripts to ClearSilver developper to help in these situations.

I still have a difficulty with FastCGI if I want no to have any a directory listing as your project home page. I'll try to troubleshoot it and open a new ticket if needed. For me, this ticket can be closed. I don't know for others who expressed interest in this issue.

comment:17 by Christian Boos, 18 years ago

Resolution: fixed
Status: newclosed

I believe that the incomplete headers (0 bytes) received from server errors are due to segfaults, themselves due to some libraries not being built thread-safe (be it SQLite or neo).

If the problem persist for someone, despite having everything built in thread-safe mode, please reopen the ticket and provide detailed information.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Jonas Borgström.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Jonas Borgström to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.