Edgewall Software
Modify

Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#7611 closed defect (invalid)

Trac under mod_wsgi seems to leak file handles

Reported by: niels.reedijk@… Owned by: Christian Boos
Priority: normal Milestone:
Component: web frontend Version: 0.11.1
Severity: normal Keywords: mod_wsgi needinfo
Cc: Branch:
Release Notes:
API Changes:

Description

After running a Trac 0.11.1 installation (dev.haiku-os.org) with mod_wsgi under Apache 2 for several days, errors start to pop up about too many files being opened. This means that some templates can't be opened, or that no new files can be attached. Apparantly file handles are leaked.

The OS is Solaris 9. Apache is 2.2.6. Mod_wsgi is 2.1.

Attachments (2)

pfiles_trac_log (42.6 KB ) - added by Niels <niels.reedijk@…> 11 years ago.
pfiles_trac_2_log (28.5 KB ) - added by Niels <niels.reedijk@…> 11 years ago.

Download all attachments as: .zip

Change History (16)

comment:1 by niels.reedijk@…, 11 years ago

Summary: mod_wsgimod_wsgi seems to leak file handles

in reply to:  2 comment:3 by Niels <niels.reedijk@…>, 11 years ago

Replying to anonymous:

Upgrade to mod_wsgi 2.3. See:

http://code.google.com/p/modwsgi/issues/detail?id=95 http://code.google.com/p/modwsgi/wiki/ChangesInVersion0203 http://code.google.com/p/modwsgi/wiki/ChangesInVersion0202

That issue has nothing to do with this problem. The issue above clearly mentions that file handles are leaked when doing a gracious restart or shutdown.

This is not the case.

comment:4 by Graham.Dumpleton@…, 11 years ago

The title of the referenced issue is "Daemon process listener sockets leaked in parent process on 'graceful' restart." and specifically mentions 'graceful restart'. Leaking descriptors on shutdown doesn't make any sense since the processes are all killed off and thus any retained descriptors would be forcibly closed off at that point.

In UNIX, sockets and files are both manipulated via a file descriptor and so in many respects appear the same. Also, the particular listener socket which was leaking was a UNIX socket, not an INET socket. As such, the socket has an actual file present in the file system as well. In other words, on UNIX socket handles are file handles.

If you are adamant that it is not the same issue and still want to point the finger at mod_wsgi as the culprit, then maybe you should go over to:

http://groups.google.com/group/modwsgi

and post about it there or log a ticket on the mod_wsgi issue tracker at:

http://code.google.com/p/modwsgi/issues/list

rather than here on the Trac site, especially since you seem to suggest it isn't even a Trac problem.

In logging an issue on mod_wsgi issue tracker however, you are going to have to provide some decent information about what is being leaked, in the same way the bug report you were pointed at does, otherwise can only again be assumed it is the same problem.

So, you need to show why it isn't the same problem.

in reply to:  4 comment:5 by Niels <niels.reedijk@…>, 11 years ago

Summary: mod_wsgi seems to leak file handlesTrac under mod_wsgi seems to leak file handles

Replying to Graham.Dumpleton@…:

The title of the referenced issue is "Daemon process listener sockets leaked in parent process on 'graceful' restart." and specifically mentions 'graceful restart'. Leaking descriptors on shutdown doesn't make any sense since the processes are all killed off and thus any retained descriptors would be forcibly closed off at that point.

I was about to admit that I misread the mentioned bug report - something I normally refuse to do when someone responds so belitling - but fortunately I don't have to. I did not perform a graceful restart. The problem occurs after a period of runtime. In other words, the two issues are logically unrelated - read more below.

If you are adamant that it is not the same issue and still want to point the finger at mod_wsgi as the culprit, then maybe you should go over to:

http://groups.google.com/group/modwsgi

and post about it there or log a ticket on the mod_wsgi issue tracker at:

http://code.google.com/p/modwsgi/issues/list

rather than here on the Trac site, especially since you seem to suggest it isn't even a Trac problem.

Never I suggest it isn't a Trac problem. If I were under the impression it was a mod_wsgi issue, I would have reported it under the mod_wsgi issue tracker. The summary can be misleading - I changed that - nevertheless, there is more to a bug report than its summary.

In logging an issue on mod_wsgi issue tracker however, you are going to have to provide some decent information about what is being leaked, in the same way the bug report you were pointed at does, otherwise can only again be assumed it is the same problem.

So, you need to show why it isn't the same problem.

Well, the most simple reason, like I mentioned in my initial reply, is that I did not do a graceful restart.

The second reason is, that due to your pointer to the bug report, I now find out how to track open files, I found out that even after 161 hours of runtime, the main apache process (and the children) only have a max of 2 socked files open. Next to that, what's also interesting for this report, is that only 62 file handles were open. You would have expected more to be open (especially since the issue recurs every two weeks).

This might have two reasons; the problem might not be incremental but sudden, or there is low traffic.

In order to make this bug report more informative, I have done two things. First of all, I have installed mod_wsgi 2.3 to rule out the graceful restart issue. I will close this report if it does not happen again in the next three weeks. Secondly, I will monitor the server to see whether it is an incremental file leakage, or whether it happens suddenly, and I will attach a list of open files when it happens again.

Nonetheless, I do not consider this issue to be 'fixed' right now.

comment:6 by Graham.Dumpleton@…, 11 years ago

Sorry, I also have not read what you said properly, thought you were referring to your own issue when talking about restarts.

Lack of sleep from a baby that doesn't want to sleep properly and frustration with dealing with a few too many people this week who don't want to read the documentation must be getting the better of me and causing me to be a bit short. Thanks for bringing me back to reality. :-(

What I might suggest is when you feel you have a good handle on what might be happening, but still not sure of cause, post a description to the Trac user group on Google and sure you will find people there interested in helping debug it. Things I would be looking out for are bursts in traffic in Apache access logs from spam bots about time any problems seem to manifest. I have seen this sort of unexpected behaviour, especially when the spam bots attempt to post data to arbitrary URLs, to cause problems for various applications in the past.

by Niels <niels.reedijk@…>, 11 years ago

Attachment: pfiles_trac_log added

by Niels <niels.reedijk@…>, 11 years ago

Attachment: pfiles_trac_2_log added

comment:7 by Niels <niels.reedijk@…>, 11 years ago

I added logs of two different freeze moments. I will start debugging this thing at a later stage.

comment:8 by Remy Blank, 11 years ago

Keywords: needinfo added

The file attachment:pfiles_trac_log shows two interesting things:

  • A pile of S_IFIFO descriptors (named pipes). I have no idea why there should be so many.
  • The remains of Trac trying to create an attachment named if_re_port.diff on ticket 2767, and failing miserably, possibly because the user under which the web server runs isn't allowed to write to:
    /var/trac/dev.haiku-os.org/attachments/ticket/2767/
    
    This is related to #3722. You may want to check your filesystem permissions below the attachments folder of your environment.

The attachment:pfiles_trac_2_log only shows the O_IFIFO descriptors, so this is the most likely cause. Your next step should probably be to find out which processes are at both ends of these FIFOs. Let us know what you find out.

in reply to:  8 ; comment:9 by Niels <niels.reedijk@…>, 11 years ago

Replying to rblank:

The file attachment:pfiles_trac_log shows two interesting things:

  • A pile of S_IFIFO descriptors (named pipes). I have no idea why there should be so many.
  • The remains of Trac trying to create an attachment named if_re_port.diff on ticket 2767, and failing miserably, possibly because the user under which the web server runs isn't allowed to write to:
    /var/trac/dev.haiku-os.org/attachments/ticket/2767/
    
    This is related to #3722. You may want to check your filesystem permissions below the attachments folder of your environment.

Strange thing is that in the process of saving the file, Trac actually creates 101 0 kb files. The problem seems to happen during attaching a file, where for some reason it fails and then in a second pass tries again while creating a new filename.

After apache is restarted, the person who wants to attach the file was able to attach that specific file, so it's not the file itself that is not working.

The attachment:pfiles_trac_2_log only shows the O_IFIFO descriptors, so this is the most likely cause. Your next step should probably be to find out which processes are at both ends of these FIFOs. Let us know what you find out.

Any hints on how to do that?

Basically, now it's just waiting for the next time Apache goes wild to trace the problem.

in reply to:  9 ; comment:10 by Remy Blank, 11 years ago

Replying to Niels <niels.reedijk@…>:

Strange thing is that in the process of saving the file, Trac actually creates 101 0 kb files. The problem seems to happen during attaching a file, where for some reason it fails and then in a second pass tries again while creating a new filename.

This is due to the algorithm for creating unique file names for attachments with the same name. The attachments module tries to create the file, and if an exception is thrown on creation, it tries all numbered variants up to 100. The problem is, it does that on any exception, not only on "file exists". This should really be fixed.

After apache is restarted, the person who wants to attach the file was able to attach that specific file, so it's not the file itself that is not working.

That is strange indeed. Maybe this is only a side effect of the "too many open files" error. I'm surprised it creates the files in these conditions, but it's possible.

Any hints on how to do that?

You could try and see if lsof gives you more information. Also look if you seem to have lots of child processes of the parent Apache process.

comment:11 by anonymous, 11 years ago

Resolution: invalid
Status: newclosed

I'm going to close this ticket (hopefully for ever). Recently the server had a system software update (the latest Sun recommended patches for Solaris 10). Mind me, it has only been running for two days now, but there is no leaking of file handles for now, so it does not show the previous behaviour.

Thanks everyone for providing tips to identify the source of the problem.

in reply to:  10 comment:12 by Christian Boos, 11 years ago

Milestone: 0.11.3
Owner: set to Christian Boos

Re-opening, as a reminder for fixing the following glitch:

Replying to rblank:

This is due to the algorithm for creating unique file names for attachments with the same name. The attachments module tries to create the file, and if an exception is thrown on creation, it tries all numbered variants up to 100. The problem is, it does that on any exception, not only on "file exists". This should really be fixed.

comment:13 by Remy Blank, 11 years ago

I was going to try and find a fix as part of #3722, but you're welcome to beat me to it :-)

in reply to:  13 comment:14 by Christian Boos, 11 years ago

Milestone: 0.11.3

(was r7810)

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to as closed The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.