#7611 closed defect (invalid)
Trac under mod_wsgi seems to leak file handles
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | web frontend | Version: | 0.11.1 |
Severity: | normal | Keywords: | mod_wsgi needinfo |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
After running a Trac 0.11.1 installation (dev.haiku-os.org) with mod_wsgi under Apache 2 for several days, errors start to pop up about too many files being opened. This means that some templates can't be opened, or that no new files can be attached. Apparantly file handles are leaked.
The OS is Solaris 9. Apache is 2.2.6. Mod_wsgi is 2.1.
Attachments (2)
Change History (16)
comment:1 by , 16 years ago
Summary: | mod_wsgi → mod_wsgi seems to leak file handles |
---|
follow-up: 3 comment:2 by , 16 years ago
comment:3 by , 16 years ago
Replying to anonymous:
Upgrade to mod_wsgi 2.3. See:
http://code.google.com/p/modwsgi/issues/detail?id=95 http://code.google.com/p/modwsgi/wiki/ChangesInVersion0203 http://code.google.com/p/modwsgi/wiki/ChangesInVersion0202
That issue has nothing to do with this problem. The issue above clearly mentions that file handles are leaked when doing a gracious restart or shutdown.
This is not the case.
follow-up: 5 comment:4 by , 16 years ago
The title of the referenced issue is "Daemon process listener sockets leaked in parent process on 'graceful' restart." and specifically mentions 'graceful restart'. Leaking descriptors on shutdown doesn't make any sense since the processes are all killed off and thus any retained descriptors would be forcibly closed off at that point.
In UNIX, sockets and files are both manipulated via a file descriptor and so in many respects appear the same. Also, the particular listener socket which was leaking was a UNIX socket, not an INET socket. As such, the socket has an actual file present in the file system as well. In other words, on UNIX socket handles are file handles.
If you are adamant that it is not the same issue and still want to point the finger at mod_wsgi as the culprit, then maybe you should go over to:
and post about it there or log a ticket on the mod_wsgi issue tracker at:
rather than here on the Trac site, especially since you seem to suggest it isn't even a Trac problem.
In logging an issue on mod_wsgi issue tracker however, you are going to have to provide some decent information about what is being leaked, in the same way the bug report you were pointed at does, otherwise can only again be assumed it is the same problem.
So, you need to show why it isn't the same problem.
comment:5 by , 16 years ago
Summary: | mod_wsgi seems to leak file handles → Trac under mod_wsgi seems to leak file handles |
---|
Replying to Graham.Dumpleton@…:
The title of the referenced issue is "Daemon process listener sockets leaked in parent process on 'graceful' restart." and specifically mentions 'graceful restart'. Leaking descriptors on shutdown doesn't make any sense since the processes are all killed off and thus any retained descriptors would be forcibly closed off at that point.
I was about to admit that I misread the mentioned bug report - something I normally refuse to do when someone responds so belitling - but fortunately I don't have to. I did not perform a graceful restart. The problem occurs after a period of runtime. In other words, the two issues are logically unrelated - read more below.
If you are adamant that it is not the same issue and still want to point the finger at mod_wsgi as the culprit, then maybe you should go over to:
and post about it there or log a ticket on the mod_wsgi issue tracker at:
rather than here on the Trac site, especially since you seem to suggest it isn't even a Trac problem.
Never I suggest it isn't a Trac problem. If I were under the impression it was a mod_wsgi issue, I would have reported it under the mod_wsgi issue tracker. The summary can be misleading - I changed that - nevertheless, there is more to a bug report than its summary.
In logging an issue on mod_wsgi issue tracker however, you are going to have to provide some decent information about what is being leaked, in the same way the bug report you were pointed at does, otherwise can only again be assumed it is the same problem.
So, you need to show why it isn't the same problem.
Well, the most simple reason, like I mentioned in my initial reply, is that I did not do a graceful restart.
The second reason is, that due to your pointer to the bug report, I now find out how to track open files, I found out that even after 161 hours of runtime, the main apache process (and the children) only have a max of 2 socked files open. Next to that, what's also interesting for this report, is that only 62 file handles were open. You would have expected more to be open (especially since the issue recurs every two weeks).
This might have two reasons; the problem might not be incremental but sudden, or there is low traffic.
In order to make this bug report more informative, I have done two things. First of all, I have installed mod_wsgi 2.3 to rule out the graceful restart issue. I will close this report if it does not happen again in the next three weeks. Secondly, I will monitor the server to see whether it is an incremental file leakage, or whether it happens suddenly, and I will attach a list of open files when it happens again.
Nonetheless, I do not consider this issue to be 'fixed' right now.
comment:6 by , 16 years ago
Sorry, I also have not read what you said properly, thought you were referring to your own issue when talking about restarts.
Lack of sleep from a baby that doesn't want to sleep properly and frustration with dealing with a few too many people this week who don't want to read the documentation must be getting the better of me and causing me to be a bit short. Thanks for bringing me back to reality. :-(
What I might suggest is when you feel you have a good handle on what might be happening, but still not sure of cause, post a description to the Trac user group on Google and sure you will find people there interested in helping debug it. Things I would be looking out for are bursts in traffic in Apache access logs from spam bots about time any problems seem to manifest. I have seen this sort of unexpected behaviour, especially when the spam bots attempt to post data to arbitrary URLs, to cause problems for various applications in the past.
by , 16 years ago
Attachment: | pfiles_trac_log added |
---|
by , 16 years ago
Attachment: | pfiles_trac_2_log added |
---|
comment:7 by , 16 years ago
I added logs of two different freeze moments. I will start debugging this thing at a later stage.
follow-up: 9 comment:8 by , 16 years ago
Keywords: | needinfo added |
---|
The file attachment:pfiles_trac_log shows two interesting things:
- A pile of
S_IFIFO
descriptors (named pipes). I have no idea why there should be so many.
- The remains of Trac trying to create an attachment named
if_re_port.diff
on ticket 2767, and failing miserably, possibly because the user under which the web server runs isn't allowed to write to:/var/trac/dev.haiku-os.org/attachments/ticket/2767/
This is related to #3722. You may want to check your filesystem permissions below theattachments
folder of your environment.
The attachment:pfiles_trac_2_log only shows the O_IFIFO
descriptors, so this is the most likely cause. Your next step should probably be to find out which processes are at both ends of these FIFOs. Let us know what you find out.
follow-up: 10 comment:9 by , 16 years ago
Replying to rblank:
The file attachment:pfiles_trac_log shows two interesting things:
- A pile of
S_IFIFO
descriptors (named pipes). I have no idea why there should be so many.
- The remains of Trac trying to create an attachment named
if_re_port.diff
on ticket 2767, and failing miserably, possibly because the user under which the web server runs isn't allowed to write to:/var/trac/dev.haiku-os.org/attachments/ticket/2767/This is related to #3722. You may want to check your filesystem permissions below theattachments
folder of your environment.
Strange thing is that in the process of saving the file, Trac actually creates 101 0 kb files. The problem seems to happen during attaching a file, where for some reason it fails and then in a second pass tries again while creating a new filename.
After apache is restarted, the person who wants to attach the file was able to attach that specific file, so it's not the file itself that is not working.
The attachment:pfiles_trac_2_log only shows the
O_IFIFO
descriptors, so this is the most likely cause. Your next step should probably be to find out which processes are at both ends of these FIFOs. Let us know what you find out.
Any hints on how to do that?
Basically, now it's just waiting for the next time Apache goes wild to trace the problem.
follow-up: 12 comment:10 by , 16 years ago
Replying to Niels <niels.reedijk@…>:
Strange thing is that in the process of saving the file, Trac actually creates 101 0 kb files. The problem seems to happen during attaching a file, where for some reason it fails and then in a second pass tries again while creating a new filename.
This is due to the algorithm for creating unique file names for attachments with the same name. The attachments module tries to create the file, and if an exception is thrown on creation, it tries all numbered variants up to 100. The problem is, it does that on any exception, not only on "file exists". This should really be fixed.
After apache is restarted, the person who wants to attach the file was able to attach that specific file, so it's not the file itself that is not working.
That is strange indeed. Maybe this is only a side effect of the "too many open files" error. I'm surprised it creates the files in these conditions, but it's possible.
Any hints on how to do that?
You could try and see if lsof
gives you more information. Also look if you seem to have lots of child processes of the parent Apache process.
comment:11 by , 16 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
I'm going to close this ticket (hopefully for ever). Recently the server had a system software update (the latest Sun recommended patches for Solaris 10). Mind me, it has only been running for two days now, but there is no leaking of file handles for now, so it does not show the previous behaviour.
Thanks everyone for providing tips to identify the source of the problem.
comment:12 by , 16 years ago
Milestone: | → 0.11.3 |
---|---|
Owner: | set to |
Re-opening, as a reminder for fixing the following glitch:
Replying to rblank:
This is due to the algorithm for creating unique file names for attachments with the same name. The attachments module tries to create the file, and if an exception is thrown on creation, it tries all numbered variants up to 100. The problem is, it does that on any exception, not only on "file exists". This should really be fixed.
follow-up: 14 comment:13 by , 16 years ago
I was going to try and find a fix as part of #3722, but you're welcome to beat me to it :-)
Upgrade to mod_wsgi 2.3. See:
Issue should probably be closed.