#11805 closed defect (fixed)
RAW data download broken for larger files
Reported by: | Dirk Stöcker | Owned by: | Jun Omae |
---|---|---|---|
Priority: | high | Milestone: | 1.0.3 |
Component: | version control/browser | Version: | 1.0.2 |
Severity: | major | Keywords: | svn17 svn18 |
Cc: | felix.schwarz@… | Branch: | |
Release Notes: |
Fix segmentation fault while downloading content of Subversion node. |
||
API Changes: | |||
Internal Changes: |
Description
The changes in r11818 seem to have broken the raw data download (txt maybe also, don't now). See also the original report in http://josm.openstreetmap.de/ticket/10692
For larger files the data transfer stops in the middle. This happens for remote as well as local requests (which are FAST!). It does not always stop at the same size (but depends on the actual file size). TCP dump indicates, that Apache probably does simply not get the data to deliver it.
P.S. Why is there no Content-Length for raw data delivery? Content-Length is easy to find in these cases.
Example and testcases see original bug report.
Attachments (0)
Change History (27)
comment:1 by , 10 years ago
Cc: | added |
---|
comment:2 by , 10 years ago
Component: | general → version control/browser |
---|---|
Milestone: | → 1.0.3 |
Owner: | set to |
Status: | new → assigned |
comment:3 by , 10 years ago
Recent Ubuntu 12.04 LTS on a Hetzner VQ19 VServer using Apache 2.4 with WSGI interface.
comment:4 by , 10 years ago
According to http://packages.ubuntu.com/trusty/apache2.2-bin, are you using Ubuntu LTS 14.04? Is version of mod-wsgi 3.4?
$ wget -S -O /dev/null http://josm.openstreetmap.de/robots.txt 2>&1 | grep -i Server Server: Apache/2.4.7 (Ubuntu)
Also, could you please provide apache configuration? I want to know the detail of configuration.
comment:5 by , 10 years ago
I can't access the server ATM, but we did an update to recent LTS a month ago, so it is the most recent LTS which was available a month ago :-) Apache config: What do you want to know? I wont make it publically available completely.
comment:6 by , 10 years ago
Ok. I want to know version of mod-wsgi and SetOutputFilter
settings in particular.
In #717, I've tested with the following;
- Apache 2.2.15 with mod_wsgi 3.2
- Apache 2.2.15 with mod_fcgid 2.3.7
- Lighttpd 1.4.31 with mod_fastcgi
- Nginx 1.0.15 with fastcgi
comment:7 by , 10 years ago
mod-wsgi: 3.4-4ubuntu2.1.14.04.1
Filter settings:
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/x-mapcss
Issue happens also for our ".lang" files, which are plain binary, so I doubt filter is releavnt.
comment:8 by , 10 years ago
Hmmm, it cannot be reproduced with Ubuntu 14.04. It works well on my environment.
$ apt-show-versions apache2-mpm-worker libapache2-mod-wsgi apache2-mpm-worker:amd64/trusty-security 2.4.7-1ubuntu4.1 uptodate libapache2-mod-wsgi:amd64/trusty-security 3.4-4ubuntu2.1.14.04.1 uptodate
/etc/apache2/mods-enabled/deflate.conf
<IfModule mod_deflate.c> <IfModule mod_filter.c> # these are known to be safe with MSIE 6 AddOutputFilterByType DEFLATE text/html text/plain text/xml # everything else may cause problems with MSIE 6 AddOutputFilterByType DEFLATE text/css AddOutputFilterByType DEFLATE application/x-javascript application/javascript application/ecmascript AddOutputFilterByType DEFLATE application/rss+xml AddOutputFilterByType DEFLATE application/xml </IfModule> </IfModule>
/etc/apache2/sites-available/trac-1.0.conf (daemon mode)
<IfModule mod_wsgi.c> WSGIDaemonProcess trac-1.0 \ python-path=/var/local/venv/trac-1.0/lib/python2.7/site-packages \ processes=2 threads=4 maximum-requests=128 inactivity-timeout=600 \ display-name=%{GROUP} WSGIScriptAlias /trac /var/local/venv/trac-1.0/wsgi/trac.wsgi \ process-group=trac-1.0 application-group=%{GLOBAL} <Directory /var/local/venv/trac-1.0/wsgi/> SetEnv trac.env_parent_dir /var/local/trac Require all granted </Directory> </IfModule>
comment:9 by , 10 years ago
Well, the JOSM page has much more load. Something like 100.000-200.000 accesses a day. I wouldn't assume that a test system shows the same trouble for a problem which seems to be file and timing dependend. :-)
Maybe it breaks when another process on the same core does something or …
In general it seems that 1.0.2 is not as stable as 1.0.1 and sometimes connections break. But I can't lay a finger on this issues yet.
I can't give you access to the life server, but I could add debugging outputs when that helps.
comment:10 by , 10 years ago
Thanks for your helps. After increasing threads
of WSGIDaemonProcess
to 25, I tested with ab -c 25 -n 250 'http://localhost/trac/t11805/browser/trunk/trac/locale/ja/LC_MESSAGES/messages.po?format=txt'
with clone of Trac repository.
Concurrency Level: 25 Time taken for tests: 9.711 seconds Complete requests: 250 Failed requests: 237 (Connect: 0, Receive: 0, Length: 237, Exceptions: 0) ...
I get the following in /var/log/apache2/error.log
.
[Wed Nov 05 13:16:54.703657 2014] [mpm_worker:notice] [pid 10611:tid 139721355519872] AH00292: Apache/2.4.7 (Ubuntu) SVN/1.8.8 mod_wsgi/3.4 Python/2.7.6 configured -- resuming normal operations [Wed Nov 05 13:16:54.703772 2014] [core:notice] [pid 10611:tid 139721355519872] AH00094: Command line: '/usr/sbin/apache2' ... [Wed Nov 05 13:24:15.537651 2014] [core:error] [pid 10616:tid 139721205790464] [client ::1:32819] End of script output before headers: trac.wsgi [Wed Nov 05 13:24:15.537740 2014] [core:error] [pid 10617:tid 139721057466112] [client ::1:32820] End of script output before headers: trac.wsgi [Wed Nov 05 13:24:15.537816 2014] [core:error] [pid 10616:tid 139721015502592] [client ::1:32821] End of script output before headers: trac.wsgi [Wed Nov 05 13:24:15.537893 2014] [core:error] [pid 10617:tid 139721065858816] [client ::1:32822] End of script output before headers: trac.wsgi [Wed Nov 05 13:24:16.225174 2014] [core:notice] [pid 10611:tid 139721355519872] AH00051: child pid 12308 exit signal Segmentation fault (11), possible coredump in /etc/apache2
Could you please check the same error is logged in error.log?
comment:11 by , 10 years ago
Created patch in [27e138ce/jomae.git] (jomae.git@t11805) to avoid SEGVs.
comment:12 by , 10 years ago
I get:
[Wed Nov 05 09:44:06.611013 2014] [core:notice] [pid 12330:tid 140620797282176] AH00052: child pid 23032 exit signal Segmentation fault (11)
comment:13 by , 10 years ago
I can confirm that your fix works. Thanks a lot - I know this was no easy bug but your fix was really fast!
This bug should be reported upstream (to mod_wsgi or apache, whoever is responsible for it).
comment:14 by , 10 years ago
BTW: This also explains my observation of "general stability" issue. Because the segfault shuts down the whole currently active instance and thus all active connections.
follow-up: 18 comment:15 by , 10 years ago
It seems as if this could be mod_wsgi bug #42 though that needs to be confirmed (didn't look too much into this issue).
follow-up: 17 comment:16 by , 10 years ago
I got the following backtrace. It seems to be a mod_wsgi issue.
(gdb) bt #0 0x00007f915e996c6d in poll () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f915eeaec1e in poll (__timeout=<optimized out>, __nfds=<optimized out>, __fds=0x7fffe3d04200) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46 #2 apr_poll (aprset=0x7fffe3d042c0, num=1, nsds=0x7fffe3d042ac, timeout=<optimized out>) at /build/buildd/apr-1.5.0/poll/unix/poll.c:120 #3 0x00007f9159be8647 in ?? () from /usr/lib/apache2/modules/mod_wsgi.so #4 0x00007f9159be9ba5 in ?? () from /usr/lib/apache2/modules/mod_wsgi.so #5 0x00007f915f79e2a9 in ap_run_post_config (pconf=0x7f915f735028, plog=0x7f915f709028, ptemp=0x7f915f70b028, s=0x7f915f70dde0) at config.c:103 #6 0x00007f915f77ee07 in main (argc=3, argv=0x7fffe3d047e8) at main.c:765
comment:17 by , 10 years ago
I got the following backtrace. It seems to be a mod_wsgi issue.
I guess to be caused by wrong usage of apr pool with backtrace of another thread….
Thread 28 (Thread 0x7f9155d04700 (LWP 13480)): #0 0x00007f915e8dfea7 in kill () at ../sysdeps/unix/syscall-template.S:81 #1 <signal handler called> #2 0x00007f915cb8fa18 in svn_cache__set (cache=0x7f914801d3c0, key=0x7f9141e67138, value=0x7f9141e0d028, scratch_pool=0x7f9141e5b028) at /build/buildd/subversion-1.8.8/subversion/libsvn_subr/cache.c:105 #3 0x00007f915c061185 in rep_read_contents (baton=0x7f9141e670d0, buf=0x7f913c1aa270 "/templates/wiki_view.html:73\n#, python-format\nmsgid \"Last modified on %(date)s\"\nmsgstr \"最終更新 %(date)s\"\n\n#: trac/wiki/templates/wiki_view.html:77\n#, python-format\nmsgid \"The page %(name)s does "..., len=0x7f9155d031f8) at /build/buildd/subversion-1.8.8/subversion/libsvn_fs_fs/fs_fs.c:5256 #4 0x00007f914331d4e1 in ?? () from /usr/lib/python2.7/dist-packages/libsvn/_core.so #5 0x00007f91597c6af7 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #6 0x00007f91597c917d in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #7 0x00007f91597c6dd8 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #8 0x00007f91597c917d in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #9 0x00007f91597c6dd8 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #10 0x00007f91597c917d in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #11 0x00007f91597c6dd8 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #12 0x00007f91597c917d in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #13 0x00007f91597c6dd8 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #14 0x00007f91597c8883 in ?? () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #15 0x00007f91596b5a2b in PyIter_Next () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #16 0x00007f9159be56b4 in ?? () from /usr/lib/apache2/modules/mod_wsgi.so #17 0x00007f9159bebb08 in ?? () from /usr/lib/apache2/modules/mod_wsgi.so #18 0x00007f915ec77182 in start_thread (arg=0x7f9155d04700) at pthread_create.c:312 #19 0x00007f915e9a3fbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
comment:18 by , 10 years ago
Replying to fschwarz:
It seems as if this could be mod_wsgi bug #42 though that needs to be confirmed (didn't look too much into this issue).
This looks more like a followup issue, but may be related. I think a new bug report is necessary. The debug information here should help them to find the issue.
follow-up: 22 comment:19 by , 10 years ago
Release Notes: | modified (diff) |
---|
Committed in [13243] and merged to trunk in [13244].
Also, rarely reproduced on tracd
with downloading the content immediately after tracd
is launched. After [13243], works well on tracd
.
#0 0x00007f5f392e117c in rep_read_contents (baton=0x7f5f380690d8, buf=0x7f5f3417b2e0 "ified] %(reldate)s\"\nmsgstr \"\"\n\n#: trac/wiki/templates/wiki_view.html:62\n#, python-format\nmsgid \"Last modified on %(date)s\"\nmsgstr \"\"\n\n#: trac/wiki/templates/wiki_view.html:66\n#, python-format\nmsgid \"T"..., len=0x7f5f3b8ee308) at /build/buildd/subversion-1.8.8/subversion/libsvn_fs_fs/fs_fs.c:5256 #1 0x00007f5f331654e1 in ?? () from /usr/lib/python2.7/dist-packages/libsvn/_core.so #2 0x000000000052f936 in PyEval_EvalFrameEx () #3 0x000000000055c594 in PyEval_EvalCodeEx () #4 0x000000000052ca8d in PyEval_EvalFrameEx () #5 0x000000000055c594 in PyEval_EvalCodeEx () #6 0x000000000052ca8d in PyEval_EvalFrameEx () #7 0x000000000055c594 in PyEval_EvalCodeEx () #8 0x000000000052ca8d in PyEval_EvalFrameEx () #9 0x000000000055c594 in PyEval_EvalCodeEx () #10 0x000000000052ca8d in PyEval_EvalFrameEx () #11 0x000000000056cc54 in ?? () #12 0x000000000052c94e in PyEval_EvalFrameEx () #13 0x000000000052cf32 in PyEval_EvalFrameEx () #14 0x000000000052cf32 in PyEval_EvalFrameEx () #15 0x000000000052cf32 in PyEval_EvalFrameEx () #16 0x000000000056d0aa in ?? () #17 0x00000000004d9854 in ?? () #18 0x00000000004da20b in PyEval_CallObjectWithKeywords () #19 0x0000000000497c7d in PyInstance_New () #20 0x000000000052cc20 in PyEval_EvalFrameEx () #21 0x000000000052cf32 in PyEval_EvalFrameEx () #22 0x000000000056d0aa in ?? () #23 0x000000000052e1e6 in PyEval_EvalFrameEx () #24 0x000000000052cf32 in PyEval_EvalFrameEx () #25 0x000000000052cf32 in PyEval_EvalFrameEx () #26 0x000000000056d0aa in ?? () #27 0x00000000004d9854 in ?? () #28 0x00000000004da20b in PyEval_CallObjectWithKeywords () #29 0x00000000005872b2 in ?? () #30 0x00007f5f3e87f182 in start_thread (arg=0x7f5f3b8f0700) at pthread_create.c:312 #31 0x00007f5f3e5abfbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
comment:20 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:21 by , 10 years ago
Upstream report: https://github.com/GrahamDumpleton/mod_wsgi/issues/55
follow-up: 23 comment:22 by , 10 years ago
Replying to jomae:
Also, rarely reproduced on
tracd
with downloading the content immediately aftertracd
is launched. After [13243], works well ontracd
.
If you can see the issue with tracd this could hint at an issue within (the Python bindings for) subversion not mod_wsgi.
dstoecker: JOSM uses subversion as well, right?
Did anyone experience the issue when using no source control or something else than subversion?
comment:23 by , 10 years ago
Replying to fschwarz:
If you can see the issue with tracd this could hint at an issue within (the Python bindings for) subversion not mod_wsgi.
After making the report I've also seen this. For complex software it's sometimes hard to find right upstream :-)
dstoecker: JOSM uses subversion as well, right?
Yes.
comment:24 by , 10 years ago
Can we use this ticket to gather more information about the crash so the problem can be submitted to the subversion developers? For example I think we should check if the problem happens on different versions of Python/subversion or if there are any specifics (e.g. regression introducted in svn …/already fixed in latest svn). I could help testing Fedora rawhide which is usually very, very close to the latest release versions. Also CentOS 5/6 might help to find out if this worked at some point.
(If you don't want to do that evaluation here please tell me so I can shut up - but I think it is important to do the checks before contacting the subversion guys.)
follow-up: 26 comment:25 by , 10 years ago
Next try: svn-issue:4526
Additional information is always welcome, but I think the SVN guys probably know better what they need than we. Spending too much time searching information in advance takes away our power for Trac bugs.
comment:27 by , 10 years ago
Keywords: | svn17 svn18 added |
---|
This issue can be reproduced with Subversion 1.7.x and 1.8.x but cannot be with 1.6.x.
I'll investigate it. The similar issue occurred in [11797] through [11818] on 1.0.2dev, e.g. comment:46:ticket:717. I believed fixing it in [11818]….
Could you please provide system information in that site and apache configuration?
Becuase it cannot determine Content-Length before retrieving entire of substituted content. So Trac sends the content with chunked-encoding. Subversion keywords and eol-style substitution is introduced in #717.