Edgewall Software
Modify

Opened 10 years ago

Closed 10 years ago

Last modified 3 years ago

#11805 closed defect (fixed)

RAW data download broken for larger files

Reported by: Dirk Stöcker Owned by: Jun Omae
Priority: high Milestone: 1.0.3
Component: version control/browser Version: 1.0.2
Severity: major Keywords: svn17 svn18
Cc: felix.schwarz@… Branch:
Release Notes:

Fix segmentation fault while downloading content of Subversion node.

API Changes:
Internal Changes:

Description

The changes in r11818 seem to have broken the raw data download (txt maybe also, don't now). See also the original report in http://josm.openstreetmap.de/ticket/10692

For larger files the data transfer stops in the middle. This happens for remote as well as local requests (which are FAST!). It does not always stop at the same size (but depends on the actual file size). TCP dump indicates, that Apache probably does simply not get the data to deliver it.

P.S. Why is there no Content-Length for raw data delivery? Content-Length is easy to find in these cases.

Example and testcases see original bug report.

Attachments (0)

Change History (27)

comment:1 by Felix Schwarz, 10 years ago

Cc: felix.schwarz@… added

in reply to:  description comment:2 by Jun Omae, 10 years ago

Component: generalversion control/browser
Milestone: 1.0.3
Owner: set to Jun Omae
Status: newassigned

I'll investigate it. The similar issue occurred in [11797] through [11818] on 1.0.2dev, e.g. comment:46:ticket:717. I believed fixing it in [11818]….

Could you please provide system information in that site and apache configuration?

P.S. Why is there no Content-Length for raw data delivery? Content-Length is easy to find in these cases.

Becuase it cannot determine Content-Length before retrieving entire of substituted content. So Trac sends the content with chunked-encoding. Subversion keywords and eol-style substitution is introduced in #717.

Last edited 10 years ago by Jun Omae (previous) (diff)

comment:3 by Dirk Stöcker, 10 years ago

Recent Ubuntu 12.04 LTS on a Hetzner VQ19 VServer using Apache 2.4 with WSGI interface.

comment:4 by Jun Omae, 10 years ago

According to http://packages.ubuntu.com/trusty/apache2.2-bin, are you using Ubuntu LTS 14.04? Is version of mod-wsgi 3.4?

$ wget -S -O /dev/null http://josm.openstreetmap.de/robots.txt 2>&1 | grep -i Server
  Server: Apache/2.4.7 (Ubuntu)

Also, could you please provide apache configuration? I want to know the detail of configuration.

comment:5 by Dirk Stöcker, 10 years ago

I can't access the server ATM, but we did an update to recent LTS a month ago, so it is the most recent LTS which was available a month ago :-) Apache config: What do you want to know? I wont make it publically available completely.

comment:6 by Jun Omae, 10 years ago

Ok. I want to know version of mod-wsgi and SetOutputFilter settings in particular.

In #717, I've tested with the following;

  • Apache 2.2.15 with mod_wsgi 3.2
  • Apache 2.2.15 with mod_fcgid 2.3.7
  • Lighttpd 1.4.31 with mod_fastcgi
  • Nginx 1.0.15 with fastcgi

comment:7 by Dirk Stöcker, 10 years ago

mod-wsgi: 3.4-4ubuntu2.1.14.04.1

Filter settings:

AddOutputFilterByType DEFLATE text/html text/plain text/xml text/x-mapcss

Issue happens also for our ".lang" files, which are plain binary, so I doubt filter is releavnt.

comment:8 by Jun Omae, 10 years ago

Hmmm, it cannot be reproduced with Ubuntu 14.04. It works well on my environment.

$ apt-show-versions apache2-mpm-worker libapache2-mod-wsgi
apache2-mpm-worker:amd64/trusty-security 2.4.7-1ubuntu4.1 uptodate
libapache2-mod-wsgi:amd64/trusty-security 3.4-4ubuntu2.1.14.04.1 uptodate

/etc/apache2/mods-enabled/deflate.conf

<IfModule mod_deflate.c>
        <IfModule mod_filter.c>
                # these are known to be safe with MSIE 6
                AddOutputFilterByType DEFLATE text/html text/plain text/xml

                # everything else may cause problems with MSIE 6
                AddOutputFilterByType DEFLATE text/css
                AddOutputFilterByType DEFLATE application/x-javascript application/javascript application/ecmascript
                AddOutputFilterByType DEFLATE application/rss+xml
                AddOutputFilterByType DEFLATE application/xml
        </IfModule>
</IfModule>

/etc/apache2/sites-available/trac-1.0.conf (daemon mode)

<IfModule mod_wsgi.c>
    WSGIDaemonProcess trac-1.0 \
        python-path=/var/local/venv/trac-1.0/lib/python2.7/site-packages \
        processes=2 threads=4 maximum-requests=128 inactivity-timeout=600 \
        display-name=%{GROUP}
    WSGIScriptAlias /trac /var/local/venv/trac-1.0/wsgi/trac.wsgi \
        process-group=trac-1.0 application-group=%{GLOBAL}
    <Directory /var/local/venv/trac-1.0/wsgi/>
        SetEnv trac.env_parent_dir /var/local/trac
        Require all granted
    </Directory>
</IfModule>
Last edited 10 years ago by Jun Omae (previous) (diff)

comment:9 by Dirk Stöcker, 10 years ago

Well, the JOSM page has much more load. Something like 100.000-200.000 accesses a day. I wouldn't assume that a test system shows the same trouble for a problem which seems to be file and timing dependend. :-)

Maybe it breaks when another process on the same core does something or …

In general it seems that 1.0.2 is not as stable as 1.0.1 and sometimes connections break. But I can't lay a finger on this issues yet.

I can't give you access to the life server, but I could add debugging outputs when that helps.

comment:10 by Jun Omae, 10 years ago

Thanks for your helps. After increasing threads of WSGIDaemonProcess to 25, I tested with ab -c 25 -n 250 'http://localhost/trac/t11805/browser/trunk/trac/locale/ja/LC_MESSAGES/messages.po?format=txt' with clone of Trac repository.

Concurrency Level:      25
Time taken for tests:   9.711 seconds
Complete requests:      250
Failed requests:        237
   (Connect: 0, Receive: 0, Length: 237, Exceptions: 0)
...

I get the following in /var/log/apache2/error.log.

[Wed Nov 05 13:16:54.703657 2014] [mpm_worker:notice] [pid 10611:tid 139721355519872] AH00292: Apache/2.4.7 (Ubuntu) SVN/1.8.8 mod_wsgi/3.4 Python/2.7.6 configured -- resuming normal operations
[Wed Nov 05 13:16:54.703772 2014] [core:notice] [pid 10611:tid 139721355519872] AH00094: Command line: '/usr/sbin/apache2'
...
[Wed Nov 05 13:24:15.537651 2014] [core:error] [pid 10616:tid 139721205790464] [client ::1:32819] End of script output before headers: trac.wsgi
[Wed Nov 05 13:24:15.537740 2014] [core:error] [pid 10617:tid 139721057466112] [client ::1:32820] End of script output before headers: trac.wsgi
[Wed Nov 05 13:24:15.537816 2014] [core:error] [pid 10616:tid 139721015502592] [client ::1:32821] End of script output before headers: trac.wsgi
[Wed Nov 05 13:24:15.537893 2014] [core:error] [pid 10617:tid 139721065858816] [client ::1:32822] End of script output before headers: trac.wsgi
[Wed Nov 05 13:24:16.225174 2014] [core:notice] [pid 10611:tid 139721355519872] AH00051: child pid 12308 exit signal Segmentation fault (11), possible coredump in /etc/apache2

Could you please check the same error is logged in error.log?

comment:11 by Jun Omae, 10 years ago

Created patch in [27e138ce/jomae.git] (jomae.git@t11805) to avoid SEGVs.

comment:12 by Dirk Stöcker, 10 years ago

I get:

[Wed Nov 05 09:44:06.611013 2014] [core:notice] [pid 12330:tid 140620797282176] AH00052: child pid 23032 exit signal Segmentation fault (11)

comment:13 by Dirk Stöcker, 10 years ago

I can confirm that your fix works. Thanks a lot - I know this was no easy bug but your fix was really fast!

This bug should be reported upstream (to mod_wsgi or apache, whoever is responsible for it).

comment:14 by Dirk Stöcker, 10 years ago

BTW: This also explains my observation of "general stability" issue. Because the segfault shuts down the whole currently active instance and thus all active connections.

comment:15 by Felix Schwarz, 10 years ago

It seems as if this could be mod_wsgi bug #42 though that needs to be confirmed (didn't look too much into this issue).

comment:16 by Jun Omae, 10 years ago

I got the following backtrace. It seems to be a mod_wsgi issue.

(gdb) bt
#0  0x00007f915e996c6d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f915eeaec1e in poll (__timeout=<optimized out>, __nfds=<optimized out>, __fds=0x7fffe3d04200)
    at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2  apr_poll (aprset=0x7fffe3d042c0, num=1, nsds=0x7fffe3d042ac, timeout=<optimized out>)
    at /build/buildd/apr-1.5.0/poll/unix/poll.c:120
#3  0x00007f9159be8647 in ?? () from /usr/lib/apache2/modules/mod_wsgi.so
#4  0x00007f9159be9ba5 in ?? () from /usr/lib/apache2/modules/mod_wsgi.so
#5  0x00007f915f79e2a9 in ap_run_post_config (pconf=0x7f915f735028, plog=0x7f915f709028, ptemp=0x7f915f70b028, s=0x7f915f70dde0)
    at config.c:103
#6  0x00007f915f77ee07 in main (argc=3, argv=0x7fffe3d047e8) at main.c:765

in reply to:  16 comment:17 by Jun Omae, 10 years ago

I got the following backtrace. It seems to be a mod_wsgi issue.

I guess to be caused by wrong usage of apr pool with backtrace of another thread….

Thread 28 (Thread 0x7f9155d04700 (LWP 13480)):
#0  0x00007f915e8dfea7 in kill () at ../sysdeps/unix/syscall-template.S:81
#1  <signal handler called>
#2  0x00007f915cb8fa18 in svn_cache__set (cache=0x7f914801d3c0, key=0x7f9141e67138, value=0x7f9141e0d028,
    scratch_pool=0x7f9141e5b028) at /build/buildd/subversion-1.8.8/subversion/libsvn_subr/cache.c:105
#3  0x00007f915c061185 in rep_read_contents (baton=0x7f9141e670d0,
    buf=0x7f913c1aa270 "/templates/wiki_view.html:73\n#, python-format\nmsgid \"Last modified on %(date)s\"\nmsgstr \"最終更新 %(date)s\"\n\n#: trac/wiki/templates/wiki_view.html:77\n#, python-format\nmsgid \"The page %(name)s does "..., len=0x7f9155d031f8)
    at /build/buildd/subversion-1.8.8/subversion/libsvn_fs_fs/fs_fs.c:5256
#4  0x00007f914331d4e1 in ?? () from /usr/lib/python2.7/dist-packages/libsvn/_core.so
#5  0x00007f91597c6af7 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#6  0x00007f91597c917d in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#7  0x00007f91597c6dd8 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#8  0x00007f91597c917d in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#9  0x00007f91597c6dd8 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#10 0x00007f91597c917d in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#11 0x00007f91597c6dd8 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#12 0x00007f91597c917d in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#13 0x00007f91597c6dd8 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#14 0x00007f91597c8883 in ?? () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#15 0x00007f91596b5a2b in PyIter_Next () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#16 0x00007f9159be56b4 in ?? () from /usr/lib/apache2/modules/mod_wsgi.so
#17 0x00007f9159bebb08 in ?? () from /usr/lib/apache2/modules/mod_wsgi.so
#18 0x00007f915ec77182 in start_thread (arg=0x7f9155d04700) at pthread_create.c:312
#19 0x00007f915e9a3fbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

in reply to:  15 comment:18 by Dirk Stöcker, 10 years ago

Replying to fschwarz:

It seems as if this could be mod_wsgi bug #42 though that needs to be confirmed (didn't look too much into this issue).

This looks more like a followup issue, but may be related. I think a new bug report is necessary. The debug information here should help them to find the issue.

comment:19 by Jun Omae, 10 years ago

Release Notes: modified (diff)

Committed in [13243] and merged to trunk in [13244].

Also, rarely reproduced on tracd with downloading the content immediately after tracd is launched. After [13243], works well on tracd.

#0  0x00007f5f392e117c in rep_read_contents (baton=0x7f5f380690d8,
    buf=0x7f5f3417b2e0 "ified] %(reldate)s\"\nmsgstr \"\"\n\n#: trac/wiki/templates/wiki_view.html:62\n#, python-format\nmsgid \"Last modified on %(date)s\"\nmsgstr \"\"\n\n#: trac/wiki/templates/wiki_view.html:66\n#, python-format\nmsgid \"T"...,
    len=0x7f5f3b8ee308) at /build/buildd/subversion-1.8.8/subversion/libsvn_fs_fs/fs_fs.c:5256
#1  0x00007f5f331654e1 in ?? () from /usr/lib/python2.7/dist-packages/libsvn/_core.so
#2  0x000000000052f936 in PyEval_EvalFrameEx ()
#3  0x000000000055c594 in PyEval_EvalCodeEx ()
#4  0x000000000052ca8d in PyEval_EvalFrameEx ()
#5  0x000000000055c594 in PyEval_EvalCodeEx ()
#6  0x000000000052ca8d in PyEval_EvalFrameEx ()
#7  0x000000000055c594 in PyEval_EvalCodeEx ()
#8  0x000000000052ca8d in PyEval_EvalFrameEx ()
#9  0x000000000055c594 in PyEval_EvalCodeEx ()
#10 0x000000000052ca8d in PyEval_EvalFrameEx ()
#11 0x000000000056cc54 in ?? ()
#12 0x000000000052c94e in PyEval_EvalFrameEx ()
#13 0x000000000052cf32 in PyEval_EvalFrameEx ()
#14 0x000000000052cf32 in PyEval_EvalFrameEx ()
#15 0x000000000052cf32 in PyEval_EvalFrameEx ()
#16 0x000000000056d0aa in ?? ()
#17 0x00000000004d9854 in ?? ()
#18 0x00000000004da20b in PyEval_CallObjectWithKeywords ()
#19 0x0000000000497c7d in PyInstance_New ()
#20 0x000000000052cc20 in PyEval_EvalFrameEx ()
#21 0x000000000052cf32 in PyEval_EvalFrameEx ()
#22 0x000000000056d0aa in ?? ()
#23 0x000000000052e1e6 in PyEval_EvalFrameEx ()
#24 0x000000000052cf32 in PyEval_EvalFrameEx ()
#25 0x000000000052cf32 in PyEval_EvalFrameEx ()
#26 0x000000000056d0aa in ?? ()
#27 0x00000000004d9854 in ?? ()
#28 0x00000000004da20b in PyEval_CallObjectWithKeywords ()
#29 0x00000000005872b2 in ?? ()
#30 0x00007f5f3e87f182 in start_thread (arg=0x7f5f3b8f0700) at pthread_create.c:312
#31 0x00007f5f3e5abfbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Last edited 10 years ago by Jun Omae (previous) (diff)

comment:20 by Jun Omae, 10 years ago

Resolution: fixed
Status: assignedclosed

in reply to:  19 ; comment:22 by Felix Schwarz, 10 years ago

Replying to jomae:

Also, rarely reproduced on tracd with downloading the content immediately after tracd is launched. After [13243], works well on tracd.

If you can see the issue with tracd this could hint at an issue within (the Python bindings for) subversion not mod_wsgi.

dstoecker: JOSM uses subversion as well, right?

Did anyone experience the issue when using no source control or something else than subversion?

in reply to:  22 comment:23 by Dirk Stöcker, 10 years ago

Replying to fschwarz:

If you can see the issue with tracd this could hint at an issue within (the Python bindings for) subversion not mod_wsgi.

After making the report I've also seen this. For complex software it's sometimes hard to find right upstream :-)

dstoecker: JOSM uses subversion as well, right?

Yes.

comment:24 by Felix Schwarz, 10 years ago

Can we use this ticket to gather more information about the crash so the problem can be submitted to the subversion developers? For example I think we should check if the problem happens on different versions of Python/subversion or if there are any specifics (e.g. regression introducted in svn …/already fixed in latest svn). I could help testing Fedora rawhide which is usually very, very close to the latest release versions. Also CentOS 5/6 might help to find out if this worked at some point.

(If you don't want to do that evaluation here please tell me so I can shut up - but I think it is important to do the checks before contacting the subversion guys.)

comment:25 by Dirk Stöcker, 10 years ago

Next try: svn-issue:4526

Additional information is always welcome, but I think the SVN guys probably know better what they need than we. Spending too much time searching information in advance takes away our power for Trac bugs.

Last edited 3 years ago by Jun Omae (previous) (diff)

in reply to:  25 comment:26 by Jun Omae, 10 years ago

Next try: svn-issue:4526

I've posted minimum steps to reproduce it.

Last edited 3 years ago by Jun Omae (previous) (diff)

comment:27 by Jun Omae, 10 years ago

Keywords: svn17 svn18 added

This issue can be reproduced with Subversion 1.7.x and 1.8.x but cannot be with 1.6.x.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Jun Omae.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Jun Omae to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.