Context Navigation

Modify ↓

#8675 closed defect (fixed)

Content-Length is incorrect when downloading in plain-text of wiki page with non-ascii characters

Reported by:	Jun Omae <jun66j5@…>	Owned by:	Christian Boos
Priority:	normal	Milestone:	0.11.6
Component:	rendering	Version:	0.12dev
Severity:	normal	Keywords:	unicode http11
Cc:		Branch:
Release Notes:
API Changes:
Internal Changes:

Description

I use Trac 0.11.5 and 0.12-r8583. When I download in plain-text of wiki page, the text file is broken. Mimeview.send_converted() sends length of unicode string in Content-Length header.

Mimeview.convert_content() should return a string object instead of a unicode object.

trunk/trac/mimeview/api.py

             output = converter.convert_content(req, mimetype, content, ck)
             if not output:
                 continue
+            return (output[0], output[1], ext)
+            content = output[0]
+            if isinstance(content, unicode):
+                content = content.encode('utf-8')
+            return (content, output[1], ext)
         raise TracError(_('No available MIME conversions from %(old)s to '
                           '%(new)s', old=mimetype, new=key))

Attachments (0)

Change History (8)

comment:1 by Christian Boos, 16 years ago

Milestone:	→ 0.11.6
Owner:	set to Christian Boos

Oh, this is bad…

comment:2 by Christian Boos, 16 years ago

Can you please try that patch instead?

trac/mimeview/api.py

diff --git a/trac/mimeview/api.py b/trac/mimeview/api.py

-              a
         from trac.web import RequestDone
         content, output_type, ext = self.convert_content(req, in_type,
                                                          content, selector)
+        if isinstance(content, unicode):
+            content = content.encode('utf-8')
         req.send_response(200)
         req.send_header('Content-Type', output_type)
         req.send_header('Content-Length', len(content))

I think we need an API change in 0.12 to refuse taking unicode in Request.write, as when this happens, the Content-Length is likely already wrong.

comment:3 by Jun Omae <jun66j5@…>, 16 years ago

OK, I tried your patch. This problem is fixed with your patch also.

follow-up: 5 comment:4 by Christian Boos, 16 years ago

Fine, I think it's preferable to do the conversion at the latest point possible.

Patch committed in r8608.

So what about trunk and making Request.write stricter?

in reply to: 4 ; follow-up: 6 comment:5 by Remy Blank, 16 years ago

Replying to cboos:

So what about trunk and making Request.write stricter?

Do you have an idea how many plugins use Request.write() and pass a unicode string? That's probably difficult to grep from the sources…

How about changing Request.write() as follows:

If data is a unicode string, convert it to UTF-8 at the beginning of the function.

Get the Content-Length from the headers.
- If it is not set, set it to the length of data, or do nothing if the headers have already been sent.
- If it is set, check its value. If it is incorrect, fix it, or raise an exception if the headers have already been sent.

That should still allow the current usage with unicode strings, while catching bad Content-Length calculations.

in reply to: 5 ; follow-up: 7 comment:6 by Christian Boos, 16 years ago

Replying to rblank:

Replying to cboos:

So what about trunk and making Request.write stricter?

Do you have an idea how many plugins use Request.write() and pass a unicode string? That's probably difficult to grep from the sources…

How about changing Request.write() as follows:

I'd agree with your suggestion, but this would prevent write to be called multiple times.

I've currently done this:

trac/web/api.py

             ctpos = value.find('charset=')
             if ctpos >= 0:
                 self._outcharset = value[ctpos + 8:].strip()
+        elif name.lower() == 'content-length':
+            self._content_length = int(value)
         self._outheaders.append((name, unicode(value).encode('utf-8')))
     def end_headers(self):
 …
     def write(self, data):
         """Write the given data to the response body.
+        `data` can be either a `str` or an `unicode` string.
+        If it's the latter, the unicode string will be encoded
+        using the charset specified in the ''Content-Type'' header
+        `data` *must* be a `str` string, encoded with the charset
+        which has been specified in the ''Content-Type'' header
         or 'utf-8' otherwise.
+        Note that the ''Content-Length'' header must have been specified.
+        Its value either corresponds to the length of `data`, or, if there
+        are multiple calls to `write`, to the cumulated length of the `data`
+        arguments.
         """
         if not self._write:
             self.end_headers()
+        if not hasattr(self, '_content_length'):
+            raise RuntimeError("No Content-Length header set")
         if isinstance(data, unicode):
             data = data.encode(self._outcharset or 'utf-8')
+            raise ValueError("Can't send unicode content")
         self._write(data)
     # Internal methods

As you can see, there was already a data.encode() when given unicode input.

While doing the above changes, I also thought that it would be better to ensure we send proper output, exactly the way you suggested, the only thing that stopped me from doing so was that I couldn't see a way how to do this while still allowing multiple writes.

What we could do is to prevent multiple calls to write and add a new method write_chunk which could allow partial writes, having a clear documentation that it should be used with caution w.r.t. to the content length.

in reply to: 6 comment:7 by Remy Blank, 16 years ago

Replying to cboos:

I'd agree with your suggestion, but this would prevent write to be called multiple times.

Oh, right, I hadn't thought of that.

I've currently done this:

Looks good.

What we could do is to prevent multiple calls to write and add a new method write_chunk which could allow partial writes, having a clear documentation that it should be used with caution w.r.t. to the content length.

I wouldn't do that. I think it's sufficient to have a clear documentation of write(), as you suggest, and a check that the content length has been set. Adding write_chunk() would only complicate the interface.

comment:8 by Christian Boos, 16 years ago

Component:	wiki system → rendering
Keywords:	unicode http11 added
Resolution:	→ fixed
Status:	new → closed

Ok, so patch from comment:6 committed in r8608 and API change documented.

Modify Ticket

Change Properties

Summary:
Description:	I use Trac 0.11.5 and 0.12-r8583. When I download in plain-text of wiki page, the text file is broken. `Mimeview.send_converted()` sends length of unicode string in Content-Length header. `Mimeview.convert_content()` should return a string object instead of a unicode object. {{{ #!diff Index: trunk/trac/mimeview/api.py =================================================================== --- trunk/trac/mimeview/api.py (revision 8583) +++ trunk/trac/mimeview/api.py (working copy) @@ -650,7 +650,10 @@ output = converter.convert_content(req, mimetype, content, ck) if not output: continue - return (output[0], output[1], ext) + content = output[0] + if isinstance(content, unicode): + content = content.encode('utf-8') + return (content, output[1], ext) raise TracError(_('No available MIME conversions from %(old)s to ' '%(new)s', old=mimetype, new=key)) }}} You may use WikiFormatting here.
Type:		Priority:
Milestone:		Component:
Version:		Severity:
Keywords:		Cc:	Set your email in Preferences
Branch:
Release Notes:
API Changes:
Internal Changes:

Action

leave as closed The owner will remain Christian Boos.

reopen The resolution will be deleted. Next status will be 'reopened'.

change ownership to The owner will be changed from Christian Boos to the specified user.

Add Comment

Your email or username:

E-mail address and name can be saved in the Preferences .

You may use WikiFormatting here.

Attachments ↑ Description ↑

Note: See TracTickets for help on using tickets.

Download in other formats: