Edgewall Software
Modify

Opened 15 years ago

Closed 15 years ago

Last modified 11 years ago

#8675 closed defect (fixed)

Content-Length is incorrect when downloading in plain-text of wiki page with non-ascii characters

Reported by: Jun Omae <jun66j5@…> Owned by: Christian Boos
Priority: normal Milestone: 0.11.6
Component: rendering Version: 0.12dev
Severity: normal Keywords: unicode http11
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description

I use Trac 0.11.5 and 0.12-r8583. When I download in plain-text of wiki page, the text file is broken. Mimeview.send_converted() sends length of unicode string in Content-Length header.

Mimeview.convert_content() should return a string object instead of a unicode object.

  • trunk/trac/mimeview/api.py

     
    650650            output = converter.convert_content(req, mimetype, content, ck)
    651651            if not output:
    652652                continue
    653             return (output[0], output[1], ext)
     653            content = output[0]
     654            if isinstance(content, unicode):
     655                content = content.encode('utf-8')
     656            return (content, output[1], ext)
    654657        raise TracError(_('No available MIME conversions from %(old)s to '
    655658                          '%(new)s', old=mimetype, new=key))
    656659

Attachments (0)

Change History (8)

comment:1 by Christian Boos, 15 years ago

Milestone: 0.11.6
Owner: set to Christian Boos

Oh, this is bad…

comment:2 by Christian Boos, 15 years ago

Can you please try that patch instead?

  • trac/mimeview/api.py

    diff --git a/trac/mimeview/api.py b/trac/mimeview/api.py
    a b  
    934934        from trac.web import RequestDone
    935935        content, output_type, ext = self.convert_content(req, in_type,
    936936                                                         content, selector)
     937        if isinstance(content, unicode):
     938            content = content.encode('utf-8')
    937939        req.send_response(200)
    938940        req.send_header('Content-Type', output_type)
    939941        req.send_header('Content-Length', len(content))

I think we need an API change in 0.12 to refuse taking unicode in Request.write, as when this happens, the Content-Length is likely already wrong.

comment:3 by Jun Omae <jun66j5@…>, 15 years ago

OK, I tried your patch. This problem is fixed with your patch also.

comment:4 by Christian Boos, 15 years ago

Fine, I think it's preferable to do the conversion at the latest point possible.

Patch committed in r8608.

So what about trunk and making Request.write stricter?

in reply to:  4 ; comment:5 by Remy Blank, 15 years ago

Replying to cboos:

So what about trunk and making Request.write stricter?

Do you have an idea how many plugins use Request.write() and pass a unicode string? That's probably difficult to grep from the sources…

How about changing Request.write() as follows:

  • If data is a unicode string, convert it to UTF-8 at the beginning of the function.
  • Get the Content-Length from the headers.
    • If it is not set, set it to the length of data, or do nothing if the headers have already been sent.
    • If it is set, check its value. If it is incorrect, fix it, or raise an exception if the headers have already been sent.

That should still allow the current usage with unicode strings, while catching bad Content-Length calculations.

in reply to:  5 ; comment:6 by Christian Boos, 15 years ago

Replying to rblank:

Replying to cboos:

So what about trunk and making Request.write stricter?

Do you have an idea how many plugins use Request.write() and pass a unicode string? That's probably difficult to grep from the sources…

How about changing Request.write() as follows:

I'd agree with your suggestion, but this would prevent write to be called multiple times.

I've currently done this:

  • trac/web/api.py

     
    259259            ctpos = value.find('charset=')
    260260            if ctpos >= 0:
    261261                self._outcharset = value[ctpos + 8:].strip()
     262        elif name.lower() == 'content-length':
     263            self._content_length = int(value)
    262264        self._outheaders.append((name, unicode(value).encode('utf-8')))
    263265
    264266    def end_headers(self):
     
    456458    def write(self, data):
    457459        """Write the given data to the response body.
    458460
    459         `data` can be either a `str` or an `unicode` string.
    460         If it's the latter, the unicode string will be encoded
    461         using the charset specified in the ''Content-Type'' header
     461        `data` *must* be a `str` string, encoded with the charset
     462        which has been specified in the ''Content-Type'' header
    462463        or 'utf-8' otherwise.
     464
     465        Note that the ''Content-Length'' header must have been specified.
     466        Its value either corresponds to the length of `data`, or, if there
     467        are multiple calls to `write`, to the cumulated length of the `data`
     468        arguments.
    463469        """
    464470        if not self._write:
    465471            self.end_headers()
     472        if not hasattr(self, '_content_length'):
     473            raise RuntimeError("No Content-Length header set")
    466474        if isinstance(data, unicode):
    467             data = data.encode(self._outcharset or 'utf-8')
     475            raise ValueError("Can't send unicode content")
    468476        self._write(data)
    469477
    470478    # Internal methods

As you can see, there was already a data.encode() when given unicode input.

While doing the above changes, I also thought that it would be better to ensure we send proper output, exactly the way you suggested, the only thing that stopped me from doing so was that I couldn't see a way how to do this while still allowing multiple writes.

What we could do is to prevent multiple calls to write and add a new method write_chunk which could allow partial writes, having a clear documentation that it should be used with caution w.r.t. to the content length.

in reply to:  6 comment:7 by Remy Blank, 15 years ago

Replying to cboos:

I'd agree with your suggestion, but this would prevent write to be called multiple times.

Oh, right, I hadn't thought of that.

I've currently done this:

Looks good.

What we could do is to prevent multiple calls to write and add a new method write_chunk which could allow partial writes, having a clear documentation that it should be used with caution w.r.t. to the content length.

I wouldn't do that. I think it's sufficient to have a clear documentation of write(), as you suggest, and a check that the content length has been set. Adding write_chunk() would only complicate the interface.

comment:8 by Christian Boos, 15 years ago

Component: wiki systemrendering
Keywords: unicode http11 added
Resolution: fixed
Status: newclosed

Ok, so patch from comment:6 committed in r8608 and API change documented.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.