Edgewall Software
Modify

Opened 5 years ago

Closed 5 years ago

Last modified 10 months ago

#8675 closed defect (fixed)

Content-Length is incorrect when downloading in plain-text of wiki page with non-ascii characters

Reported by: Jun Omae <jun66j5@…> Owned by: cboos
Priority: normal Milestone: 0.11.6
Component: rendering Version: 0.12dev
Severity: normal Keywords: unicode http11
Cc:
Release Notes:
API Changes:

Description

I use Trac 0.11.5 and 0.12-r8583. When I download in plain-text of wiki page, the text file is broken. Mimeview.send_converted() sends length of unicode string in Content-Length header.

Mimeview.convert_content() should return a string object instead of a unicode object.

  • trunk/trac/mimeview/api.py

     
    650650            output = converter.convert_content(req, mimetype, content, ck) 
    651651            if not output: 
    652652                continue 
    653             return (output[0], output[1], ext) 
     653            content = output[0] 
     654            if isinstance(content, unicode): 
     655                content = content.encode('utf-8') 
     656            return (content, output[1], ext) 
    654657        raise TracError(_('No available MIME conversions from %(old)s to ' 
    655658                          '%(new)s', old=mimetype, new=key)) 
    656659 

Attachments (0)

Change History (8)

comment:1 Changed 5 years ago by cboos

  • Milestone set to 0.11.6
  • Owner set to cboos

Oh, this is bad…

comment:2 Changed 5 years ago by cboos

Can you please try that patch instead?

  • trac/mimeview/api.py

    diff --git a/trac/mimeview/api.py b/trac/mimeview/api.py
    a b  
    934934        from trac.web import RequestDone 
    935935        content, output_type, ext = self.convert_content(req, in_type, 
    936936                                                         content, selector) 
     937        if isinstance(content, unicode): 
     938            content = content.encode('utf-8') 
    937939        req.send_response(200) 
    938940        req.send_header('Content-Type', output_type) 
    939941        req.send_header('Content-Length', len(content)) 

I think we need an API change in 0.12 to refuse taking unicode in Request.write, as when this happens, the Content-Length is likely already wrong.

comment:3 Changed 5 years ago by Jun Omae <jun66j5@…>

OK, I tried your patch. This problem is fixed with your patch also.

comment:4 follow-up: Changed 5 years ago by cboos

Fine, I think it's preferable to do the conversion at the latest point possible.

Patch committed in r8608.

So what about trunk and making Request.write stricter?

comment:5 in reply to: ↑ 4 ; follow-up: Changed 5 years ago by rblank

Replying to cboos:

So what about trunk and making Request.write stricter?

Do you have an idea how many plugins use Request.write() and pass a unicode string? That's probably difficult to grep from the sources…

How about changing Request.write() as follows:

  • If data is a unicode string, convert it to UTF-8 at the beginning of the function.
  • Get the Content-Length from the headers.
    • If it is not set, set it to the length of data, or do nothing if the headers have already been sent.
    • If it is set, check its value. If it is incorrect, fix it, or raise an exception if the headers have already been sent.

That should still allow the current usage with unicode strings, while catching bad Content-Length calculations.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 5 years ago by cboos

Replying to rblank:

Replying to cboos:

So what about trunk and making Request.write stricter?

Do you have an idea how many plugins use Request.write() and pass a unicode string? That's probably difficult to grep from the sources…

How about changing Request.write() as follows:

I'd agree with your suggestion, but this would prevent write to be called multiple times.

I've currently done this:

  • trac/web/api.py

     
    259259            ctpos = value.find('charset=') 
    260260            if ctpos >= 0: 
    261261                self._outcharset = value[ctpos + 8:].strip() 
     262        elif name.lower() == 'content-length': 
     263            self._content_length = int(value) 
    262264        self._outheaders.append((name, unicode(value).encode('utf-8'))) 
    263265 
    264266    def end_headers(self): 
     
    456458    def write(self, data): 
    457459        """Write the given data to the response body. 
    458460 
    459         `data` can be either a `str` or an `unicode` string. 
    460         If it's the latter, the unicode string will be encoded 
    461         using the charset specified in the ''Content-Type'' header 
     461        `data` *must* be a `str` string, encoded with the charset 
     462        which has been specified in the ''Content-Type'' header 
    462463        or 'utf-8' otherwise. 
     464 
     465        Note that the ''Content-Length'' header must have been specified. 
     466        Its value either corresponds to the length of `data`, or, if there 
     467        are multiple calls to `write`, to the cumulated length of the `data` 
     468        arguments. 
    463469        """ 
    464470        if not self._write: 
    465471            self.end_headers() 
     472        if not hasattr(self, '_content_length'): 
     473            raise RuntimeError("No Content-Length header set") 
    466474        if isinstance(data, unicode): 
    467             data = data.encode(self._outcharset or 'utf-8') 
     475            raise ValueError("Can't send unicode content") 
    468476        self._write(data) 
    469477 
    470478    # Internal methods 

As you can see, there was already a data.encode() when given unicode input.

While doing the above changes, I also thought that it would be better to ensure we send proper output, exactly the way you suggested, the only thing that stopped me from doing so was that I couldn't see a way how to do this while still allowing multiple writes.

What we could do is to prevent multiple calls to write and add a new method write_chunk which could allow partial writes, having a clear documentation that it should be used with caution w.r.t. to the content length.

comment:7 in reply to: ↑ 6 Changed 5 years ago by rblank

Replying to cboos:

I'd agree with your suggestion, but this would prevent write to be called multiple times.

Oh, right, I hadn't thought of that.

I've currently done this:

Looks good.

What we could do is to prevent multiple calls to write and add a new method write_chunk which could allow partial writes, having a clear documentation that it should be used with caution w.r.t. to the content length.

I wouldn't do that. I think it's sufficient to have a clear documentation of write(), as you suggest, and a check that the content length has been set. Adding write_chunk() would only complicate the interface.

comment:8 Changed 5 years ago by cboos

  • Component changed from wiki system to rendering
  • Keywords unicode http11 added
  • Resolution set to fixed
  • Status changed from new to closed

Ok, so patch from comment:6 committed in r8608 and API change documented.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed The owner will remain cboos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from cboos to the specified user.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.