Opened 4 years ago
Closed 4 years ago
#8675 closed defect (fixed)
Content-Length is incorrect when downloading in plain-text of wiki page with non-ascii characters
| Reported by: | Jun Omae <jun66j5@…> | Owned by: | cboos |
|---|---|---|---|
| Priority: | normal | Milestone: | 0.11.6 |
| Component: | rendering | Version: | 0.12dev |
| Severity: | normal | Keywords: | unicode http11 |
| Cc: | |||
| Release Notes: | |||
| API Changes: | |||
Description
I use Trac 0.11.5 and 0.12-r8583. When I download in plain-text of wiki page, the text file is broken. Mimeview.send_converted() sends length of unicode string in Content-Length header.
Mimeview.convert_content() should return a string object instead of a unicode object.
-
trunk/trac/mimeview/api.py
650 650 output = converter.convert_content(req, mimetype, content, ck) 651 651 if not output: 652 652 continue 653 return (output[0], output[1], ext) 653 content = output[0] 654 if isinstance(content, unicode): 655 content = content.encode('utf-8') 656 return (content, output[1], ext) 654 657 raise TracError(_('No available MIME conversions from %(old)s to ' 655 658 '%(new)s', old=mimetype, new=key)) 656 659
Attachments (0)
Change History (8)
comment:1 Changed 4 years ago by cboos
- Milestone set to 0.11.6
- Owner set to cboos
comment:2 Changed 4 years ago by cboos
Can you please try that patch instead?
-
trac/mimeview/api.py
diff --git a/trac/mimeview/api.py b/trac/mimeview/api.py
a b 934 934 from trac.web import RequestDone 935 935 content, output_type, ext = self.convert_content(req, in_type, 936 936 content, selector) 937 if isinstance(content, unicode): 938 content = content.encode('utf-8') 937 939 req.send_response(200) 938 940 req.send_header('Content-Type', output_type) 939 941 req.send_header('Content-Length', len(content))
I think we need an API change in 0.12 to refuse taking unicode in Request.write, as when this happens, the Content-Length is likely already wrong.
comment:3 Changed 4 years ago by Jun Omae <jun66j5@…>
OK, I tried your patch. This problem is fixed with your patch also.
comment:4 follow-up: ↓ 5 Changed 4 years ago by cboos
Fine, I think it's preferable to do the conversion at the latest point possible.
Patch committed in r8608.
So what about trunk and making Request.write stricter?
comment:5 in reply to: ↑ 4 ; follow-up: ↓ 6 Changed 4 years ago by rblank
Replying to cboos:
So what about trunk and making Request.write stricter?
Do you have an idea how many plugins use Request.write() and pass a unicode string? That's probably difficult to grep from the sources…
How about changing Request.write() as follows:
- If data is a unicode string, convert it to UTF-8 at the beginning of the function.
- Get the Content-Length from the headers.
- If it is not set, set it to the length of data, or do nothing if the headers have already been sent.
- If it is set, check its value. If it is incorrect, fix it, or raise an exception if the headers have already been sent.
That should still allow the current usage with unicode strings, while catching bad Content-Length calculations.
comment:6 in reply to: ↑ 5 ; follow-up: ↓ 7 Changed 4 years ago by cboos
Replying to rblank:
Replying to cboos:
So what about trunk and making Request.write stricter?
Do you have an idea how many plugins use Request.write() and pass a unicode string? That's probably difficult to grep from the sources…
How about changing Request.write() as follows:
I'd agree with your suggestion, but this would prevent write to be called multiple times.
I've currently done this:
-
trac/web/api.py
259 259 ctpos = value.find('charset=') 260 260 if ctpos >= 0: 261 261 self._outcharset = value[ctpos + 8:].strip() 262 elif name.lower() == 'content-length': 263 self._content_length = int(value) 262 264 self._outheaders.append((name, unicode(value).encode('utf-8'))) 263 265 264 266 def end_headers(self): … … 456 458 def write(self, data): 457 459 """Write the given data to the response body. 458 460 459 `data` can be either a `str` or an `unicode` string. 460 If it's the latter, the unicode string will be encoded 461 using the charset specified in the ''Content-Type'' header 461 `data` *must* be a `str` string, encoded with the charset 462 which has been specified in the ''Content-Type'' header 462 463 or 'utf-8' otherwise. 464 465 Note that the ''Content-Length'' header must have been specified. 466 Its value either corresponds to the length of `data`, or, if there 467 are multiple calls to `write`, to the cumulated length of the `data` 468 arguments. 463 469 """ 464 470 if not self._write: 465 471 self.end_headers() 472 if not hasattr(self, '_content_length'): 473 raise RuntimeError("No Content-Length header set") 466 474 if isinstance(data, unicode): 467 data = data.encode(self._outcharset or 'utf-8')475 raise ValueError("Can't send unicode content") 468 476 self._write(data) 469 477 470 478 # Internal methods
As you can see, there was already a data.encode() when given unicode input.
While doing the above changes, I also thought that it would be better to ensure we send proper output, exactly the way you suggested, the only thing that stopped me from doing so was that I couldn't see a way how to do this while still allowing multiple writes.
What we could do is to prevent multiple calls to write and add a new method write_chunk which could allow partial writes, having a clear documentation that it should be used with caution w.r.t. to the content length.
comment:7 in reply to: ↑ 6 Changed 4 years ago by rblank
Replying to cboos:
I'd agree with your suggestion, but this would prevent write to be called multiple times.
Oh, right, I hadn't thought of that.
I've currently done this:
Looks good.
What we could do is to prevent multiple calls to write and add a new method write_chunk which could allow partial writes, having a clear documentation that it should be used with caution w.r.t. to the content length.
I wouldn't do that. I think it's sufficient to have a clear documentation of write(), as you suggest, and a check that the content length has been set. Adding write_chunk() would only complicate the interface.
comment:8 Changed 4 years ago by cboos
- Component changed from wiki system to rendering
- Keywords unicode http11 added
- Resolution set to fixed
- Status changed from new to closed
Ok, so patch from comment:6 committed in r8608 and API change documented.



Oh, this is bad…