#8675 closed defect (fixed)
Content-Length is incorrect when downloading in plain-text of wiki page with non-ascii characters
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | normal | Milestone: | 0.11.6 |
Component: | rendering | Version: | 0.12dev |
Severity: | normal | Keywords: | unicode http11 |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
I use Trac 0.11.5 and 0.12-r8583. When I download in plain-text of wiki page, the text file is broken. Mimeview.send_converted()
sends length of unicode string in Content-Length header.
Mimeview.convert_content()
should return a string object instead of a unicode object.
-
trunk/trac/mimeview/api.py
650 650 output = converter.convert_content(req, mimetype, content, ck) 651 651 if not output: 652 652 continue 653 return (output[0], output[1], ext) 653 content = output[0] 654 if isinstance(content, unicode): 655 content = content.encode('utf-8') 656 return (content, output[1], ext) 654 657 raise TracError(_('No available MIME conversions from %(old)s to ' 655 658 '%(new)s', old=mimetype, new=key)) 656 659
Attachments (0)
Change History (8)
comment:1 by , 15 years ago
Milestone: | → 0.11.6 |
---|---|
Owner: | set to |
comment:2 by , 15 years ago
Can you please try that patch instead?
-
trac/mimeview/api.py
diff --git a/trac/mimeview/api.py b/trac/mimeview/api.py
a b 934 934 from trac.web import RequestDone 935 935 content, output_type, ext = self.convert_content(req, in_type, 936 936 content, selector) 937 if isinstance(content, unicode): 938 content = content.encode('utf-8') 937 939 req.send_response(200) 938 940 req.send_header('Content-Type', output_type) 939 941 req.send_header('Content-Length', len(content))
I think we need an API change in 0.12 to refuse taking unicode in Request.write
, as when this happens, the Content-Length is likely already wrong.
follow-up: 5 comment:4 by , 15 years ago
Fine, I think it's preferable to do the conversion at the latest point possible.
Patch committed in r8608.
So what about trunk and making Request.write
stricter?
follow-up: 6 comment:5 by , 15 years ago
Replying to cboos:
So what about trunk and making
Request.write
stricter?
Do you have an idea how many plugins use Request.write()
and pass a unicode string? That's probably difficult to grep from the sources…
How about changing Request.write()
as follows:
- If
data
is a unicode string, convert it to UTF-8 at the beginning of the function.
- Get the
Content-Length
from the headers.- If it is not set, set it to the length of
data
, or do nothing if the headers have already been sent. - If it is set, check its value. If it is incorrect, fix it, or raise an exception if the headers have already been sent.
- If it is not set, set it to the length of
That should still allow the current usage with unicode strings, while catching bad Content-Length
calculations.
follow-up: 7 comment:6 by , 15 years ago
Replying to rblank:
Replying to cboos:
So what about trunk and making
Request.write
stricter?Do you have an idea how many plugins use
Request.write()
and pass a unicode string? That's probably difficult to grep from the sources…How about changing
Request.write()
as follows:
I'd agree with your suggestion, but this would prevent write
to be called multiple times.
I've currently done this:
-
trac/web/api.py
259 259 ctpos = value.find('charset=') 260 260 if ctpos >= 0: 261 261 self._outcharset = value[ctpos + 8:].strip() 262 elif name.lower() == 'content-length': 263 self._content_length = int(value) 262 264 self._outheaders.append((name, unicode(value).encode('utf-8'))) 263 265 264 266 def end_headers(self): … … 456 458 def write(self, data): 457 459 """Write the given data to the response body. 458 460 459 `data` can be either a `str` or an `unicode` string. 460 If it's the latter, the unicode string will be encoded 461 using the charset specified in the ''Content-Type'' header 461 `data` *must* be a `str` string, encoded with the charset 462 which has been specified in the ''Content-Type'' header 462 463 or 'utf-8' otherwise. 464 465 Note that the ''Content-Length'' header must have been specified. 466 Its value either corresponds to the length of `data`, or, if there 467 are multiple calls to `write`, to the cumulated length of the `data` 468 arguments. 463 469 """ 464 470 if not self._write: 465 471 self.end_headers() 472 if not hasattr(self, '_content_length'): 473 raise RuntimeError("No Content-Length header set") 466 474 if isinstance(data, unicode): 467 data = data.encode(self._outcharset or 'utf-8')475 raise ValueError("Can't send unicode content") 468 476 self._write(data) 469 477 470 478 # Internal methods
As you can see, there was already a data.encode()
when given unicode
input.
While doing the above changes, I also thought that it would be better to ensure we send proper output, exactly the way you suggested, the only thing that stopped me from doing so was that I couldn't see a way how to do this while still allowing multiple writes.
What we could do is to prevent multiple calls to write
and add a new method write_chunk
which could allow partial writes, having a clear documentation that it should be used with caution w.r.t. to the content length.
comment:7 by , 15 years ago
Replying to cboos:
I'd agree with your suggestion, but this would prevent
write
to be called multiple times.
Oh, right, I hadn't thought of that.
I've currently done this:
Looks good.
What we could do is to prevent multiple calls to
write
and add a new methodwrite_chunk
which could allow partial writes, having a clear documentation that it should be used with caution w.r.t. to the content length.
I wouldn't do that. I think it's sufficient to have a clear documentation of write()
, as you suggest, and a check that the content length has been set. Adding write_chunk()
would only complicate the interface.
comment:8 by , 15 years ago
Component: | wiki system → rendering |
---|---|
Keywords: | unicode http11 added |
Resolution: | → fixed |
Status: | new → closed |
Ok, so patch from comment:6 committed in r8608 and API change documented.
Oh, this is bad…