#7959 closed defect (fixed)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 4: ordinal not in range(128)
Reported by: | rupert.thurner | Owned by: | Christian Boos |
---|---|---|---|
Priority: | high | Milestone: | 0.11.3 |
Component: | general | Version: | 0.11.2.1 |
Severity: | critical | Keywords: | genshi |
Cc: | trac-ja@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description (last modified by )
… as there are so many unicode errors, does it make sense to do a guaranteed encoding at some point, like described in http://code.activestate.com/recipes/466341/ ?
Example error: One of my test SVN repositories was deliberately created in a way that its name contains cyrillic characters, e.g. /home/svn/борис. Accessing this with Apache's mod_dav_svn works, but trac gives the following error:
Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/trac/web/api.py", line 339, in send_error 'text/html') File "/usr/lib/python2.5/site-packages/trac/web/chrome.py", line 715, in render_template return stream.render(method, doctype=doctype) File "/var/lib/python-support/python2.5/genshi/core.py", line 179, in render return encode(generator, method=method, encoding=encoding, out=out) File "/var/lib/python-support/python2.5/genshi/output.py", line 60, in encode return _encode(u''.join(list(iterator))) File "/var/lib/python-support/python2.5/genshi/output.py", line 311, in __call__ for kind, data, pos in stream: File "/var/lib/python-support/python2.5/genshi/output.py", line 753, in __call__ for kind, data, pos in stream: File "/var/lib/python-support/python2.5/genshi/output.py", line 592, in __call__ for kind, data, pos in stream: File "/var/lib/python-support/python2.5/genshi/output.py", line 707, in __call__ text = mjoin(textbuf, escape_quotes=False) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 49: ordinal not in range(128)
(from comment:8)
Attachments (0)
Change History (18)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
I'm not sure what you mean with "guaranteed encoding", but Trac already has to_unicode()
, which should never fail (but it can loose information, as it will use the 'replace'
strategy if it encounters a decoding exception).
There's one other source of decoding errors that I have seen: when logging exceptions to the log, with a construct like (e being an exception instance):
self.log.error('Processing error: %s' % e)
Those should obviously be fixed.
Could you please explain more precisely what you mean, maybe with an example?
comment:3 by , 16 years ago
one example is th:TracHoursPlugin. install and click on the link "hours" (how the plugin gets called) throws an unicode error, suggesting to report it to the trac project.
searching for information on it lead to TracUnicode, and TracDev/UnicodeGuidelines. i did not really understand how to find the root cause of the exception, or even better, how to fix it.
comment:4 by , 16 years ago
Ok, I see. Then I'm with eblot on this one: the plugin needs fixing, not Trac. -1 from me as well.
follow-up: 6 comment:5 by , 16 years ago
why the error message does not say its the plugin, and the stack trace does not point to the location? or what part of the documentation gives a hint on how to fix it? how would you fix it?
comment:6 by , 16 years ago
Replying to anonymous:
why the error message does not say its the plugin, and the stack trace does not point to the location? or what part of the documentation gives a hint on how to fix it? how would you fix it?
Please copy 'n paste the exact traceback.
comment:7 by , 16 years ago
File "/opt/csw/lib/python/site-packages/Trac-0.11.2.1-py2.5.egg/trac/web/main.py", line 432, in _dispatch_request dispatcher.dispatch(req) File "/opt/csw/lib/python/site-packages/Trac-0.11.2.1-py2.5.egg/trac/web/main.py", line 226, in dispatch data, content_type) File "/opt/csw/lib/python/site-packages/Trac-0.11.2.1-py2.5.egg/trac/web/chrome.py", line 719, in render_template return stream.render(method, doctype=doctype) File "build/bdist.solaris-2.10-sun4v/egg/genshi/core.py", line 179, in renderFile "build/bdist.solaris-2.10-sun4v/egg/genshi/output.py", line 60, in encodeFile "build/bdist.solaris-2.10-sun4v/egg/genshi/output.py", line 311, in __call__File "build/bdist.solaris-2.10-sun4v/egg/genshi/output.py", line 753, in __call__File "build/bdist.solaris-2.10-sun4v/egg/genshi/output.py", line 592, in __call__File "build/bdist.solaris-2.10-sun4v/egg/genshi/output.py", line 707, in __call__ System Information: Trac: 0.11.2.1 Python: 2.5.1 (r251:54863, Nov 3 2007, 02:54:52) [C] setuptools: 0.6c9 SQLite: 3.2.2 pysqlite: 2.3.5 Genshi: 0.5.1 Pygments: 1.0 Subversion: 1.4.5 (r25188) jQuery: 1.2.6
follow-up: 9 comment:8 by , 16 years ago
I've got a very similar issue. One of my test SVN repositories was deliberately created in a way that its name contains cyrillic characters, e.g. /home/svn/борис. Accessing this with Apache's mod_dav_svn works, but trac gives the following error:
Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/trac/web/api.py", line 339, in send_error 'text/html') File "/usr/lib/python2.5/site-packages/trac/web/chrome.py", line 715, in render_template return stream.render(method, doctype=doctype) File "/var/lib/python-support/python2.5/genshi/core.py", line 179, in render return encode(generator, method=method, encoding=encoding, out=out) File "/var/lib/python-support/python2.5/genshi/output.py", line 60, in encode return _encode(u''.join(list(iterator))) File "/var/lib/python-support/python2.5/genshi/output.py", line 311, in __call__ for kind, data, pos in stream: File "/var/lib/python-support/python2.5/genshi/output.py", line 753, in __call__ for kind, data, pos in stream: File "/var/lib/python-support/python2.5/genshi/output.py", line 592, in __call__ for kind, data, pos in stream: File "/var/lib/python-support/python2.5/genshi/output.py", line 707, in __call__ text = mjoin(textbuf, escape_quotes=False) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 49: ordinal not in range(128)
This directory is referenced in [trac]/repository_dir. It is encoded in UTF-8 which is also the $LANG setting of the environment trac runs in, so it should work.
follow-up: 13 comment:9 by , 16 years ago
Milestone: | → 0.11.3 |
---|---|
Owner: | set to |
Priority: | normal → high |
Severity: | normal → critical |
Status: | new → assigned |
Replying to josef:
... _encode(u''.join(list(iterator))) ...
Wait wait …
Oh damn, my r7295 change was dead wrong. The else:
part should have been kept instead. In addition to make Genshi rendering faster and more memory efficient, this fix should also avoid this kind of exception, as the encoding should be done in 'replace' mode.
Besides, using /home/svn/борис
for the repository_dir
should eventually be supported.
So we have 3 unicode related issues discussed in this ticket:
- comment:5 's
self.log.error('Processing error: %s' % e)
style of errors - see #7935 - comment:8 's request for supporting non-ascii repository_dirs - probably worth another ticket
- the whole
... genshi/output.py
class of unicode errors (#3908, #6932, probably lots of others) which can be avoided by fixing r7295. Let's focus this ticket on this.
follow-up: 11 comment:10 by , 16 years ago
I'm also having problems with send_error as josef but with a plugin I'm writing. Is there any quick fix for this?
I reverted r7295 but it didn't help (why would it help? the cStringIO would not help the unicode errors).
comment:11 by , 16 years ago
Replying to anonymous:
I'm also having problems with send_error as josef but with a plugin I'm writing. Is there any quick fix for this?
If you're writing a plugin, your responsible for exchanging only unicode
string content with Trac, see TracDev/UnicodeGuidelines#TracboundariesforUnicodeData for more.
I reverted r7295 but it didn't help (why would it help? the cStringIO would not help the unicode errors).
When given a file-like object to the out
parameter, Genshi will convert unicode strings to utf-8 in 'replace' mode. Ah, yes, that won't help if the string is not unicode
in the first place, as there could be a UnicodeDecodeError#encode first.
comment:12 by , 16 years ago
Milestone: | 0.11.4 → 0.11.3 |
---|
comment:13 by , 16 years ago
Replying to cboos:
Replying to josef:
... _encode(u''.join(list(iterator))) ...
Wait wait …
Oh damn, my r7295 change was dead wrong. The
else:
part should have been kept instead.
This is now fixed in r7822.
In addition to make Genshi rendering faster and more memory efficient,
Well, the memory savings are roughly 15%, but rendering doesn't seem to be actually faster.
this fix should also avoid this kind of exception, as the encoding should be done in 'replace' mode.
As noted above in comment:11, that change unfortunately doesn't help in this area.
comment:14 by , 16 years ago
Description: | modified (diff) |
---|---|
Keywords: | genshi added |
I can reproduce the original error (well, the one in comment:8 actually) and now with r7822, this gives a slightly different backtrace:
Traceback (most recent call last): File "C:\Workspace\src\trac\repos\0.11-stable\trac\web\api.py", line 367, in send_error 'text/html') File "C:\Workspace\src\trac\repos\0.11-stable\trac\web\chrome.py", line 742, in render_template stream.render(method, doctype=doctype, out=buffer) File "build\bdist.win32\egg\genshi\core.py", line 179, in render return encode(generator, method=method, encoding=encoding, out=out) File "build\bdist.win32\egg\genshi\output.py", line 57, in encode for chunk in iterator: File "build\bdist.win32\egg\genshi\output.py", line 307, in __call__ for kind, data, pos in stream: File "build\bdist.win32\egg\genshi\output.py", line 749, in __call__ for kind, data, pos in stream: File "build\bdist.win32\egg\genshi\output.py", line 588, in __call__ for kind, data, pos in stream: File "build\bdist.win32\egg\genshi\output.py", line 703, in __call__ text = mjoin(textbuf, escape_quotes=False) File "build\bdist.win32\egg\genshi\core.py", line 465, in join for item in seq])) File "build\bdist.win32\egg\genshi\core.py", line 494, in escape text = unicode(text).replace('&', '&') \ UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 63: ordinal not in range(128)
This is shown as "raw" content, as the error actually happens during send_error
.
There's also no hint about where that error comes from (in my test, I just put a non-ascii character in the repository_dir
).
comment:15 by , 16 years ago
Cc: | added |
---|
comment:16 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
With r7838, unicode errors raised by Genshi during rendering are converted to TracError giving a hint about the location of the error.
comment:17 by , 16 years ago
comment:18 by , 12 years ago
To ignore this issue. you can use %r in python.
Like : str = "her my variable making problem. \xa0"
xstr = "%r" % str
xstr hold value which fix problem character. So this bug is no more you will be found.
Thanks, Tejas
Most of the unicode decoding/encoding errors come from two sources:
The remaining error (real Trac errors) are quite rare nowadays.
There is another ticket somewhere in the Trac DB that defines the rules for badly written plugins: it is not Trac role to deal with bad plugin and perform on-the-fly conversions for plugin that submit non-unicode data to Trac.
Following this defined rule, I'd vote for -1.