Edgewall Software

Opened 15 years ago

Last modified 12 years ago

#7959 closed defect

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 4: ordinal not in range(128) — at Version 14

Reported by: rupert.thurner Owned by: Christian Boos
Priority: high Milestone: 0.11.3
Component: general Version: 0.11.2.1
Severity: critical Keywords: genshi
Cc: trac-ja@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description (last modified by Christian Boos)

… as there are so many unicode errors, does it make sense to do a guaranteed encoding at some point, like described in http://code.activestate.com/recipes/466341/ ?

Example error: One of my test SVN repositories was deliberately created in a way that its name contains cyrillic characters, e.g. /home/svn/борис. Accessing this with Apache's mod_dav_svn works, but trac gives the following error:

Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/trac/web/api.py", line 339, in send_error
    'text/html')
  File "/usr/lib/python2.5/site-packages/trac/web/chrome.py", line 715, in render_template
    return stream.render(method, doctype=doctype)
  File "/var/lib/python-support/python2.5/genshi/core.py", line 179, in render
    return encode(generator, method=method, encoding=encoding, out=out)
  File "/var/lib/python-support/python2.5/genshi/output.py", line 60, in encode
    return _encode(u''.join(list(iterator)))
  File "/var/lib/python-support/python2.5/genshi/output.py", line 311, in __call__
    for kind, data, pos in stream:
  File "/var/lib/python-support/python2.5/genshi/output.py", line 753, in __call__
    for kind, data, pos in stream:
  File "/var/lib/python-support/python2.5/genshi/output.py", line 592, in __call__
    for kind, data, pos in stream:
  File "/var/lib/python-support/python2.5/genshi/output.py", line 707, in __call__
    text = mjoin(textbuf, escape_quotes=False)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 49: ordinal not in range(128)

(from comment:8)

Change History (14)

comment:1 by Emmanuel Blot, 15 years ago

Most of the unicode decoding/encoding errors come from two sources:

  • Badly written plugins (or plugins written for 0.10 and used against the 0.11 series)
  • Direct access to the underlying DB without proper data encoding

The remaining error (real Trac errors) are quite rare nowadays.

There is another ticket somewhere in the Trac DB that defines the rules for badly written plugins: it is not Trac role to deal with bad plugin and perform on-the-fly conversions for plugin that submit non-unicode data to Trac.

Following this defined rule, I'd vote for -1.

comment:2 by Remy Blank, 15 years ago

I'm not sure what you mean with "guaranteed encoding", but Trac already has to_unicode(), which should never fail (but it can loose information, as it will use the 'replace' strategy if it encounters a decoding exception).

There's one other source of decoding errors that I have seen: when logging exceptions to the log, with a construct like (e being an exception instance):

self.log.error('Processing error: %s' % e)

Those should obviously be fixed.

Could you please explain more precisely what you mean, maybe with an example?

comment:3 by anonymous, 15 years ago

one example is th:TracHoursPlugin. install and click on the link "hours" (how the plugin gets called) throws an unicode error, suggesting to report it to the trac project.

searching for information on it lead to TracUnicode, and TracDev/UnicodeGuidelines. i did not really understand how to find the root cause of the exception, or even better, how to fix it.

comment:4 by Remy Blank, 15 years ago

Ok, I see. Then I'm with eblot on this one: the plugin needs fixing, not Trac. -1 from me as well.

comment:5 by anonymous, 15 years ago

why the error message does not say its the plugin, and the stack trace does not point to the location? or what part of the documentation gives a hint on how to fix it? how would you fix it?

in reply to:  5 comment:6 by Emmanuel Blot, 15 years ago

Replying to anonymous:

why the error message does not say its the plugin, and the stack trace does not point to the location? or what part of the documentation gives a hint on how to fix it? how would you fix it?

Please copy 'n paste the exact traceback.

comment:7 by anonymous, 15 years ago

File "/opt/csw/lib/python/site-packages/Trac-0.11.2.1-py2.5.egg/trac/web/main.py", line 432, in _dispatch_request
  dispatcher.dispatch(req)
File "/opt/csw/lib/python/site-packages/Trac-0.11.2.1-py2.5.egg/trac/web/main.py", line 226, in dispatch
  data, content_type)
File "/opt/csw/lib/python/site-packages/Trac-0.11.2.1-py2.5.egg/trac/web/chrome.py", line 719, in render_template
  return stream.render(method, doctype=doctype)
File "build/bdist.solaris-2.10-sun4v/egg/genshi/core.py", line 179, in renderFile 
"build/bdist.solaris-2.10-sun4v/egg/genshi/output.py", line 60, in encodeFile 
"build/bdist.solaris-2.10-sun4v/egg/genshi/output.py", line 311, in __call__File 
"build/bdist.solaris-2.10-sun4v/egg/genshi/output.py", line 753, in __call__File 
"build/bdist.solaris-2.10-sun4v/egg/genshi/output.py", line 592, in __call__File 
"build/bdist.solaris-2.10-sun4v/egg/genshi/output.py", line 707, in __call__

System Information:

Trac: 	0.11.2.1
Python: 	2.5.1 (r251:54863, Nov 3 2007, 02:54:52) [C]
setuptools: 	0.6c9
SQLite: 	3.2.2
pysqlite: 	2.3.5
Genshi: 	0.5.1
Pygments: 	1.0
Subversion: 	1.4.5 (r25188)
jQuery:	1.2.6

comment:8 by josef, 15 years ago

I've got a very similar issue. One of my test SVN repositories was deliberately created in a way that its name contains cyrillic characters, e.g. /home/svn/борис. Accessing this with Apache's mod_dav_svn works, but trac gives the following error:

Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/trac/web/api.py", line 339, in send_error
    'text/html')
  File "/usr/lib/python2.5/site-packages/trac/web/chrome.py", line 715, in render_template
    return stream.render(method, doctype=doctype)
  File "/var/lib/python-support/python2.5/genshi/core.py", line 179, in render
    return encode(generator, method=method, encoding=encoding, out=out)
  File "/var/lib/python-support/python2.5/genshi/output.py", line 60, in encode
    return _encode(u''.join(list(iterator)))
  File "/var/lib/python-support/python2.5/genshi/output.py", line 311, in __call__
    for kind, data, pos in stream:
  File "/var/lib/python-support/python2.5/genshi/output.py", line 753, in __call__
    for kind, data, pos in stream:
  File "/var/lib/python-support/python2.5/genshi/output.py", line 592, in __call__
    for kind, data, pos in stream:
  File "/var/lib/python-support/python2.5/genshi/output.py", line 707, in __call__
    text = mjoin(textbuf, escape_quotes=False)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 49: ordinal not in range(128)

This directory is referenced in [trac]/repository_dir. It is encoded in UTF-8 which is also the $LANG setting of the environment trac runs in, so it should work.

in reply to:  8 ; comment:9 by Christian Boos, 15 years ago

Milestone: 0.11.3
Owner: set to Christian Boos
Priority: normalhigh
Severity: normalcritical
Status: newassigned

Replying to josef:

... _encode(u''.join(list(iterator))) ...

Wait wait …

Oh damn, my r7295 change was dead wrong. The else: part should have been kept instead. In addition to make Genshi rendering faster and more memory efficient, this fix should also avoid this kind of exception, as the encoding should be done in 'replace' mode.

Besides, using /home/svn/борис for the repository_dir should eventually be supported.

So we have 3 unicode related issues discussed in this ticket:

  • comment:5 's self.log.error('Processing error: %s' % e) style of errors - see #7935
  • comment:8 's request for supporting non-ascii repository_dirs - probably worth another ticket
  • the whole ... genshi/output.py class of unicode errors (#3908, #6932, probably lots of others) which can be avoided by fixing r7295. Let's focus this ticket on this.

comment:10 by anonymous, 15 years ago

I'm also having problems with send_error as josef but with a plugin I'm writing. Is there any quick fix for this?

I reverted r7295 but it didn't help (why would it help? the cStringIO would not help the unicode errors).

in reply to:  10 comment:11 by Christian Boos, 15 years ago

Replying to anonymous:

I'm also having problems with send_error as josef but with a plugin I'm writing. Is there any quick fix for this?

If you're writing a plugin, your responsible for exchanging only unicode string content with Trac, see TracDev/UnicodeGuidelines#TracboundariesforUnicodeData for more.

I reverted r7295 but it didn't help (why would it help? the cStringIO would not help the unicode errors).

When given a file-like object to the out parameter, Genshi will convert unicode strings to utf-8 in 'replace' mode. Ah, yes, that won't help if the string is not unicode in the first place, as there could be a UnicodeDecodeError#encode first.

comment:12 by Christian Boos, 15 years ago

Milestone: 0.11.40.11.3

in reply to:  9 comment:13 by Christian Boos, 15 years ago

Replying to cboos:

Replying to josef:

... _encode(u''.join(list(iterator))) ...

Wait wait …

Oh damn, my r7295 change was dead wrong. The else: part should have been kept instead.

This is now fixed in r7822.

In addition to make Genshi rendering faster and more memory efficient,

Well, the memory savings are roughly 15%, but rendering doesn't seem to be actually faster.

this fix should also avoid this kind of exception, as the encoding should be done in 'replace' mode.

As noted above in comment:11, that change unfortunately doesn't help in this area.

comment:14 by Christian Boos, 15 years ago

Description: modified (diff)
Keywords: genshi added

I can reproduce the original error (well, the one in comment:8 actually) and now with r7822, this gives a slightly different backtrace:

Traceback (most recent call last):
  File "C:\Workspace\src\trac\repos\0.11-stable\trac\web\api.py", line 367, in send_error
    'text/html')
  File "C:\Workspace\src\trac\repos\0.11-stable\trac\web\chrome.py", line 742, in render_template
    stream.render(method, doctype=doctype, out=buffer)
  File "build\bdist.win32\egg\genshi\core.py", line 179, in render
    return encode(generator, method=method, encoding=encoding, out=out)
  File "build\bdist.win32\egg\genshi\output.py", line 57, in encode
    for chunk in iterator:
  File "build\bdist.win32\egg\genshi\output.py", line 307, in __call__
    for kind, data, pos in stream:
  File "build\bdist.win32\egg\genshi\output.py", line 749, in __call__
    for kind, data, pos in stream:
  File "build\bdist.win32\egg\genshi\output.py", line 588, in __call__
    for kind, data, pos in stream:
  File "build\bdist.win32\egg\genshi\output.py", line 703, in __call__
    text = mjoin(textbuf, escape_quotes=False)
  File "build\bdist.win32\egg\genshi\core.py", line 465, in join
    for item in seq]))
  File "build\bdist.win32\egg\genshi\core.py", line 494, in escape
    text = unicode(text).replace('&', '&') \
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 63: ordinal not in range(128)

This is shown as "raw" content, as the error actually happens during send_error. There's also no hint about where that error comes from (in my test, I just put a non-ascii character in the repository_dir).

Note: See TracTickets for help on using tickets.