Edgewall Software
Home
Trac
Trac Hacks
Genshi
Babel
Bitten
Home
Download
Documentation
Mailing Lists
License
FAQ
Search:
Login
Preferences
Help/Guide
About Trac
Wiki
Timeline
Roadmap
Browse Source
View Tickets
New Ticket
Search
Context Navigation
+0
Start Page
Index
History
Editing UnicodeEncodeError
Adjust edit area height:
8
12
16
20
24
28
32
36
40
Edit side-by-side
= What does this `UnicodeEncodeError` mean? = It means that a given `unicode` object (i.e. the Python internal representation for a sequence of internationalized characters conforming to the Unicode standard) failed to be converted to a `str` object (i.e. a sequence of bytes). The failure means that there was a character which couldn't be represented by an appropriate sequence of bytes in the chosen output encoding. In practice, the default conversion being a "strict" one using the default encoding, which is 'ascii' most of the time, the error will happen as soon as the `unicode` object contains characters outside of the ASCII range of characters (codepoint 0 to 127). This error was frequently seen during the transition to internal use of `unicode` that happened in Trac [milestone:0.10], and can still be seen now and then with Trac plugins that are not using the Trac API the way they should. Examples: {{{ >>> str(u'chaîne de caractères') Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xee' in position 3: ordinal not in range(128) >>> }}} The above is actually equivalent to the following, as `sys.getdefaultencoding()` is frequently 'ascii': {{{ >>> u'chaîne de caractères'.encode('ascii') Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xee' in position 3: ordinal not in range(128) }}} One possibility to force the conversion to happen would be to use a non-strict conversion: {{{ >>> u'chaîne de caractères'.encode('ascii', 'ignore') 'chane de caractres' >>> u'chaîne de caractères'.encode('ascii', 'replace') 'cha?ne de caract?res' }}} == ... but I was decoding? == #decode A more subtle and confusing way to trigger this error is when trying to ''decode'' an `unicode` string. Wait... decoding a sequence of unicode characters? Does that even make sense? Well, normally not, but Python interprets that as a shortcut for decoding the `str` object obtained from that unicode string encoded using the default encoding. So we have the following equivalence: {{{ u"string".decode(enc) == str(u"string").decode(enc) }}} That could be called a `u"cadeau empoisonné"` ;-) Of course, if `u"string"` can't be first encoded the naive way in order to produce that temporary `str` object, it will trigger the same error we saw above: {{{ >>> u'chaîne de caractères'.decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeDecodeError: 'ascii' codec can't decode byte 0xee in position 3: ordinal not in range(128) }}} In practice, this happens when an API designed to handle a `str` object suddenly receive an `unicode` object. It's "normal" to call `s.decode(...)` if `s` is a `str` object, but this will fail with the above confusing error if `s` is actually an `unicode` object containing characters not present in the ASCII character set. ---- See also: TracDev/UnicodeGuidelines, UnicodeDecodeError
Note:
See
WikiFormatting
and
TracWiki
for help on editing wiki content.
Change information
Your email or username:
E-mail address and name can be saved in the
Preferences
Comment about this change (optional):
Note:
See
TracWiki
for help on using the wiki.