What does this UnicodeEncodeError
mean?
It means that a given unicode
object (i.e. the Python internal representation for a sequence of internationalized characters conforming to the Unicode standard) failed to be converted to a str
object (i.e. a sequence of bytes). The failure means that there was a character which couldn't be represented by an appropriate sequence of bytes in the chosen output encoding.
In practice, the default conversion being a "strict" one using the default encoding, which is 'ascii' most of the time, the error will happen as soon as the unicode
object contains characters outside of the ASCII range of characters (codepoint 0 to 127).
This error was frequently seen during the transition to internal use of unicode
that happened in Trac 0.10, and can still be seen now and then with Trac plugins that are not using the Trac API the way they should.
Examples:
>>> str(u'chaîne de caractères') Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xee' in position 3: ordinal not in range(128) >>>
The above is actually equivalent to the following, as sys.getdefaultencoding()
is frequently 'ascii':
>>> u'chaîne de caractères'.encode('ascii') Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xee' in position 3: ordinal not in range(128)
One possibility to force the conversion to happen would be to use a non-strict conversion:
>>> u'chaîne de caractères'.encode('ascii', 'ignore') 'chane de caractres' >>> u'chaîne de caractères'.encode('ascii', 'replace') 'cha?ne de caract?res'
… but I was decoding?
A more subtle and confusing way to trigger this error is when trying to decode an unicode
string. Wait… decoding a sequence of unicode characters? Does that even make sense? Well, normally not, but Python interprets that as a shortcut for decoding the str
object obtained from that unicode string encoded using the default encoding. So we have the following equivalence:
u"string".decode(enc) == str(u"string").decode(enc)
That could be called a u"cadeau empoisonné"
;-)
Of course, if u"string"
can't be first encoded the naive way in order to produce that temporary str
object, it will trigger the same error we saw above:
>>> u'chaîne de caractères'.decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Dev\Python254\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode character u'\xee' in position 3: ordinal not in range(128)
In practice, this happens when an API designed to handle a str
object suddenly receive an unicode
object. It's "normal" to call s.decode(...)
if s
is a str
object, but this will fail with the above confusing error if s
is actually an unicode
object containing characters not present in the ASCII character set.
See also: TracDev/UnicodeGuidelines, UnicodeDecodeError