Edgewall Software

Version 3 (modified by Christian Boos, 15 years ago) ( diff )

copy/pasted/mirrored from UnicodeDecodeError#encode

What does this UnicodeEncodeError mean?

It means that a given unicode object (i.e. the Python internal representation for a sequence of internationalized characters conforming to the Unicode standard) failed to be converted to a str object (i.e. a sequence of bytes). The failure means that there was a character which couldn't be represented by an appropriate sequence of bytes in the chosen output encoding.

In practice, the default conversion being a "strict" one using the default encoding, which is 'ascii' most of the time, the error will happen as soon as the unicode object contains characters outside of the ASCII range of characters (codepoint 0 to 127).

This error was frequently seen during the transition to internal use of unicode that happened in Trac 0.10, and can still be seen now and then with Trac plugins that are not using the Trac API the way they should.

Examples:

>>> str(u'chaîne de caractères')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xee'
                    in position 3: ordinal not in range(128)
>>> 

The above is actually equivalent to the following, as sys.getdefaultencoding() is frequently 'ascii':

>>> u'chaîne de caractères'.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xee' 
                    in position 3: ordinal not in range(128)

One possibility to force the conversion to happen would be to use a non-strict conversion:

>>> u'chaîne de caractères'.encode('ascii', 'ignore')
'chane de caractres'
>>> u'chaîne de caractères'.encode('ascii', 'replace')
'cha?ne de caract?res'

… but I was decoding?

A more subtle and confusing way to trigger this error is when trying to decode an unicode string. Wait… decoding a sequence of unicode characters? Does that even make sense? Well, normally not, but Python interprets that as a shortcut for decoding the str object obtained from that unicode string encoded using the default encoding. So we have the following equivalence:

u"string".decode(enc) == str(u"string").decode(enc)

That could be called a u"cadeau empoisonné" ;-)

Of course, if u"string" can't be first encoded the naive way in order to produce that temporary str object, it will trigger the same error we saw above:

>>> u'chaîne de caractères'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xee 
                    in position 3: ordinal not in range(128)

In practice, this happens when an API designed to handle a str object suddenly receive an unicode object. It's "normal" to call s.decode(...) if s is a str object, but this will fail with the above confusing error if s is actually an unicode object containing characters not present in the ASCII character set.


See also: TracDev/UnicodeGuidelines, UnicodeDecodeError

Note: See TracWiki for help on using the wiki.