Edgewall Software

What does this UnicodeDecodeError mean?

This error indicate a failed attempt to create an unicode object (i.e. the Python internal representation for a sequence of internationalized characters as defined by the Unicode standard) from a str object, by "decoding" the sequence of bytes of the latter according to some conventional encoding.

In practice, this happens because the default conversion will make use of the default encoding, which usually is the ASCII encoding and as such, doesn't associate any meaning to the byte values higher than 127.

If an encoding is explicitly specified (e.g. "UTF-8"), the same exception will happen if the sequence of bytes is actually not conforming to the specified encoding (e.g. it was actually "iso-8859-1" a.k.a. "latin1").

This error happened quite frequently during the transition to the usage of unicode internally that occurred during Trac 0.10, until we adopted a robust conversion helper method (to_unicode, from the trac.util.text package).

It can still happen that plugins are trying to produce unicode objects in a naive way, which can easily trigger the error:

>>> unicode('chaîne de caractères')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xee 
                    in position 3: ordinal not in range(128)

Not specifying the encoding means using the sys.getdefaultencoding(), which is usually 'ascii'. So in effect, the above is equivalent to:

>>> unicode('chaîne de caractères', 'ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xee 
                    in position 3: ordinal not in range(128)

… but I was encoding?

A more subtle and confusing way to trigger this error is when trying to encode a sequence of bytes to a given encoding. Wait… encoding a sequence of bytes? Does that even make sense? Well, normally not, but Python interprets that as a shortcut for encoding the unicode object corresponding to this string. So we have the following equivalence:

"string".encode(enc) == unicode("string").encode(enc)

That could be called a u"cadeau empoisonné" ;-)

Of course, if "string" can't be first decoded the naive way in order to produce that temporary unicode object, it will trigger the same error we saw above:

>>> 'chaîne de caractères'.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xee 
                    in position 3: ordinal not in range(128)

In practice, this happens when an API designed to handle an unicode object suddenly receive a str object. It's "normal" to call s.encode(...) if s is an unicode object, but this will fail with the above confusing error if s is actually a str object containing bytes not in the 0..127 range (see #4875 for an example).


See also: TracDev/UnicodeGuidelines, UnicodeEncodeError

Last modified 16 years ago Last modified on Jan 19, 2009, 3:03:20 PM
Note: See TracWiki for help on using the wiki.