Edgewall Software

Changes between Version 2 and Version 3 of UnicodeEncodeError


Ignore:
Timestamp:
Feb 18, 2009, 10:01:48 AM (15 years ago)
Author:
Christian Boos
Comment:

copy/pasted/mirrored from UnicodeDecodeError#encode

Legend:

Unmodified
Added
Removed
Modified
  • UnicodeEncodeError

    v2 v3  
    3535}}}
    3636
     37
     38== ... but I was decoding? == #decode
     39A more subtle and confusing way to trigger this error is when trying to ''decode'' an `unicode` string. Wait... decoding a sequence of unicode characters? Does that even make sense? Well, normally not, but Python interprets that as a shortcut for decoding the `str` object obtained from that unicode string encoded using the default encoding. So we have the following equivalence:
     40{{{
     41u"string".decode(enc) == str(u"string").decode(enc)
     42}}}
     43That could be called a `u"cadeau empoisonné"` ;-)
     44
     45Of course, if `u"string"` can't be first encoded the naive way in order to produce that temporary `str` object, it will trigger the same error we saw above:
     46{{{
     47>>> u'chaîne de caractères'.decode('utf-8')
     48Traceback (most recent call last):
     49  File "<stdin>", line 1, in ?
     50UnicodeDecodeError: 'ascii' codec can't decode byte 0xee
     51                    in position 3: ordinal not in range(128)
     52}}}
     53
     54In practice, this happens when an API designed to handle a `str` object suddenly receive an `unicode` object. It's "normal" to call `s.decode(...)` if `s` is a `str` object, but this will fail with the above confusing error if `s` is actually an `unicode` object containing characters not present in the ASCII character set.
     55
     56
    3757----
    3858See also: TracDev/UnicodeGuidelines, UnicodeDecodeError