Edgewall Software

Changes between Initial Version and Version 1 of UnicodeEncodeError


Ignore:
Timestamp:
Mar 2, 2007, 1:07:32 PM (17 years ago)
Author:
Christian Boos
Comment:

Explain the often misunderstood UnicodeEncodeError

Legend:

Unmodified
Added
Removed
Modified
  • UnicodeEncodeError

    v1 v1  
     1= What does this `UnicodeEncodeError` mean? =
     2
     3It means that a given `unicode` object (i.e. the Python internal representation for a sequence of internationalized characters conforming to the Unicode standard) failed to be converted to a `str` object (i.e. a sequence of bytes). The failure means that there was a character which couldn't be represented by an appropriate sequence of bytes in the chosen output encoding.
     4
     5In practice, the default conversion being a "strict" one using the default encoding, which is 'ascii' most of the time, as soon as the `unicode` object contains characters outside of the ASCII range of characters (codepoint 0 to 127), the error will happen.
     6
     7This error was frequently seen during the transition to internal use of `unicode` that happened in Trac [milestone:0.10], and can still be seen now and then with Trac plugins that are not using the Trac API the way they should.
     8
     9Examples:
     10
     11{{{
     12>>> str(u'chaîne de caractères')
     13Traceback (most recent call last):
     14  File "<stdin>", line 1, in ?
     15UnicodeEncodeError: 'ascii' codec can't encode character u'\xee' in position 3: ordinal not in range(128)
     16>>>
     17}}}
     18
     19The above is actually equivalent to the following, as `sys.getdefaultencoding()` is frequently 'ascii':
     20{{{
     21>>> u'chaîne de caractères'.encode('ascii')
     22Traceback (most recent call last):
     23  File "<stdin>", line 1, in ?
     24UnicodeEncodeError: 'ascii' codec can't encode character u'\xee' in position 3: ordinal not in range(128)
     25}}}
     26
     27One possibility to force the conversion to happen would be to use a non-strict conversion:
     28{{{
     29>>> u'chaîne de caractères'.encode('ascii', 'ignore')
     30'chane de caractres'
     31>>> u'chaîne de caractères'.encode('ascii', 'replace')
     32'cha?ne de caract?res'
     33}}}
     34
     35----
     36See also: TracDev/UnicodeGuidelines, UnicodeDecodeError