Edgewall Software

Ticket #2971 (reopened defect)

Opened 3 years ago

Last modified 4 weeks ago

Unicode encoding error w/ Windows & defined locale

Reported by: eblot Owned by: cboos
Priority: normal Milestone: 0.11.3
Component: general Version: devel
Severity: major Keywords: unicode windows python23
Cc:

Description

The following piece of code:

encoding = locale.getlocale(locale.LC_TIME)[1] or \
           locale.getpreferredencoding()

in /trac/util/__init__.py breaks w/ Windows & ActiveState Python 2.3:

ActivePython 2.3.5 Build 236 (ActiveState Corp.) based on
Python 2.3.5 (#62, Feb  9 2005, 16:17:08) [MSC v.1200 32 bit (Intel)] on win32

with the following Python stack trace

Traceback (most recent call last):
  File "trac\web\main.py", line 308, in dispatch_request
    dispatcher.dispatch(req)
  File "trac\web\main.py", line 153, in dispatch
    populate_hdf(req.hdf, self.env, req)
  File "trac\web\main.py", line 69, in populate_hdf
    hdf['trac'] = {
  File "trac\util\__init__.py", line 198, in format_datetime
    return unicode(text, encoding, 'replace')
LookupError: unknown encoding: 1252

when locale is defined.

It seems the trouble comes from the Python encoding:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'French_France')
'French_France.1252'
>>> locale.setlocale(locale.LC_ALL, 'English_United-Kingdom')
'English_United Kingdom.1252'
>>> locale.getlocale(locale.LC_TIME)[1]
'1252'

The expected code page was cp1252, not 1252:

>>> locale.getpreferredencoding()
'cp1252'
>>> unicode('test', 'cp1252', 'replace')
u'test'
>>> unicode('test', '1252', 'replace')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
LookupError: unknown encoding: 1252

Attachments

Change History

Changed 3 years ago by cboos

  • owner changed from jonas to cboos
  • milestone set to 0.10

Using only locale.getpreferredencoding() would work on windows, but not on Linux, where setting the locale doesn't seem to affect it (e.g. after a locale.setlocale(locale.LC_ALL, 'French'), my locale.getpreferredencoding() was still 'UTF-8', whereas locale.getlocale(locale.LC_TIME)[1] gave 'ISO8859-1', which was consistent with the encoding used by strftime).

So I guess some platform dependent code is in order here...

Changed 3 years ago by cboos

Emmanuel, this fix worked for me, could you try it out?

Index: trac/util/__init__.py
===================================================================
--- trac/util/__init__.py       (revision 3118)
+++ trac/util/__init__.py       (working copy)
@@ -208,8 +208,8 @@
             t = time.localtime(int(t))

     text = time.strftime(format, t)
-    encoding = locale.getlocale(locale.LC_TIME)[1] or \
-               locale.getpreferredencoding()
+    lc_time_encoding = sys.platform != 'win32' and getlocale(locale.LC_TIME)[1]
+    encoding = lc_time_encoding or locale.getpreferredencoding()
     return unicode(text, encoding, 'replace')

 def format_date(t=None, format='%x', gmt=False):

Changed 3 years ago by cboos

  • status changed from new to closed
  • resolution set to fixed

Issue fixed in r3141.

Changed 13 months ago by cboos

  • keywords python23 added

More precisely, that was a win32 issue with Python 2.3. When using 2.4 or 2.5, the original code would have worked just fine ('1252' is a known encoding alias). See follow-up change r6113.

Changed 4 months ago by sakesun

  • status changed from closed to reopened
  • resolution fixed deleted

This won't work for Thai on Python 2.5

'abc'.encode('cp874')

'abc'

'abc'.encode('874')

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

LookupError?: unknown encoding: 874

Changed 4 months ago by sakesun

I cannot find information to confirm the behaviour for every python versions across every platforms.

I use this fix for my own Trac:

def format_datetime(t=None, format='%x %X', tzinfo=None):
    """Format the `datetime` object `t` into an `unicode` string

    If `t` is None, the current time will be used.
    
    The formatting will be done using the given `format`, which consist
    of conventional `strftime` keys. In addition the format can be 'iso8601'
    to specify the international date format.

    `tzinfo` will default to the local timezone if left to `None`.
    """
    t = to_datetime(t, tzinfo).astimezone(tzinfo or localtz)
    if format.lower() == 'iso8601':
        format = '%Y-%m-%dT%H:%M:%SZ%z'
    text = t.strftime(format)
    encoding1 = locale.getpreferredencoding() or sys.getdefaultencoding()
    encoding2 = locale.getlocale(locale.LC_TIME)[1] or encoding1
    try:
        return unicode(text, encoding2, 'replace')
    except LookupError:
        return unicode(text, encoding1, 'replace')

Changed 4 months ago by anonymous

  • milestone changed from 0.10 to 0.11.1

Changed 4 months ago by eblot

Locale name definition is platform-specific.

Changed 4 weeks ago by cboos

  • milestone changed from 0.11.2 to 0.11.3

Add/Change #2971 (Unicode encoding error w/ Windows & defined locale)

Author



Change Properties
<Author field>
Action
as reopened
as The resolution will be set. Next status will be 'closed'
to The owner will change from cboos. Next status will be 'new'
 
Note: See TracTickets for help on using tickets.