Edgewall Software

Opened 12 years ago

Closed 11 years ago

Last modified 11 years ago

#10920 closed defect (fixed)

UnicodeDecodeError: 'ascii' codec can't decode byte 0x8c in position 14: ordinal not in range(128) — at Version 14

Reported by: lukasz.matecki@… Owned by: Christian Boos
Priority: high Milestone: 0.12.5
Component: roadmap Version: 1.0
Severity: normal Keywords: unicode localtz datefmt
Cc: Branch:
Release Notes:

Fixed generation of iCalendar file from the roadmap on Windows when the user's timezone is the system default.

API Changes:

str(trac.util.datefmt.localtz) now returns an "UTC[+-]HH:MM" string (r11422)

Internal Changes:

Description

I can see that this is related to the Polish locale, the unicode error occured at the local time designation "X-WR-TIMEZONE:\x8crodkowoeuropejski czas…" where \x8c stands for Polish "ś".

How to Reproduce

While doing a GET operation on /roadmap, Trac issued an internal error.

(please provide additional details here)

Request parameters:

{'format': u'ics', 'user': u'admin'}

User agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0

System Information

Trac 1.0
Babel 0.9.6
Genshi 0.6 (without speedups)
Pygments 1.5
pysqlite 2.6.0
Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]
pytz 2011c
RPC 1.1.2
setuptools 0.6c11
SQLite 3.6.21
Subversion 1.7.5 (r1336830)
jQuery 1.7.2

Enabled Plugins

tracxmlrpc 1.1.2

Python Traceback

Traceback (most recent call last):
  File "c:\users\lukasz\appdata\local\temp\easy_install-myvpng\Trac-1.0-py2.7-win32.egg.tmp\trac\web\main.py", line 497, in _dispatch_request
    dispatcher.dispatch(req)
  File "c:\users\lukasz\appdata\local\temp\easy_install-myvpng\Trac-1.0-py2.7-win32.egg.tmp\trac\web\main.py", line 214, in dispatch
    resp = chosen_handler.process_request(req)
  File "c:\users\lukasz\appdata\local\temp\easy_install-myvpng\Trac-1.0-py2.7-win32.egg.tmp\trac\ticket\roadmap.py", line 442, in process_request
    self._render_ics(req, milestones)
  File "c:\users\lukasz\appdata\local\temp\easy_install-myvpng\Trac-1.0-py2.7-win32.egg.tmp\trac\ticket\roadmap.py", line 574, in _render_ics
    ics_str = buf.getvalue().encode('utf-8')
  File "c:\Python27\lib\StringIO.py", line 270, in getvalue
    self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8c in position 14: ordinal not in range(128)

Change History (14)

comment:1 by Christian Boos, 12 years ago

Keywords: unicode added
Milestone: 1.0.1
Owner: set to Christian Boos
Priority: normalhigh
Status: newassigned

It's a bit difficult to reproduce (on Linux I always get plain ascii timezone names), but it's not hard to see what's going on…

Thanks for the report!

comment:2 by Christian Boos, 12 years ago

Could you please try the following patch?

  • trac/ticket/roadmap.py

     
    3131from trac.resource import *
    3232from trac.search import ISearchSource, search_to_sql, shorten_result
    3333from trac.util import as_bool
    34 from trac.util.datefmt import parse_date, utc, to_utimestamp, \
     34from trac.util.datefmt import parse_date, utc, to_datetime, to_utimestamp, \
    3535                              get_datetime_format_hint, format_date, \
    3636                              format_datetime, from_utimestamp, user_time
    3737from trac.util.text import CRLF
     
    515515
    516516        host = req.base_url[req.base_url.find('://') + 3:]
    517517        user = req.args.get('user', 'anonymous')
     518        now = to_datetime(None, req.tz)
    518519
    519520        write_prop('BEGIN', 'VCALENDAR')
    520521        write_prop('VERSION', '2.0')
     
    524525
    525526        write_prop('X-WR-CALNAME',
    526527                   self.env.project_name + ' - ' + _('Roadmap'))
    527528        write_prop('X-WR-CALDESC', self.env.project_description)
    528         write_prop('X-WR-TIMEZONE', str(req.tz))
     529        write_prop('X-WR-TIMEZONE', req.tz.tzname(now))
    529530
    530531        for milestone in milestones:

comment:3 by lukasz.matecki@…, 12 years ago

This doesn't work. I can see that req.tz.tzname(now) is still in my local encoding. When I do this dirty hack:

        import sys
        print sys.stdout.encoding
        write_prop('X-WR-TIMEZONE', req.tz.tzname(now).decode(sys.stdout.encoding))

the calendar gets generated. My codepage displayed from the above is cp852.

comment:4 by Christian Boos, 12 years ago

Milestone: 1.0.10.12.5

I wonder how the result of LocalTimezone.tzname() gets localized on Linux, if at all. From my limited testing, it seems I always get back the abbreviated timezone, so in plain ascii.

Back to your problem on Windows. Maybe reusing some code from datefmt could help:

  • trac/ticket/roadmap.py

     
    524524        write_prop('X-WR-CALNAME',
    525525                   self.env.project_name + ' - ' + _('Roadmap'))
    526526        write_prop('X-WR-CALDESC', self.env.project_description)
    527         write_prop('X-WR-TIMEZONE', str(req.tz))
     527        write_prop('X-WR-TIMEZONE', unicode(req.tz))
    528528
    529529        for milestone in milestones:
    530530            uid = '<%s/milestone/%s@%s>' % (req.base_path, milestone.name,
  • trac/util/datefmt.py

     
    172172        'iso8601': 'iso8601time', None: 'iso8601time'},
    173173}
    174174
     175def lc_time_encoding():
     176    """Return the most likely encoding used by strftime and other
     177    representations of system time.
     178    """
     179    return getlocale(LC_TIME)[1] or getpreferredencoding() \
     180        or sys.getdefaultencoding()
     181
    175182def _format_datetime_without_babel(t, format, tzinfo):
    176183    tz = tzinfo or localtz
    177184    t = to_datetime(t, tz)
     
    190197        text = text.replace('+0000', 'Z')
    191198        if not text.endswith('Z'):
    192199            text = text[:-2] + ":" + text[-2:]
    193     encoding = getlocale(LC_TIME)[1] or getpreferredencoding() \
    194                or sys.getdefaultencoding()
    195     return unicode(text, encoding, 'replace')
     200    return unicode(text, lc_time_encoding(), 'replace')
    196201
    197202def format_datetime(t=None, format='%x %X', tzinfo=None, locale=None):
    198203    """Format the `datetime` object `t` into an `unicode` string
     
    761766    def __str__(self):
    762767        return self.tzname(datetime.now())
    763768
     769    def __unicode__(self):
     770        return str(self).decode(lc_time_encoding(), 'replace')
     771
    764772    def __repr__(self):
    765773        return '<LocalTimezone "%s" %s "%s" %s>' % (
    766774            time.tzname[False], STDOFFSET,

Could you please try this one? Here, I'm hoping that getpreferredencoding() also gives 'cp852' on your system. But maybe not, it's often quite different from the encoding on sys.stdout. The problem is that we can't assume we have a meaningful sys.stdout, as in some contexts (e.g. mod_wsgi) it's redirected.

And btw. Jun, getpreferredencoding in the above reminds me that we still have #10768 on the todo list ;-)

comment:5 by lukasz.matecki@…, 12 years ago

I've tried all three methods in 'lc_time_encoding()' and I'm getting None, cp1250 and ascii on the command line (sic! I'm getting cp1250 while the code in roadmap.py prints cp852 for standard out).

Anyway, the problematic character '0x8c' is actually Polish "Ś" (which is the first non-ascii character in the locale time designator "Środkowo…") in cp1250 so the encoding in my environemnt seem to be cp1250 and your patch will probably work.

I'm having problems applying patches using Tortoise Merge (I'm getting Rejected patch hunks stuff). Can you provide entire file ? Or maybe the svn location ?

Thanks.

comment:6 by Christian Boos, 11 years ago

Keywords: localtz datefmt added

I've pushed two branches:


Now the big question is whether the above approach is a pertinent one or not… If there's any interoperability concern, using the localized name of the timezone is probably not a good idea, even if it has a correct unicode representation (see for example the discussion in pythonbug:883604). Which other software will understand your "Ścrodkowoeuropejski czas…"? Certainly not a single one on Linux…

A better approach would probably be to try to "normalize" the tzname to some known value. Not trivial either, see e.g. SO:7669938, which we need to adapt in order to ignore the possibly localized time.tzname. Going through all_timezones and pick the best match w.r.t. utcoffset(dt) for a few sample datetimes?

comment:7 by lukasz.matecki@…, 11 years ago

Ok, the patch works.

As for an elegant solution: I found this interesting article:

http://regebro.wordpress.com/2008/05/10/python-and-time-zones-part-2-the-beast-returns/

So your idea to use UTC offset as index and find a name is good but I think it would be good not to use the local timezone list but create single, "universal" (English names will suffice ?) list of timezones. Of course they will be more general than the ones got from machine locale (for instance "Eastern European" instead of "Eastern / Warsaw" or whatever these names actually are).

What do you think ?

in reply to:  7 ; comment:8 by Christian Boos, 11 years ago

Replying to lukasz.matecki@…:

Ok, the patch works.

As for an elegant solution: I found this interesting article:

http://regebro.wordpress.com/2008/05/10/python-and-time-zones-part-2-the-beast-returns/

Ha! Quite interesting (and funny!) read. Yes, it looks it's even more broken on Windows than I thought.

So your idea to use UTC offset as index and find a name is good but I think it would be good not to use the local timezone list but create single, "universal" (English names will suffice ?) list of timezones.

Of course they will be more general than the ones got from machine locale (for instance "Eastern European" instead of "Eastern / Warsaw" or whatever these names actually are).

Would such a made up list really be useful? I think it won't help for interoperability. We could perhaps use the closest FixedOffset timezone and use formal names (UTC+01:00, etc.).

But the more I think about it, the more I think we should just emphasize the importance of picking an explicit timezone in the user preferences (and ask your admin to install pytz!), rather than relying on the localtz.

And in that perspective, the patches proposed are maybe enough.

comment:9 by lukasz.matecki@…, 11 years ago

Fine for me.

in reply to:  8 comment:10 by Christian Boos, 11 years ago

Just to be complete:

Replying to cboos:

… We could perhaps use the closest FixedOffset timezone and use formal names (UTC+01:00, etc.).

i.e. format what localtz.utcoffset(now) gives us:

  • datefmt.py

     
    393395    """A 'local' time zone implementation"""
    394396   
    395397    def __str__(self):
    396         return self.tzname(datetime.now())
    397    
     398        secs = self.utcoffset(datetime.now())
     399        hours, rem = divmod(secs, 3600)
     400        return 'UTC%+03d:%02d' % (hours + 1 if secs < 0 else hours, rem / 60)
     401
    398402    def __repr__(self):
    399403        return '<LocalTimezone "%s" %s "%s" %s>' % (
    400404            time.tzname[False], STDOFFSET,

What do others think? Should unicode(localtz) return:

  1. whatever name given by the system
    • (+) "natural" result?
    • (-) which might be entirely useless, especially on Windows as seen above and from the URL given in comment:7)
  2. an UTC offset, as per the above patch
    • (+) simple and has the extra bonus property that str(localtz) will always work, i.e. no further fix in roadmap.py for the present issue
    • (-) besides the Wikipedia article linked above, there's no evidence that other programs will understand those UTC timezones…
  3. something else?

Now that I spent a ridiculous amount of time getting the above patch to work, my preference would be 2. ;-)

comment:11 by Remy Blank, 11 years ago

  1. is probably useless. My preference is 2.

(OT: What's the trick again to avoid starting a <ol> when the beginning of a line is "1."?)

in reply to:  11 comment:12 by Christian Boos, 11 years ago

Replying to rblank:

(OT: What's the trick again to avoid starting a <ol> when the beginning of a line is "1."?)

It used to be a leading `` … which is no longer invisible. Well, maybe that would count as a bug and we could just replace `` with nothing instead of producing an empty <tt> (which looks pretty useless anyway, but even if there would be an use, then we could leave {{{}}} for that → ).

1. this is 1. at the start of a normal paragraph.

in reply to:  11 comment:13 by Christian Boos, 11 years ago

API Changes: modified (diff)
Release Notes: modified (diff)

Replying to rblank:

  1. is probably useless. My preference is 2.

Good! I applied the patch (hopefully correct this time…) on 0.12-stable in r11414.

DONE I'll add some unit-tests for this, but after Jun commits his extensive changes in this area of the code. (r11422)

Last edited 11 years ago by Christian Boos (previous) (diff)

comment:14 by Christian Boos, 11 years ago

API Changes: modified (diff)
Resolution: fixed
Status: assignedclosed

Tests added in r11422 (and well, next time I'll start with the tests).

Note: See TracTickets for help on using tickets.