#10920 closed defect (fixed)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8c in position 14: ordinal not in range(128)
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | high | Milestone: | 0.12.5 |
Component: | roadmap | Version: | 1.0 |
Severity: | normal | Keywords: | unicode localtz datefmt |
Cc: | Branch: | ||
Release Notes: |
Fixed generation of iCalendar file from the roadmap on Windows when the user's timezone is the system default. |
||
API Changes: |
|
||
Internal Changes: |
Description
I can see that this is related to the Polish locale, the unicode error occured at the local time designation "X-WR-TIMEZONE:\x8crodkowoeuropejski czas…" where \x8c stands for Polish "ś".
How to Reproduce
While doing a GET operation on /roadmap
, Trac issued an internal error.
(please provide additional details here)
Request parameters:
{'format': u'ics', 'user': u'admin'}
User agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0
System Information
Trac | 1.0
|
Babel | 0.9.6
|
Genshi | 0.6 (without speedups)
|
Pygments | 1.5
|
pysqlite | 2.6.0
|
Python | 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]
|
pytz | 2011c
|
RPC | 1.1.2
|
setuptools | 0.6c11
|
SQLite | 3.6.21
|
Subversion | 1.7.5 (r1336830)
|
jQuery | 1.7.2
|
Enabled Plugins
tracxmlrpc | 1.1.2
|
Python Traceback
Traceback (most recent call last): File "c:\users\lukasz\appdata\local\temp\easy_install-myvpng\Trac-1.0-py2.7-win32.egg.tmp\trac\web\main.py", line 497, in _dispatch_request dispatcher.dispatch(req) File "c:\users\lukasz\appdata\local\temp\easy_install-myvpng\Trac-1.0-py2.7-win32.egg.tmp\trac\web\main.py", line 214, in dispatch resp = chosen_handler.process_request(req) File "c:\users\lukasz\appdata\local\temp\easy_install-myvpng\Trac-1.0-py2.7-win32.egg.tmp\trac\ticket\roadmap.py", line 442, in process_request self._render_ics(req, milestones) File "c:\users\lukasz\appdata\local\temp\easy_install-myvpng\Trac-1.0-py2.7-win32.egg.tmp\trac\ticket\roadmap.py", line 574, in _render_ics ics_str = buf.getvalue().encode('utf-8') File "c:\Python27\lib\StringIO.py", line 270, in getvalue self.buf += ''.join(self.buflist) UnicodeDecodeError: 'ascii' codec can't decode byte 0x8c in position 14: ordinal not in range(128)
Attachments (0)
Change History (14)
comment:1 by , 12 years ago
Keywords: | unicode added |
---|---|
Milestone: | → 1.0.1 |
Owner: | set to |
Priority: | normal → high |
Status: | new → assigned |
comment:2 by , 12 years ago
Could you please try the following patch?
-
trac/ticket/roadmap.py
31 31 from trac.resource import * 32 32 from trac.search import ISearchSource, search_to_sql, shorten_result 33 33 from trac.util import as_bool 34 from trac.util.datefmt import parse_date, utc, to_ utimestamp, \34 from trac.util.datefmt import parse_date, utc, to_datetime, to_utimestamp, \ 35 35 get_datetime_format_hint, format_date, \ 36 36 format_datetime, from_utimestamp, user_time 37 37 from trac.util.text import CRLF … … 515 515 516 516 host = req.base_url[req.base_url.find('://') + 3:] 517 517 user = req.args.get('user', 'anonymous') 518 now = to_datetime(None, req.tz) 518 519 519 520 write_prop('BEGIN', 'VCALENDAR') 520 521 write_prop('VERSION', '2.0') … … 524 525 525 526 write_prop('X-WR-CALNAME', 526 527 self.env.project_name + ' - ' + _('Roadmap')) 527 528 write_prop('X-WR-CALDESC', self.env.project_description) 528 write_prop('X-WR-TIMEZONE', str(req.tz))529 write_prop('X-WR-TIMEZONE', req.tz.tzname(now)) 529 530 530 531 for milestone in milestones:
comment:3 by , 12 years ago
This doesn't work. I can see that req.tz.tzname(now) is still in my local encoding. When I do this dirty hack:
import sys print sys.stdout.encoding write_prop('X-WR-TIMEZONE', req.tz.tzname(now).decode(sys.stdout.encoding))
the calendar gets generated. My codepage displayed from the above is cp852.
comment:4 by , 12 years ago
Milestone: | 1.0.1 → 0.12.5 |
---|
I wonder how the result of LocalTimezone.tzname()
gets localized on Linux, if at all. From my limited testing, it seems I always get back the abbreviated timezone, so in plain ascii.
Back to your problem on Windows. Maybe reusing some code from datefmt could help:
-
trac/ticket/roadmap.py
524 524 write_prop('X-WR-CALNAME', 525 525 self.env.project_name + ' - ' + _('Roadmap')) 526 526 write_prop('X-WR-CALDESC', self.env.project_description) 527 write_prop('X-WR-TIMEZONE', str(req.tz))527 write_prop('X-WR-TIMEZONE', unicode(req.tz)) 528 528 529 529 for milestone in milestones: 530 530 uid = '<%s/milestone/%s@%s>' % (req.base_path, milestone.name, -
trac/util/datefmt.py
172 172 'iso8601': 'iso8601time', None: 'iso8601time'}, 173 173 } 174 174 175 def lc_time_encoding(): 176 """Return the most likely encoding used by strftime and other 177 representations of system time. 178 """ 179 return getlocale(LC_TIME)[1] or getpreferredencoding() \ 180 or sys.getdefaultencoding() 181 175 182 def _format_datetime_without_babel(t, format, tzinfo): 176 183 tz = tzinfo or localtz 177 184 t = to_datetime(t, tz) … … 190 197 text = text.replace('+0000', 'Z') 191 198 if not text.endswith('Z'): 192 199 text = text[:-2] + ":" + text[-2:] 193 encoding = getlocale(LC_TIME)[1] or getpreferredencoding() \ 194 or sys.getdefaultencoding() 195 return unicode(text, encoding, 'replace') 200 return unicode(text, lc_time_encoding(), 'replace') 196 201 197 202 def format_datetime(t=None, format='%x %X', tzinfo=None, locale=None): 198 203 """Format the `datetime` object `t` into an `unicode` string … … 761 766 def __str__(self): 762 767 return self.tzname(datetime.now()) 763 768 769 def __unicode__(self): 770 return str(self).decode(lc_time_encoding(), 'replace') 771 764 772 def __repr__(self): 765 773 return '<LocalTimezone "%s" %s "%s" %s>' % ( 766 774 time.tzname[False], STDOFFSET,
Could you please try this one? Here, I'm hoping that getpreferredencoding()
also gives 'cp852'
on your system. But maybe not, it's often quite different from the encoding on sys.stdout. The problem is that we can't assume we have a meaningful sys.stdout, as in some contexts (e.g. mod_wsgi) it's redirected.
And btw. Jun, getpreferredencoding
in the above reminds me that we still have #10768 on the todo list ;-)
comment:5 by , 12 years ago
I've tried all three methods in 'lc_time_encoding()' and I'm getting None, cp1250 and ascii on the command line (sic! I'm getting cp1250 while the code in roadmap.py prints cp852 for standard out).
Anyway, the problematic character '0x8c' is actually Polish "Ś" (which is the first non-ascii character in the locale time designator "Środkowo…") in cp1250 so the encoding in my environemnt seem to be cp1250 and your patch will probably work.
I'm having problems applying patches using Tortoise Merge (I'm getting Rejected patch hunks stuff). Can you provide entire file ? Or maybe the svn location ?
Thanks.
comment:6 by , 12 years ago
Keywords: | localtz datefmt added |
---|
I've pushed two branches:
- cboos.git:0.12/ticket10920/localtz-unicode
- cboos.git:1.0/ticket10920/localtz-unicode ⇐ this is the one you need; if you do a clone of that one, you'll get current 1.0-stable with the patch
Now the big question is whether the above approach is a pertinent one or not… If there's any interoperability concern, using the localized name of the timezone is probably not a good idea, even if it has a correct unicode representation (see for example the discussion in pythonbug:883604). Which other software will understand your "Ścrodkowoeuropejski czas…"? Certainly not a single one on Linux…
A better approach would probably be to try to "normalize" the tzname to some known value. Not trivial either, see e.g. SO:7669938, which we need to adapt in order to ignore the possibly localized time.tzname
. Going through all_timezones
and pick the best match w.r.t. utcoffset(dt)
for a few sample datetimes?
follow-up: 8 comment:7 by , 12 years ago
Ok, the patch works.
As for an elegant solution: I found this interesting article:
http://regebro.wordpress.com/2008/05/10/python-and-time-zones-part-2-the-beast-returns/
So your idea to use UTC offset as index and find a name is good but I think it would be good not to use the local timezone list but create single, "universal" (English names will suffice ?) list of timezones. Of course they will be more general than the ones got from machine locale (for instance "Eastern European" instead of "Eastern / Warsaw" or whatever these names actually are).
What do you think ?
follow-up: 10 comment:8 by , 12 years ago
Replying to lukasz.matecki@…:
Ok, the patch works.
As for an elegant solution: I found this interesting article:
http://regebro.wordpress.com/2008/05/10/python-and-time-zones-part-2-the-beast-returns/
Ha! Quite interesting (and funny!) read. Yes, it looks it's even more broken on Windows than I thought.
So your idea to use UTC offset as index and find a name is good but I think it would be good not to use the local timezone list but create single, "universal" (English names will suffice ?) list of timezones.
Of course they will be more general than the ones got from machine locale (for instance "Eastern European" instead of "Eastern / Warsaw" or whatever these names actually are).
Would such a made up list really be useful? I think it won't help for interoperability. We could perhaps use the closest FixedOffset
timezone and use formal names (UTC+01:00, etc.).
But the more I think about it, the more I think we should just emphasize the importance of picking an explicit timezone in the user preferences (and ask your admin to install pytz!), rather than relying on the localtz.
And in that perspective, the patches proposed are maybe enough.
comment:10 by , 12 years ago
Just to be complete:
Replying to cboos:
… We could perhaps use the closest
FixedOffset
timezone and use formal names (UTC+01:00, etc.).
i.e. format what localtz.utcoffset(now)
gives us:
-
datefmt.py
393 395 """A 'local' time zone implementation""" 394 396 395 397 def __str__(self): 396 return self.tzname(datetime.now()) 397 398 secs = self.utcoffset(datetime.now()) 399 hours, rem = divmod(secs, 3600) 400 return 'UTC%+03d:%02d' % (hours + 1 if secs < 0 else hours, rem / 60) 401 398 402 def __repr__(self): 399 403 return '<LocalTimezone "%s" %s "%s" %s>' % ( 400 404 time.tzname[False], STDOFFSET,
What do others think? Should unicode(localtz)
return:
- whatever name given by the system
- (+) "natural" result?
- (-) which might be entirely useless, especially on Windows as seen above and from the URL given in comment:7)
- an UTC offset, as per the above patch
- (+) simple and has the extra bonus property that
str(localtz)
will always work, i.e. no further fix in roadmap.py for the present issue - (-) besides the Wikipedia article linked above, there's no evidence that other programs will understand those UTC timezones…
- (+) simple and has the extra bonus property that
- something else?
Now that I spent a ridiculous amount of time getting the above patch to work, my preference would be 2. ;-)
follow-ups: 12 13 comment:11 by , 12 years ago
- is probably useless. My preference is 2.
(OT: What's the trick again to avoid starting a <ol>
when the beginning of a line is "1."?)
comment:12 by , 12 years ago
Replying to rblank:
(OT: What's the trick again to avoid starting a
<ol>
when the beginning of a line is "1."?)
It used to be a leading ``
… which is no longer invisible. Well, maybe that would count as a bug and we could just replace ``
with nothing instead of producing an empty <tt> (which looks pretty useless anyway, but even if there would be an use, then we could leave {{{}}}
for that → ).
1. this is 1. at the start of a normal paragraph.
comment:13 by , 12 years ago
comment:14 by , 12 years ago
API Changes: | modified (diff) |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
Tests added in r11422 (and well, next time I'll start with the tests).
It's a bit difficult to reproduce (on Linux I always get plain ascii timezone names), but it's not hard to see what's going on…
Thanks for the report!