Opened 16 years ago
Closed 15 years ago
#8276 closed defect (fixed)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 94-96: ordinal not in range(128)
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | high | Milestone: | 0.11.6 |
Component: | ticket system | Version: | 0.11.5 |
Severity: | major | Keywords: | config unicode review |
Cc: | tumma72@…, osimons, felix.schwarz@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
How to Reproduce
While doing a POST operation on /newticket
, Trac issued an internal error.
(please provide additional details here)
Request parameters:
{'__FORM_TOKEN': u'10f55ceac7d80a875397f225', 'field_description': u'\u6d4b\u8bd5\u4e00\u4e0b\u4e2d\u6587', 'field_drp_resources': u'', 'field_owner': u'liuyun', 'field_remaining_time': u'', 'field_reporter': u'admin', 'field_sprint': u'\u63d0\u4f9b\u53ef\u6d4b\u8bd5\u7684\u7248\u672c', 'field_status': u'new', 'field_summary': u'\u6d4b\u8bd5\u4e00\u4e0b\u4e2d\u6587', 'field_type': u'task', 'submit': u'Create ticket'}
User Agent was: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
System Information
Trac | 0.11.4
|
Python | 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)]
|
setuptools | 0.6c9
|
SQLite | 3.6.11
|
pysqlite | 2.5.5
|
Genshi | 0.5.1
|
mod_python | 3.3.1
|
Agilo | 0.7.3.3-r1417-20090313
|
jQuery: | 1.2.6
|
Python Traceback
Attachments (3)
Change History (31)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
While doing a POST operation on /newticket, Trac issued an internal error.
(please provide additional details here)
comment:2 by , 16 years ago
While doing a POST operation on /newticket, Trac issued an internal error.
(please provide additional details here)
comment:3 by , 16 years ago
It seem the traceback went missing in the description. Could you please find the traceback corresponding to this issue in your Trac log, and paste it here? Thanks.
comment:4 by , 15 years ago
Keywords: | needinfo added |
---|---|
Milestone: | 0.11.5 |
follow-up: 6 comment:5 by , 15 years ago
I just had the same issue. It happens simply when inserting non-ASCII characters in input, AFAIK. Here you go:
# File "/usr/lib/python2.5/site-packages/trac/web/main.py", line 423, in _dispatch_request Code fragment:
- try:
- if not env and env_error:
- raise HTTPInternalError(env_error)
- try:
- dispatcher = RequestDispatcher(env)
- dispatcher.dispatch(req)
- except RequestDone:
- pass
- resp = req._response or []
- except HTTPException, e:
Local variables: Name Value after [u' except RequestDone:', u' pass', u' resp = … before [u' try:', u' if not env and env_error:', u' raise … dispatcher <trac.web.main.RequestDispatcher object at 0xacb86ec> e UnicodeEncodeError('ascii', u'But then again, that shouldn\'t be … env <trac.env.Environment object at 0xa929e8c> env_error None exc_info (<type 'exceptions.UnicodeEncodeError'>, UnicodeEncodeError('ascii', u'But … filename '/usr/lib/python2.5/site-packages/trac/web/main.py' frames [{'function': '_dispatch_request', 'lines_before': [u' try:', u' … has_admin True line u' dispatcher.dispatch(req)' lineno 422 message u"UnicodeEncodeError: 'ascii' codec can't encode characters in position … req <Request "POST u'/ticket/222'"> resp [] tb <traceback object at 0xb08d25c> tb_hide None traceback 'Traceback (most recent call last):\n File … # File "/usr/lib/python2.5/site-packages/trac/web/main.py", line 197, in dispatch Code fragment:
- req.args.get('FORM_TOKEN') != req.form_token:
- raise HTTPBadRequest('Missing or invalid form token. '
- 'Do you have cookies enabled?')
- # Process the request and render the template
- resp = chosen_handler.process_request(req)
- if resp:
- if len(resp) == 2: # Clearsilver
- chrome.populate_hdf(req)
- template, content_type = \
- self._post_process_request(req, *resp)
Local variables: Name Value chosen_handler <trac.ticket.web_ui.TicketModule object at 0xacb874c> chrome <trac.web.chrome.Chrome object at 0xacac72c> ctype 'application/x-www-form-urlencoded' err (<type 'exceptions.UnicodeEncodeError'>, UnicodeEncodeError('ascii', u'But … handler <trac.ticket.web_ui.TicketModule object at 0xacb874c> options {} req <Request "POST u'/ticket/222'"> self <trac.web.main.RequestDispatcher object at 0xacb86ec> # File "/usr/lib/python2.5/site-packages/trac/ticket/web_ui.py", line 170, in process_request Code fragment:
- def process_request(self, req):
- if 'id' in req.args:
- if req.path_info.startswith('/newticket'):
- raise TracError(_("id can't be set for a new ticket request."))
- return self._process_ticket_request(req)
- return self._process_newticket_request(req)
- # ITemplateProvider methods
- def get_htdocs_dirs(self):
Local variables: Name Value req <Request "POST u'/ticket/222'"> self <trac.ticket.web_ui.TicketModule object at 0xacb874c> # File "/usr/lib/python2.5/site-packages/trac/ticket/web_ui.py", line 494, in _process_ticket_request Code fragment:
- # things.
- valid = self._validate_ticket(req, ticket) and not problems
- if 'preview' not in req.args:
- if valid:
- # redirected if successful
- self._do_save(req, ticket, action)
- # else fall through in a preview
- req.argspreview = True
- # Preview an existing ticket (after a Preview or a failed Save)
- data.update({
Local variables: Name Value action u'leave' actions ['leave', 'resolve', 'reassign'] data {'comment': None, 'preserve_newlines': False, 'ticket': … field_changes {} id 222 problems [] req <Request "POST u'/ticket/222'"> self <trac.ticket.web_ui.TicketModule object at 0xacb874c> ticket <trac.ticket.model.Ticket object at 0xaf7376c> valid True version None # File "/usr/lib/python2.5/site-packages/trac/ticket/web_ui.py", line 989, in _do_save Code fragment:
- # — Save changes
- now = datetime.now(utc)
- if ticket.save_changes(get_reporter_id(req, 'author'),
- req.args.get('comment'), when=now,
- cnum=internal_cnum):
- try:
- tn = TicketNotifyEmail(self.env)
- tn.notify(ticket, newticket=False, modtime=now)
- except Exception, e:
- self.log.exception("Failure sending notification on change to "
Local variables: Name Value action u'leave' cnum u'4' controllers [<trac.ticket.default_workflow.ConfigurableTicketWorkflow object at … internal_cnum u'4' now datetime.datetime(2009, 6, 29, 20, 0, 59, 523283, tzinfo≤FixedOffset … replyto u req <Request "POST u'/ticket/222'"> self <trac.ticket.web_ui.TicketModule object at 0xacb874c> ticket <trac.ticket.model.Ticket object at 0xaf7376c> # File "/usr/lib/python2.5/site-packages/trac/ticket/model.py", line 282, in save_changes Code fragment:
- old_values = self._old
- self._old = {}
- self.time_changed = when
- for listener in TicketSystem(self.env).change_listeners:
- listener.ticket_changed(self, comment, author, old_values)
- return True
- def get_changelog(self, when=None, db=None):
- """Return the changelog as a list of tuples of the form
- (time, author, field, oldvalue, newvalue, permanent).
Local variables: Name Value author u'sjors' cc cclist [] cnum u'4' comment u'But then again, that shouldn\'t be necessary. Around the community, this … cursor <trac.db.util.IterableCursor object at 0xafa526c> custom_fields [] db <trac.db.pool.PooledConnection object at 0xafa104c> f {'type': 'text', 'name': 'cc', 'label': 'Cc'} handle_ta True listener <likebackplugin.likebackplugin.LikeBackPlugin object at 0xacb85ec> old_values {} self <trac.ticket.model.Ticket object at 0xaf7376c> when datetime.datetime(2009, 6, 29, 20, 0, 59, 523283, tzinfo≤FixedOffset … when_ts 1246305659 # File "build/bdist.linux-i686/egg/likebackplugin/likebackplugin.py", line 39, in ticket_changed Local variables: Name Value author u'sjors' comment u'But then again, that shouldn\'t be necessary. Around the community, this … old_values {} self <likebackplugin.likebackplugin.LikeBackPlugin object at 0xacb85ec> ticket <trac.ticket.model.Ticket object at 0xaf7376c> values {'comment': u'But then again, that shouldn\'t be necessary. Around the … # File "/usr/lib/python2.5/urllib.py", line 1250, in urlencode
comment:6 by , 15 years ago
Replying to dazjorz@…:
I just had the same issue.
No, you have an issue with the "likebackplugin" (no idea what this is), so please report that error to that plugin's maintainer.
comment:8 by , 15 years ago
Keywords: | needinfo removed |
---|---|
Resolution: | → wontfix |
Status: | new → closed |
And for the original issue (in description), this looks like an AgiloForScrum issue (Agilo 0.7.3.3-r1417-20090313
).
Please report it to them.
comment:9 by , 15 years ago
Keywords: | config unicode added |
---|---|
Resolution: | wontfix |
Severity: | critical → major |
Status: | closed → reopened |
Version: | 0.11.4 → 0.11.5 |
Hi cboos, I had a look at this, and effectively the bug comes out with AgiloForScrum, but it is not an agilo bug, if you try to save a value using the config API that contains unicode characters inside, it will fail, and also break the whole config and trac environment. Also other plugins are using the Configuration to save data into the trac.ini, and for example the labels of custom fields are quite a natural place where people would like to use localized strings.
For what I found out, the config.py is "encoding" in utf-8 in two places… one in the Section.set() if the value is not None, and another time in the save() of the Configuration. Now till there is no true unicode character in the string, the double encode is not generating any error, as soon as there is a real unicode sequence inside… it breaks. At least I would suggest to put there a safeguard, something like:
for key, val in options: if key in self[section].overridden: fileobj.write('# %s = <inherited>\n' % key) else: val = val.replace(CRLF, '\n').replace('\n', '\n ') try: val = val.encode('utf-8') except UnicodeDecodeError, e: continue fileobj.write('%s = %s\n' % (key, val))
So at least only a property will be skipped and not the whole config messed up, causing significant issues in having to reconfigure the whole project manually. If I come up with something more sensible than this, I'll let you know :-)
comment:10 by , 15 years ago
Here you go, I made it a patch, it is just a parachute for now, you may come up with something more sensible, but at least this avoids to destroy the whole config file, and catches alse error in the coercion to 'ascii' while writing to the file.
Index: trac/config.py =================================================================== --- trac/config.py (revision 8367) +++ trac/config.py (working copy) @@ -208,8 +208,15 @@ fileobj.write('# %s = <inherited>\n' % key) else: val = val.replace(CRLF, '\n').replace('\n', '\n ') - fileobj.write('%s = %s\n' % (key, - val.encode('utf-8'))) + try: + val = val.encode('utf-8') + except UnicodeDecodeError, e: + pass # we ignore this + try: + fileobj.write('%s = %s\n' % (key, val)) + except UnicodeDecodeError, e: + continue # we go forward writing the rest fileobj.write('\n') finally: fileobj.close()
Also attached.
HTH Best ANdreaT
by , 15 years ago
Attachment: | config.patch added |
---|
config.patch to guard from unicode properties values written in the config file
comment:11 by , 15 years ago
Well, val
must be an unicode
object at this point. If not, then it's a bug and we should fix that. Do you have any reproduction recipe I could try out?
I do have several non-ascii stuff in my trac.ini, and it's working fine, e.g.
[ticket-custom] changelog = textarea changelog.cols = 50 changelog.label = Changelög changelog.rows = 3
comment:12 by , 15 years ago
Cc: | added |
---|
Ok, may be you can try to help me, cause as of now I am getting puzzled myself :-) So I try to explain… if I send the string in unicode from inside an input field of a web form, it get in as req.args argument in unicode… apparently, and it is something like
u'Bl\xf8d'
that to me looks ok. See the console output of the server, with repr() and type()
Serving on 0.0.0.0:8001 view at http://127.0.0.1:8001/ Storing: blood.label u'Bl\xf8d' <type 'unicode'> Storing: blood.order u'0' <type 'unicode'> Storing: blood u'text' <type 'unicode'>
after this print I call the self.env.config.set() passing the parameters… after the self.env.config.save() and the result is an exception in the following code fragment:
if key in self[section].overridden: fileobj.write('# %s = <inherited>\n' % key) else: val = val.replace(CRLF, '\n').replace('\n', '\n ') fileobj.write('%s = %s\n' % (key, val.encode('utf-8'))) fileobj.write('\n') finally: fileobj.close() self._old_sections = deepcopy(self.parser._sections) except Exception:
with Local variables:
Name | Value |
current | False |
default | None |
fileobj | <closed file '/var/lib/trac/test_me_more/conf/trac.ini', mode 'w' at … |
key | u'blood.label' |
option | 'max_size' |
options | [('blood', u'text'), (u'blood.label', u'Bl\xf8d'), (u'blood.order', u'0')] |
section | 'ticket-custom' |
sections | [('account-manager', [('authentication_url', u), ('force_passwd_change', … |
self | <Configuration '/var/lib/trac/test_me_more/conf/trac.ini'> |
val | u'Bl\xf8d' |
Following the trace the error appears to be in the coercion of the utf-encoded string into the string to write into the file… here it breaks and the whole config is ending at the last valid line… messing up everything :-) So from here my suggestion to put at least a guard (see patch).
Now the puzzling part… if I write a simple python script that does the same thing on the same environment works:
# -*- coding: utf-8 -*- from trac.env import Environment from trac.util.text import to_unicode def main(): """This should break the config with unicode characters""" env = Environment('/var/lib/trac/test_me_more') value = u'Bl\xf8d' env.config.set('break-me', 'test', value) env.config.save() if __name__ == '__main__': main()
So any idea? I am really starting to think that the mixture of headache and Friday evening is playing a role on me ;-)
Thanks ANdreaT
comment:13 by , 15 years ago
I forgot the trivial part… the error is:
Trac detected an internal error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)
That is the Decoding to ASCII that apparently is done at line 212 of the piece above that is the call to val.encode('utf-8')
Sorry for omitting this in the before post :-(
Ciao ANdreaT
comment:14 by , 15 years ago
Don't hate me I am trying to figure this out… the same error is happening if you take the sting in unicode and you try to encode it twice, that is what according to my first analysis was happening:
>>> test = u'Bl\xf8d' >>> test.encode('utf-8') 'Bl\xc3\xb8d' >>> test.encode('utf-8').encode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128) >>>
It is the exact same error, I thought the reason is that the first time the encoding is done in config.py 410:
if value is None: self.overridden[name] = True value = '' else: ==> value = to_unicode(value).encode('utf-8') return self.config.parser.set(self.name, name, value)
And than again here, in save() config.py 210:
if key in self[section].overridden: fileobj.write('# %s = <inherited>\n' % key) else: val = val.replace(CRLF, '\n').replace('\n', '\n ') fileobj.write('%s = %s\n' % (key, ==> val.encode('utf-8'))) fileobj.write('\n')
But as said if you try to write directly it seems to work :-(
Ideas?
comment:15 by , 15 years ago
Strange indeed. I added this change:
-
trac/config.py
181 181 """Write the configuration options to the primary file.""" 182 182 if not self.filename: 183 183 return 184 185 self.set('break-me', 'test', u'Bl\xf8d') 184 186 185 187 # Only save options that differ from the defaults 186 188 sections = []
And triggered a save in the web admin, then looked at the .ini:
[break-me] test = Blød
So if I understand you correctly, you would have got an error instead. Can you try? If it works for you then I didn't understand you correctly and you could give me some diff to reproduce the issue :-)
comment:16 by , 15 years ago
Yes it is breaking if I do it from an Admin page module… As said with the script on the environment is working just fine. Something wired must happen that I am not seeing here. I made another test, from an admin page trying to write the config, as per the example you made:
self.env.config.set('break-me', 'test', u'Bl\xf8d') print "Saving the config..." for opt, value in self.env.config.parser._sections['break-me'].items(): print opt, repr(value) self.env.config.save()
It is breaking and the output on the console is:
Saving the config... __name__ 'break-me' test 'Bl\xc3\xb8d'
As you can see the test key is already an encoded string… this happens because the set() method is encoding it… that when the save() is called it is encoded again… than it explodes, with the same error as if you would do:
>>> test = 'Bl\xc3\xb8d' >>> test.encode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)
So I am really puzzled :-(
follow-up: 18 comment:17 by , 15 years ago
I also can see:
__name__ 'break-me' test 'Bl\xc3\xb8d'
This is expected. I think you missed the fact that the ConfigParser class is not unicode aware, so we have to store keys and values in the parser
as str
objects, utf-8 encoded. So that's what set()
does.
I think we actually don't do it yet for keys or even section names, and that counts as a bug ;-)
Then, in save()
, we retrieve the values from that ConfigParser and use to_unicode()
, which produces an unicode
value from the str
(even if by extraordinaire you'd get an unicode
value out of the parser, to_unicode(uni)
yields uni
unmodified).
Last time I looked (0.7.2 I think), you had some wrapping code around the config. Maybe now you're doing something more intrusive?
Again, the problem comes from the fact you're having a str
object instead of an unicode
one, and in the trac/config.py save()
code I don't yet see how this can happen.
Maybe apply:
-
trac/config.py
181 181 """Write the configuration options to the primary file.""" 182 182 if not self.filename: 183 183 return 184 185 self.set('break-me', 'test', u'Bl\xf8d') 186 for opt, value in self.parser._sections['break-me'].items(): 187 print opt, repr(value) 184 188 185 189 # Only save options that differ from the defaults 186 190 sections = [] … … 192 196 default = self.parent.get(section, option) 193 197 current = self.parser.has_option(section, option) and \ 194 198 to_unicode(self.parser.get(section, option)) 199 print repr(option), repr(current) 195 200 if current is not False and current != default: 196 201 options.append((option, current)) 197 202 if options: 198 203 sections.append((section, sorted(options))) 204 print repr(sections) 199 205 200 206 try: 201 207 fileobj = open(self.filename, 'w')
and from there we can see what we have for differences?
I get:
test 'Bl\xc3\xb8d' ... 'test' u'Bl\xf8d' ... [... ('break-me', [('test', u'Bl\xf8d')]),...]
comment:18 by , 15 years ago
Milestone: | → 0.11.6 |
---|---|
Owner: | set to |
Status: | reopened → new |
I think we actually don't do it yet for keys or even section names, and that counts as a bug ;-)
After chatting with Andrea, we found out (he found out actually ;-) ), that the above is really the cause of the backtrace: if an unicode
input is given for the key in config.set(section, key, value)
, then it's stored as unicode in the ConfigParser and later the "%s = %s\n" = (key, val.encode('utf-8'))
line will fail, as key being unicode here will force the conversion of the utf-8 str to unicode…
Working on a fix to accept unicode
input on section
and key
arguments of the methods, as it should be.
comment:19 by , 15 years ago
Thanks Christian, in the meanwhile for the AgiloForScrum users, there will be a new release fixing this issue today… around 16:00 GMT+2 :-)
by , 15 years ago
Attachment: | t8276-unicode-for-sections-and-keys-r8424.diff added |
---|
Consistently handle conversion to and from utf-8 when storing resp. retrieving from the ConfigParser, not only for values but also for section and key names.
comment:20 by , 15 years ago
Cc: | added |
---|---|
Keywords: | review added |
Status: | new → assigned |
With attachment:t8276-unicode-for-sections-and-keys-r8424.diff, the section names and key names can be given to the Config and Section methods as unicode and they will be properly encoded to UTF-8 strings before being stored in the parser.
I've also renamed the name method argument to key, in order to better distinguish it from the section name (self.name
in Section class).
Please review and test.
comment:21 by , 15 years ago
Works for me thanks :-) I had to rename the call in a couple of places… but works fine now ;-)
comment:22 by , 15 years ago
Patch makes sense, tests pass. I've installed it on my dev setup with various plugins+++, and all seems to work well.
Should be OK to commit as far as I can tell.
comment:23 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Great, thanks for the review both of you!
Patch applied in [8458].
comment:24 by , 15 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Just found one problem with the change.
Now the config keys are returned as unicode objects, which is good except in some places where unicode objects can't be used, e.g. in the following code:
Traceback (most recent call last): File "C:\Workspace\src\trac\repos\trunk\trac\web\main.py", line 467, in _dispatch_request dispatcher.dispatch(req) File "C:\Workspace\src\trac\repos\trunk\trac\web\main.py", line 212, in dispatch resp = chosen_handler.process_request(req) File "C:\Workspace\src\trac\repos\trunk\trac\ticket\web_ui.py", line 190, in process_request return self._process_ticket_request(req) File "C:\Workspace\src\trac\repos\trunk\trac\ticket\web_ui.py", line 540, in _process_ticket_request get_reporter_id(req, 'author'), field_changes) File "C:\Workspace\src\trac\repos\trunk\trac\ticket\web_ui.py", line 1231, in _insert_ticket_data fields = self._prepare_fields(req, ticket) File "C:\Workspace\src\trac\repos\trunk\trac\ticket\web_ui.py", line 1132, in _prepare_fields field['rendered'] = self._query_link(req, name, ticket[name]) File "C:\Workspace\src\trac\repos\trunk\trac\ticket\web_ui.py", line 1103, in _query_link return tag.a(text or value, href=req.href.query(**args)) TypeError: <lambda>() keywords must be strings
args
is the following:
{'status': u'!closed', u'customer': ...
Before, it used to be simply 'customer'
(it's the name of ticket custom field).
comment:25 by , 15 years ago
Cc: | added |
---|
comment:26 by , 15 years ago
So after the merge of r8458 as a part of r8469, I also committed the fix for the error shown in comment:24, as r8471.
This particular error was only concerning trunk, but I also spotted a few other places which could benefit from a similar fix. The question is, should r8471 (and the side fix for Href in r8470) be backported for 0.11.6dev?
comment:27 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
User Agent was: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)