Edgewall Software

Opened 14 years ago

Closed 14 years ago

#1821 closed defect (fixed)

Accentuated characters prevent Trac from rendering the ticket

Reported by: Emmanuel Blot Owned by: Christian Boos
Priority: normal Milestone: 0.9
Component: ticket system Version: devel
Severity: major Keywords:
Cc: Branch:
Release Notes:
API Changes:


For example, if the 'a' character with a accent (I won't submit the actual character here ;-)) is specified in a comment, Trac accepts the character, stores it in the SQL database, but then fails to render it:

Trac detected an internal error:

'ascii' codec can't encode character u'\xe2' in position 0: ordinal not in range(128)

Python traceback

Traceback (most recent call last):
  File "/local/engine/trac/trac/web/modpython_frontend.py", line 274, in handler
    dispatch_request(mpr.path_info, mpr, env)
  File "/local/engine/trac/trac/web/main.py", line 425, in dispatch_request
  File "/local/engine/trac/trac/web/main.py", line 285, in dispatch
    resp = chosen_handler.process_request(req)
  File "/local/engine/trac/trac/ticket/web_ui.py", line 194, in process_request
    self._insert_ticket_data(req, db, ticket, reporter_id)
  File "/local/engine/trac/trac/ticket/web_ui.py", line 379, in _insert_ticket_data
    changes[-1]['comment'] = wiki_to_html(new, self.env, req, db)
  File "/local/engine/trac/trac/wiki/formatter.py", line 668, in wiki_to_html
    Formatter(env, req, absurls, db).format(wikitext, out, escape_newlines)
  File "/local/engine/trac/trac/wiki/formatter.py", line 557, in format
    line = util.escape(line, False)
  File "/local/engine/trac/trac/util.py", line 55, in escape
    text = str(text).replace('&', '&') \
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe2' in position 0: ordinal not in range(128)

This means that once someone has submitted such a character, the ticket cannot be rendered anymore.

I do not know if this issue also impacts other component (as it seems to occur when Wiki text is rendered)

Attachments (0)

Change History (12)

comment:1 by Emmanuel Blot, 14 years ago

I forgot to specify the Trac version: pre-0.9, [2011]

comment:2 by Christian Boos, 14 years ago

Owner: changed from Jonas Borgström to Christian Boos
Status: newassigned

Bizarre, ça marche pour moâ ;-) (trac r2013)

Maybe a problem with your default encoding?

comment:3 by Christopher Lenz, 14 years ago

No, this is a problem with PySQLite 2.x returning unicode objects instead of UTF-8 strings. PySQLite 2 support is still experimental at this point, and the encoding issue is the reason. This problem can also be encountered in the search module.

comment:4 by Christian Boos, 14 years ago

It shouldn't be the default encoding (mine is ascii, so if that would be the problem, I'd have it too).

Are you using pysqlite2?

comment:5 by Christopher Lenz, 14 years ago

To be more specific, this is a problem with PySQLite/SQLite losing column type information in SQL UNIONs. See PySQLite ticket #102. We register a converter for text columns that encodes unicode objects to utf-8 strings (technically, it just returns the original bytestring), but that converter doesn't get triggered in queries with unions, as used by the search module and, as in this case, for generating the ticket changelog.

cboos, I think it's good style to only accept a ticket when you think you have understood the cause of the problem, and hopefully know a way to fix it.

comment:6 by Christian Boos, 14 years ago

I initially thought it was a simple matter of default encoding, and progressively realized it has to do with pysqlite2…

Nevertheless, this probelm has something to do with the default encoding, as the str(text) raises the error only because the default encoding is ascii. If the default encoding would be utf8, things would work as expected.

comment:7 by Christian Boos, 14 years ago

installed pysqlite2, reproduced the problem, quick hack… yes, it works.

Index: trac/db.py
--- trac/db.py  (revision 2013)
+++ trac/db.py  (working copy)
@@ -214,7 +214,15 @@
             # we need two converters
             sqlite.register_converter('text', str)
             sqlite.register_converter('TEXT', str)
+            # ...but this is not enough, as type information is lost
+            # in queries using UNION.
+            # Changing the default encoding makes `str(x)` succeed
+            # when `x` is an unicode string.
+            import sys
+            reload(sys)
+            sys.setdefaultencoding('utf8')
+            del sys.setdefaultencoding
             cnx = sqlite.connect(path, detect_types=sqlite.PARSE_DECLTYPES,
                                  check_same_thread=False, timeout=timeout)

Disclaimer: I didn't invent the ugly reload/del thing :)

An alternative would be to use type annotations for the affected queries, but that will affect portability, I guess.

OTOH, there seems to be no adverse side-effects with the above hack. What do you think?

comment:8 by Jonas Borgström, 14 years ago

pysqlite2 sometimes returns strings as unicode object instead of the utf-8 encoded "plain" strings used everywhere else in Trac, that causes things to break. The sys.setdefaultencoding hack might work but is not the correct way to fix this. Please don't apply this patch.

comment:9 by Jonas Borgström, 14 years ago

I've added a workaround in [2023] that makes sure pysqlite2 never returns unicode strings. If this works and do not introduce any unacceptable performance penalty we can probably close this ticket.

comment:10 by Emmanuel Blot, 14 years ago

I'm not sure to understand the proper way to fix up this issue: Do I need to change the default encoding afterall (and where is the default setting defined ?)


comment:11 by Christian Boos, 14 years ago

Normally upgrading to the latest revision (r2027) should be enough. If it works afterwards, feel free to close this ticket.

comment:12 by Emmanuel Blot, 14 years ago

Resolution: fixed
Status: assignedclosed

I'm closing this ticket, since the fix seems to work gracefully.
Test performed w/ [2033].

Modify Ticket

Change Properties
Set your email in Preferences
as closed The owner will remain Christian Boos.
The resolution will be deleted.
to The owner will be changed from Christian Boos to the specified user.

Add Comment

E-mail address and name can be saved in the Preferences .
Note: See TracTickets for help on using tickets.