Edgewall Software
Modify

Opened 18 years ago

Closed 18 years ago

Last modified 19 months ago

#3787 closed defect (worksforme)

Wiki: error with utf8 char

Reported by: anonymous Owned by: Jonas Borgström
Priority: normal Milestone:
Component: wiki system Version: 0.10
Severity: normal Keywords: utf8 codec
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description (last modified by Matthew Good)

  • Problem:

When i create a text "Je suis là" or "Thanh Hà" with heading 1 format. Exception rised.

  • Python traceback:
    Traceback (most recent call last):
      File "C:\WWWServer\python\Lib\site-packages\trac\web\modpython_frontend.py", line 206, in handler
        dispatch_request(mpr.path_info, mpr, env)
      File "C:\WWWServer\python\Lib\site-packages\trac\web\main.py", line 139, in dispatch_request
        dispatcher.dispatch(req)
      File "C:\WWWServer\python\Lib\site-packages\trac\web\main.py", line 107, in dispatch
        resp = chosen_handler.process_request(req)
      File "C:\WWWServer\python\Lib\site-packages\trac\wiki\web_ui.py", line 116, in process_request
        self._render_view(req, db, page)
      File "C:\WWWServer\python\Lib\site-packages\trac\wiki\web_ui.py", line 364, in _render_view
        req.hdf['wiki.page_html'] = wiki_to_html(page.text, self.env, req)
      File "C:\WWWServer\python\Lib\site-packages\trac\wiki\formatter.py", line 744, in wiki_to_html
        Formatter(env, req, absurls, db).format(wikitext, out, escape_newlines)
      File "C:\WWWServer\python\Lib\site-packages\trac\wiki\formatter.py", line 599, in format
        result = re.sub(self.rules, self.replace, line)
      File "c:\wwwserver\python\lib\sre.py", line 142, in sub
        return _compile(pattern, 0).sub(repl, string, count)
      File "C:\WWWServer\python\Lib\site-packages\trac\wiki\formatter.py", line 221, in replace
        return getattr(self, '_' + itype + '_formatter')(match, fullmatch)
      File "C:\WWWServer\python\Lib\site-packages\trac\wiki\formatter.py", line 389, in _heading_formatter
        anchor = self._anchor_re.sub('', sans_markup.decode('utf-8'))
      File "C:\WWWServer\python\lib\encodings\utf_8.py", line 16, in decode
        return codecs.utf_8_decode(input, errors, True)
    UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 5: unexpected end of data
    

Attachments (0)

Change History (10)

comment:1 by Matthew Good, 18 years ago

Description: modified (diff)
Milestone: 0.9.7
Priority: highestnormal
Resolution: worksforme
Severity: blockernormal
Status: newclosed

Trac 0.10 has much improved support for unicode data, so this is almost certainly fixed there. There isn't going to be a 0.9.7 release (except possibly for a security fix), so you can upgrade to the 0.10rc1 release available now, or wait until the final 0.10 release later this week.

comment:2 by Christian Boos, 18 years ago

Are you sure you're running 0.9.6? Looks like a duplicate of #3058 which I fixed for milestone:0.9.6.

comment:3 by rob@…, 18 years ago

Keywords: utf8 codec added
Resolution: worksforme
Status: closedreopened
Summary: Wiki: error with utf8 char at the end of heading #1Wiki: error with utf8 char
Version: 0.9.60.10

Not certain what character is causing me grief but the symptom is identical: In reports I get:

Report execution failed: 'utf8' codec can't decode byte 0x85 in position 157: unexpected code byte

And viewing the ticket I get:

Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/trac/web/main.py", line 356, in dispatch_request
    dispatcher.dispatch(req)
  File "/usr/lib/python2.4/site-packages/trac/web/main.py", line 224, in dispatch
    resp = chosen_handler.process_request(req)
  File "/usr/lib/python2.4/site-packages/trac/ticket/web_ui.py", line 302, in process_request
    get_reporter_id(req, 'author'))
  File "/usr/lib/python2.4/site-packages/trac/ticket/web_ui.py", line 615, in _insert_ticket_data
    for change in self.grouped_changelog_entries(ticket, db):
  File "/usr/lib/python2.4/site-packages/trac/ticket/web_ui.py", line 662, in grouped_changelog_entries
    changelog = ticket.get_changelog(when=when, db=db)
  File "/usr/lib/python2.4/site-packages/trac/ticket/model.py", line 299, in get_changelog
    for t, author, field, oldvalue, newvalue, permanent in cursor:
  File "/usr/lib/python2.4/site-packages/trac/db/util.py", line 40, in __iter__
    row = self.cursor.fetchone()
  File "/usr/lib/python2.4/site-packages/trac/db/sqlite_backend.py", line 73, in fetchone
    return row and self._convert_row(row) or None
  File "/usr/lib/python2.4/site-packages/trac/db/sqlite_backend.py", line 69, in _convert_row
    return tuple([(isinstance(v, str) and [v.decode('utf-8')] or [v])[0]
  File "/usr/lib/python2.4/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 20-22: invalid data

We're running 0.10. Thanks.

in reply to:  3 comment:4 by Tim Hatch <trac@…>, 18 years ago

Replying to rob@widerweb.co.uk:

And viewing the ticket I get:

I think it might help if we know exactly what bytes are in that ticket. If the data is not sensitive, can you run

echo "select * from ticket where id=32" | sqlite3 /var/trac/path/to/trac.db | hexdump
echo "select * from ticket_change where ticket=32" | sqlite3 /var/trac/path/to/trac.db | hexdump

replacing the ticket number and db path with your own?

comment:5 by anonymous, 18 years ago

I have run into a similar issue - the relevant bytes are:

[[[ 61 4C 69 74 75 74 65 64 3A 20 2B 20 38 33 33 2E 33 36 33 37 5B B0 62 5B 5D 72 0A 5D 6F 4C 67 6E 74 69 64 75 3A 65 2D 20 2E 30 34 34 36 34 31 35 5B 5B 72 62 5D 5D ]]]

And the "special" character was the degree (°) symbol.

Converting the above hex block to text shows that the text appears to be stored as swapped bytes ("aLituted" for "Latitude"), and the degree symbol is encoded as a Latin-1 character (0xB0) rather than UTF8 form (0xC2 0xB0). I'm not sure if either of those are to be expected - the server was running on a little-endian machine, with Trac 0.10.

The error I'm getting when trying to view this ticket is "codec can't decode byte 0xb0 in position 674", so presumably the data has been stored as non-UTF8 but is being interpreted as UTF8 on retrieval.

comment:6 by anonymous, 18 years ago

I have found this problem on my trac environment too:

...
  File "/usr/lib/python2.3/site-packages/trac/ticket/model.py", line 541, in select
    for name, owner, description in cursor:
  File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 40, in __iter__
    row = self.cursor.fetchone()
  File "/usr/lib/python2.3/site-packages/pyPgSQL/PgSQL.py", line 3139, in fetchone
    return self.__fetchOneRow()
  File "/usr/lib/python2.3/site-packages/pyPgSQL/PgSQL.py", line 2776, in __fetchOneRow
    _r.getvalue(self._idx_, _i)))
  File "/usr/lib/python2.3/site-packages/pyPgSQL/PgSQL.py", line 821, in typecast
    return unicode(value, *self.__conn.client_encoding)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb1 in position 20: unexpected code byte

I cleared all national characters from ticket and ticket_change tables and there was problem in component table. After clearing component table it works fine but with no Polish accents :(

Thanks a lot for a great software.

comment:7 by anonymous, 18 years ago

Ok, I will add my voice to this. We are seeing this error on wiki pages that we have created. Interestingly, it does not seem to be consistent in that sometimes we get an error trying to access a certain page, then at other times we can display the page fine. We can't see any pattern and we have tested this with multiply OS/browser combinations.

The one thing that seems to consistently fail trying to access TracLinks#QuotingspaceinLinks (which is part of the TracGuide.

The error is:

Traceback (most recent call last):
  File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 356, in dispatch_request
    dispatcher.dispatch(req)
  File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 224, in dispatch
    resp = chosen_handler.process_request(req)
  File "/usr/lib/python2.3/site-packages/trac/wiki/web_ui.py", line 97, in process_request
    page = WikiPage(self.env, pagename, version, db)
  File "/usr/lib/python2.3/site-packages/trac/wiki/model.py", line 32, in __init__
    self._fetch(name, version, db)
  File "/usr/lib/python2.3/site-packages/trac/wiki/model.py", line 53, in _fetch
    (name,))
  File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 47, in execute
    return self.cursor.execute(sql_escape_percent(sql), args)
  File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 47, in execute
    return self.cursor.execute(sql_escape_percent(sql), args)
  File "/usr/lib/python2.3/site-packages/MySQLdb/cursors.py", line 163, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py", line 35, in defaulterrorhandler
    raise errorclass, errorvalue
UnicodeDecodeError: 'utf8' codec can't decode byte 0x97 in position 140: unexpected code byte

in reply to:  7 comment:8 by anonymous, 18 years ago

I am having this same error.

mySQL 5, python 2.4, Apache 2.0.59

Replying to anonymous:

Ok, I will add my voice to this. We are seeing this error on wiki pages that we have created. Interestingly, it does not seem to be consistent in that sometimes we get an error trying to access a certain page, then at other times we can display the page fine. We can't see any pattern and we have tested this with multiply OS/browser combinations.

The one thing that seems to consistently fail trying to access TracLinks#QuotingspaceinLinks (which is part of the TracGuide.

The error is:

Traceback (most recent call last):
  File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 356, in dispatch_request
    dispatcher.dispatch(req)
  File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 224, in dispatch
    resp = chosen_handler.process_request(req)
  File "/usr/lib/python2.3/site-packages/trac/wiki/web_ui.py", line 97, in process_request
    page = WikiPage(self.env, pagename, version, db)
  File "/usr/lib/python2.3/site-packages/trac/wiki/model.py", line 32, in __init__
    self._fetch(name, version, db)
  File "/usr/lib/python2.3/site-packages/trac/wiki/model.py", line 53, in _fetch
    (name,))
  File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 47, in execute
    return self.cursor.execute(sql_escape_percent(sql), args)
  File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 47, in execute
    return self.cursor.execute(sql_escape_percent(sql), args)
  File "/usr/lib/python2.3/site-packages/MySQLdb/cursors.py", line 163, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py", line 35, in defaulterrorhandler
    raise errorclass, errorvalue
UnicodeDecodeError: 'utf8' codec can't decode byte 0x97 in position 140: unexpected code byte

comment:9 by Christian Boos, 18 years ago

Resolution: worksforme
Status: reopenedclosed

So, to summarize:

  • the original issue reported here is really a duplicate of #3058
  • comment:3 is a duplicate of #4037 (and probably comment:5 as well)
  • comment:6 also looks like a duplicate of #4037, although the database being PostgreSQL, it might be something else
  • comment:7 and comment:8 are probably symptoms of the classical MySqlDb incompatibilities with non-utf8 charsets, see #3884.

in reply to:  9 comment:10 by rob@…, 18 years ago

cboos, comment:3 is indeed a duplicate of #4037, caused by importing defects from Excel (stop sniggering at the back there). And the patch suggested in comment:ticket:4037:1 fixes our problem perfectly. For such an easy fix I would support it being Trac's official line to tolerate imported data, it must be such a common mistake.

Thanks for the best change management system I've ever used.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Jonas Borgström.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Jonas Borgström to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.