#3787 closed defect (worksforme)
Wiki: error with utf8 char
Reported by: | anonymous | Owned by: | Jonas Borgström |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | wiki system | Version: | 0.10 |
Severity: | normal | Keywords: | utf8 codec |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description (last modified by )
- Problem:
When i create a text "Je suis là" or "Thanh Hà" with heading 1 format. Exception rised.
- Python traceback:
Traceback (most recent call last): File "C:\WWWServer\python\Lib\site-packages\trac\web\modpython_frontend.py", line 206, in handler dispatch_request(mpr.path_info, mpr, env) File "C:\WWWServer\python\Lib\site-packages\trac\web\main.py", line 139, in dispatch_request dispatcher.dispatch(req) File "C:\WWWServer\python\Lib\site-packages\trac\web\main.py", line 107, in dispatch resp = chosen_handler.process_request(req) File "C:\WWWServer\python\Lib\site-packages\trac\wiki\web_ui.py", line 116, in process_request self._render_view(req, db, page) File "C:\WWWServer\python\Lib\site-packages\trac\wiki\web_ui.py", line 364, in _render_view req.hdf['wiki.page_html'] = wiki_to_html(page.text, self.env, req) File "C:\WWWServer\python\Lib\site-packages\trac\wiki\formatter.py", line 744, in wiki_to_html Formatter(env, req, absurls, db).format(wikitext, out, escape_newlines) File "C:\WWWServer\python\Lib\site-packages\trac\wiki\formatter.py", line 599, in format result = re.sub(self.rules, self.replace, line) File "c:\wwwserver\python\lib\sre.py", line 142, in sub return _compile(pattern, 0).sub(repl, string, count) File "C:\WWWServer\python\Lib\site-packages\trac\wiki\formatter.py", line 221, in replace return getattr(self, '_' + itype + '_formatter')(match, fullmatch) File "C:\WWWServer\python\Lib\site-packages\trac\wiki\formatter.py", line 389, in _heading_formatter anchor = self._anchor_re.sub('', sans_markup.decode('utf-8')) File "C:\WWWServer\python\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 5: unexpected end of data
Attachments (0)
Change History (10)
comment:1 by , 18 years ago
Description: | modified (diff) |
---|---|
Milestone: | 0.9.7 |
Priority: | highest → normal |
Resolution: | → worksforme |
Severity: | blocker → normal |
Status: | new → closed |
comment:2 by , 18 years ago
Are you sure you're running 0.9.6? Looks like a duplicate of #3058 which I fixed for milestone:0.9.6.
follow-up: 4 comment:3 by , 18 years ago
Keywords: | utf8 codec added |
---|---|
Resolution: | worksforme |
Status: | closed → reopened |
Summary: | Wiki: error with utf8 char at the end of heading #1 → Wiki: error with utf8 char |
Version: | 0.9.6 → 0.10 |
Not certain what character is causing me grief but the symptom is identical: In reports I get:
Report execution failed: 'utf8' codec can't decode byte 0x85 in position 157: unexpected code byte
And viewing the ticket I get:
Traceback (most recent call last): File "/usr/lib/python2.4/site-packages/trac/web/main.py", line 356, in dispatch_request dispatcher.dispatch(req) File "/usr/lib/python2.4/site-packages/trac/web/main.py", line 224, in dispatch resp = chosen_handler.process_request(req) File "/usr/lib/python2.4/site-packages/trac/ticket/web_ui.py", line 302, in process_request get_reporter_id(req, 'author')) File "/usr/lib/python2.4/site-packages/trac/ticket/web_ui.py", line 615, in _insert_ticket_data for change in self.grouped_changelog_entries(ticket, db): File "/usr/lib/python2.4/site-packages/trac/ticket/web_ui.py", line 662, in grouped_changelog_entries changelog = ticket.get_changelog(when=when, db=db) File "/usr/lib/python2.4/site-packages/trac/ticket/model.py", line 299, in get_changelog for t, author, field, oldvalue, newvalue, permanent in cursor: File "/usr/lib/python2.4/site-packages/trac/db/util.py", line 40, in __iter__ row = self.cursor.fetchone() File "/usr/lib/python2.4/site-packages/trac/db/sqlite_backend.py", line 73, in fetchone return row and self._convert_row(row) or None File "/usr/lib/python2.4/site-packages/trac/db/sqlite_backend.py", line 69, in _convert_row return tuple([(isinstance(v, str) and [v.decode('utf-8')] or [v])[0] File "/usr/lib/python2.4/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 20-22: invalid data
We're running 0.10. Thanks.
comment:4 by , 18 years ago
Replying to rob@widerweb.co.uk:
And viewing the ticket I get:
I think it might help if we know exactly what bytes are in that ticket. If the data is not sensitive, can you run
echo "select * from ticket where id=32" | sqlite3 /var/trac/path/to/trac.db | hexdump echo "select * from ticket_change where ticket=32" | sqlite3 /var/trac/path/to/trac.db | hexdump
replacing the ticket number and db path with your own?
comment:5 by , 18 years ago
I have run into a similar issue - the relevant bytes are:
[[[ 61 4C 69 74 75 74 65 64 3A 20 2B 20 38 33 33 2E 33 36 33 37 5B B0 62 5B 5D 72 0A 5D 6F 4C 67 6E 74 69 64 75 3A 65 2D 20 2E 30 34 34 36 34 31 35 5B 5B 72 62 5D 5D ]]]
And the "special" character was the degree (°) symbol.
Converting the above hex block to text shows that the text appears to be stored as swapped bytes ("aLituted" for "Latitude"), and the degree symbol is encoded as a Latin-1 character (0xB0) rather than UTF8 form (0xC2 0xB0). I'm not sure if either of those are to be expected - the server was running on a little-endian machine, with Trac 0.10.
The error I'm getting when trying to view this ticket is "codec can't decode byte 0xb0 in position 674", so presumably the data has been stored as non-UTF8 but is being interpreted as UTF8 on retrieval.
comment:6 by , 18 years ago
I have found this problem on my trac environment too:
... File "/usr/lib/python2.3/site-packages/trac/ticket/model.py", line 541, in select for name, owner, description in cursor: File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 40, in __iter__ row = self.cursor.fetchone() File "/usr/lib/python2.3/site-packages/pyPgSQL/PgSQL.py", line 3139, in fetchone return self.__fetchOneRow() File "/usr/lib/python2.3/site-packages/pyPgSQL/PgSQL.py", line 2776, in __fetchOneRow _r.getvalue(self._idx_, _i))) File "/usr/lib/python2.3/site-packages/pyPgSQL/PgSQL.py", line 821, in typecast return unicode(value, *self.__conn.client_encoding) UnicodeDecodeError: 'utf8' codec can't decode byte 0xb1 in position 20: unexpected code byte
I cleared all national characters from ticket and ticket_change tables and there was problem in component table. After clearing component table it works fine but with no Polish accents :(
Thanks a lot for a great software.
follow-up: 8 comment:7 by , 18 years ago
Ok, I will add my voice to this. We are seeing this error on wiki pages that we have created. Interestingly, it does not seem to be consistent in that sometimes we get an error trying to access a certain page, then at other times we can display the page fine. We can't see any pattern and we have tested this with multiply OS/browser combinations.
The one thing that seems to consistently fail trying to access TracLinks#QuotingspaceinLinks (which is part of the TracGuide.
The error is:
Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 356, in dispatch_request dispatcher.dispatch(req) File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 224, in dispatch resp = chosen_handler.process_request(req) File "/usr/lib/python2.3/site-packages/trac/wiki/web_ui.py", line 97, in process_request page = WikiPage(self.env, pagename, version, db) File "/usr/lib/python2.3/site-packages/trac/wiki/model.py", line 32, in __init__ self._fetch(name, version, db) File "/usr/lib/python2.3/site-packages/trac/wiki/model.py", line 53, in _fetch (name,)) File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 47, in execute return self.cursor.execute(sql_escape_percent(sql), args) File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 47, in execute return self.cursor.execute(sql_escape_percent(sql), args) File "/usr/lib/python2.3/site-packages/MySQLdb/cursors.py", line 163, in execute self.errorhandler(self, exc, value) File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py", line 35, in defaulterrorhandler raise errorclass, errorvalue UnicodeDecodeError: 'utf8' codec can't decode byte 0x97 in position 140: unexpected code byte
comment:8 by , 18 years ago
I am having this same error.
mySQL 5, python 2.4, Apache 2.0.59
Replying to anonymous:
Ok, I will add my voice to this. We are seeing this error on wiki pages that we have created. Interestingly, it does not seem to be consistent in that sometimes we get an error trying to access a certain page, then at other times we can display the page fine. We can't see any pattern and we have tested this with multiply OS/browser combinations.
The one thing that seems to consistently fail trying to access TracLinks#QuotingspaceinLinks (which is part of the TracGuide.
The error is:
Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 356, in dispatch_request dispatcher.dispatch(req) File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 224, in dispatch resp = chosen_handler.process_request(req) File "/usr/lib/python2.3/site-packages/trac/wiki/web_ui.py", line 97, in process_request page = WikiPage(self.env, pagename, version, db) File "/usr/lib/python2.3/site-packages/trac/wiki/model.py", line 32, in __init__ self._fetch(name, version, db) File "/usr/lib/python2.3/site-packages/trac/wiki/model.py", line 53, in _fetch (name,)) File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 47, in execute return self.cursor.execute(sql_escape_percent(sql), args) File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 47, in execute return self.cursor.execute(sql_escape_percent(sql), args) File "/usr/lib/python2.3/site-packages/MySQLdb/cursors.py", line 163, in execute self.errorhandler(self, exc, value) File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py", line 35, in defaulterrorhandler raise errorclass, errorvalue UnicodeDecodeError: 'utf8' codec can't decode byte 0x97 in position 140: unexpected code byte
follow-up: 10 comment:9 by , 18 years ago
Resolution: | → worksforme |
---|---|
Status: | reopened → closed |
So, to summarize:
- the original issue reported here is really a duplicate of #3058
- comment:3 is a duplicate of #4037 (and probably comment:5 as well)
- comment:6 also looks like a duplicate of #4037, although the database being PostgreSQL, it might be something else
- comment:7 and comment:8 are probably symptoms of the classical MySqlDb incompatibilities with non-utf8 charsets, see #3884.
comment:10 by , 18 years ago
cboos, comment:3 is indeed a duplicate of #4037, caused by importing defects from Excel (stop sniggering at the back there). And the patch suggested in comment:ticket:4037:1 fixes our problem perfectly. For such an easy fix I would support it being Trac's official line to tolerate imported data, it must be such a common mistake.
Thanks for the best change management system I've ever used.
Trac 0.10 has much improved support for unicode data, so this is almost certainly fixed there. There isn't going to be a 0.9.7 release (except possibly for a security fix), so you can upgrade to the 0.10rc1 release available now, or wait until the final 0.10 release later this week.