#3058 closed defect (fixed)
Bug of the wiki compiler
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | highest | Milestone: | 0.9.6 |
Component: | general | Version: | 0.9.5 |
Severity: | normal | Keywords: | unicode |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description (last modified by )
When compiling a page of a project of mine, the compiler crashes without any reason. By seeveral tests, I isolated the problem: it happens when I add to the page the following string: " === Indice di Priorità ==="
In the following is the python traceback:
Traceback (most recent call last): File "C:\Python24\lib\site-packages\trac\web\standalone.py", line 303, in _do_trac_req dispatch_request(path_info, req, env) File "C:\Python24\lib\site-packages\trac\web\main.py", line 139, in dispatch_request dispatcher.dispatch(req) File "C:\Python24\lib\site-packages\trac\web\main.py", line 107, in dispatch resp = chosen_handler.process_request(req) File "C:\Python24\lib\site-packages\trac\wiki\web_ui.py", line 92, in process_request self._render_editor(req, db, page, preview=True) File "C:\Python24\lib\site-packages\trac\wiki\web_ui.py", line 311, in _render_editor info['page_html'] = wiki_to_html(page.text, self.env, req, db) File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 744, in wiki_to_html Formatter(env, req, absurls, db).format(wikitext, out, escape_newlines) File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 599, in format result = re.sub(self.rules, self.replace, line) File "C:\Python24\lib\sre.py", line 142, in sub return _compile(pattern, 0).sub(repl, string, count) File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 221, in replace return getattr(self, '_' + itype + '_formatter')(match, fullmatch) File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 389, in _heading_formatter anchor = self._anchor_re.sub('', sans_markup.decode('utf-8')) File "C:\Python24\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 17: unexpected end of data
Attachments (0)
Change History (5)
comment:1 by , 19 years ago
Description: | modified (diff) |
---|---|
Keywords: | unicode added |
Milestone: | → 0.9.6 |
Owner: | changed from | to
Priority: | normal → highest |
comment:2 by , 19 years ago
Does anybody have an idea why, on the command line, I have this:
>>> s = 'Indice di Priorit\xc3\xa0' >>> s.strip() 'Indice di Priorit\xc3\xa0'
while in Trac, the same strip() operation, on the same input, returns
'Indice di Priorit\xc3'
?
i.e.
-
formatter.py
657 660 self.out = out 658 661 self._open_tags = [] 659 662 663 print 'oneliner', type(text), `text` 664 print 'oneliner.strip', `text.strip()` 665 660 666 # Simplify code blocks 661 667 in_code_block = 0
shows:
oneliner <type 'str'> 'Indice di Priorit\xc3\xa0' oneliner.strip 'Indice di Priorit\xc3'
?
comment:3 by , 19 years ago
Answering to myself:
>>> s = 'Indice di Priorit\xc3\xa0' >>> s.strip() 'Indice di Priorit\xc3\xa0' >>> import locale >>> locale.getlocale() (None, None) >>> locale.setlocale(locale.LC_ALL, 'en') 'English_United States.1252' >>> s.strip() 'Indice di Priorit\xc3'
comment:4 by , 19 years ago
… and in cp1252, we have: A0 = U+00A0 : NO-BREAK SPACE
Yet another example of why using unicode
internally is so important (0.10).
In the meantime, for this issue, a temporary conversion to unicode
could do the trick:
Index: trac/wiki/formatter.py =================================================================== --- trac/wiki/formatter.py (revision 3213) +++ trac/wiki/formatter.py (working copy) @@ -21,6 +21,7 @@ import re import os import urllib +import StringIO as pyStringIO try: from cStringIO import StringIO @@ -660,7 +661,9 @@ # Simplify code blocks in_code_block = 0 processor = None - buf = StringIO() + buf = pyStringIO.StringIO() + text = unicode(text, 'utf-8', 'replace') + for line in text.strip().splitlines(): if line.strip() == '{{{': in_code_block += 1 @@ -678,6 +681,7 @@ else: print>>buf, line result = buf.getvalue()[:-1] + result = result.encode('utf-8') if shorten: result = util.shorten_line(result)
Opinions?
Right, I can reproduce this.