#3058 closed defect (fixed)
Bug of the wiki compiler
| Reported by: | Owned by: | Christian Boos | |
|---|---|---|---|
| Priority: | highest | Milestone: | 0.9.6 |
| Component: | general | Version: | 0.9.5 |
| Severity: | normal | Keywords: | unicode |
| Cc: | Branch: | ||
| Release Notes: | |||
| API Changes: | |||
| Internal Changes: | |||
Description (last modified by )
When compiling a page of a project of mine, the compiler crashes without any reason. By seeveral tests, I isolated the problem: it happens when I add to the page the following string: " === Indice di Priorità ==="
In the following is the python traceback:
Traceback (most recent call last):
File "C:\Python24\lib\site-packages\trac\web\standalone.py", line 303, in _do_trac_req
dispatch_request(path_info, req, env)
File "C:\Python24\lib\site-packages\trac\web\main.py", line 139, in dispatch_request
dispatcher.dispatch(req)
File "C:\Python24\lib\site-packages\trac\web\main.py", line 107, in dispatch
resp = chosen_handler.process_request(req)
File "C:\Python24\lib\site-packages\trac\wiki\web_ui.py", line 92, in process_request
self._render_editor(req, db, page, preview=True)
File "C:\Python24\lib\site-packages\trac\wiki\web_ui.py", line 311, in _render_editor
info['page_html'] = wiki_to_html(page.text, self.env, req, db)
File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 744, in wiki_to_html
Formatter(env, req, absurls, db).format(wikitext, out, escape_newlines)
File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 599, in format
result = re.sub(self.rules, self.replace, line)
File "C:\Python24\lib\sre.py", line 142, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 221, in replace
return getattr(self, '_' + itype + '_formatter')(match, fullmatch)
File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 389, in _heading_formatter
anchor = self._anchor_re.sub('', sans_markup.decode('utf-8'))
File "C:\Python24\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 17: unexpected end of data
Attachments (0)
Change History (5)
comment:1 by , 20 years ago
| Description: | modified (diff) |
|---|---|
| Keywords: | unicode added |
| Milestone: | → 0.9.6 |
| Owner: | changed from to |
| Priority: | normal → highest |
comment:2 by , 20 years ago
Does anybody have an idea why, on the command line, I have this:
>>> s = 'Indice di Priorit\xc3\xa0' >>> s.strip() 'Indice di Priorit\xc3\xa0'
while in Trac, the same strip() operation, on the same input, returns
'Indice di Priorit\xc3' ?
i.e.
-
formatter.py
657 660 self.out = out 658 661 self._open_tags = [] 659 662 663 print 'oneliner', type(text), `text` 664 print 'oneliner.strip', `text.strip()` 665 660 666 # Simplify code blocks 661 667 in_code_block = 0
shows:
oneliner <type 'str'> 'Indice di Priorit\xc3\xa0' oneliner.strip 'Indice di Priorit\xc3'
?
comment:3 by , 20 years ago
Answering to myself:
>>> s = 'Indice di Priorit\xc3\xa0' >>> s.strip() 'Indice di Priorit\xc3\xa0' >>> import locale >>> locale.getlocale() (None, None) >>> locale.setlocale(locale.LC_ALL, 'en') 'English_United States.1252' >>> s.strip() 'Indice di Priorit\xc3'
comment:4 by , 20 years ago
… and in cp1252, we have: A0 = U+00A0 : NO-BREAK SPACE
Yet another example of why using unicode internally is so important (0.10).
In the meantime, for this issue, a temporary conversion to unicode
could do the trick:
Index: trac/wiki/formatter.py
===================================================================
--- trac/wiki/formatter.py (revision 3213)
+++ trac/wiki/formatter.py (working copy)
@@ -21,6 +21,7 @@
import re
import os
import urllib
+import StringIO as pyStringIO
try:
from cStringIO import StringIO
@@ -660,7 +661,9 @@
# Simplify code blocks
in_code_block = 0
processor = None
- buf = StringIO()
+ buf = pyStringIO.StringIO()
+ text = unicode(text, 'utf-8', 'replace')
+
for line in text.strip().splitlines():
if line.strip() == '{{{':
in_code_block += 1
@@ -678,6 +681,7 @@
else:
print>>buf, line
result = buf.getvalue()[:-1]
+ result = result.encode('utf-8')
if shorten:
result = util.shorten_line(result)
Opinions?



Right, I can reproduce this.