Edgewall Software
Modify

Opened 18 years ago

Closed 18 years ago

Last modified 18 years ago

#3058 closed defect (fixed)

Bug of the wiki compiler

Reported by: m.petretta@… Owned by: Christian Boos
Priority: highest Milestone: 0.9.6
Component: general Version: 0.9.5
Severity: normal Keywords: unicode
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description (last modified by Christian Boos)

When compiling a page of a project of mine, the compiler crashes without any reason. By seeveral tests, I isolated the problem: it happens when I add to the page the following string: " === Indice di Priorità ==="

In the following is the python traceback:

Traceback (most recent call last):
  File "C:\Python24\lib\site-packages\trac\web\standalone.py", line 303, in _do_trac_req
    dispatch_request(path_info, req, env)
  File "C:\Python24\lib\site-packages\trac\web\main.py", line 139, in dispatch_request
    dispatcher.dispatch(req)
  File "C:\Python24\lib\site-packages\trac\web\main.py", line 107, in dispatch
    resp = chosen_handler.process_request(req)
  File "C:\Python24\lib\site-packages\trac\wiki\web_ui.py", line 92, in process_request
    self._render_editor(req, db, page, preview=True)
  File "C:\Python24\lib\site-packages\trac\wiki\web_ui.py", line 311, in _render_editor
    info['page_html'] = wiki_to_html(page.text, self.env, req, db)
  File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 744, in wiki_to_html
    Formatter(env, req, absurls, db).format(wikitext, out, escape_newlines)
  File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 599, in format
    result = re.sub(self.rules, self.replace, line)
  File "C:\Python24\lib\sre.py", line 142, in sub
    return _compile(pattern, 0).sub(repl, string, count)
  File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 221, in replace
    return getattr(self, '_' + itype + '_formatter')(match, fullmatch)
  File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 389, in _heading_formatter
    anchor = self._anchor_re.sub('', sans_markup.decode('utf-8'))
  File "C:\Python24\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 17: unexpected end of data

Attachments (0)

Change History (5)

comment:1 by Christian Boos, 18 years ago

Description: modified (diff)
Keywords: unicode added
Milestone: 0.9.6
Owner: changed from Jonas Borgström to Christian Boos
Priority: normalhighest

Right, I can reproduce this.

comment:2 by Christian Boos, 18 years ago

Does anybody have an idea why, on the command line, I have this:

>>> s = 'Indice di Priorit\xc3\xa0'
>>> s.strip()
'Indice di Priorit\xc3\xa0'

while in Trac, the same strip() operation, on the same input, returns 'Indice di Priorit\xc3' ?

i.e.

  • formatter.py

     
    657660        self.out = out
    658661        self._open_tags = []
    659662
     663        print 'oneliner', type(text), `text`
     664        print 'oneliner.strip', `text.strip()`
     665
    660666        # Simplify code blocks
    661667        in_code_block = 0

shows:

oneliner <type 'str'> 'Indice di Priorit\xc3\xa0'
oneliner.strip 'Indice di Priorit\xc3'

?

comment:3 by Christian Boos, 18 years ago

Answering to myself:

>>> s = 'Indice di Priorit\xc3\xa0'
>>> s.strip()
'Indice di Priorit\xc3\xa0'
>>> import locale
>>> locale.getlocale()
(None, None)
>>> locale.setlocale(locale.LC_ALL, 'en')
'English_United States.1252'
>>> s.strip()
'Indice di Priorit\xc3'

comment:4 by Christian Boos, 18 years ago

… and in cp1252, we have: A0 = U+00A0 : NO-BREAK SPACE

Yet another example of why using unicode internally is so important (0.10).

In the meantime, for this issue, a temporary conversion to unicode could do the trick:

Index: trac/wiki/formatter.py
===================================================================
--- trac/wiki/formatter.py	(revision 3213)
+++ trac/wiki/formatter.py	(working copy)
@@ -21,6 +21,7 @@
 import re
 import os
 import urllib
+import StringIO as pyStringIO
 
 try:
     from cStringIO import StringIO
@@ -660,7 +661,9 @@
         # Simplify code blocks
         in_code_block = 0
         processor = None
-        buf = StringIO()
+        buf = pyStringIO.StringIO()
+        text = unicode(text, 'utf-8', 'replace')
+
         for line in text.strip().splitlines():
             if line.strip() == '{{{':
                 in_code_block += 1
@@ -678,6 +681,7 @@
             else:
                 print>>buf, line
         result = buf.getvalue()[:-1]
+        result = result.encode('utf-8')
 
         if shorten:
             result = util.shorten_line(result)

Opinions?

comment:5 by Christian Boos, 18 years ago

Resolution: fixed
Status: newclosed

Fixed in r3236.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.