#1224 closed defect (fixed)
Cannot use non ascii heading string
Reported by: | Owned by: | Jonas Borgström | |
---|---|---|---|
Priority: | normal | Milestone: | 0.9 |
Component: | wiki system | Version: | devel |
Severity: | normal | Keywords: | |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
Today, I updated to r1278 from r1233. Then I got exception to view existing pages.
In changeset [1250], heading become having id="xxx" generated by heading string. But it assumes that (almost)alphabets or digits exist in heading string, and remove other characters with regexp 'anchor_re.sub()'.
On writing japanese (generary, non-english) wiki page, we oftenly make heading string which has japanese characters only. In this case, generated 'anchor' string goes empty by 'anchor_re.sub()' Then cause exception at line 385 in WikiFormatter.py.
This problem should be fixed before next release to allow writing non-ascii language wiki pages. Or, cuase trouble for existing japanese pages.
I don't know right fix. I'm modified WikiFormatter.py localy as workaround.
--- WikiFormatter.py (revision 1278) +++ WikiFormatter.py (working copy) @@ -382,6 +382,11 @@ depth = min(len(fullmatch.group('hdepth')), 5) heading = match[depth + 1:len(match) - depth - 1] anchor = anchor_base = self._anchor_re.sub('', heading) + # gotoh: following 'if' block is workaround for kanji heading + # because original code assumes ascii words. + if len(anchor) == 0: + self.out.write('<h%d >%s</h%d>' % (depth, heading, depth)) + return '' if anchor[0].isdigit(): anchor = '_' + anchor i = 1
Attachments (0)
Change History (4)
comment:1 by , 20 years ago
comment:2 by , 20 years ago
Milestone: | → 0.9 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
Fixed in [1280].
comment:3 by , 20 years ago
Resolution: | fixed |
---|---|
Severity: | critical → normal |
Status: | closed → reopened |
The fix in [1280] is meaningfull but still not work for japanese. It assumes target string is unicode string object but strings in _heading_formatter() is not unicode object (encoded bytes). self._anchor_re.sub() is applied against raw utf-8 bytes thus it makes broken utf-8 bytes. I should change to work like this:
--- WikiFormatter.py (revision 1285) +++ WikiFormatter.py (working copy) @@ -381,7 +381,7 @@ depth = min(len(fullmatch.group('hdepth')), 5) heading = match[depth + 1:len(match) - depth - 1] - anchor = anchor_base = self._anchor_re.sub('', heading) + anchor = anchor_base = self._anchor_re.sub('', heading.decode('utf-8')).encode('utf-8') if not anchor or not anchor[0].isalpha(): # an ID must start with a letter in HTML anchor = 'a' + anchor
We should decode string in other right place. But I don't know where…
comment:4 by , 20 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
I've applied a modified version of your patch in [1296]. Works well here. Thanks a lot!
Now I'm using this ad-hoc fix instead of patch in description above. Thus I can use link to each sections.