#1224 closed defect (fixed)
Cannot use non ascii heading string
| Reported by: | Owned by: | Jonas Borgström | |
|---|---|---|---|
| Priority: | normal | Milestone: | 0.9 |
| Component: | wiki system | Version: | devel |
| Severity: | normal | Keywords: | |
| Cc: | Branch: | ||
| Release Notes: | |||
| API Changes: | |||
| Internal Changes: | |||
Description
Today, I updated to r1278 from r1233. Then I got exception to view existing pages.
In changeset [1250], heading become having id="xxx" generated by heading string. But it assumes that (almost)alphabets or digits exist in heading string, and remove other characters with regexp 'anchor_re.sub()'.
On writing japanese (generary, non-english) wiki page, we oftenly make heading string which has japanese characters only. In this case, generated 'anchor' string goes empty by 'anchor_re.sub()' Then cause exception at line 385 in WikiFormatter.py.
This problem should be fixed before next release to allow writing non-ascii language wiki pages. Or, cuase trouble for existing japanese pages.
I don't know right fix. I'm modified WikiFormatter.py localy as workaround.
--- WikiFormatter.py (revision 1278)
+++ WikiFormatter.py (working copy)
@@ -382,6 +382,11 @@
depth = min(len(fullmatch.group('hdepth')), 5)
heading = match[depth + 1:len(match) - depth - 1]
anchor = anchor_base = self._anchor_re.sub('', heading)
+ # gotoh: following 'if' block is workaround for kanji heading
+ # because original code assumes ascii words.
+ if len(anchor) == 0:
+ self.out.write('<h%d >%s</h%d>' % (depth, heading, depth))
+ return ''
if anchor[0].isdigit():
anchor = '_' + anchor
i = 1
Attachments (0)
Change History (4)
comment:1 by , 21 years ago
comment:2 by , 21 years ago
| Milestone: | → 0.9 |
|---|---|
| Resolution: | → fixed |
| Status: | new → closed |
Fixed in [1280].
comment:3 by , 21 years ago
| Resolution: | fixed |
|---|---|
| Severity: | critical → normal |
| Status: | closed → reopened |
The fix in [1280] is meaningfull but still not work for japanese. It assumes target string is unicode string object but strings in _heading_formatter() is not unicode object (encoded bytes). self._anchor_re.sub() is applied against raw utf-8 bytes thus it makes broken utf-8 bytes. I should change to work like this:
--- WikiFormatter.py (revision 1285)
+++ WikiFormatter.py (working copy)
@@ -381,7 +381,7 @@
depth = min(len(fullmatch.group('hdepth')), 5)
heading = match[depth + 1:len(match) - depth - 1]
- anchor = anchor_base = self._anchor_re.sub('', heading)
+ anchor = anchor_base = self._anchor_re.sub('', heading.decode('utf-8')).encode('utf-8')
if not anchor or not anchor[0].isalpha():
# an ID must start with a letter in HTML
anchor = 'a' + anchor
We should decode string in other right place. But I don't know where…
comment:4 by , 21 years ago
| Resolution: | → fixed |
|---|---|
| Status: | reopened → closed |
I've applied a modified version of your patch in [1296]. Works well here. Thanks a lot!



Now I'm using this ad-hoc fix instead of patch in description above. Thus I can use link to each sections.
--- WikiFormatter.py (revision 1278) +++ WikiFormatter.py (working copy) @@ -382,7 +382,10 @@ depth = min(len(fullmatch.group('hdepth')), 5) heading = match[depth + 1:len(match) - depth - 1] anchor = anchor_base = self._anchor_re.sub('', heading) - if anchor[0].isdigit(): + # gotoh: adhoc fix to allow non-ascii anchor name. + if len(anchor) == 0 and 0 < len(heading): + anchor = anchor_base = heading + if len(anchor) == 0 or anchor[0].isdigit(): anchor = '_' + anchor i = 1 while anchor in self.anchors: