Edgewall Software
Modify

Opened 20 years ago

Closed 20 years ago

Last modified 18 years ago

#1224 closed defect (fixed)

Cannot use non ascii heading string

Reported by: Shun-ichi Goto <gotoh@…> Owned by: Jonas Borgström
Priority: normal Milestone: 0.9
Component: wiki system Version: devel
Severity: normal Keywords:
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description

Today, I updated to r1278 from r1233. Then I got exception to view existing pages.

In changeset [1250], heading become having id="xxx" generated by heading string. But it assumes that (almost)alphabets or digits exist in heading string, and remove other characters with regexp 'anchor_re.sub()'.

On writing japanese (generary, non-english) wiki page, we oftenly make heading string which has japanese characters only. In this case, generated 'anchor' string goes empty by 'anchor_re.sub()' Then cause exception at line 385 in WikiFormatter.py.

This problem should be fixed before next release to allow writing non-ascii language wiki pages. Or, cuase trouble for existing japanese pages.

I don't know right fix. I'm modified WikiFormatter.py localy as workaround.

--- WikiFormatter.py	(revision 1278)
+++ WikiFormatter.py	(working copy)
@@ -382,6 +382,11 @@
         depth = min(len(fullmatch.group('hdepth')), 5)
         heading = match[depth + 1:len(match) - depth - 1]
         anchor = anchor_base = self._anchor_re.sub('', heading)
+        # gotoh: following 'if' block is workaround for kanji heading
+        # because original code assumes ascii words.
+        if len(anchor) == 0:
+            self.out.write('<h%d >%s</h%d>' % (depth, heading, depth))
+            return ''
         if anchor[0].isdigit():
             anchor = '_' + anchor
         i = 1

Attachments (0)

Change History (4)

comment:1 by Shun-ichi Goto <gotoh@…>, 20 years ago

Now I'm using this ad-hoc fix instead of patch in description above. Thus I can use link to each sections.

--- WikiFormatter.py	(revision 1278)
+++ WikiFormatter.py	(working copy)
@@ -382,7 +382,10 @@
         depth = min(len(fullmatch.group('hdepth')), 5)
         heading = match[depth + 1:len(match) - depth - 1]
         anchor = anchor_base = self._anchor_re.sub('', heading)
-        if anchor[0].isdigit():
+        # gotoh: adhoc fix to allow non-ascii anchor name.
+        if len(anchor) == 0 and 0 < len(heading):
+            anchor = anchor_base = heading
+        if len(anchor) == 0 or anchor[0].isdigit():
             anchor = '_' + anchor
         i = 1
         while anchor in self.anchors:

comment:2 by Christopher Lenz, 20 years ago

Milestone: 0.9
Resolution: fixed
Status: newclosed

Fixed in [1280].

comment:3 by Shun-ichi Goto <gotoh@…>, 20 years ago

Resolution: fixed
Severity: criticalnormal
Status: closedreopened

The fix in [1280] is meaningfull but still not work for japanese. It assumes target string is unicode string object but strings in _heading_formatter() is not unicode object (encoded bytes). self._anchor_re.sub() is applied against raw utf-8 bytes thus it makes broken utf-8 bytes. I should change to work like this:

--- WikiFormatter.py	(revision 1285)
+++ WikiFormatter.py	(working copy)
@@ -381,7 +381,7 @@
 
         depth = min(len(fullmatch.group('hdepth')), 5)
         heading = match[depth + 1:len(match) - depth - 1]
-        anchor = anchor_base = self._anchor_re.sub('', heading)
+        anchor = anchor_base = self._anchor_re.sub('', heading.decode('utf-8')).encode('utf-8')
         if not anchor or not anchor[0].isalpha():
             # an ID must start with a letter in HTML
             anchor = 'a' + anchor

We should decode string in other right place. But I don't know where…

comment:4 by Christopher Lenz, 20 years ago

Resolution: fixed
Status: reopenedclosed

I've applied a modified version of your patch in [1296]. Works well here. Thanks a lot!

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Jonas Borgström.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Jonas Borgström to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.