Context Navigation

Modify ↓

#1224 closed defect (fixed)

Cannot use non ascii heading string

Reported by:	Shun-ichi Goto <gotoh@…>	Owned by:	Jonas Borgström
Priority:	normal	Milestone:	0.9
Component:	wiki system	Version:	devel
Severity:	normal	Keywords:
Cc:		Branch:
Release Notes:
API Changes:
Internal Changes:

Description

Today, I updated to r1278 from r1233. Then I got exception to view existing pages.

In changeset [1250], heading become having id="xxx" generated by heading string. But it assumes that (almost)alphabets or digits exist in heading string, and remove other characters with regexp 'anchor_re.sub()'.

On writing japanese (generary, non-english) wiki page, we oftenly make heading string which has japanese characters only. In this case, generated 'anchor' string goes empty by 'anchor_re.sub()' Then cause exception at line 385 in WikiFormatter.py.

This problem should be fixed before next release to allow writing non-ascii language wiki pages. Or, cuase trouble for existing japanese pages.

I don't know right fix. I'm modified WikiFormatter.py localy as workaround.

--- WikiFormatter.py	(revision 1278)
+++ WikiFormatter.py	(working copy)
@@ -382,6 +382,11 @@
         depth = min(len(fullmatch.group('hdepth')), 5)
         heading = match[depth + 1:len(match) - depth - 1]
         anchor = anchor_base = self._anchor_re.sub('', heading)
+        # gotoh: following 'if' block is workaround for kanji heading
+        # because original code assumes ascii words.
+        if len(anchor) == 0:
+            self.out.write('<h%d >%s</h%d>' % (depth, heading, depth))
+            return ''
         if anchor[0].isdigit():
             anchor = '_' + anchor
         i = 1

Attachments (0)

Change History (4)

comment:1 by Shun-ichi Goto <gotoh@…>, 20 years ago

Now I'm using this ad-hoc fix instead of patch in description above. Thus I can use link to each sections.

--- WikiFormatter.py	(revision 1278)
+++ WikiFormatter.py	(working copy)
@@ -382,7 +382,10 @@
         depth = min(len(fullmatch.group('hdepth')), 5)
         heading = match[depth + 1:len(match) - depth - 1]
         anchor = anchor_base = self._anchor_re.sub('', heading)
-        if anchor[0].isdigit():
+        # gotoh: adhoc fix to allow non-ascii anchor name.
+        if len(anchor) == 0 and 0 < len(heading):
+            anchor = anchor_base = heading
+        if len(anchor) == 0 or anchor[0].isdigit():
             anchor = '_' + anchor
         i = 1
         while anchor in self.anchors:

comment:2 by Christopher Lenz, 20 years ago

Milestone:	→ 0.9
Resolution:	→ fixed
Status:	new → closed

Fixed in [1280].

comment:3 by Shun-ichi Goto <gotoh@…>, 20 years ago

Resolution:	fixed
Severity:	critical → normal
Status:	closed → reopened

The fix in [1280] is meaningfull but still not work for japanese. It assumes target string is unicode string object but strings in _heading_formatter() is not unicode object (encoded bytes). self._anchor_re.sub() is applied against raw utf-8 bytes thus it makes broken utf-8 bytes. I should change to work like this:

--- WikiFormatter.py	(revision 1285)
+++ WikiFormatter.py	(working copy)
@@ -381,7 +381,7 @@
 
         depth = min(len(fullmatch.group('hdepth')), 5)
         heading = match[depth + 1:len(match) - depth - 1]
-        anchor = anchor_base = self._anchor_re.sub('', heading)
+        anchor = anchor_base = self._anchor_re.sub('', heading.decode('utf-8')).encode('utf-8')
         if not anchor or not anchor[0].isalpha():
             # an ID must start with a letter in HTML
             anchor = 'a' + anchor

We should decode string in other right place. But I don't know where…

comment:4 by Christopher Lenz, 20 years ago

Resolution:	→ fixed
Status:	reopened → closed

I've applied a modified version of your patch in [1296]. Works well here. Thanks a lot!

Modify Ticket

Change Properties

Summary:
Description:	Today, I updated to r1278 from r1233. Then I got exception to view existing pages. In changeset [1250], heading become having id="xxx" generated by heading string. But it assumes that (almost)alphabets or digits exist in heading string, and remove other characters with regexp 'anchor_re.sub()'. On writing japanese (generary, non-english) wiki page, we oftenly make heading string which has japanese characters only. In this case, generated 'anchor' string goes empty by 'anchor_re.sub()' Then cause exception at line 385 in WikiFormatter.py. This problem should be fixed before next release to allow writing non-ascii language wiki pages. Or, cuase trouble for existing japanese pages. I don't know right fix. I'm modified WikiFormatter.py localy as workaround. {{{ --- WikiFormatter.py (revision 1278) +++ WikiFormatter.py (working copy) @@ -382,6 +382,11 @@ depth = min(len(fullmatch.group('hdepth')), 5) heading = match[depth + 1:len(match) - depth - 1] anchor = anchor_base = self._anchor_re.sub('', heading) + # gotoh: following 'if' block is workaround for kanji heading + # because original code assumes ascii words. + if len(anchor) == 0: + self.out.write('<h%d >%s</h%d>' % (depth, heading, depth)) + return '' if anchor[0].isdigit(): anchor = '_' + anchor i = 1 }}} You may use WikiFormatting here.
Type:		Priority:
Milestone:		Component:
Version:		Severity:
Keywords:		Cc:	Set your email in Preferences
Branch:
Release Notes:
API Changes:
Internal Changes:

Action

leave as closed The owner will remain Jonas Borgström.

reopen The resolution will be deleted. Next status will be 'reopened'.

change ownership to The owner will be changed from Jonas Borgström to the specified user.

Add Comment

Your email or username:

E-mail address and name can be saved in the Preferences .

You may use WikiFormatting here.

Attachments ↑ Description ↑

Note: See TracTickets for help on using tickets.

Download in other formats: