Edgewall Software
Modify

Opened 19 years ago

Closed 19 years ago

#2868 closed defect (duplicate)

garbled unicode chars in inline diff view

Reported by: Andrew Stromnov Owned by: Jonas Borgström
Priority: normal Milestone:
Component: version control/changeset view Version: devel
Severity: minor Keywords: diff unicode
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description

garbled unicode chars in highlighted inline diff view

Example: diff between str1="АШИПКА" and str2="ОШИБКА"

  1. str1 and str2 passed to markup_intraline_changes as raw strings (not unicode) as '\xd0\x90\xd0\xa8\xd0\x98\xd0\x9f\xd0\x9a\xd0\x90' and '\xd0\x9e\xd0\xa8\xd0\x98\xd0\x91\xd0\x9a\xd0\x90' accordingly
  2. Then str1 and str2 passed to _get_change_extent. But in this raw strings extent calculated from '\x9e' (second octet of UTF8 char).
  3. Results after tag substitution: '\xd0<del>\x90\xd0\xa8\xd0\x98\xd0\x9f</del>\xd0\x9a\xd0\x90' and '\xd0<add>\x9e\xd0\xa8\xd0\x98\xd0\x91</add>\xd0\x9a\xd0\x90'. First UTF8 chars are broken.

Possible fix: use unicode strings for extent calculation.

Quick (and dirty) hack:

--- diff.py.orig	Mon Mar 13 13:43:21 2006
+++ diff.py	Mon Mar 13 15:26:11 2006
@@ -148,6 +147,11 @@
             if tag == 'replace' and i2 - i1 == j2 - j1:
                 for i in range(i2 - i1):
                     fromline, toline = fromlines[i1 + i], tolines[j1 + i]
+		    
+                    fromline, toline = fromline.decode('utf8'), toline.decode('utf8')
+		    
                     (start, end) = _get_change_extent(fromline, toline)
 
                     if start == 0 and end < 0:
@@ -170,6 +174,12 @@
                         tolines[j1 + i] = toline[:start] + '\0' + \
                                           toline[start:end] + '\1' + \
                                           toline[end:]
+                    
+                    fromlines[i1 + i] = fromlines[i1 + i].encode('utf8')
+                    tolines[j1 + i] = tolines[j1 + i].encode('utf8')
+		    
             yield tag, i1, i2, j1, j2
 
     changes = []

Attachments (0)

Change History (2)

comment:1 by Christian Boos, 19 years ago

Keywords: unicode added
Milestone: 0.11

comment:2 by Christian Boos, 19 years ago

Milestone: 0.11
Resolution: duplicate
Status: newclosed

Duplicate of #2363

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Jonas Borgström.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Jonas Borgström to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.