Edgewall Software
Modify

Opened 18 years ago

Closed 18 years ago

Last modified 2 years ago

#4321 closed defect (fixed)

Unicode initenv problem

Reported by: fred.stober@… Owned by: Christian Boos
Priority: normal Milestone: 0.10.4
Component: version control Version: 0.10.2
Severity: normal Keywords: svn unicode
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description

Hi!

I've had a big problem with umlauts in my svn repository. I've just converted my cvs repository to svn using cvs2svn and used trac-admin to import the repository using initenv. But something went wrong during the import. I've always got this error trying to access the timeline or using the browser to view directories with files with had an umlaut in their log message:

OperationalError: Could not decode to UTF-8 column 'message' with text …

I've had to apply this patch:

/trac/versioncontrol/svn_fs.py
643c635
<         message = self._get_prop(core.SVN_PROP_REVISION_LOG)
>         message = to_unicode(self._get_prop(core.SVN_PROP_REVISION_LOG))

(Usiing Gentoo ebuild www-apps/trac-0.10.1)

And now everything works fine!

Keep up the good work!

Attachments (0)

Change History (10)

comment:1 by Jonas Borgström, 18 years ago

Resolution: worksforme
Status: newclosed

Subversion always stores filenames and log messages as utf-8 internally. So the proposed patch should never be needed. This was probably caused by a missing or incorrect '—encoding' parameter when running the cvs2svn command. '—encoding=iso-8859-1' will probably do the trick.

in reply to:  1 comment:2 by Christian Boos, 18 years ago

Keywords: unicode added
Milestone: 0.11
Resolution: worksforme
Status: closedreopened

Replying to jonas:

Subversion always stores filenames and log messages as utf-8 internally. So the proposed patch should never be needed.

No, but a similar one which transforms the UTF-8 string to unicode should. Currently, this (last?) use of UTF-8 goes unnoticed thanks to the cache.

comment:3 by Christian Boos, 18 years ago

Resolution: fixed
Status: reopenedclosed

Different patch applied in r4394. As jonas said, we should assume that Subversion gives us proper UTF-8 encoded strings, so we use here _from_svn() instead of the more error tolerant to_unicode().

comment:4 by eli.carter@…, 18 years ago

Resolution: fixed
Status: closedreopened

This needs to be back-ported to the 0.10 line.

in reply to:  4 ; comment:5 by Christian Boos, 18 years ago

Keywords: svn added
Milestone: 0.110.10.4

Replying to eli.carter@commprove.com:
As discussed on #trac, I'm waiting for some feedback (the repr() of the message or author which is failing), but I'm quite sure that it can well be that Subversion does not always give back properly UTF-8 encoded strings (e.g. repositories created from cvs-import, like in your case). We shouldn't abort in this situation but be more robust (using to_unicode(text, 'utf-8') which will assume UTF-8 but use replacement characters if invalid UTF-8 data is found).

i.e. (patch on 0.10-stable)

  • svn_fs.py

     
    635635        self.pool = Pool(pool)
    636636        message = self._get_prop(core.SVN_PROP_REVISION_LOG)
    637637        author = self._get_prop(core.SVN_PROP_REVISION_AUTHOR)
     638        message = to_unicode(message, 'utf-8')
     639        author = to_unicode(author, 'utf-8')
    638640        date = self._get_prop(core.SVN_PROP_REVISION_DATE)
    639641        if date:
    640642            date = core.svn_time_from_cstring(date, self.pool()) / 1000000

comment:6 by Christian Boos, 18 years ago

PS: Note that this will require a resync

in reply to:  5 comment:7 by eli.carter@…, 18 years ago

Replying to cboos:

Replying to eli.carter@commprove.com:
As discussed on #trac, I'm waiting for some feedback (the repr() of the message or author which is failing),

The relevant portion of one of the log messages is:

'Struttura modificata per uniformit\xfc\xb0\x80\x80\x81\xa0 ...'

And the resync gave this error:

Command failed: 'utf8' codec can't decode bytes in position 34-39: unsupported Unicode code range

comment:8 by eli.carter@…, 18 years ago

The patch shown above for 0.10-stable works for me. trunk will need to change _from_svn(...) to to_unicode(..., 'utf-8') as well.

comment:9 by Christian Boos, 18 years ago

Ok, will do. Thanks for the feedback!

comment:10 by Christian Boos, 18 years ago

Resolution: fixed
Status: reopenedclosed

Applied in r4491 (trunk) and r4492 (0.10-stable).

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.