Context Navigation

Modify ↓

#4321 closed defect (fixed)

Unicode initenv problem

Reported by:	fred.stober@…	Owned by:	Christian Boos
Priority:	normal	Milestone:	0.10.4
Component:	version control	Version:	0.10.2
Severity:	normal	Keywords:	svn unicode
Cc:		Branch:
Release Notes:
API Changes:
Internal Changes:

Description

Hi!

I've had a big problem with umlauts in my svn repository. I've just converted my cvs repository to svn using cvs2svn and used trac-admin to import the repository using initenv. But something went wrong during the import. I've always got this error trying to access the timeline or using the browser to view directories with files with had an umlaut in their log message:

OperationalError: Could not decode to UTF-8 column 'message' with text …

I've had to apply this patch:

/trac/versioncontrol/svn_fs.py
643c635
<         message = self._get_prop(core.SVN_PROP_REVISION_LOG)
>         message = to_unicode(self._get_prop(core.SVN_PROP_REVISION_LOG))

(Usiing Gentoo ebuild www-apps/trac-0.10.1)

And now everything works fine!

Keep up the good work!

Attachments (0)

Change History (10)

follow-up: 2 comment:1 by Jonas Borgström, 19 years ago

Resolution:	→ worksforme
Status:	new → closed

Subversion always stores filenames and log messages as utf-8 internally. So the proposed patch should never be needed. This was probably caused by a missing or incorrect '—encoding' parameter when running the cvs2svn command. '—encoding=iso-8859-1' will probably do the trick.

in reply to: 1 comment:2 by Christian Boos, 19 years ago

Keywords:	unicode added
Milestone:	→ 0.11
Resolution:	worksforme
Status:	closed → reopened

Replying to jonas:

Subversion always stores filenames and log messages as utf-8 internally. So the proposed patch should never be needed.

No, but a similar one which transforms the UTF-8 string to unicode should. Currently, this (last?) use of UTF-8 goes unnoticed thanks to the cache.

comment:3 by Christian Boos, 19 years ago

Resolution:	→ fixed
Status:	reopened → closed

Different patch applied in r4394. As jonas said, we should assume that Subversion gives us proper UTF-8 encoded strings, so we use here _from_svn() instead of the more error tolerant to_unicode().

follow-up: 5 comment:4 by eli.carter@…, 19 years ago

Resolution:	fixed
Status:	closed → reopened

This needs to be back-ported to the 0.10 line.

in reply to: 4 ; follow-up: 7 comment:5 by Christian Boos, 19 years ago

Keywords:	svn added
Milestone:	0.11 → 0.10.4

Replying to eli.carter@commprove.com:
As discussed on #trac, I'm waiting for some feedback (the repr() of the message or author which is failing), but I'm quite sure that it can well be that Subversion does not always give back properly UTF-8 encoded strings (e.g. repositories created from cvs-import, like in your case). We shouldn't abort in this situation but be more robust (using to_unicode(text, 'utf-8') which will assume UTF-8 but use replacement characters if invalid UTF-8 data is found).

i.e. (patch on 0.10-stable)

svn_fs.py

         self.pool = Pool(pool)
         message = self._get_prop(core.SVN_PROP_REVISION_LOG)
         author = self._get_prop(core.SVN_PROP_REVISION_AUTHOR)
+        message = to_unicode(message, 'utf-8')
+        author = to_unicode(author, 'utf-8')
         date = self._get_prop(core.SVN_PROP_REVISION_DATE)
         if date:
             date = core.svn_time_from_cstring(date, self.pool()) / 1000000

comment:6 by Christian Boos, 19 years ago

PS: Note that this will require a resync

in reply to: 5 comment:7 by eli.carter@…, 19 years ago

Replying to cboos:

Replying to eli.carter@commprove.com:
As discussed on #trac, I'm waiting for some feedback (the repr() of the message or author which is failing),

The relevant portion of one of the log messages is:

'Struttura modificata per uniformit\xfc\xb0\x80\x80\x81\xa0 ...'

And the resync gave this error:

Command failed: 'utf8' codec can't decode bytes in position 34-39: unsupported Unicode code range

comment:8 by eli.carter@…, 19 years ago

The patch shown above for 0.10-stable works for me. trunk will need to change _from_svn(...) to to_unicode(..., 'utf-8') as well.

comment:9 by Christian Boos, 19 years ago

Ok, will do. Thanks for the feedback!

comment:10 by Christian Boos, 19 years ago

Resolution:	→ fixed
Status:	reopened → closed

Applied in r4491 (trunk) and r4492 (0.10-stable).

Modify Ticket

Change Properties

Summary:
Description:	Hi! I've had a big problem with umlauts in my svn repository. I've just converted my cvs repository to svn using cvs2svn and used trac-admin to import the repository using initenv. But something went wrong during the import. I've always got this error trying to access the timeline or using the browser to view directories with files with had an umlaut in their log message: OperationalError: Could not decode to UTF-8 column 'message' with text ... I've had to apply this patch: {{{ #!diff /trac/versioncontrol/svn_fs.py 643c635 < message = self._get_prop(core.SVN_PROP_REVISION_LOG) > message = to_unicode(self._get_prop(core.SVN_PROP_REVISION_LOG)) }}} (Usiing Gentoo ebuild www-apps/trac-0.10.1) And now everything works fine! Keep up the good work! You may use WikiFormatting here.
Type:		Priority:
Milestone:		Component:
Version:		Severity:
Keywords:		Cc:	Set your email in Preferences
Branch:
Release Notes:
API Changes:
Internal Changes:

Action

leave as closed The owner will remain Christian Boos.

reopen The resolution will be deleted. Next status will be 'reopened'.

change ownership to The owner will be changed from Christian Boos to the specified user.

Add Comment

Your email or username:

E-mail address and name can be saved in the Preferences .

You may use WikiFormatting here.

Attachments ↑ Description ↑

Note: See TracTickets for help on using tickets.

Download in other formats: