#4321 closed defect (fixed)
Unicode initenv problem
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | normal | Milestone: | 0.10.4 |
Component: | version control | Version: | 0.10.2 |
Severity: | normal | Keywords: | svn unicode |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
Hi!
I've had a big problem with umlauts in my svn repository. I've just converted my cvs repository to svn using cvs2svn and used trac-admin to import the repository using initenv. But something went wrong during the import. I've always got this error trying to access the timeline or using the browser to view directories with files with had an umlaut in their log message:
OperationalError: Could not decode to UTF-8 column 'message' with text …
I've had to apply this patch:
/trac/versioncontrol/svn_fs.py 643c635 < message = self._get_prop(core.SVN_PROP_REVISION_LOG) > message = to_unicode(self._get_prop(core.SVN_PROP_REVISION_LOG))
(Usiing Gentoo ebuild www-apps/trac-0.10.1)
And now everything works fine!
Keep up the good work!
Attachments (0)
Change History (10)
follow-up: 2 comment:1 by , 18 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
comment:2 by , 18 years ago
Keywords: | unicode added |
---|---|
Milestone: | → 0.11 |
Resolution: | worksforme |
Status: | closed → reopened |
Replying to jonas:
Subversion always stores filenames and log messages as utf-8 internally. So the proposed patch should never be needed.
No, but a similar one which transforms the UTF-8 string to unicode
should.
Currently, this (last?) use of UTF-8 goes unnoticed thanks to the cache.
comment:3 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Different patch applied in r4394. As jonas said, we should assume that Subversion gives us proper UTF-8 encoded strings, so we use here _from_svn()
instead of the more error tolerant to_unicode()
.
follow-up: 5 comment:4 by , 18 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
This needs to be back-ported to the 0.10 line.
follow-up: 7 comment:5 by , 18 years ago
Keywords: | svn added |
---|---|
Milestone: | 0.11 → 0.10.4 |
Replying to eli.carter@commprove.com:
As discussed on #trac, I'm waiting for some feedback (the repr() of the message
or author
which is failing), but I'm quite sure that it can well be that Subversion does not always give back properly UTF-8 encoded strings (e.g. repositories created from cvs-import, like in your case). We shouldn't abort in this situation but be more robust (using to_unicode(text, 'utf-8')
which will assume UTF-8 but use replacement characters if invalid UTF-8 data is found).
i.e. (patch on 0.10-stable)
-
svn_fs.py
635 635 self.pool = Pool(pool) 636 636 message = self._get_prop(core.SVN_PROP_REVISION_LOG) 637 637 author = self._get_prop(core.SVN_PROP_REVISION_AUTHOR) 638 message = to_unicode(message, 'utf-8') 639 author = to_unicode(author, 'utf-8') 638 640 date = self._get_prop(core.SVN_PROP_REVISION_DATE) 639 641 if date: 640 642 date = core.svn_time_from_cstring(date, self.pool()) / 1000000
comment:7 by , 18 years ago
Replying to cboos:
Replying to eli.carter@commprove.com:
As discussed on #trac, I'm waiting for some feedback (the repr() of themessage
orauthor
which is failing),
The relevant portion of one of the log messages is:
'Struttura modificata per uniformit\xfc\xb0\x80\x80\x81\xa0 ...'
And the resync gave this error:
Command failed: 'utf8' codec can't decode bytes in position 34-39: unsupported Unicode code range
comment:8 by , 18 years ago
The patch shown above for 0.10-stable works for me. trunk will need to change _from_svn(...)
to to_unicode(..., 'utf-8')
as well.
comment:10 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Subversion always stores filenames and log messages as utf-8 internally. So the proposed patch should never be needed. This was probably caused by a missing or incorrect '—encoding' parameter when running the cvs2svn command. '—encoding=iso-8859-1' will probably do the trick.