Opened 10 years ago
Closed 9 years ago
#12322 closed defect (fixed)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 8: invalid continuation byte
| Reported by: | Ryan J Ollos | Owned by: | Ryan J Ollos |
|---|---|---|---|
| Priority: | normal | Milestone: | 1.0.10 |
| Component: | plugin/git | Version: | |
| Severity: | normal | Keywords: | |
| Cc: | Branch: | ||
| Release Notes: |
Invalid byte sequence in filepath is replaced when reading Git commits. |
||
| API Changes: | |||
| Internal Changes: | |||
Description
Encountered this error while running trac-admin $env repository resync "(default)":
2016-01-19 00:21:23,635 Trac[console] ERROR: Exception in trac-admin command:
Traceback (most recent call last):
File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/admin/console.py", line 109, in onecmd
rv = cmd.Cmd.onecmd(self, line) or 0
File "/usr/lib/python2.7/cmd.py", line 220, in onecmd
return self.default(line)
File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/admin/console.py", line 287, in default
return self.cmd_mgr.execute_command(*args)
File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/admin/api.py", line 127, in execute_command
return f(*fargs)
File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/versioncontrol/admin.py", line 156, in _do_resync
self._sync(reponame, rev, clean=True)
File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/versioncontrol/admin.py", line 143, in _sync
repos.sync(self._sync_feedback, clean=clean)
File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/tracopt/versioncontrol/git/git_fs.py", line 141, in sync
self._insert_changeset(db, rev, cset)
File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/versioncontrol/cache.py", line 285, in _insert_changeset
for path, kind, action, bpath, brev in cset.get_changes():
File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/tracopt/versioncontrol/git/git_fs.py", line 851, in get_changes
self.repos.git.diff_tree(parent, self.rev, find_renames=True):
File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/tracopt/versioncontrol/git/PyGIT.py", line 1044, in diff_tree
yield __chg_tuple()
File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/tracopt/versioncontrol/git/PyGIT.py", line 1036, in __chg_tuple
chg[5] = self._fs_to_unicode(chg[5])
File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/tracopt/versioncontrol/git/PyGIT.py", line 380, in <lambda>
self._fs_to_unicode = lambda s: s.decode(git_fs_encoding)
File "/var/www/bugs.jqueryui.com/private/pve/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 8: invalid continuation byte
I'll post more info if I can reproduce at a different debug level.
Attachments (0)
Change History (20)
comment:1 by , 10 years ago
| Milestone: | → 1.0.10 |
|---|
comment:2 by , 10 years ago
That commit has invalid byte sequence in the name of files.
$ git show --name-status c1800c59953161d88432ea8a307b5cdf08c5ec41 ... M ya/demos/accordion/default.html M ya/demos/dialog/default.html A ya/external/PIE.htc A ya/external/border-radius.htc A ya/external/jquery.bgiframe-2.1.2.js A ya/lib/sl.css M ya/lib/sl.js A ya/lib/uihelper.js A "ya/test/\312\326\267\347\307\331.txt" A ya/themes/default/images/ui-icon-arrows.png A ya/themes/default/images/ui-icon-close.png A ya/themes/default/images/ui-icon-triangle-1-e.png A ya/themes/default/images/ui-icon-triangle-1-s.png A ya/themes/default/images/ui-icons.png A ya/themes/default/jquery.ui.accordion.css A ya/themes/default/jquery.ui.dialog.css A ya/themes/default/jquery.ui.override.css M ya/ui/jquery.ya.accordion0.js M ya/ui/jquery.ya.dialog0.js
$ python -c '"ya/test/\312\326\267\347\307\331.txt".decode("utf-8")'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 8: invalid continuation byte
We could ignore those invalid byte sequence in git repository.
-
tracopt/versioncontrol/git/PyGIT.py
diff --git a/tracopt/versioncontrol/git/PyGIT.py b/tracopt/versioncontrol/git/PyGIT.py index 966df98bc..fc61319ed 100644
a b class Storage(object): 380 380 codecs.lookup(git_fs_encoding) 381 381 382 382 # setup conversion functions 383 self._fs_to_unicode = lambda s: s.decode(git_fs_encoding) 383 self._fs_to_unicode = lambda s: s.decode(git_fs_encoding, 384 'replace') 384 385 self._fs_from_unicode = lambda s: s.encode(git_fs_encoding) 385 386 else: 386 387 # pass bytestrings as-is w/o any conversion
After the patch:
Python 2.5.6 (r256:88840, Oct 21 2014, 22:26:35)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from trac.env import open_environment
>>> env = open_environment('/home/jun66j5/var/trac/1.0-sqlite')
>>> repos = env.get_repository('jquery-ui.git')
>>> cset = repos.get_changeset('c1800c59953161d88432ea8a307b5cdf08c5ec41')
>>> for _ in cset.get_changes(): print _[0]
...
ya/demos/accordion/default.html
ya/demos/dialog/default.html
ya/external/PIE.htc
ya/external/border-radius.htc
ya/external/jquery.bgiframe-2.1.2.js
ya/lib/sl.css
ya/lib/sl.js
ya/lib/uihelper.js
ya/test/�ַ���.txt
ya/themes/default/images/ui-icon-arrows.png
ya/themes/default/images/ui-icon-close.png
ya/themes/default/images/ui-icon-triangle-1-e.png
ya/themes/default/images/ui-icon-triangle-1-s.png
ya/themes/default/images/ui-icons.png
ya/themes/default/jquery.ui.accordion.css
ya/themes/default/jquery.ui.dialog.css
ya/themes/default/jquery.ui.override.css
ya/ui/jquery.ya.accordion0.js
ya/ui/jquery.ya.dialog0.js
comment:3 by , 10 years ago
Replacing invalid characters seems like a good solution. Thanks for investigating.
comment:4 by , 10 years ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
follow-up: 9 comment:5 by , 10 years ago
| Release Notes: | modified (diff) |
|---|
Change from comment:2 committed to 1.0-stable in [14523], merged to trunk in [14524].
It would be good to have a test case, but I struggled with that. I was trying to use _git_fast_import and the format used in _generate_data_many_merges, but I'm unsure of the specification of that format, or how I can export a Git commit in the format.
comment:6 by , 10 years ago
There were some warnings when installing r14523 and syncing the repository:
$ pve/bin/trac-admin trac repository resync "(default)" Resyncing repository history for (default)... <path>/pve/local/lib/python2.7/site-packages/trac/db/util.py:72: Warning: Invalid utf8 character string: 'F09F98' return self.cursor.execute(sql_escape_percent(sql), args) <path>/pve/local/lib/python2.7/site-packages/trac/db/util.py:72: Warning: Incorrect string value: '\xF0\x9F\x98\xB3' for column 'message' at row 1 return self.cursor.execute(sql_escape_percent(sql), args) 15002 rev
I haven't looked at whether those should be expected for the content of the repository.
comment:7 by , 10 years ago
comment:8 by , 10 years ago
It looks like database is MySQL with utf8_bin collation type. We could try converting from utf8_bin to utf8_mb4.
comment:9 by , 10 years ago
Replying to Ryan J Ollos:
… how I can export a Git commit in the format.
We can export the test data with the following steps.
$ mkdir /tmp/t12322
$ cd /tmp/t12322
$ git init .
Initialized empty Git repository in /tmp/t12322/.git/
$ python -c 'with open("\312\326\267\347\307\331.txt", "w") as f: f.write("")'
$ LC_ALL=C ls -lb
total 0
-rw-r--r-- 1 jun66j5 jun66j5 0 Feb 16 15:49 \312\326\267\347\307\331.txt
$ git add *.txt
$ git commit -a -m '(#12322)'
[master (root-commit) 2b6c462] (#12322)
0 files changed
create mode 100644 "\312\326\267\347\307\331.txt"
$ git fast-export --all
blob
mark :1
data 0
reset refs/heads/master
commit refs/heads/master
mark :2
author Jun Omae <jun66j5@gmail.com> 1455605392 +0900
committer Jun Omae <jun66j5@gmail.com> 1455605392 +0900
data 9
(#12322)
M 100644 :1 "\312\326\267\347\307\331.txt"
reset refs/heads/master
from :2
comment:12 by , 9 years ago
I think there must be something different about my shell environment, because I get:
$python -c 'with open("\312\326\267\347\307\331.txt", "w") as f: f.write("")' $LC_ALL=C ls -b %CA\326\267%E7%C7%D9.txt
comment:13 by , 9 years ago
I guess that you're using Python 3. Please try use b-prefix on Python 3 or Python 2:
$ python3 -c 'with open(b"\312\326\267\347\307\331.txt", "w") as f: f.write("")'
comment:14 by , 9 years ago
I'm using Python 2.7.11 on OSX:
$echo $LANG en_US.UTF-8 $python --version Python 2.7.11 $ python -c 'with open(b"\312\326\267\347\307\331.txt", "w") as f: f.write("")' $LC_ALL=C ls -b %CA\326\267%E7%C7%D9.txt
comment:15 by , 9 years ago
Ok. I get the same on my Mac. It doesn't seem that the filename can be created on HFS. Instead, please try on case-sensitive filesystem (e.g. ext4, xfs, …).
comment:16 by , 9 years ago
Thanks for checking. Works on Windows 7:
>C:\Python27-x64\python.exe --version Python 2.7.11 >C:\Python27-x64\python.exe -c "with open('\312\326\267\ 347\307\331.txt', 'w') as f: f.write('')" C:\Users\Ryan Ollos\temp>ls -lb total 0 -rw-r--r-- 1 Ryan Ollos Administrators 0 May 18 23:14 \312\326\267\347\307\331.t xt
I created fast-export data on Debian. Tests pass on OSX with:
$git --version git version 2.8.2
Proposed changes in log:rjollos.git:t12322.
comment:17 by , 9 years ago
I get 1 failure on Python 2.5:
======================================================================
FAIL: test_sync_file_with_invalid_byte_sequence (tracopt.versioncontrol.git.tests.git_fs.GitCachedRepositoryTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/run/shm/209e76f970dd6806b5e83dd56a7cb71299754d14/py25-sqlite/tracopt/versioncontrol/git/tests/git_fs.py", line 571, in test_sync_file_with_invalid_byte_sequence
self.assertEqual(u'�ַ���.txt', changes[0][0])
AssertionError: u'\ufffd\u05b7\ufffd\ufffd\ufffd.txt' != u'\ufffd\ufffd\ufffd.txt'
Hmm, there is difference about results between Python 2.5 and 2.6.
$ python2.4 -c 'print(repr("\312\326\267\347\307\331.txt".decode("utf8", "replace")))'
u'\ufffd\ufffd\ufffd.txt'
$ python2.5 -c 'print(repr("\312\326\267\347\307\331.txt".decode("utf8", "replace")))'
u'\ufffd\ufffd\ufffd.txt'
$ python2.6 -c 'print(repr("\312\326\267\347\307\331.txt".decode("utf8", "replace")))'
u'\ufffd\u05b7\ufffd\ufffd\ufffd.txt'
$ python2.7 -c 'print(repr("\312\326\267\347\307\331.txt".decode("utf8", "replace")))'
u'\ufffd\u05b7\ufffd\ufffd\ufffd.txt'
comment:18 by , 9 years ago
That's a strange finding. I think it would be okay to just push the test case to trunk since development on 1.0-stable is winding down.
comment:19 by , 9 years ago
I agree. However, if we solve it on 1.0-stable, we could use assertIn(...):
-
tracopt/versioncontrol/git/tests/git_fs.py
diff --git a/tracopt/versioncontrol/git/tests/git_fs.py b/tracopt/versioncontrol/git/tests/git_fs.py index b24a9fe86..e1bdaaf5c 100644
a b from :2 567 567 568 568 changes = list(repos.repos.get_changeset(revs[0]).get_changes()) 569 569 self.assertEqual(1, len(changes)) 570 self.assertEqual(u'�ַ���.txt', changes[0][0]) 570 self.assertIn(changes[0][0], (u'\ufffd\u05b7\ufffd\ufffd\ufffd.txt', 571 u'\ufffd\ufffd\ufffd.txt')) 571 572 572 573 def test_sync_merge(self): 573 574 self._git_init()
comment:20 by , 9 years ago
| Resolution: | → fixed |
|---|---|
| Status: | assigned → closed |



With
log_levelatINFO:2016-01-19 00:43:47,572 Trac[git_fs] INFO: Trying to sync revision [c1800c59953161d88432ea8a307b5cdf08c5ec41] 2016-01-19 00:43:47,602 Trac[console] ERROR: Exception in trac-admin command: Traceback (most recent call last): File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/admin/console.py", line 109, in onecmd rv = cmd.Cmd.onecmd(self, line) or 0 File "/usr/lib/python2.7/cmd.py", line 220, in onecmd return self.default(line)The commit can be found here.