Opened 9 years ago
Closed 9 years ago
#12322 closed defect (fixed)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 8: invalid continuation byte
Reported by: | Ryan J Ollos | Owned by: | Ryan J Ollos |
---|---|---|---|
Priority: | normal | Milestone: | 1.0.10 |
Component: | plugin/git | Version: | |
Severity: | normal | Keywords: | |
Cc: | Branch: | ||
Release Notes: |
Invalid byte sequence in filepath is replaced when reading Git commits. |
||
API Changes: | |||
Internal Changes: |
Description
Encountered this error while running trac-admin $env repository resync "(default)"
:
2016-01-19 00:21:23,635 Trac[console] ERROR: Exception in trac-admin command: Traceback (most recent call last): File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/admin/console.py", line 109, in onecmd rv = cmd.Cmd.onecmd(self, line) or 0 File "/usr/lib/python2.7/cmd.py", line 220, in onecmd return self.default(line) File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/admin/console.py", line 287, in default return self.cmd_mgr.execute_command(*args) File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/admin/api.py", line 127, in execute_command return f(*fargs) File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/versioncontrol/admin.py", line 156, in _do_resync self._sync(reponame, rev, clean=True) File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/versioncontrol/admin.py", line 143, in _sync repos.sync(self._sync_feedback, clean=clean) File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/tracopt/versioncontrol/git/git_fs.py", line 141, in sync self._insert_changeset(db, rev, cset) File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/versioncontrol/cache.py", line 285, in _insert_changeset for path, kind, action, bpath, brev in cset.get_changes(): File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/tracopt/versioncontrol/git/git_fs.py", line 851, in get_changes self.repos.git.diff_tree(parent, self.rev, find_renames=True): File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/tracopt/versioncontrol/git/PyGIT.py", line 1044, in diff_tree yield __chg_tuple() File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/tracopt/versioncontrol/git/PyGIT.py", line 1036, in __chg_tuple chg[5] = self._fs_to_unicode(chg[5]) File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/tracopt/versioncontrol/git/PyGIT.py", line 380, in <lambda> self._fs_to_unicode = lambda s: s.decode(git_fs_encoding) File "/var/www/bugs.jqueryui.com/private/pve/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 8: invalid continuation byte
I'll post more info if I can reproduce at a different debug level.
Attachments (0)
Change History (20)
comment:1 by , 9 years ago
Milestone: | → 1.0.10 |
---|
comment:2 by , 9 years ago
That commit has invalid byte sequence in the name of files.
$ git show --name-status c1800c59953161d88432ea8a307b5cdf08c5ec41 ... M ya/demos/accordion/default.html M ya/demos/dialog/default.html A ya/external/PIE.htc A ya/external/border-radius.htc A ya/external/jquery.bgiframe-2.1.2.js A ya/lib/sl.css M ya/lib/sl.js A ya/lib/uihelper.js A "ya/test/\312\326\267\347\307\331.txt" A ya/themes/default/images/ui-icon-arrows.png A ya/themes/default/images/ui-icon-close.png A ya/themes/default/images/ui-icon-triangle-1-e.png A ya/themes/default/images/ui-icon-triangle-1-s.png A ya/themes/default/images/ui-icons.png A ya/themes/default/jquery.ui.accordion.css A ya/themes/default/jquery.ui.dialog.css A ya/themes/default/jquery.ui.override.css M ya/ui/jquery.ya.accordion0.js M ya/ui/jquery.ya.dialog0.js
$ python -c '"ya/test/\312\326\267\347\307\331.txt".decode("utf-8")' Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 8: invalid continuation byte
We could ignore those invalid byte sequence in git repository.
-
tracopt/versioncontrol/git/PyGIT.py
diff --git a/tracopt/versioncontrol/git/PyGIT.py b/tracopt/versioncontrol/git/PyGIT.py index 966df98bc..fc61319ed 100644
a b class Storage(object): 380 380 codecs.lookup(git_fs_encoding) 381 381 382 382 # setup conversion functions 383 self._fs_to_unicode = lambda s: s.decode(git_fs_encoding) 383 self._fs_to_unicode = lambda s: s.decode(git_fs_encoding, 384 'replace') 384 385 self._fs_from_unicode = lambda s: s.encode(git_fs_encoding) 385 386 else: 386 387 # pass bytestrings as-is w/o any conversion
After the patch:
Python 2.5.6 (r256:88840, Oct 21 2014, 22:26:35) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from trac.env import open_environment >>> env = open_environment('/home/jun66j5/var/trac/1.0-sqlite') >>> repos = env.get_repository('jquery-ui.git') >>> cset = repos.get_changeset('c1800c59953161d88432ea8a307b5cdf08c5ec41') >>> for _ in cset.get_changes(): print _[0] ... ya/demos/accordion/default.html ya/demos/dialog/default.html ya/external/PIE.htc ya/external/border-radius.htc ya/external/jquery.bgiframe-2.1.2.js ya/lib/sl.css ya/lib/sl.js ya/lib/uihelper.js ya/test/�ַ���.txt ya/themes/default/images/ui-icon-arrows.png ya/themes/default/images/ui-icon-close.png ya/themes/default/images/ui-icon-triangle-1-e.png ya/themes/default/images/ui-icon-triangle-1-s.png ya/themes/default/images/ui-icons.png ya/themes/default/jquery.ui.accordion.css ya/themes/default/jquery.ui.dialog.css ya/themes/default/jquery.ui.override.css ya/ui/jquery.ya.accordion0.js ya/ui/jquery.ya.dialog0.js
comment:3 by , 9 years ago
Replacing invalid characters seems like a good solution. Thanks for investigating.
comment:4 by , 9 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
follow-up: 9 comment:5 by , 9 years ago
Release Notes: | modified (diff) |
---|
Change from comment:2 committed to 1.0-stable in [14523], merged to trunk in [14524].
It would be good to have a test case, but I struggled with that. I was trying to use _git_fast_import and the format used in _generate_data_many_merges, but I'm unsure of the specification of that format, or how I can export a Git commit in the format.
comment:6 by , 9 years ago
There were some warnings when installing r14523 and syncing the repository:
$ pve/bin/trac-admin trac repository resync "(default)" Resyncing repository history for (default)... <path>/pve/local/lib/python2.7/site-packages/trac/db/util.py:72: Warning: Invalid utf8 character string: 'F09F98' return self.cursor.execute(sql_escape_percent(sql), args) <path>/pve/local/lib/python2.7/site-packages/trac/db/util.py:72: Warning: Incorrect string value: '\xF0\x9F\x98\xB3' for column 'message' at row 1 return self.cursor.execute(sql_escape_percent(sql), args) 15002 rev
I haven't looked at whether those should be expected for the content of the repository.
comment:7 by , 9 years ago
comment:8 by , 9 years ago
It looks like database is MySQL with utf8_bin
collation type. We could try converting from utf8_bin
to utf8_mb4
.
comment:9 by , 9 years ago
Replying to Ryan J Ollos:
… how I can export a Git commit in the format.
We can export the test data with the following steps.
$ mkdir /tmp/t12322 $ cd /tmp/t12322 $ git init . Initialized empty Git repository in /tmp/t12322/.git/ $ python -c 'with open("\312\326\267\347\307\331.txt", "w") as f: f.write("")' $ LC_ALL=C ls -lb total 0 -rw-r--r-- 1 jun66j5 jun66j5 0 Feb 16 15:49 \312\326\267\347\307\331.txt $ git add *.txt $ git commit -a -m '(#12322)' [master (root-commit) 2b6c462] (#12322) 0 files changed create mode 100644 "\312\326\267\347\307\331.txt" $ git fast-export --all blob mark :1 data 0 reset refs/heads/master commit refs/heads/master mark :2 author Jun Omae <jun66j5@gmail.com> 1455605392 +0900 committer Jun Omae <jun66j5@gmail.com> 1455605392 +0900 data 9 (#12322) M 100644 :1 "\312\326\267\347\307\331.txt" reset refs/heads/master from :2
comment:12 by , 9 years ago
I think there must be something different about my shell environment, because I get:
$python -c 'with open("\312\326\267\347\307\331.txt", "w") as f: f.write("")' $LC_ALL=C ls -b %CA\326\267%E7%C7%D9.txt
comment:13 by , 9 years ago
I guess that you're using Python 3. Please try use b-prefix on Python 3 or Python 2:
$ python3 -c 'with open(b"\312\326\267\347\307\331.txt", "w") as f: f.write("")'
comment:14 by , 9 years ago
I'm using Python 2.7.11 on OSX:
$echo $LANG en_US.UTF-8 $python --version Python 2.7.11 $ python -c 'with open(b"\312\326\267\347\307\331.txt", "w") as f: f.write("")' $LC_ALL=C ls -b %CA\326\267%E7%C7%D9.txt
comment:15 by , 9 years ago
Ok. I get the same on my Mac. It doesn't seem that the filename can be created on HFS. Instead, please try on case-sensitive filesystem (e.g. ext4, xfs, …).
comment:16 by , 9 years ago
Thanks for checking. Works on Windows 7:
>C:\Python27-x64\python.exe --version Python 2.7.11 >C:\Python27-x64\python.exe -c "with open('\312\326\267\ 347\307\331.txt', 'w') as f: f.write('')" C:\Users\Ryan Ollos\temp>ls -lb total 0 -rw-r--r-- 1 Ryan Ollos Administrators 0 May 18 23:14 \312\326\267\347\307\331.t xt
I created fast-export data on Debian. Tests pass on OSX with:
$git --version git version 2.8.2
Proposed changes in log:rjollos.git:t12322.
comment:17 by , 9 years ago
I get 1 failure on Python 2.5:
====================================================================== FAIL: test_sync_file_with_invalid_byte_sequence (tracopt.versioncontrol.git.tests.git_fs.GitCachedRepositoryTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/run/shm/209e76f970dd6806b5e83dd56a7cb71299754d14/py25-sqlite/tracopt/versioncontrol/git/tests/git_fs.py", line 571, in test_sync_file_with_invalid_byte_sequence self.assertEqual(u'�ַ���.txt', changes[0][0]) AssertionError: u'\ufffd\u05b7\ufffd\ufffd\ufffd.txt' != u'\ufffd\ufffd\ufffd.txt'
Hmm, there is difference about results between Python 2.5 and 2.6.
$ python2.4 -c 'print(repr("\312\326\267\347\307\331.txt".decode("utf8", "replace")))' u'\ufffd\ufffd\ufffd.txt' $ python2.5 -c 'print(repr("\312\326\267\347\307\331.txt".decode("utf8", "replace")))' u'\ufffd\ufffd\ufffd.txt' $ python2.6 -c 'print(repr("\312\326\267\347\307\331.txt".decode("utf8", "replace")))' u'\ufffd\u05b7\ufffd\ufffd\ufffd.txt' $ python2.7 -c 'print(repr("\312\326\267\347\307\331.txt".decode("utf8", "replace")))' u'\ufffd\u05b7\ufffd\ufffd\ufffd.txt'
comment:18 by , 9 years ago
That's a strange finding. I think it would be okay to just push the test case to trunk since development on 1.0-stable is winding down.
comment:19 by , 9 years ago
I agree. However, if we solve it on 1.0-stable, we could use assertIn(...)
:
-
tracopt/versioncontrol/git/tests/git_fs.py
diff --git a/tracopt/versioncontrol/git/tests/git_fs.py b/tracopt/versioncontrol/git/tests/git_fs.py index b24a9fe86..e1bdaaf5c 100644
a b from :2 567 567 568 568 changes = list(repos.repos.get_changeset(revs[0]).get_changes()) 569 569 self.assertEqual(1, len(changes)) 570 self.assertEqual(u'�ַ���.txt', changes[0][0]) 570 self.assertIn(changes[0][0], (u'\ufffd\u05b7\ufffd\ufffd\ufffd.txt', 571 u'\ufffd\ufffd\ufffd.txt')) 571 572 572 573 def test_sync_merge(self): 573 574 self._git_init()
comment:20 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
With
log_level
atINFO
:The commit can be found here.