Edgewall Software

Opened 8 years ago

Last modified 8 years ago

#12322 closed defect

UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 8: invalid continuation byte — at Version 5

Reported by: Ryan J Ollos Owned by: Ryan J Ollos
Priority: normal Milestone: 1.0.10
Component: plugin/git Version:
Severity: normal Keywords:
Cc: Branch:
Release Notes:

Invalid byte sequence in filepath is replaced when reading Git commits.

API Changes:
Internal Changes:

Description

Encountered this error while running trac-admin $env repository resync "(default)":

2016-01-19 00:21:23,635 Trac[console] ERROR: Exception in trac-admin command: 
Traceback (most recent call last):
  File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/admin/console.py", line 109, in onecmd
    rv = cmd.Cmd.onecmd(self, line) or 0
  File "/usr/lib/python2.7/cmd.py", line 220, in onecmd
    return self.default(line)
  File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/admin/console.py", line 287, in default
    return self.cmd_mgr.execute_command(*args)
  File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/admin/api.py", line 127, in execute_command
    return f(*fargs)
  File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/versioncontrol/admin.py", line 156, in _do_resync
    self._sync(reponame, rev, clean=True)
  File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/versioncontrol/admin.py", line 143, in _sync
    repos.sync(self._sync_feedback, clean=clean)
  File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/tracopt/versioncontrol/git/git_fs.py", line 141, in sync
    self._insert_changeset(db, rev, cset)
  File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/versioncontrol/cache.py", line 285, in _insert_changeset
    for path, kind, action, bpath, brev in cset.get_changes():
  File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/tracopt/versioncontrol/git/git_fs.py", line 851, in get_changes
    self.repos.git.diff_tree(parent, self.rev, find_renames=True):
  File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/tracopt/versioncontrol/git/PyGIT.py", line 1044, in diff_tree
    yield __chg_tuple()
  File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/tracopt/versioncontrol/git/PyGIT.py", line 1036, in __chg_tuple
    chg[5] = self._fs_to_unicode(chg[5])
  File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/tracopt/versioncontrol/git/PyGIT.py", line 380, in <lambda>
    self._fs_to_unicode = lambda s: s.decode(git_fs_encoding)
  File "/var/www/bugs.jqueryui.com/private/pve/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 8: invalid continuation byte

I'll post more info if I can reproduce at a different debug level.

Change History (5)

comment:1 by Ryan J Ollos, 8 years ago

Milestone: 1.0.10

With log_level at INFO:

2016-01-19 00:43:47,572 Trac[git_fs] INFO: Trying to sync revision [c1800c59953161d88432ea8a307b5cdf08c5ec41]
2016-01-19 00:43:47,602 Trac[console] ERROR: Exception in trac-admin command:
Traceback (most recent call last):
  File "/var/www/bugs.jqueryui.com/private/pve/local/lib/python2.7/site-packages/trac/admin/console.py", line 109, in onecmd
    rv = cmd.Cmd.onecmd(self, line) or 0
  File "/usr/lib/python2.7/cmd.py", line 220, in onecmd
    return self.default(line)

The commit can be found here.

comment:2 by Jun Omae, 8 years ago

That commit has invalid byte sequence in the name of files.

$ git show --name-status c1800c59953161d88432ea8a307b5cdf08c5ec41
...
M       ya/demos/accordion/default.html
M       ya/demos/dialog/default.html
A       ya/external/PIE.htc
A       ya/external/border-radius.htc
A       ya/external/jquery.bgiframe-2.1.2.js
A       ya/lib/sl.css
M       ya/lib/sl.js
A       ya/lib/uihelper.js
A       "ya/test/\312\326\267\347\307\331.txt"
A       ya/themes/default/images/ui-icon-arrows.png
A       ya/themes/default/images/ui-icon-close.png
A       ya/themes/default/images/ui-icon-triangle-1-e.png
A       ya/themes/default/images/ui-icon-triangle-1-s.png
A       ya/themes/default/images/ui-icons.png
A       ya/themes/default/jquery.ui.accordion.css
A       ya/themes/default/jquery.ui.dialog.css
A       ya/themes/default/jquery.ui.override.css
M       ya/ui/jquery.ya.accordion0.js
M       ya/ui/jquery.ya.dialog0.js
$ python -c '"ya/test/\312\326\267\347\307\331.txt".decode("utf-8")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 8: invalid continuation byte

We could ignore those invalid byte sequence in git repository.

  • tracopt/versioncontrol/git/PyGIT.py

    diff --git a/tracopt/versioncontrol/git/PyGIT.py b/tracopt/versioncontrol/git/PyGIT.py
    index 966df98bc..fc61319ed 100644
    a b class Storage(object):  
    380380            codecs.lookup(git_fs_encoding)
    381381
    382382            # setup conversion functions
    383             self._fs_to_unicode = lambda s: s.decode(git_fs_encoding)
     383            self._fs_to_unicode = lambda s: s.decode(git_fs_encoding,
     384                                                     'replace')
    384385            self._fs_from_unicode = lambda s: s.encode(git_fs_encoding)
    385386        else:
    386387            # pass bytestrings as-is w/o any conversion

After the patch:

Python 2.5.6 (r256:88840, Oct 21 2014, 22:26:35)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from trac.env import open_environment
>>> env = open_environment('/home/jun66j5/var/trac/1.0-sqlite')
>>> repos = env.get_repository('jquery-ui.git')
>>> cset = repos.get_changeset('c1800c59953161d88432ea8a307b5cdf08c5ec41')
>>> for _ in cset.get_changes(): print _[0]
...
ya/demos/accordion/default.html
ya/demos/dialog/default.html
ya/external/PIE.htc
ya/external/border-radius.htc
ya/external/jquery.bgiframe-2.1.2.js
ya/lib/sl.css
ya/lib/sl.js
ya/lib/uihelper.js
ya/test/�ַ���.txt
ya/themes/default/images/ui-icon-arrows.png
ya/themes/default/images/ui-icon-close.png
ya/themes/default/images/ui-icon-triangle-1-e.png
ya/themes/default/images/ui-icon-triangle-1-s.png
ya/themes/default/images/ui-icons.png
ya/themes/default/jquery.ui.accordion.css
ya/themes/default/jquery.ui.dialog.css
ya/themes/default/jquery.ui.override.css
ya/ui/jquery.ya.accordion0.js
ya/ui/jquery.ya.dialog0.js
Last edited 8 years ago by Jun Omae (previous) (diff)

comment:3 by Ryan J Ollos, 8 years ago

Replacing invalid characters seems like a good solution. Thanks for investigating.

comment:4 by Ryan J Ollos, 8 years ago

Owner: set to Ryan J Ollos
Status: newassigned

comment:5 by Ryan J Ollos, 8 years ago

Release Notes: modified (diff)

Change from comment:2 committed to 1.0-stable in [14523], merged to trunk in [14524].

It would be good to have a test case, but I struggled with that. I was trying to use _git_fast_import and the format used in _generate_data_many_merges, but I'm unsure of the specification of that format, or how I can export a Git commit in the format.

Note: See TracTickets for help on using tickets.