Edgewall Software
Modify

Opened 17 years ago

Closed 17 years ago

#6677 closed defect (fixed)

trac-admin copystatic raises unicode decode error

Reported by: ilias@… Owned by: Christian Boos
Priority: normal Milestone: 0.11.1
Component: admin/console Version: 0.11rc1
Severity: normal Keywords: unicode
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description

P:\>trac-admin intra copystatic  tstatic
Copying resources from:
  trac.web.chrome.Chrome
    p:\own\dev\infra\trac\trac\htdocs
    P:\intra\htdocs
Traceback (most recent call last):
  File "C:\prg\py24\Scripts\trac-admin-script.py", line 7, in ?
    sys.exit(
  File "P:\own\dev\infra\trac\trac\admin\console.py", line 1222, in run
    return admin.onecmd(command)
  File "P:\own\dev\infra\trac\trac\admin\console.py", line 102, in onecmd
    rv = cmd.Cmd.onecmd(self, line) or 0
  File "C:\prg\py24\lib\cmd.py", line 219, in onecmd
    return func(arg)
  File "P:\own\dev\infra\trac\trac\admin\console.py", line 1172, in do_copystatic
    copytree(source, dest)
  File "P:\own\dev\infra\trac\trac\admin\console.py", line 64, in copytree
    copytree(srcname, dstname, symlinks, skip)
  File "P:\own\dev\infra\trac\trac\admin\console.py", line 64, in copytree
    copytree(srcname, dstname, symlinks, skip)
  File "P:\own\dev\infra\trac\trac\admin\console.py", line 64, in copytree
    copytree(srcname, dstname, symlinks, skip)
  File "P:\own\dev\infra\trac\trac\admin\console.py", line 58, in copytree
    dstname = os.path.join(dst, name)
  File "C:\prg\py24\lib\ntpath.py", line 102, in join
    path += "\\" + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc1 in position 1: ordinal not in range(128)

Attachments (2)

6677-fix-argv-encoding.patch (989 bytes ) - added by Christian Boos 17 years ago.
fix command line conversion to unicode, in non-interactive mode (see comment:12)
6677-fix-copytree.patch (2.7 KB ) - added by Christian Boos 17 years ago.
fix copytree so that it can cope with any filename it finds on its way

Download all attachments as: .zip

Change History (29)

comment:1 by Christian Boos, 17 years ago

Keywords: unicode added
Milestone: 0.11.1
Severity: majornormal

comment:2 by Ernesto <ernestohdez87@…>, 17 years ago

Version: 0.11rc1

hacker@ubuntu1:~/Desktop/todo/Trac-0.11rc1$ trac-admin help Traceback (most recent call last):

File "/usr/bin/trac-admin", line 8, in <module>

load_entry_point('Trac==0.11rc1', 'console_scripts', 'trac-admin')()

File "/usr/lib/python2.5/site-packages/pkg_resources.py", line 277, in load_entry_point

return get_distribution(dist).load_entry_point(group, name)

File "/usr/lib/python2.5/site-packages/pkg_resources.py", line 2179, in load_entry_point

return ep.load()

File "/usr/lib/python2.5/site-packages/pkg_resources.py", line 1912, in load

entry = import(self.module_name, globals(),globals(), __name__)

File "/usr/lib/python2.5/site-packages/Trac-0.11rc1-py2.5.egg/trac/admin/console.py", line 34, in <module>

from trac.ticket.model import *

File "/usr/lib/python2.5/site-packages/Trac-0.11rc1-py2.5.egg/trac/ticket/init.py", line 1, in <module>

from trac.ticket.api import *

File "/usr/lib/python2.5/site-packages/Trac-0.11rc1-py2.5.egg/trac/ticket/api.py", line 31, in <module>

from trac.wiki import IWikiSyntaxProvider, WikiParser

File "/usr/lib/python2.5/site-packages/Trac-0.11rc1-py2.5.egg/trac/wiki/init.py", line 1, in <module>

from trac.wiki.api import *

File "/usr/lib/python2.5/site-packages/Trac-0.11rc1-py2.5.egg/trac/wiki/api.py", line 36, in <module>

from trac.wiki.parser import WikiParser

ImportError: No module named parser hacker@ubuntu1:~/Desktop/todo/Trac-0.11rc1$

comment:3 by osimons, 17 years ago

Milestone: 0.11.20.11.1
Owner: changed from Christopher Lenz to osimons

I got a very similar error doing trac-admin ... hotcopy which uses the same copytree() method.

Could you try the following patch and confirm that it solves it? Make sure to double-check the result for any non-ascii filenames and paths you have…

  • trac/admin/console.py

     
    5454    os.mkdir(dst)
    5555    errors = []
    5656    for name in names:
    57         srcname = os.path.join(src, name)
     57        srcname = to_unicode(os.path.join(src, name))
    5858        if srcname in skip:
    5959            continue
    60         dstname = os.path.join(dst, name)
     60        dstname = to_unicode(os.path.join(dst, name))
    6161        try:
    6262            if symlinks and os.path.islink(srcname):
    6363                linkto = os.readlink(srcname)

(As for Ernesto, that traceback of yours is completely unrelated to the issue of this ticket…)

in reply to:  3 comment:4 by anonymous, 17 years ago

Replying to osimons:

I got a very similar error doing trac-admin ... hotcopy which uses the same copytree() method.

Could you try the following patch and confirm that it solves it? Make sure to double-check the result for any non-ascii filenames and paths you have…

Sorry, I am in a non-coding phase currently, so i cannot make this test.

I missed to write the following: I'v used greek characters ας the project τιτλε, that was the only utf-8 if I remember right.

You could use this as a project title:

Ελληνικά

and reproduce the error, apply the patch, test again as you like.

cu!

comment:5 by ilias@…, 17 years ago

anonymous was me

comment:6 by Christian Boos, 17 years ago

I think os.path normally knows how to handle unicode paths.

I couldn't reproduce the bug (neither for deploy nor for hotcopy) even when using a non-ascii target directory (C:/TEMP/testé).

Simon, could you give me either a precise reproduction recipe or more details debug information? (os.path should normally be able to cope with unicode paths).

in reply to:  6 ; comment:7 by ilias@…, 17 years ago

Replying to cboos:

Simon, could you give me either a precise reproduction recipe or more details debug information? (os.path should normally be able to cope with unicode paths).

see comment:4

the bug raised simply by using an utf-8 project title:

You could use this as a project title:

Ελληνικά

in reply to:  7 comment:8 by Christian Boos, 17 years ago

Replying to ilias@lazaridis.com:

the bug raised simply by using an utf-8 project title:

I doubt this, because since #6535 my test project name is Trac Dével (ü) Σχέδιο.

in reply to:  6 comment:9 by Christian Boos, 17 years ago

Replying to cboos:

I think os.path normally knows how to handle unicode paths.

I couldn't reproduce the bug (neither for deploy nor for hotcopy) even when using a non-ascii target directory (C:/TEMP/testé).

Actually, it only worked by luck, since the target directory after the copy appears to be named: "C:/TEMP/testΘ". So there's indeed an encoding issue with the target path.

comment:10 by Christian Boos, 17 years ago

Tricky … my cmd.exe shell has cp437:

> chcp
Active code page: 437

> python - C:\TEMP\testé
>>> import sys
>>> sys.stdin.encoding
'cp437'

so far so good, python recognized that, and handles strings entered from that shell the expected way:

>>> test = r'C:\TEMP\testé'
>>> test.decode('cp437')
u'C:\\TEMP\\test\xe9'

Now, the funny thing. Remember that python - C:\TEMP\testé command line?

>>> sys.argv[1]
'C:\\TEMP\\test\xe9'

Right, that's latin-1 apparently.

Therefore our code in admin/console.py (which does a line = to_unicode(line, sys.stdin.encoding)) is based on the wrong assumption that the strings in sys.argv are encoded using the stdin encoding. Assumption which is not true on Windows when using cmd.exe at least.

But when parsing the command line arguments

comment:11 by Christian Boos, 17 years ago

See related pythonbug:2128.

comment:12 by Christian Boos, 17 years ago

So it seems that sys.argv strings are actually encoded in the locale encoding (on Windows)

  • trac/admin/console.py

     
    9999        """`line` may be a `str` or an `unicode` object"""
    100100        try:
    101101            if isinstance(line, str):
    102                 line = to_unicode(line, sys.stdin.encoding)
     102                line = to_unicode(line, locale.getpreferredencoding())
    103103            line = line.replace('\\', '\\\\')
    104104            rv = cmd.Cmd.onecmd(self, line) or 0
    105105        except SystemExit:

Would be worth testing on other platforms (Linux, MacOS, …)

comment:13 by osimons, 17 years ago

Did some more testing, and my traceback actually happens in copytree() when testing if srcname in skip for a file in project htdocs directory. Here is a very simple illustration of what I think happens at my end:

>>> f = os.listdir('.')[-1] # only need one file for testing
>>> f
'\xc3\xa6\xc3\xb8a\xcc\x8a.txt'
>>> type(f)
<type 'str'>
>>> print f
æøå.txt
>>> l = ['hi.txt']
>>> f in l
False
>>> l = [u'hi.txt', 'hi.txt'] # one item is unicode
>>> f in l
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position...

The output from os.listdir() are plain strings. A slightly more focused patch is therefore:

  • trac/admin/console.py

     
    5050    Added a `skip` parameter consisting of absolute paths
    5151    which we don't want to copy.
    5252    """
    53     names = os.listdir(src)
     53    names = [to_unicode(f) for f in os.listdir(src)]
    5454    os.mkdir(dst)
    5555    errors = []
    5656    for name in names:

My testing is on OSX, and the result of sys.stdin.encoding is 'UTF-8'. Looking at console encoding I have a feeling that it is a slightly different issue as no console input/output are directly involved - see also #7303. In the example above, the console outputs the filename just fine when printing.

comment:14 by osimons, 17 years ago

Oh, should add this to the description of my test environment as well:

>>> locale.getpreferredencoding()
'mac-roman'

comment:15 by osimons, 17 years ago

Here is an old (still open) Python bug with an interesting discussion about unicode and filenames; 767645.

>>> os.path.supports_unicode_filenames
False

in reply to:  13 ; comment:16 by Christian Boos, 17 years ago

Replying to osimons:

Did some more testing, and my traceback actually happens in copytree() when testing if srcname in skip

IIUC, srcname is a str and skip contains a unicode object, right? What does print repr((srcname,skip)) shows before the error triggers?

Actually, I think that for having a robust copytree, we shouldn't bother with unicode at all. We don't have control over the files present in the tree and in the general case, there could be file names consisting in byte sequences which don't correspond to the current encoding (be it sys.getfilesystemencoding(), sys.getdefaultencoding() or locale.getpreferredencoding()).

in reply to:  12 comment:17 by Christian Boos, 17 years ago

About that other issue:

Replying to cboos:

So it seems that sys.argv strings are actually encoded in the locale encoding (on Windows) … Would be worth testing on other platforms (Linux, MacOS, …)

Some indications that this should be safe on Linux: encoding of sys.argv ? on python-list.

What about MacOS/X?

in reply to:  16 comment:18 by osimons, 17 years ago

Replying to cboos:

IIUC, srcname is a str and skip contains a unicode object, right? What does print repr((srcname,skip)) shows before the error triggers?

('/...../project1/htdocs/CR1blA\xcc\x8a\xe2\x80\x99_neg.png',
[u'/...../project1/db/trac.db-journal', u'/...../project1/db/trac.db-stmtjrnl'])

(Paths shortened by me.)

comment:19 by Christian Boos, 17 years ago

Okay: hotcopy does a copytree(self.__env.path, dest, ... skip), where self.__env.path is a str (sys.argv[0] didn't go through the line = line.to_unicode(...) conversion) and skip is computed using db_path, taken from the config file and hence is unicode.

We also need to fix dest, otherwise the os.path.join(dst, name) will fail when seeing '.../CR1blA\xcc\x8a\xe2\x80\x99_neg.png'.

In order to convert that dest to a str, we first need to decode it properly, and for that I just realized that the patch in comment:12 is only needed in batch mode and won't work in interactive mode, in which case using sys.stdin.encoding was the correct thing to do!

by Christian Boos, 17 years ago

fix command line conversion to unicode, in non-interactive mode (see comment:12)

by Christian Boos, 17 years ago

Attachment: 6677-fix-copytree.patch added

fix copytree so that it can cope with any filename it finds on its way

comment:20 by Christian Boos, 17 years ago

Please try out the attached patches (they are Mercurial patches, so they need to be applied using -p1).

in reply to:  20 comment:21 by osimons, 17 years ago

Replying to cboos:

Please try out the attached patches (they are Mercurial patches, so they need to be applied using -p1).

Patches looks good. At least my hotcopy now works without problems.

comment:22 by Christian Boos, 17 years ago

Owner: changed from osimons to Christian Boos
Status: newassigned

Ok, thanks. Could the reporter confirm the fix as well?

in reply to:  22 ; comment:23 by ilias@…, 17 years ago

Replying to cboos:

Ok, thanks. Could the reporter confirm the fix as well?

sorry, i cannot recreate the environment that cause the error at this moment.

will do the test when i update the installation to the latest version (in a few weeks)

in reply to:  23 comment:24 by Christian Boos, 17 years ago

Replying to ilias@lazaridis.com:

Replying to cboos:

Ok, thanks. Could the reporter confirm the fix as well?

will do the test when i update the installation to the latest version (in a few weeks)

No need Ilias, I think I have now figured out what you meant in comment:7: it's when the environment path contains non-ascii characters, it fails right away. Because you said "utf-8 project title", this made me think about #6535 and I got confused.

So actually a trac-admin C:\TEMP\testé1 hotcopy C:\TEMP\testé2 command also fails for me, even with the above patches. I'll investigate further.

in reply to:  3 comment:25 by ilias@…, 17 years ago

btw, I'm missing copystatic form trac-admin, it didn't list in the help, too. Version:

P:\local>trac-admin
trac-admin - The Trac Administration Console 0.11b1

comment:26 by Christian Boos, 17 years ago

That command was renamed a few times (staticcopy, copystatic and now deploy), see log:trunk/trac/admin/console.py@6894. Note that 0.11 is out since a month now (0.11b1 is way obsolete - there has been b2, rc1 and rc2 since then).

comment:27 by Christian Boos, 17 years ago

Resolution: fixed
Status: assignedclosed

Ok, so I've applied attachment:6677-fix-argv-encoding.patch (r7360) and attachment:6677-fix-copytree.patch (r7361).

The error you get for a command like trac-admin C:\TEMP\testé1 hotcopy C:\TEMP\testé2 can't be fixed easily, as the implication would be that a non-ascii environment path would be supported in Trac, which is not the case for now. Adding that support is beyond the scope of that ticket, so for now I simply added an explicit error message in this situation (r7363).

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.