Opened 17 years ago
Closed 17 years ago
#6677 closed defect (fixed)
trac-admin copystatic raises unicode decode error
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | normal | Milestone: | 0.11.1 |
Component: | admin/console | Version: | 0.11rc1 |
Severity: | normal | Keywords: | unicode |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
P:\>trac-admin intra copystatic tstatic Copying resources from: trac.web.chrome.Chrome p:\own\dev\infra\trac\trac\htdocs P:\intra\htdocs Traceback (most recent call last): File "C:\prg\py24\Scripts\trac-admin-script.py", line 7, in ? sys.exit( File "P:\own\dev\infra\trac\trac\admin\console.py", line 1222, in run return admin.onecmd(command) File "P:\own\dev\infra\trac\trac\admin\console.py", line 102, in onecmd rv = cmd.Cmd.onecmd(self, line) or 0 File "C:\prg\py24\lib\cmd.py", line 219, in onecmd return func(arg) File "P:\own\dev\infra\trac\trac\admin\console.py", line 1172, in do_copystatic copytree(source, dest) File "P:\own\dev\infra\trac\trac\admin\console.py", line 64, in copytree copytree(srcname, dstname, symlinks, skip) File "P:\own\dev\infra\trac\trac\admin\console.py", line 64, in copytree copytree(srcname, dstname, symlinks, skip) File "P:\own\dev\infra\trac\trac\admin\console.py", line 64, in copytree copytree(srcname, dstname, symlinks, skip) File "P:\own\dev\infra\trac\trac\admin\console.py", line 58, in copytree dstname = os.path.join(dst, name) File "C:\prg\py24\lib\ntpath.py", line 102, in join path += "\\" + b UnicodeDecodeError: 'ascii' codec can't decode byte 0xc1 in position 1: ordinal not in range(128)
Attachments (2)
Change History (29)
comment:1 by , 17 years ago
Keywords: | unicode added |
---|---|
Milestone: | → 0.11.1 |
Severity: | major → normal |
comment:2 by , 17 years ago
Version: | → 0.11rc1 |
---|
follow-ups: 4 25 comment:3 by , 17 years ago
Milestone: | 0.11.2 → 0.11.1 |
---|---|
Owner: | changed from | to
I got a very similar error doing trac-admin ... hotcopy
which uses the same copytree()
method.
Could you try the following patch and confirm that it solves it? Make sure to double-check the result for any non-ascii filenames and paths you have…
-
trac/admin/console.py
54 54 os.mkdir(dst) 55 55 errors = [] 56 56 for name in names: 57 srcname = os.path.join(src, name)57 srcname = to_unicode(os.path.join(src, name)) 58 58 if srcname in skip: 59 59 continue 60 dstname = os.path.join(dst, name)60 dstname = to_unicode(os.path.join(dst, name)) 61 61 try: 62 62 if symlinks and os.path.islink(srcname): 63 63 linkto = os.readlink(srcname)
(As for Ernesto, that traceback of yours is completely unrelated to the issue of this ticket…)
comment:4 by , 17 years ago
Replying to osimons:
I got a very similar error doing
trac-admin ... hotcopy
which uses the samecopytree()
method.Could you try the following patch and confirm that it solves it? Make sure to double-check the result for any non-ascii filenames and paths you have…
Sorry, I am in a non-coding phase currently, so i cannot make this test.
I missed to write the following: I'v used greek characters ας the project τιτλε, that was the only utf-8 if I remember right.
You could use this as a project title:
Ελληνικά
and reproduce the error, apply the patch, test again as you like.
cu!
follow-ups: 7 9 comment:6 by , 17 years ago
I think os.path
normally knows how to handle unicode paths.
I couldn't reproduce the bug (neither for deploy
nor for hotcopy
) even when using a non-ascii target directory (C:/TEMP/testé).
Simon, could you give me either a precise reproduction recipe or more details debug information? (os.path
should normally be able to cope with unicode paths).
follow-up: 8 comment:7 by , 17 years ago
comment:8 by , 17 years ago
Replying to ilias@lazaridis.com:
the bug raised simply by using an utf-8 project title:
I doubt this, because since #6535 my test project name is Trac Dével (ü) Σχέδιο.
comment:9 by , 17 years ago
Replying to cboos:
I think
os.path
normally knows how to handle unicode paths.I couldn't reproduce the bug (neither for
deploy
nor forhotcopy
) even when using a non-ascii target directory (C:/TEMP/testé).
Actually, it only worked by luck, since the target directory after the copy appears to be named: "C:/TEMP/testΘ". So there's indeed an encoding issue with the target path.
comment:10 by , 17 years ago
Tricky … my cmd.exe shell has cp437:
> chcp Active code page: 437 > python - C:\TEMP\testé >>> import sys >>> sys.stdin.encoding 'cp437'
so far so good, python recognized that, and handles strings entered from that shell the expected way:
>>> test = r'C:\TEMP\testé' >>> test.decode('cp437') u'C:\\TEMP\\test\xe9'
Now, the funny thing. Remember that python - C:\TEMP\testé
command line?
>>> sys.argv[1] 'C:\\TEMP\\test\xe9'
Right, that's latin-1 apparently.
Therefore our code in admin/console.py (which does a line = to_unicode(line, sys.stdin.encoding)
) is based on the wrong assumption that the strings in sys.argv
are encoded using the stdin encoding. Assumption which is not true on Windows when using cmd.exe
at least.
But when parsing the command line arguments
follow-up: 17 comment:12 by , 17 years ago
So it seems that sys.argv
strings are actually encoded in the locale encoding (on Windows)
-
trac/admin/console.py
99 99 """`line` may be a `str` or an `unicode` object""" 100 100 try: 101 101 if isinstance(line, str): 102 line = to_unicode(line, sys.stdin.encoding)102 line = to_unicode(line, locale.getpreferredencoding()) 103 103 line = line.replace('\\', '\\\\') 104 104 rv = cmd.Cmd.onecmd(self, line) or 0 105 105 except SystemExit:
Would be worth testing on other platforms (Linux, MacOS, …)
follow-up: 16 comment:13 by , 17 years ago
Did some more testing, and my traceback actually happens in copytree()
when testing if srcname in skip
for a file in project htdocs
directory. Here is a very simple illustration of what I think happens at my end:
>>> f = os.listdir('.')[-1] # only need one file for testing >>> f '\xc3\xa6\xc3\xb8a\xcc\x8a.txt' >>> type(f) <type 'str'> >>> print f æøå.txt >>> l = ['hi.txt'] >>> f in l False >>> l = [u'hi.txt', 'hi.txt'] # one item is unicode >>> f in l ... UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position...
The output from os.listdir()
are plain strings. A slightly more focused patch is therefore:
-
trac/admin/console.py
50 50 Added a `skip` parameter consisting of absolute paths 51 51 which we don't want to copy. 52 52 """ 53 names = os.listdir(src)53 names = [to_unicode(f) for f in os.listdir(src)] 54 54 os.mkdir(dst) 55 55 errors = [] 56 56 for name in names:
My testing is on OSX, and the result of sys.stdin.encoding
is 'UTF-8'
. Looking at console encoding I have a feeling that it is a slightly different issue as no console input/output are directly involved - see also #7303. In the example above, the console outputs the filename just fine when printing.
comment:14 by , 17 years ago
Oh, should add this to the description of my test environment as well:
>>> locale.getpreferredencoding() 'mac-roman'
comment:15 by , 17 years ago
Here is an old (still open) Python bug with an interesting discussion about unicode and filenames; 767645.
>>> os.path.supports_unicode_filenames False
follow-up: 18 comment:16 by , 17 years ago
Replying to osimons:
Did some more testing, and my traceback actually happens in
copytree()
when testing ifsrcname in skip
IIUC, srcname
is a str
and skip
contains a unicode
object, right? What does print repr((srcname,skip))
shows before the error triggers?
Actually, I think that for having a robust copytree
, we shouldn't bother with unicode at all. We don't have control over the files present in the tree and in the general case, there could be file names consisting in byte sequences which don't correspond to the current encoding (be it sys.getfilesystemencoding()
, sys.getdefaultencoding()
or locale.getpreferredencoding()
).
comment:17 by , 17 years ago
About that other issue:
Replying to cboos:
So it seems that
sys.argv
strings are actually encoded in the locale encoding (on Windows) … Would be worth testing on other platforms (Linux, MacOS, …)
Some indications that this should be safe on Linux: encoding of sys.argv ? on python-list.
What about MacOS/X?
comment:18 by , 17 years ago
Replying to cboos:
IIUC,
srcname
is astr
andskip
contains aunicode
object, right? What doesprint repr((srcname,skip))
shows before the error triggers?
('/...../project1/htdocs/CR1blA\xcc\x8a\xe2\x80\x99_neg.png', [u'/...../project1/db/trac.db-journal', u'/...../project1/db/trac.db-stmtjrnl'])
(Paths shortened by me.)
comment:19 by , 17 years ago
Okay: hotcopy
does a copytree(self.__env.path, dest, ... skip)
, where self.__env.path
is a str
(sys.argv[0]
didn't go through the line = line.to_unicode(...)
conversion) and skip
is computed using db_path
, taken from the config file and hence is unicode
.
We also need to fix dest
, otherwise the os.path.join(dst, name)
will fail when seeing '.../CR1blA\xcc\x8a\xe2\x80\x99_neg.png'
.
In order to convert that dest
to a str
, we first need to decode it properly, and for that I just realized that the patch in comment:12 is only needed in batch mode and won't work in interactive mode, in which case using sys.stdin.encoding
was the correct thing to do!
by , 17 years ago
Attachment: | 6677-fix-argv-encoding.patch added |
---|
fix command line conversion to unicode, in non-interactive mode (see comment:12)
by , 17 years ago
Attachment: | 6677-fix-copytree.patch added |
---|
fix copytree
so that it can cope with any filename it finds on its way
follow-up: 21 comment:20 by , 17 years ago
Please try out the attached patches (they are Mercurial patches, so they need to be applied using -p1).
comment:21 by , 17 years ago
Replying to cboos:
Please try out the attached patches (they are Mercurial patches, so they need to be applied using -p1).
Patches looks good. At least my hotcopy
now works without problems.
follow-up: 23 comment:22 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Ok, thanks. Could the reporter confirm the fix as well?
follow-up: 24 comment:23 by , 17 years ago
Replying to cboos:
Ok, thanks. Could the reporter confirm the fix as well?
sorry, i cannot recreate the environment that cause the error at this moment.
will do the test when i update the installation to the latest version (in a few weeks)
comment:24 by , 17 years ago
Replying to ilias@lazaridis.com:
Replying to cboos:
Ok, thanks. Could the reporter confirm the fix as well?
will do the test when i update the installation to the latest version (in a few weeks)
No need Ilias, I think I have now figured out what you meant in comment:7: it's when the environment path contains non-ascii characters, it fails right away. Because you said "utf-8 project title", this made me think about #6535 and I got confused.
So actually a trac-admin C:\TEMP\testé1 hotcopy C:\TEMP\testé2
command also fails for me, even with the above patches. I'll investigate further.
comment:25 by , 17 years ago
btw, I'm missing copystatic form trac-admin, it didn't list in the help, too. Version:
P:\local>trac-admin trac-admin - The Trac Administration Console 0.11b1
comment:26 by , 17 years ago
That command was renamed a few times (staticcopy, copystatic and now deploy), see log:trunk/trac/admin/console.py@6894. Note that 0.11 is out since a month now (0.11b1 is way obsolete - there has been b2, rc1 and rc2 since then).
comment:27 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Ok, so I've applied attachment:6677-fix-argv-encoding.patch (r7360) and attachment:6677-fix-copytree.patch (r7361).
The error you get for a command like trac-admin C:\TEMP\testé1 hotcopy C:\TEMP\testé2
can't be fixed easily, as the implication would be that a non-ascii environment path would be supported in Trac, which is not the case for now. Adding that support is beyond the scope of that ticket, so for now I simply added an explicit error message in this situation (r7363).
hacker@ubuntu1:~/Desktop/todo/Trac-0.11rc1$ trac-admin help Traceback (most recent call last):
ImportError: No module named parser hacker@ubuntu1:~/Desktop/todo/Trac-0.11rc1$