Opened 17 years ago
Closed 17 years ago
#6180 closed defect (fixed)
Trac admin's wiki dump command has encode problem
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | normal | Milestone: | 0.10.5 |
Component: | admin/console | Version: | 0.10.4 |
Severity: | normal | Keywords: | unicode |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
When trac wiki contents include pages written in Japanese, trac dump command (on the trac-admin console) is failed.
I think that trac dump command doesn't enough care about encode.
I fixed this problem and made patch.
Please test my patch and If it has no problem, apply this patch.
Attachments (5)
Change History (16)
by , 17 years ago
Attachment: | wiki_dump_encode_fix.patch added |
---|
follow-up: 2 comment:1 by , 17 years ago
I think the sys.stdout.encoding
already caused troubles in another piece of code: sometimes it is None
, and would make the code to fail.
follow-up: 3 comment:2 by , 17 years ago
by , 17 years ago
Attachment: | wiki_dump_encode_fix_2.patch added |
---|
default encode added patch(by a way same way as #4742)
comment:3 by , 17 years ago
by , 17 years ago
Attachment: | wiki_dump_encode_fix_3.patch added |
---|
getattr default argument "None" added
comment:4 by , 17 years ago
Sorry, I forgot 3rd(default) argument of getattr.
I Changed wiki_dump_encode_fix_2.patch: Line:new 827
+ cons_charset = getattr(sys.stdout, 'encoding') or 'utf-8'
To
+ cons_charset = getattr(sys.stdout, 'encoding', None) or 'utf-8'
New patch is wiki_dump_encode_fix_3.patch.
follow-up: 6 comment:5 by , 17 years ago
Keywords: | unicode added |
---|---|
Milestone: | → 0.10.5 |
Content in japanase or japanese page names?
comment:6 by , 17 years ago
Replying to cboos:
Content in japanase or japanese page names?
Japanese page names.
Content in japanese(English page names) is no problem.
follow-up: 8 comment:7 by , 17 years ago
Ok, thanks for the clarification. However, I think that your patch assumes that the console charset is the same as the one used by the filesystem. At least on Windows, this is not true:
>>> sys.stdout.encoding 'cp437' >>> filename = u'héhé' >>> f = file(u'héhé'.encode('cp437'), 'w') >>> f.write('check') >>> f.close()
Explorer shows "h,h,".
Now:
>>> sys.getfilesystemencoding() 'mbcs' >>> f = file(u'héhé'.encode('mbcs'), 'w') >>> f.write('check again') >>> f.close()
Explorer shows "héhé".
Also, it is fine to pass unicode strings to os.path.join
.
comment:8 by , 17 years ago
Replying to cboos:
Thanks.It is as you say exactly.
This patch should care about not only console printing but also file creation.
I fixed my patch again.(wiki_dump_encode_fix_4.patch)
In addition, I changed the timing of getting encoding. New patch gets encodings outside a loop for efficiency.
by , 17 years ago
Attachment: | wiki_dump_encode_fix_4.patch added |
---|
fix both console output encoding and file system encoding
follow-up: 10 comment:9 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Well, I think there's a mix of concerns here… perhaps I introduced the confusion with my comments, sorry for that.
If the idea is simply to get "flat" filenames that can be "wiki dumped" and "wiki loaded", then urllib.quote
of any encoding will do. Using a fixed UTF-8 encoding would even be preferable for portability reasons.
If, OTOH, what we would like to achieve is writing filenames that look on the file system like the original page names, then using sys.getfilesystemencoding
is fine. But there are additional concerns:
- we would need to filter any character which are not allowed in filenames (like \ / : * ? < > | on windows).
- we would have to deal with sub-pages, either by finding a way to "encode" the "/" separator in flat filenames or create subfolders. The latter would be interesting, but then there's the complication of having to deal with conflict between filenames and folders (e.g. page A and folder A needed for writing A/B).
So all in all, I'll favor the first approach, which is much easier to deal with. Would something like attachment:utf8_wiki_dump_load.diff work for you?
by , 17 years ago
Attachment: | utf8_wiki_dump_load.diff added |
---|
Correctly dump/load unicode WikiPageNames (patch on 0.10.5dev)
comment:10 by , 17 years ago
Thanks. I also think that the first approach is good.
Your patch (utf8_wiki_dump_load.diff) has been worked correctly on my environment.
At first, I wanted to dump them by a human comprehensible name where the pages was able to be expressed with OS.
But now, I understand there are a problems not easy there.
P.S.
I will adopt the following solutions in my project.
We keep the rule for naming wiki pages. Under this rule , the character that can be used for the name is limited. It's local rule for my project.
I apply your patch, and I write the file name conversion script. ( from quoted utf-8 to os filesystem encoding )
And I apply my script to dumped files.
comment:11 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Patch file to fix this problem