Edgewall Software
Modify

Opened 17 years ago

Closed 17 years ago

#6180 closed defect (fixed)

Trac admin's wiki dump command has encode problem

Reported by: kondo@… Owned by: Christian Boos
Priority: normal Milestone: 0.10.5
Component: admin/console Version: 0.10.4
Severity: normal Keywords: unicode
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description

When trac wiki contents include pages written in Japanese, trac dump command (on the trac-admin console) is failed.

I think that trac dump command doesn't enough care about encode.

I fixed this problem and made patch.

Please test my patch and If it has no problem, apply this patch.

Attachments (5)

wiki_dump_encode_fix.patch (859 bytes ) - added by kondo@… 17 years ago.
Patch file to fix this problem
wiki_dump_encode_fix_2.patch (882 bytes ) - added by kondo@… 17 years ago.
default encode added patch(by a way same way as #4742)
wiki_dump_encode_fix_3.patch (888 bytes ) - added by kondo@… 17 years ago.
getattr default argument "None" added
wiki_dump_encode_fix_4.patch (1.3 KB ) - added by kondo@… 17 years ago.
fix both console output encoding and file system encoding
utf8_wiki_dump_load.diff (2.2 KB ) - added by Christian Boos 17 years ago.
Correctly dump/load unicode WikiPageNames (patch on 0.10.5dev)

Download all attachments as: .zip

Change History (16)

by kondo@…, 17 years ago

Attachment: wiki_dump_encode_fix.patch added

Patch file to fix this problem

comment:1 by Emmanuel Blot, 17 years ago

I think the sys.stdout.encoding already caused troubles in another piece of code: sometimes it is None, and would make the code to fail.

in reply to:  1 ; comment:2 by Emmanuel Blot, 17 years ago

Replying to eblot:

I think the sys.stdout.encoding already caused troubles in another piece of code: sometimes it is None, and would make the code to fail.

Found it: #4742.

by kondo@…, 17 years ago

default encode added patch(by a way same way as #4742)

in reply to:  2 comment:3 by kondo@…, 17 years ago

Replying to eblot:

I think the sys.stdout.encoding already caused troubles in another piece of code: sometimes it is None, and would make the code to fail.

Thanks. I didn't know such problem. I fixed my patch by a way same way as #4742.

by kondo@…, 17 years ago

getattr default argument "None" added

comment:4 by anonymous, 17 years ago

Sorry, I forgot 3rd(default) argument of getattr.

I Changed wiki_dump_encode_fix_2.patch: Line:new 827

+            cons_charset = getattr(sys.stdout, 'encoding') or 'utf-8'

To

+            cons_charset = getattr(sys.stdout, 'encoding', None) or 'utf-8'

New patch is wiki_dump_encode_fix_3.patch.

comment:5 by Christian Boos, 17 years ago

Keywords: unicode added
Milestone: 0.10.5

Content in japanase or japanese page names?

in reply to:  5 comment:6 by kondo@…, 17 years ago

Replying to cboos:

Content in japanase or japanese page names?

Japanese page names.

Content in japanese(English page names) is no problem.

comment:7 by Christian Boos, 17 years ago

Ok, thanks for the clarification. However, I think that your patch assumes that the console charset is the same as the one used by the filesystem. At least on Windows, this is not true:

>>> sys.stdout.encoding
'cp437'
>>> filename = u'héhé'
>>> f = file(u'héhé'.encode('cp437'), 'w')
>>> f.write('check')
>>> f.close()

Explorer shows "h,h,".

Now:

>>> sys.getfilesystemencoding()
'mbcs'
>>> f = file(u'héhé'.encode('mbcs'), 'w')
>>> f.write('check again')
>>> f.close()

Explorer shows "héhé".

Also, it is fine to pass unicode strings to os.path.join.

in reply to:  7 comment:8 by kondo@…, 17 years ago

Replying to cboos:

Thanks.It is as you say exactly.

This patch should care about not only console printing but also file creation.

I fixed my patch again.(wiki_dump_encode_fix_4.patch)

In addition, I changed the timing of getting encoding. New patch gets encodings outside a loop for efficiency.

by kondo@…, 17 years ago

fix both console output encoding and file system encoding

comment:9 by Christian Boos, 17 years ago

Owner: changed from Christopher Lenz to Christian Boos
Status: newassigned

Well, I think there's a mix of concerns here… perhaps I introduced the confusion with my comments, sorry for that.

If the idea is simply to get "flat" filenames that can be "wiki dumped" and "wiki loaded", then urllib.quote of any encoding will do. Using a fixed UTF-8 encoding would even be preferable for portability reasons.

If, OTOH, what we would like to achieve is writing filenames that look on the file system like the original page names, then using sys.getfilesystemencoding is fine. But there are additional concerns:

  • we would need to filter any character which are not allowed in filenames (like \ / : * ? < > | on windows).
  • we would have to deal with sub-pages, either by finding a way to "encode" the "/" separator in flat filenames or create subfolders. The latter would be interesting, but then there's the complication of having to deal with conflict between filenames and folders (e.g. page A and folder A needed for writing A/B).

So all in all, I'll favor the first approach, which is much easier to deal with. Would something like attachment:utf8_wiki_dump_load.diff work for you?

by Christian Boos, 17 years ago

Attachment: utf8_wiki_dump_load.diff added

Correctly dump/load unicode WikiPageNames (patch on 0.10.5dev)

in reply to:  9 comment:10 by kondo@…, 17 years ago

Thanks. I also think that the first approach is good.

Your patch (utf8_wiki_dump_load.diff) has been worked correctly on my environment.

At first, I wanted to dump them by a human comprehensible name where the pages was able to be expressed with OS.

But now, I understand there are a problems not easy there.

P.S.

I will adopt the following solutions in my project.

We keep the rule for naming wiki pages. Under this rule , the character that can be used for the name is limited. It's local rule for my project.

I apply your patch, and I write the file name conversion script. ( from quoted utf-8 to os filesystem encoding )

And I apply my script to dumped files.

comment:11 by Christian Boos, 17 years ago

Resolution: fixed
Status: assignedclosed

Fixed in r6059 in 0.10-stable and r6060 in trunk.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.