Edgewall Software
Modify

Ticket #5607 (closed defect: worksforme)

Opened 5 years ago

Last modified 22 months ago

RestructuredText preview doesn't handle utf-8

Reported by: Dave Abrahams <dave@…> Owned by: cboos
Priority: normal Milestone:
Component: wiki system Version: devel
Severity: normal Keywords:
Cc:
Release Notes:
API Changes:

Description

Check a utf-8 ReST document containing unicode curly quotes into svn, look at the document in the browser, see garbage characters. Isn't there some way to automatically detect the encoding? Emacs does it most of the time.

Attachments

utf8.rst (14 bytes) - added by Dave Abrahams <dave@…> 3 years ago.
ReST file with utf-8 curly quotes

Download all attachments as: .zip

Change History

comment:1 Changed 3 years ago by rblank

  • Keywords needinfo added

Is this still an issue with the current 0.11.1 version and Pygments? If yes, could you please attach a sample ReST file that shows the problem?

Changed 3 years ago by Dave Abrahams <dave@…>

ReST file with utf-8 curly quotes

comment:2 Changed 3 years ago by Dave Abrahams <dave@…>

  • Resolution set to fixed
  • Status changed from new to closed

Appears to be fixed as the attachment shows.

comment:3 Changed 3 years ago by rblank

  • Keywords needinfo removed

Actually, this is a configuration issue. When no charset information is available to display a text file, Trac uses the [trac] default_charset configuration option to convert the file to utf-8. This site is most probably configured with default_charset=utf-8, hence the attachment is displayed properly. Changing the setting to default_charset=iso-8859-15 (the default) will show the problem you describe.

The ticket description mentions files checked into SVN, though. If for some reason you can't set default_charset=utf-8 on your site, you can add an svn:mime-type property to your files and specify the charset. For example, a ReST file would have the following MIME type:

text/x-rst;charset=utf-8

This will override the default_charset setting.

There is currently no way of doing the same for attachments, although it has been requested in #7724.

comment:4 follow-up: Changed 3 years ago by Dave Abrahams <dave@…>

Awesome; that worked! Thanks for the explanation.

2 follow up questions:

  1. Would utf-8 be a superior default?
  2. Is this information documented somewhere?

comment:5 in reply to: ↑ 4 Changed 3 years ago by rblank

Replying to Dave Abrahams <dave@…>:

  1. Would utf-8 be a superior default?

It depends on what encoding most of your files use. That will leave you less files to "tag" with an svn:mime-type property.

Personally, I don't understand why everybody isn't using utf-8 already. I can't see a downside.

  1. Is this information documented somewhere?

default_charset is obviously documented in TracIni. The svn:mime-type with charset was discussed on the SVN developer mailing list some time ago, but I couldn't find any mention about it in the documentation.

And so you are warned: you'll not be able to set the charset in the [auto-props] section of your SVN configuration, as ';' is used to separate properties in that file (see this post). You'll have to set the property manually with svn pset. One more reason to set a sensible default_charset.

I'll add a section to TracBrowser about svn:mime-type.

comment:6 Changed 22 months ago by cboos

  • Resolution changed from fixed to worksforme
View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
to The owner will be changed from cboos. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.