Trac cannot parse win1252 source files

Trac parser is unable to read windows-1252 files in my Trac installation.

When viewing a source file using the changeset view or the browser, each >127 char and its 2 following chars are replaced by 3 garbage bytes interpreted in UTF-8 as the character "?".

You can see an example on this file : https://code.genezys.net/trac/ChuDrive/browser/SDK/ChUI/Scrollbar.cpp?rev=90#L48

This line should be:

// On n'aura plus besoin de ce mécanisme quand les enfants seront rendu en coordonnées relatives

but when Firefox is using UTF-8 as declared in Trac website, it is:

// On n'aura plus besoin de ce m?nisme quand les enfants seront rendu en coordonn? relatives

or when Firefox is manually set to use windows-1252 (which is wrong actually), it is:

// On n'aura plus besoin de ce m�nisme quand les enfants seront rendu en coordonn� relatives

This file is known to be in windows-1252 encoding.

I have version 0.10rc1 manually installed on debian sarge using python 2.3.

comment:1 by trac.edgewall@…, 16 years ago

Milestone: 0.10

I am not a spam… Should I really have to enter all this text in order to be considered a real human being ?

Anyway, setting the milestone to 0.10 as it should be the milestone adding perfect Unicode and encoding support ?

comment:2 by Christian Boos, 16 years ago

Keywords: mimeview charset added
Owner: changed from Jonas Borgström to Christian Boos

Well, it's a bit late for scheduling tickets for 0.10, as the release happened a few hours ago :)

Anyway, Trac needs a hint in this case. You have currently to set the svn:mime-type property for that, e.g.

$ svn pset svn:mime-type "text/x-c++src; charset=cp1252" Srollbar.cpp

In future releases, Trac will also be able to detect Emacs style // -*- coding: cp1252 -*- hint in the first line (#3689).

But what could possibly also be done, is to specify a default charset to use for all text files in a repository. What do you think?

comment:3 by Vincent Robert <trac.edgewall@…>, 16 years ago

Cc: Vincent Robert <trac.edgewall@…> added

Thank you for your reply !

I tried the property on a win1252 file and Trac rendered it correctly. Thanks for the solution.

Unfortunately, it is not an easy to use solution, because of the large number of files. in the repository. I tried to set it on a directory in order to apply the property for a whole directory but Subversion does not support the "svn:mime-type" property on directories.

You talk about specifying a default charset for the repository. It would be a nice idea as all the source files are in win1252. We could then use the "svn:mime-type" property for the specific files not using win1252 (none yet if I remember correctly).

How could this be done ? Is it a property to add to the repository or a setting that could be added to the trac.ini ?

I searched the TracIni page but could not found an option, if it does not yet exist and you plan to add it, may I suggest to name it [mimeviewer].default_encoding

comment:4 by anonymous, 16 years ago

Cc: trac.edgewall@… added; Vincent Robert <trac.edgewall@…> removed

I am not a spam…

comment:5 by Christian Boos, 16 years ago

Milestone: 0.11

[mimeviewer].default_encoding is already used as a general fallback charset, which is not necessarily the same as the one used within a repository.

comment:6 by Christian Boos, 15 years ago


#5824 was closed as duplicate.

comment:7 by Christian Boos, 14 years ago

Milestone: 0.13
Resolution: worksforme
Status: newclosed


Wrong, it's actually default_charset in [trac] section.

For files in the repository, we already have an API for retrieving the mime-type and the charset for a given file (Node.get_content_type). In the case of Subversion, this goes through the usage of the svn:mime-type property (comment:2). For Mercurial, this is yet to be done, see #7160.

For attachments, see #7724.

