Edgewall Software
Modify

Ticket #7160 (new defect)

Opened 2 years ago

Last modified 4 months ago

Problems with character encoding

Reported by: Martin <martin@…> Owned by: cboos
Priority: high Milestone: plugin - mercurial
Component: plugin/mercurial Version: 0.11.5
Severity: major Keywords: unicode utf-8 mercurial hg
Cc: alvaro.justen@…

Description

I have a mercurial repo and I'm using trac to manage the project.

The problem is that, as spanish is my first language, we write alot of acute vocals, and other chars that are not in the ASCII charset. So we use UTF-8 (all the files are in UTF-8 format).

The thing is that trac works great, but when trying to browse the repo, even though the brower works great, it doesn't print the exxtended characters as it should. For examplee, my name taken from the author information from hg would look like: Mart?n Marqu?s.

The contents of the repo also have the same problem. For example:

Creacón -> Creación catálogo -> catálogo

Looking at the hg repo, everything is OK, but trac doesn't render it correctly.

If you need further information, please ask.

Attachments

Change History

comment:1 Changed 23 months ago by rblank

  • Milestone set to not applicable

Could you please test with 0.11.1 and the latest mercurial-plugin?

comment:2 Changed 23 months ago by cboos

  • Keywords unicode added

Also, how did you configure your [trac] default_charset TracIni entry?

In any case, as Remy said, you're strongly advised to upgrade to 0.11 and the corresponding Mercurial plugin, because the 0.10.x version (and Trac 0.10.x as well) is in "low maintenance mode".

comment:3 Changed 23 months ago by cboos

See also #3809.

comment:4 Changed 23 months ago by anonymous

  • Version changed from 0.10.4 to 0.11.1

Upgraded quite a few months ago (on Debian testing):

trac                                 0.11.1-2 
trac-mercurial                       0.11.0.5dev~svnr7354-2 

Same problem visualizing the acute vocals.

Also changed default_charset to UTF-8 in trac.ini, with no luck (restarted apache just in case).

comment:5 Changed 19 months ago by cboos

  • Priority changed from normal to high
  • Severity changed from minor to major

We need to better handle arbitrary character sets in TracMercurial, at different levels:

The common point for all cases is that Mercurial by itself doesn't care about encodings, it simply stores the bytes as they come in (by design). So the sensible thing to do here is:

  • make it possible to configure which encoding must be used in which situation (content, filename, meta-data), all falling back on default_charset. One concrete example would be a repository created on Windows, with the filenames encoded using whatever is the current codepage and utf-8 content).
  • we must use robust conversion, as nothing guarantees that the data in the Mercurial repository will be always consistent w.r.t the chosen encoding.

comment:6 Changed 19 months ago by anonymous

You should set / add

HGENCODING=utf-8

environment variable, then restart the trac daemon or web server. Other possible way is (if you are using web server) to define bin-environment variable. The example is from my lighttpd server configuration file

....
 "bin-environment" =>
     ("TRAC_ENV_PARENT_DIR" => "/var/lib/trac/" ,
     "LC_TIME" => "bg_BG.UTF-8",
     "PYTHON_EGG_CACHE" => "/tmp/.python_eggs",
     "HGENCODING" => "utf-8")
....

comment:7 Changed 19 months ago by martin@…

OK, but should I set HGENCODING en /etc/profile?

Can't test it at the moment. For some reason the Debian trac-mercurial plugin seems to have a bug in the browse section. Think I'll have to create a new ticket. :-(

comment:8 Changed 19 months ago by anonymous

I can't tell about Debian, but in Gentoo mercurial has environment setting located in /etc/env.d/80mercurial. This file looks like this

HG=/usr/bin/hg
HGENCODING=utf-8

comment:9 Changed 17 months ago by martin@…

Setting HGENCODING in apache, /etc/profile, etc. doesn't help. I keep seeing my name (Martín Marqués) with ? in the acute vocals.

comment:10 Changed 15 months ago by IanMLewis@…

Did you try changing the default_charset in the trac.ini to utf-8?

comment:11 Changed 15 months ago by anonymous

Comment: 5 had the answers. I had to hack the code though.

comment:12 Changed 14 months ago by anonymous

I solved problem by editing backend.py in trac-mercurial. I've added there os.environHGENCODING? = "UTF-8". Using mod_python with apache doesn't send environment variables, so SetEnv HGENCODING UTF-8 doesn't work.

comment:13 Changed 14 months ago by anonymous

os.environ["HGENCODING"] = "UTF-8"

comment:14 Changed 13 months ago by Álvaro Justen <alvaro.justen@…>

  • Cc alvaro.justen@… added
  • Keywords utf-8 mercurial hg added
  • Version changed from 0.11.1 to 0.11.5

I just added:

os.environ["HGENCODING"] = "UTF-8"

to mercurial-plugin-0.11/tracext/hg/backend.py and it worked!

In my case I have all of my things UTF-8. But for people that don't use UTF-8 it won't work. Is there an way to get Hg charset? (I don't know about using Hg in Python programs)

comment:15 follow-up: ↓ 16 Changed 9 months ago by cboos

  • Milestone changed from not applicable to mercurial-plugin

comment:16 in reply to: ↑ 15 Changed 4 months ago by anonymous

Replying to cboos:

I also tried adding os.environ["HGENCODING"] = "utf-8" just next to import os in backend.py, but with no success.

This is with Trac 0.11.7. I still see UTF-8 strings coming from mercurial being interpreted as ISO_8859-1.

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as new
as The resolution will be set. Next status will be 'closed'
to The owner will be changed from cboos. Next status will be 'new'
The owner will be changed from cboos to anonymous. Next status will be 'assigned'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.