Ticket #7160 (new defect)
Problems with character encoding
| Reported by: | Martin <martin@…> | Owned by: | cboos |
|---|---|---|---|
| Priority: | high | Milestone: | plugin - mercurial |
| Component: | plugin/mercurial | Version: | 0.11.5 |
| Severity: | major | Keywords: | unicode utf-8 mercurial hg |
| Cc: | alvaro.justen@… |
Description
I have a mercurial repo and I'm using trac to manage the project.
The problem is that, as spanish is my first language, we write alot of acute vocals, and other chars that are not in the ASCII charset. So we use UTF-8 (all the files are in UTF-8 format).
The thing is that trac works great, but when trying to browse the repo, even though the brower works great, it doesn't print the exxtended characters as it should. For examplee, my name taken from the author information from hg would look like: Mart?n Marqu?s.
The contents of the repo also have the same problem. For example:
Creacón -> Creación catálogo -> catálogo
Looking at the hg repo, everything is OK, but trac doesn't render it correctly.
If you need further information, please ask.
Attachments
Change History
comment:2 Changed 23 months ago by cboos
- Keywords unicode added
Also, how did you configure your [trac] default_charset TracIni entry?
In any case, as Remy said, you're strongly advised to upgrade to 0.11 and the corresponding Mercurial plugin, because the 0.10.x version (and Trac 0.10.x as well) is in "low maintenance mode".
comment:4 Changed 23 months ago by anonymous
- Version changed from 0.10.4 to 0.11.1
Upgraded quite a few months ago (on Debian testing):
trac 0.11.1-2 trac-mercurial 0.11.0.5dev~svnr7354-2
Same problem visualizing the acute vocals.
Also changed default_charset to UTF-8 in trac.ini, with no luck (restarted apache just in case).
comment:5 Changed 19 months ago by cboos
- Priority changed from normal to high
- Severity changed from minor to major
We need to better handle arbitrary character sets in TracMercurial, at different levels:
The common point for all cases is that Mercurial by itself doesn't care about encodings, it simply stores the bytes as they come in (by design). So the sensible thing to do here is:
- make it possible to configure which encoding must be used in which situation (content, filename, meta-data), all falling back on default_charset. One concrete example would be a repository created on Windows, with the filenames encoded using whatever is the current codepage and utf-8 content).
- we must use robust conversion, as nothing guarantees that the data in the Mercurial repository will be always consistent w.r.t the chosen encoding.
comment:6 Changed 19 months ago by anonymous
You should set / add
HGENCODING=utf-8
environment variable, then restart the trac daemon or web server. Other possible way is (if you are using web server) to define bin-environment variable. The example is from my lighttpd server configuration file
....
"bin-environment" =>
("TRAC_ENV_PARENT_DIR" => "/var/lib/trac/" ,
"LC_TIME" => "bg_BG.UTF-8",
"PYTHON_EGG_CACHE" => "/tmp/.python_eggs",
"HGENCODING" => "utf-8")
....
comment:7 Changed 19 months ago by martin@…
OK, but should I set HGENCODING en /etc/profile?
Can't test it at the moment. For some reason the Debian trac-mercurial plugin seems to have a bug in the browse section. Think I'll have to create a new ticket. :-(
comment:8 Changed 19 months ago by anonymous
I can't tell about Debian, but in Gentoo mercurial has environment setting located in /etc/env.d/80mercurial. This file looks like this
HG=/usr/bin/hg HGENCODING=utf-8
comment:9 Changed 17 months ago by martin@…
Setting HGENCODING in apache, /etc/profile, etc. doesn't help. I keep seeing my name (Martín Marqués) with ? in the acute vocals.
comment:10 Changed 15 months ago by IanMLewis@…
Did you try changing the default_charset in the trac.ini to utf-8?
comment:11 Changed 15 months ago by anonymous
Comment: 5 had the answers. I had to hack the code though.
comment:12 Changed 14 months ago by anonymous
I solved problem by editing backend.py in trac-mercurial. I've added there os.environHGENCODING? = "UTF-8". Using mod_python with apache doesn't send environment variables, so SetEnv HGENCODING UTF-8 doesn't work.
comment:13 Changed 14 months ago by anonymous
os.environ["HGENCODING"] = "UTF-8"
comment:14 Changed 13 months ago by Álvaro Justen <alvaro.justen@…>
- Cc alvaro.justen@… added
- Keywords utf-8 mercurial hg added
- Version changed from 0.11.1 to 0.11.5
I just added:
os.environ["HGENCODING"] = "UTF-8"
to mercurial-plugin-0.11/tracext/hg/backend.py and it worked!
In my case I have all of my things UTF-8. But for people that don't use UTF-8 it won't work. Is there an way to get Hg charset? (I don't know about using Hg in Python programs)
comment:15 follow-up: ↓ 16 Changed 9 months ago by cboos
- Milestone changed from not applicable to mercurial-plugin
comment:16 in reply to: ↑ 15 Changed 4 months ago by anonymous
Replying to cboos:
I also tried adding os.environ["HGENCODING"] = "utf-8" just next to import os in backend.py, but with no success.
This is with Trac 0.11.7. I still see UTF-8 strings coming from mercurial being interpreted as ISO_8859-1.



Could you please test with 0.11.1 and the latest mercurial-plugin?