Opened 17 years ago
Closed 14 years ago
#7160 closed defect (fixed)
Problems with character encoding
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | high | Milestone: | plugin - mercurial |
Component: | plugin/mercurial | Version: | 0.11.5 |
Severity: | major | Keywords: | unicode utf-8 mercurial hg |
Cc: | alvaro.justen@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
I have a mercurial repo and I'm using trac to manage the project.
The problem is that, as spanish is my first language, we write alot of acute vocals, and other chars that are not in the ASCII charset. So we use UTF-8 (all the files are in UTF-8 format).
The thing is that trac works great, but when trying to browse the repo, even though the brower works great, it doesn't print the exxtended characters as it should. For examplee, my name taken from the author information from hg would look like: Mart?n Marqu?s.
The contents of the repo also have the same problem. For example:
Creacón → Creación catálogo → catálogo
Looking at the hg repo, everything is OK, but trac doesn't render it correctly.
If you need further information, please ask.
Attachments (0)
Change History (20)
comment:1 by , 16 years ago
Milestone: | → not applicable |
---|
comment:2 by , 16 years ago
Keywords: | unicode added |
---|
Also, how did you configure your [trac] default_charset
TracIni entry?
In any case, as Remy said, you're strongly advised to upgrade to 0.11 and the corresponding Mercurial plugin, because the 0.10.x version (and Trac 0.10.x as well) is in "low maintenance mode".
comment:4 by , 16 years ago
Version: | 0.10.4 → 0.11.1 |
---|
Upgraded quite a few months ago (on Debian testing):
trac 0.11.1-2 trac-mercurial 0.11.0.5dev~svnr7354-2
Same problem visualizing the acute vocals.
Also changed default_charset to UTF-8 in trac.ini, with no luck (restarted apache just in case).
follow-up: 20 comment:5 by , 16 years ago
Priority: | normal → high |
---|---|
Severity: | minor → major |
We need to better handle arbitrary character sets in TracMercurial, at different levels:
- content encoding (this ticket)
- filename encoding (#7799, #8018, #8538)
- meta-data encoding (#7694, #7217)
The common point for all cases is that Mercurial by itself doesn't care about encodings, it simply stores the bytes as they come in (by design). So the sensible thing to do here is:
- make it possible to configure which encoding must be used in which situation (content, filename, meta-data), all falling back on default_charset. One concrete example would be a repository created on Windows, with the filenames encoded using whatever is the current codepage and utf-8 content).
- we must use robust conversion, as nothing guarantees that the data in the Mercurial repository will be always consistent w.r.t the chosen encoding.
comment:6 by , 16 years ago
You should set / add
HGENCODING=utf-8
environment variable, then restart the trac daemon or web server. Other possible way is (if you are using web server) to define bin-environment variable. The example is from my lighttpd server configuration file
.... "bin-environment" => ("TRAC_ENV_PARENT_DIR" => "/var/lib/trac/" , "LC_TIME" => "bg_BG.UTF-8", "PYTHON_EGG_CACHE" => "/tmp/.python_eggs", "HGENCODING" => "utf-8") ....
comment:7 by , 16 years ago
OK, but should I set HGENCODING en /etc/profile?
Can't test it at the moment. For some reason the Debian trac-mercurial plugin seems to have a bug in the browse section. Think I'll have to create a new ticket. :-(
comment:8 by , 16 years ago
I can't tell about Debian, but in Gentoo mercurial has environment setting located in /etc/env.d/80mercurial. This file looks like this
HG=/usr/bin/hg HGENCODING=utf-8
comment:9 by , 16 years ago
Setting HGENCODING in apache, /etc/profile, etc. doesn't help. I keep seeing my name (Martín Marqués) with ? in the acute vocals.
comment:12 by , 16 years ago
I solved problem by editing backend.py in trac-mercurial. I've added there os.environHGENCODING = "UTF-8". Using mod_python with apache doesn't send environment variables, so SetEnv HGENCODING UTF-8 doesn't work.
comment:14 by , 15 years ago
Cc: | added |
---|---|
Keywords: | utf-8 mercurial hg added |
Version: | 0.11.1 → 0.11.5 |
I just added:
os.environ["HGENCODING"] = "UTF-8"
to mercurial-plugin-0.11/tracext/hg/backend.py
and it worked!
In my case I have all of my things UTF-8. But for people that don't use UTF-8 it won't work. Is there an way to get Hg charset? (I don't know about using Hg in Python programs)
follow-up: 16 comment:15 by , 15 years ago
Milestone: | not applicable → mercurial-plugin |
---|
comment:16 by , 15 years ago
Replying to cboos:
I also tried adding os.environ["HGENCODING"] = "utf-8"
just next to import os
in backend.py, but with no success.
This is with Trac 0.11.7. I still see UTF-8 strings coming from mercurial being interpreted as ISO_8859-1.
follow-up: 18 comment:17 by , 14 years ago
I'm getting this error when browsing the source, I don't know if this is related to this ticket:
12:10:59 PM Trac[main] ERROR: Internal Server Error: Traceback (most recent call last): File "/home/ismael/trac-trunk/trac/web/main.py", line 513, in _dispatch_request dispatcher.dispatch(req) File "/home/ismael/trac-trunk/trac/web/main.py", line 235, in dispatch resp = chosen_handler.process_request(req) File "/home/ismael/trac-trunk/trac/versioncontrol/web_ui/browser.py", line 370, in process_request node = get_existing_node(req, repos, path, rev_or_latest) File "/home/ismael/trac-trunk/trac/versioncontrol/web_ui/util.py", line 61, in get_existing_node return repos.get_node(path, rev) File "/home/ismael/mercu/mercurial-plugin/tracext/hg/backend.py", line 557, in get_node self.hg_node(rev)) File "/home/ismael/mercu/mercurial-plugin/tracext/hg/backend.py", line 682, in __init__ self._init_path(log, path.encode('utf-8')) File "/home/ismael/mercu/mercurial-plugin/tracext/hg/backend.py", line 731, in _init_path dirnodes = self.findnode(log.rev(self.n), [dir,]) File "/home/ismael/mercu/mercurial-plugin/tracext/hg/backend.py", line 697, in findnode if f.startswith(d): File "/home/ismael/mercu/mercurial-plugin/tracext/hg/backend.py", line 697, in findnode if f.startswith(d): File "/usr/lib/python2.5/bdb.py", line 48, in trace_dispatch return self.dispatch_line(frame)
When I debug, the problematic file is this one.
(Pdb) f 'main/core/domain/sheldon/mvno/authentication/EMPRESA\xe2\x80\x93CIFB84675529.cert'
It was the same file for different paths. I have fixed adding a
I'm using 12.0 and latest version of mercurial plugin.
comment:18 by , 14 years ago
Replying to Ismael de Esteban <ismael@…>:
Sorry I saw this issue in #9631
comment:20 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Replying to cboos:
We need to better handle arbitrary character sets in TracMercurial, at different levels:
- content encoding (this ticket)
This seems to work now with r10490 / r10491. See also #9631 for the "multiple encoding" approach.
This one (#8538) I'll keep open until all glitches are fixed.
Also benefits from the multiple encoding approach of #9631, but not yet finished.
The common point for all cases is that Mercurial by itself doesn't care about encodings, it simply stores the bytes as they come in (by design). So the sensible thing to do here is:
- make it possible to configure which encoding must be used in which situation (content, filename, meta-data), all falling back on default_charset. One concrete example would be a repository created on Windows, with the filenames encoded using whatever is the current codepage and utf-8 content).
Well, actually this is handled in a generic way by the [hg] encoding
setting which can accept multiple encodings if needed (see #9631). By default it's "utf-8" and regardless of the value of the setting a fallback of "latin1" will always be used if any other encoding has failed. That way, we are guaranteed to never trigger errors, at the cost of eventually having latin1 mangled characters if the correct encoding was not part of the list.
- we must use robust conversion, as nothing guarantees that the data in the Mercurial repository will be always consistent w.r.t the chosen encoding.
This should be achieved by r10491.
Could you please test with 0.11.1 and the latest
mercurial-plugin
?