Ticket #4717 (closed defect: fixed)
Opened 5 years ago
Last modified 13 months ago
wrong table cell width with multibyte in ticket notify
| Reported by: | xuefer@… | Owned by: | jomae |
|---|---|---|---|
| Priority: | high | Milestone: | 0.12.2 |
| Component: | general | Version: | 0.10.3 |
| Severity: | minor | Keywords: | notification unicode patch needfixup |
| Cc: | shunichi.goto@…, jun66j5@… | ||
| Release Notes: |
Improved e-mail notifications for CJK users and new [notification] ambiguous_char_width = single/double setting. |
||
| API Changes: | |||
Description
Expected:
Priority: major | Component: Others Keywords: tester xxxxxxxxxxxxx | Username: abc
Unxpected:
Priority: major | Component: Others Keywords: tester xxxxxxxxxxxxx | Username: abc
where xxxxxxxxxxxxx is 7 Chinese characters, in 14 'w' width
Attachments
Change History
comment:1 Changed 5 years ago by mgood
- Milestone 0.10.4 deleted
- Resolution set to wontfix
- Status changed from new to closed
comment:2 Changed 5 years ago by xuefer@…
- Resolution wontfix deleted
- Status changed from closed to reopened
i'm sure using fixed-width font, and each Chinese character is in the same width as 2 ascii, which IS expected, even u use fixed-width font in html. i agree that using html table can solve the problem, because browser calc the width correctly, but not trac. that's why i ask tach to fix itself, calc the text width, not character count (count of code point)
if u know php, there is mb_strwidth to do the job.
i'm not familar with python, but it seems there's:
east_asian_width( unichr) Returns the east asian width assigned to the Unicode character unichr as string. New in version 2.4.
One shall never suppose 1 Chinese char is same with as 1 ascii char, in fixed-width font.
comment:3 Changed 5 years ago by cboos
- Component changed from ticket system to general
- Keywords notification unicode added
- Milestone set to 0.12
- Severity changed from normal to minor
east_asian_width(unichr) - New in version 2.4
Maybe we can bump the Python requirement in the next version of Trac?
(btw, mgood, tee from itertools is also new in 2.4)
comment:4 follow-up: ↓ 5 Changed 3 years ago by cboos
- Owner changed from jonas to cboos
- Status changed from reopened to new
Note for self: reuse this code
from unicodedata import east_asian_width as eaw _colwidth = lambda s: sum([eaw(c) in 'WF' and 2 or 1 for c in s])
(from Shun-ichi Goto on Mercurial-devel, used with permission)
where xxxxxxxxxxxxx is 7 Chinese characters, in 14 'w' width
maybe a real example would help...
comment:5 in reply to: ↑ 4 Changed 3 years ago by Shun-ichi Goto <shunichi.goto@…>
- Cc shunichi.goto@… added
Replying to cboos:
where xxxxxxxxxxxxx is 7 Chinese characters, in 14 'w' width
maybe a real example would help...
For example in Japanese, the text "日本語" (means "Japanese") has 3 characters and
occupies 6 column width.
>>> len(u'日本語') 3 >>> _colwidth(u'日本語') 6
Expected (left:
Priority: major | Component: Others Keywords: tester 日本語 | Username: abc
Unwanted (extra 3 spaces):
Priority: major | Component: Others Keywords: tester 日本語 | Username: abc
comment:6 follow-up: ↓ 7 Changed 3 years ago by cboos
Note that it doesn't look like browsers are respecting this 3 asian characters -> 6 column width convention, at least when using the default fixed font, as can be seen in the above example. Same goes for Thunderbird when displaying the notification mail for comment:5.
comment:7 in reply to: ↑ 6 Changed 3 years ago by Shun-ichi Goto <shunichi.goto@…>
Replying to cboos:
Note that it doesn't look like browsers are respecting this 3 asian characters -> 6 column width convention, at least when using the default fixed font, as can be seen in the above example.
I guess it is issue of renderer of application on using multiple fonts (for ASCII and Japanese).
For example, if you use MSゴシック (msgothic) font on windows, you will see good
looking because it is a fixed-width font having both ASCII and Japanese glyphs.
Changed 21 months ago by jomae
Changed 21 months ago by jomae
- Attachment broken-table-cell-width.png added
Changed 21 months ago by jomae
- Attachment fixed-table-cell-width.png added
comment:8 Changed 21 months ago by jomae
- Cc jun66j5@… added
I created a patch 4717.diff for the issue using unicodedata.east_asian_width.
New [notification] ambiguous_char_width option is like ambiwidth option of Vim. CJK (Chinese, Japanese and Korean) users need this option and specify 'double'. Default of the option is 'single', the most users don't need to specify the option.
If you want to know about Ambiguous Characters, visit http://www.unicode.org/reports/tr11/.
Trac 0.12dev-r9679:

Trac 0.12dev-r9679 with the patch:

comment:9 Changed 21 months ago by cboos
- Milestone changed from next-major-0.1X to 0.12.1
Thanks for the patch, it looks good except for a few details:
- from unicodedata import east_asian_width should be among the first block of imports, as it's a module coming from stock Python
- I didn't count, but it looks like you have some lines > 79 (single width ;-) ) chars; please check our TracDev/CodingStyle
Finally, I think the setting makes sense in a locally deployed Trac instance, where most users are either CJK ones or not. But ultimately this should become a user setting, in the notification preferences (#4056).
comment:10 follow-up: ↓ 12 Changed 17 months ago by cboos
- Keywords patch added
- Milestone changed from 0.12.1 to next-minor-0.12.x
- Priority changed from normal to high
comment:11 Changed 16 months ago by cboos
- Keywords needfixup added
Changed 16 months ago by jomae
- Attachment t4717-using-east-asian-width-r10051.diff added
Refreshed t4717.diff to [10051/branches/0.12-stable]
comment:12 in reply to: ↑ 10 Changed 16 months ago by jomae
comment:13 Changed 16 months ago by Andrew C Martin <andrew.c.martin@…>
per cboos' request in ticket:2378#comment:15, I reviewed and tested attachment:t4717-using-east-asian-width-r10051.diff. The patch produced no regressions in my testing and the code changes looked fine. I didn't test with any locales other than "en_US.UTF-8".
Nice touch incorporating the obfuscated email length in the table width calculations!
comment:14 Changed 14 months ago by cboos
- Milestone changed from next-minor-0.12.x to 0.12.2
- Owner changed from cboos to jomae
- Release Notes modified (diff)
Thanks Jun and Andrew.
Patch committed in r10393.
comment:15 follow-up: ↓ 18 Changed 14 months ago by cboos
- Resolution set to fixed
- Status changed from new to closed
And minor tweaks done in r10394 (to_unicode(t) is a no-op if t is already unicode).
A unit-test would have been nice, feel free to contribute one later, Jun ;-)
comment:16 Changed 14 months ago by thijstriemstra
would be good to have a "Since 0.12.2" note in the doc(string).
comment:17 Changed 14 months ago by thijstriemstra
sorry, missed r10394, nevermind :)
comment:18 in reply to: ↑ 15 Changed 13 months ago by jomae
Replying to cboos:
A unit-test would have been nice, feel free to contribute one later, Jun ;-)
I create t4717-unittest.diff for #4717 and #9494. Please apply if you can :)
comment:19 Changed 13 months ago by rblank
(Perfect) patch applied in [10413]. Thanks!



So, you're saying that the characters in question are displayed in a font such that each character is wider than an ascii character. For plaintext mail formatting we expect the emails to be displayed in a fixed-width font so that all characters are the same width. If you can, configure your mail client to use a font where the characters will all be the same width. If you can't do that, I'm afraid there's no way to work around that for the plaintext mails. HTML email formatting has been requested, and would enable better formatting with variable-width fonts. See #2625.