Modify ↓
Opened 18 years ago
Closed 18 years ago
#4752 closed defect (worksforme)
Trac UTF-8 autodetection doesn't work when no BOM present
Reported by: | anonymous | Owned by: | Jonas Borgström |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | general | Version: | |
Severity: | normal | Keywords: | |
Cc: | nslater@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
A text file encoded with UTF-8 but no BOM is displayed as ISO-8859-15 in the Trac browser.
Steps to reproduce:
- Create a text file and insert the Unicode Character 'RIGHT SINGLE QUOTATION MARK' (U+2019)
- Check into the repository.
- View the file in Trac.
- Observe mangled ISO-8859-15 output that might render like "â"
Tips for bug hunters:
- If the MIME-Type is not set by Trac the Mimeview.get_charset function calls detect_unicode
- detect_unicode checks for the following sequence of bytes: '\xef\xbb\xbf'
- UTF-8 is not required to have a BOM [1] and hence this check fails.
Possible fix:
Decode the string as 'UTF-8' as a last attempt. Check this doesn't throw an EncodingException.
Attachments (0)
Change History (2)
comment:1 by , 18 years ago
Cc: | added |
---|
comment:2 by , 18 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
The "default_charset" setting in trac.ini allows you to configure this, so you can change it to utf-8 instead if that's more suitable for your files.
For information on more sophisticated charset detection see #4080.
Note:
See TracTickets
for help on using tickets.
Oops, I forgot to include my name/email.