Edgewall Software
Modify

Opened 18 years ago

Closed 18 years ago

#4752 closed defect (worksforme)

Trac UTF-8 autodetection doesn't work when no BOM present

Reported by: anonymous Owned by: Jonas Borgström
Priority: normal Milestone:
Component: general Version:
Severity: normal Keywords:
Cc: nslater@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description

A text file encoded with UTF-8 but no BOM is displayed as ISO-8859-15 in the Trac browser.

Steps to reproduce:

  1. Create a text file and insert the Unicode Character 'RIGHT SINGLE QUOTATION MARK' (U+2019)
  2. Check into the repository.
  3. View the file in Trac.
  4. Observe mangled ISO-8859-15 output that might render like "’"

Tips for bug hunters:

  1. If the MIME-Type is not set by Trac the Mimeview.get_charset function calls detect_unicode
  2. detect_unicode checks for the following sequence of bytes: '\xef\xbb\xbf'
  3. UTF-8 is not required to have a BOM [1] and hence this check fails.

Possible fix:

Decode the string as 'UTF-8' as a last attempt. Check this doesn't throw an EncodingException.

[1] http://unicode.org/unicode/faq/utf_bom.html

Attachments (0)

Change History (2)

comment:1 by Noah Slater, 18 years ago

Cc: nslater@… added

Oops, I forgot to include my name/email.

comment:2 by Matthew Good, 18 years ago

Resolution: worksforme
Status: newclosed

The "default_charset" setting in trac.ini allows you to configure this, so you can change it to utf-8 instead if that's more suitable for your files.

For information on more sophisticated charset detection see #4080.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Jonas Borgström.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Jonas Borgström to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.