Edgewall Software
Modify

Opened 18 years ago

Closed 16 years ago

Last modified 16 years ago

#3908 closed defect (fixed)

Unicode errors

Reported by: anonymous Owned by: Jonas Borgström
Priority: high Milestone: 0.11.1
Component: general Version: 0.10
Severity: normal Keywords: unicode
Cc: darwinscusp@…, matt_tricks@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description (last modified by Christian Boos)

I just installed Trac 0.10 on linux (built from source) running with mod_python and python2.4. The 'TracTickets' link throws the following traceback:

Traceback (most recent call last):
  File "/usr/local/lib/python2.4/site-packages/trac/web/main.py", line 356, in dispatch_request
    dispatcher.dispatch(req)
  File "/usr/local/lib/python2.4/site-packages/trac/web/main.py", line 224, in dispatch
    resp = chosen_handler.process_request(req)
  File "/usr/local/lib/python2.4/site-packages/trac/wiki/web_ui.py", line 134, in process_request
    self._render_view(req, db, page)
  File "/usr/local/lib/python2.4/site-packages/trac/wiki/web_ui.py", line 446, in _render_view
    req.hdf['wiki'] = {
  File "/usr/local/lib/python2.4/site-packages/trac/wiki/formatter.py", line 999, in wiki_to_html
    Formatter(env, req, absurls, db).format(wikitext, out, escape_newlines)
  File "/usr/local/lib/python2.4/site-packages/trac/wiki/formatter.py", line 822, in format
    result = re.sub(self.wiki.rules, self.replace, line)
  File "/usr/local/lib/python2.4/sre.py", line 142, in sub
    return _compile(pattern, 0).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 4: ordinal not in range(128)

TracChangeset throws a similar traceback:

Traceback (most recent call last):
  File "/usr/local/lib/python2.4/site-packages/trac/web/main.py", line 356, in dispatch_request
    dispatcher.dispatch(req)
  File "/usr/local/lib/python2.4/site-packages/trac/web/main.py", line 224, in dispatch
    resp = chosen_handler.process_request(req)
  File "/usr/local/lib/python2.4/site-packages/trac/wiki/web_ui.py", line 134, in process_request
    self._render_view(req, db, page)
  File "/usr/local/lib/python2.4/site-packages/trac/wiki/web_ui.py", line 446, in _render_view
    req.hdf['wiki'] = {
  File "/usr/local/lib/python2.4/site-packages/trac/wiki/formatter.py", line 1000, in wiki_to_html
    return Markup(out.getvalue())
  File "/usr/local/lib/python2.4/StringIO.py", line 271, in getvalue
    self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 50: ordinal not in range(128)

Attachments (0)

Change History (23)

comment:1 by Christian Boos, 18 years ago

Component: changeset viewgeneral
Description: modified (diff)
Keywords: unicode added
Owner: changed from Christian Boos to Jonas Borgström
Priority: normalhigh

Do you mean, you can get into Trac (e.g. see the WikiStart page) and only when you click on such links (TracTickets or TracChangeset) you get these errors, or do you get them right away?

Also, what's the path to the TracEnvironment?

comment:2 by darwinscusp@…, 18 years ago

I can get into trac via WikiStart and probably 95% of the other pages just fine. However, when I click on the TracTickets and TracChangeset links the tracebacks listed above are thrown.

The path to the TracEnvironment is '/var/www/html/ooth.org/trac'.

comment:3 by Christian Boos, 18 years ago

Ok… both pages contain "Unicode only" characters (the “…”). That's for the common point between the 5% of problematic pages…

But normally, they should be stored as UTF-8 strings in the DB and Trac should see them as unicode objects, so I don't yet understand what's going on for you.

What DB + Python driver for it are you using (and charset info, if applicable)?

in reply to:  3 ; comment:4 by darwinscusp@…, 18 years ago

Ok, interesting. I'm using mysql with the MySQLDB python driver. I applied the patch from 3182 to correct some charset issues. DB uses latin1. Sounds like this might be due to difference in handling of '…' between Unicode and latin1?

in reply to:  4 comment:5 by darwinscusp@…, 18 years ago

Just for giggles, I converted my database to use utf8, and changed the database_charset entry (from 3182) to utf8. I still get the same error. Dunno if that makes any difference, but thought I'd document it anyway.

in reply to:  4 ; comment:6 by Christian Boos, 18 years ago

Milestone: 0.10.1

Replying to darwinscusp@orderofthehidden.org:

… in handling of '…' between Unicode and latin1?

Actually, the Unicode characters in questions were "“" and "”", (i.e. sorry for the confusion.

In a newly created utf8 db, I doubt you'll get the problem (that's what I've used, and it worked for me, unpatched). Nevertheless, we should also support latin1 based database in some nicely degraded way (using replacement characters, for example).

in reply to:  6 ; comment:7 by anonymous, 18 years ago

Replying to cboos:

In a newly created utf8 db, I doubt you'll get the problem (that's what I've used, and it worked for me, unpatched). Nevertheless, we should also support latin1 based database in some nicely degraded way (using replacement characters, for example).

Really? I just created a new utf8 DB, and am trying a fresh trac installation. Everything works fine until after the prompt for templates…it appears at this point the installation tries to begin, but I get the following error: "OperationalError: (1071, 'Specified key was too long; max key length is 1000 bytes')". This was the same error I received when I tried to convert my existing latin1 DB to utf8, which required that I shorten the length of the fields in some of the indexes. This appears to be a shortcoming of mysql specifically as shown here.

Ugh.

in reply to:  7 ; comment:8 by Christian Boos, 18 years ago

Replying to anonymous:

Replying to cboos:

In a newly created utf8 db, I doubt you'll get the problem (that's what I've used, and it worked for me, unpatched). …

Really? I just created a new utf8 DB, and am trying a fresh trac installation. Everything works fine until after the prompt for templates…it appears at this point the installation tries to begin, but I get the following error: "OperationalError: (1071, 'Specified key was too long; max key length is 1000 bytes')".

That's something else, already reported (#3676), and there's already a fix available.

From the link to mysql-bugs:4541 above, I quote:

… what you really ask for is to allow UNIQUE keys longer than 1000 bytes - and this is in our TODO.

And that was two years ago, so I guess we can't hope things will change on the MySQL level in a reasonably short time frame (i.e. before 0.10.1).

in reply to:  8 comment:9 by darwinscusp@…, 18 years ago

Ok, thanks. I've switched everything over to sqlite for the time being anyway. I'll try again when 0.10.1 is available. Thanks!

comment:10 by ronanod, 18 years ago

I have installed 0.10.1 and I appear to be getting the same problem.

comment:11 by jon@…, 18 years ago

The true problem may be the default database character setting of MySQL. If one user has the default char. set to be utf8, then it may work. However, the default database char. set for MySQL out of the box is latin1. Using that … it may fail.

0x93 maps to a Unicode control character, however, it maps to the curly quotes in windows (and maybe latin1?) On importing the data originally, it may 'translate' the curly quote to 0x93.

MySQL 4.1 and up support utf8 encodings. So, when initializing the environment, you should either create the database and set the character set to utf8, or once it's created, and before any tables are made, alter the database to use utf8. Then, all tables created after that should use utf8 by default for their columns, and importing the data would succeed.

You may have to deal with the index issue above … but that's a different problem.

See the Mysql Site:http://dev.mysql.com/doc/refman/4.1/en/charset-database.html

To note: I am using 0.10.3rc1, MySQL 5.1.12 on windows. I also got the same problem.

Jonathan A. Zylstra

in reply to:  11 comment:12 by Christian Boos, 18 years ago

Milestone: 0.10.4
Resolution: worksforme
Status: newclosed

Replying to jon@jzylstra.com:

The true problem may be the default database character setting of MySQL. If one user has the default char. set to be utf8, then it may work. However, the default database char. set for MySQL out of the box is latin1. Using that … it may fail.

Exactly, that's why we now require utf8 charset to be set for the Trac MySQL database, see MySqlDb for more (and also #3884).

comment:13 by anonymous, 17 years ago

Milestone: 0.11
Resolution: worksforme
Status: closedreopened

Hi guys,

I'm afraid I'm going to have to dig up the ghosts of the past as this issue is still very much alive for me. I've had a really solid look into this and I'm convinced there's still a problem for MySQL backend installs.

Basically I get much the same tracebacks as the original reporter. It's very easy to reproduce. One way is to run: trac-admin /project wiki upgrade

It also crashes out with the UnicodeDecodeError when trying to load a number of the standard TracGuide pages (eg. Help → Installation).

I'm running 0.11dev the absolute latest as of right now with the latest genshi. Python 2.4, mysql-python-1.2.2 and mod-python-3.3.1. I've got a fully utf8 MySQL DB as far as I know (every table is set to utf8 as the encoding).

I'l paste the error I get when trying to load the Installation wiki help page below. I don't really understand how this is a database issue at all, as it seems to me there's a number of places especially in formatter.py that unicode text is just flat out not handled (a couple of them are flagged with a # FixMe). If the wikidom variable is non-ascii how can it ever work? Given that the StringIO function in Python2.4 clearly states that it will not work with non 7-bit characters. How does it ever work for anyone? I'm sure I'm missing something key here but well, it is late and I've never looked at Python code before.

Cheers, Matt

— File "C:\Python24\lib\site-packages\trac-0.11dev_r5643-py2.4.egg\trac\wiki\templates\wiki_view.html", line 55, in <Expression u'wiki_to_html(context, page.text)'> ${wiki_to_html(context, page.text)} File "c:\python24\lib\site-packages\Trac-0.11dev_r5643-py2.4.egg\trac\wiki\formatter.py", line 1012, in format_to_html return HtmlFormatter(ctx, wikidom).generate(escape_newlines) File "c:\python24\lib\site-packages\Trac-0.11dev_r5643-py2.4.egg\trac\wiki\formatter.py", line 973, in generate Formatter(self.context).format(self.wikidom, out, escape_newlines) File "c:\python24\lib\site-packages\Trac-0.11dev_r5643-py2.4.egg\trac\wiki\formatter.py", line 784, in format result = re.sub(self.wikiparser.rules, self.replace, line) File "C:\Python24\lib\sre.py", line 142, in sub return _compile(pattern, 0).sub(repl, string, count)

comment:14 by anonymous, 17 years ago

Cc: matt_tricks@… added

comment:15 by Eleonore, 17 years ago

Hi,

I've got the same problem (the pages 'new ticket', 'road map' and some others won't display and an error occurs about the encoding). It happened after upgrading from trac 0.9 to trac 0.11 b2, on win 2003, and upgrading python 2.3 to 2.5 and apache 2.0 to 2.2. I also use subversion 1.4.6 and SQlite as a database. I had no such issues with trac 0.9… and moreover (but I don't know if it's linked) all the displayed pages are in raw html and don't have any layout.

Someone has any ideas about this?

Thanks,

Eleonore

comment:16 by Christian Boos, 17 years ago

Oh, it's SQLite - from your reference to this ticket I somewhat assumed you were using MySQL. Ok - which Genshi version are you using? If not already using the latest (0.4.4), can you upgrade?

comment:17 by Eleonore, 17 years ago

it's the latest version of genshi (0.4.4). And I've installed the apache binaries svn-python 1.4.6, mod_python 3.3.1, pysqlite 2.4.0,… and I think that's all.

comment:18 by Eleonore, 17 years ago

More info on my problem : I've made a new install of trac 0.11 on my computer (and not on the server, as I had done previously). I installed trac, genshi, python, but not apache nor svn. I created a new project and launched it with tracd : No Problem. Then I put my old database instead of the new one : and there, again I had an encoding problem in the pages "new ticket" and "roadmap". But no problem with the layout.

So I guess my encoding problem comes from my old database (created with trac 0.9 I think) and the layout problem comes maybe from apache, or subversion.

Any idea?

Thanks,

Eleonore

comment:19 by Christian Boos, 17 years ago

With your nearly working 0.11 installation, where you have only the encoding issue, you should be able to see the "rich" error page (if you're at least running trunk at r6828). Then, you will be able to look at the value of the "text" local variable, and from there you should be able to find out from where the problematic data comes from (a ticket, a wiki page, etc.)

There used to be encoding issues with the post-commit-hook, and if you used that in 0.9, it's quite likely that you could end up with non-UTF8 data in your database. Another way to find out where the problematic data is, would be to ".dump" the SQLite database and try to convert that using UTF8 in the Python command line:

$ sqlite3 trac.db .dump > trac.dump
$ python
>>> data = open("trac.dump", "rb").read()
>>> unicode(data, "utf-8")
UnicodeDecodeError: can't decode .... at character xyz

Then you have the offset in the file. Well, if you have more than a few places to fix manually, I agree that's going to be a bit tedious. I think it could make sense to write a little recovery tool.

comment:20 by Eleonore, 17 years ago

Well, after a re-installation of everything (trac, python, genshi) on our server and after creating an new project, without our old database, everything is ok (even the layout). I don't really need my old db so it's ok for me.

So thanks for your help! And congratulations for the installation of trac 0.11 on windows, which seems much simpler than the previous installations.

Eleonore

comment:21 by Jonas Borgström, 16 years ago

Milestone: 0.11.20.11.1
Resolution: fixed
Status: reopenedclosed

I think r7286 fixed the last remaining mysql unicode issue.

comment:22 by diroussel+trac@…, 16 years ago

No, I think there are still issues.

I installed 0.11 stable, and had it running on MySQL with the database created as UTF-8. But when a user pasted some text from an email into a ticket, then the ticket system got stuck:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 201: ordinal not in range(128)

Python Traceback

Most recent call last:

  • File "C:\apps\Python25\Lib\site-packages\trac\web\main.py", line 423, in _dispatch_request

Code fragment:

       418. try:
       419. if not env and env_error:
       420. raise HTTPInternalError(env_error)
       421. try:
       422. dispatcher = RequestDispatcher(env)
       423. dispatcher.dispatch(req)
       424. except RequestDone:
       425. pass
       426. resp = req._response or []
       427.  
       428. except HTTPException, e:

Local variables:

      Name	Value
      after 	[u' except RequestDone:', u' pass', u' resp = ...
      before 	[u' try:', u' if not env and env_error:', u' raise ...
      dispatcher 	<trac.web.main.RequestDispatcher object at 0x02C40F70>
      e 	UnicodeDecodeError('ascii', 'Request from Olga:\r\nCan you also check ...
      env 	<trac.env.Environment object at 0x0275F7F0>
      env_error 	None
      exc_info 	(<type 'exceptions.UnicodeDecodeError'>, UnicodeDecodeError('ascii', ...
      filename 	'C:\\apps\\Python25\\Lib\\site-packages\\trac\\web\\main.py'
      frames 	[{'function': '_dispatch_request', 'lines_before': [u' try:', u' ...
      has_admin 	True
      line 	u' dispatcher.dispatch(req)'
      lineno 	422
      message 	u"UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position ...
      req 	<Request "GET u'/report/1'">
      resp 	[]
      tb 	<traceback object at 0x04C57378>
      tb_hide 	None
      traceback 	'Traceback (most recent call last):\n File ...
  • File "C:\apps\Python25\Lib\site-packages\trac\web\main.py", line 197, in dispatch Code fragment:
           192. req.args.get('__FORM_TOKEN') != req.form_token:
           193. raise HTTPBadRequest('Missing or invalid form token. '
           194. 'Do you have cookies enabled?')
           195.  
           196. # Process the request and render the template
           197. resp = chosen_handler.process_request(req)
           198. if resp:
           199. if len(resp) == 2: # Clearsilver
           200. chrome.populate_hdf(req)
           201. template, content_type = \
           202. self._post_process_request(req, *resp)
    

Local variables:

      Name	Value
      chosen_handler 	<trac.ticket.report.ReportModule object at 0x02C40D90>
      chrome 	<trac.web.chrome.Chrome object at 0x02C40D10>
      err 	(<type 'exceptions.UnicodeDecodeError'>, UnicodeDecodeError('ascii', ...
      handler 	<trac.ticket.report.ReportModule object at 0x02C40D90>
      req 	<Request "GET u'/report/1'">
      self 	<trac.web.main.RequestDispatcher object at 0x02C40F70>
  • File "C:\apps\Python25\Lib\site-packages\trac\ticket\report.py", line 105, in process_request Code fragment:
           100. data = self._render_editor(req, db, id, action=='copy')
           101. elif action == 'delete':
           102. template = 'report_delete.html'
           103. data = self._render_confirm_delete(req, db, id)
           104. else:
           105. template, data, content_type = self._render_view(req, db, id)
           106. if content_type: # i.e. alternate format
           107. return template, data, content_type
           108.  
           109. if id != -1 or action == 'new':
           110. add_ctxtnav(req, _('Available Reports'), href=req.href.report())
    

Local variables:

      Name	Value
      action 	'view'
      data 	{}
      db 	<trac.db.pool.PooledConnection object at 0x049DC378>
      id 	1
      req 	<Request "GET u'/report/1'">
      self 	<trac.ticket.report.ReportModule object at 0x02C40D90>
  • File "C:\apps\Python25\Lib\site-packages\trac\ticket\report.py", line 409, in _render_view

Code fragment:

       404. realm = 'ticket'
       405. email_cells = []
       406. for header_group in header_groups:
       407. cell_group = []
       408. for header in header_group:
       409. value = unicode(result[col_idx])
       410. cell = {'value': value, 'header': header, 'index': col_idx}
       411. col = header['col']
       412. col_idx += 1
       413. # Detect and create new group
       414. if col == '__group__' and value != prev_group_value:

Local variables:

      Name	Value
      args 	{'USER': 'ut159n'}
      asc 	True
      cell 	{'header': {'asc': False, 'hidden': True, 'col': u'_changetime', 'title': ...
      cell_group 	[{'header': {'asc': False, 'hidden': True, 'col': u'__color__', 'title': ...
      cell_groups 	[]
      col 	u'changetime'
      col_idx 	11
      cols 	[u'__color__', u'ticket', u'summary', u'component', u'version', ...
      context 	<Context <Resource u'report:1'>>
      cursor 	<trac.db.util.IterableCursor object at 0x04C0EF10>
      data 	{'paginator': <trac.util.presentation.Paginator object at 0x04C0E430>, ...
      db 	<trac.db.pool.PooledConnection object at 0x049DC378>
      description 	'\n * List all active tickets by priority.\n * Color each row based on ...
      email_cells 	[{'header': {'asc': False, 'hidden': False, 'col': u'owner', 'title': ...
      emails 	u'John Monnington'
      fields 	['href', 'class', 'string', 'title']
      format 	None
      header 	{'asc': False, 'hidden': True, 'col': u'_description', 'title': ...
      header_group 	[{'asc': False, 'hidden': True, 'col': u'__color__', 'title': u'Color'}, ...
      header_groups 	[[{'asc': False, 'hidden': True, 'col': u'__color__', 'title': u'Color'}, ...
      id 	1
      idx 	12
      limit 	100
      line 	' ORDER BY CAST(p.value AS signed), milestone, t.type, time'
      num_items 	5L
      numrows 	5L
      offset 	0
      p 	['/trac/GCM-MDS/report/1?page=1', None, '1', 'Page 1']
      page 	1
      pagedata 	[['/trac/GCM-MDS/report/1?page=1', None, '1', 'Page 1']]
      paginator 	<trac.util.presentation.Paginator object at 0x04C0E430>
      prev_group_value 	None
      query 	"SELECT p.value AS __color__,id AS ticket, summary, component, version, ...
      realm 	'ticket'
      report_resource 	<Resource u'report:1'>
      req 	<Request "GET u'/report/1'">
      resource 	<Resource u'ticket:3'>
      result 	['3', 5L, 'Investigate handling of LCDXs in Waterfall', 'Credit Pricer', ...
      results 	[['2', 2L, 'New CLICs SAGASS3,4 and 5 will not save in CP Lite', 'Credit ...
      row 	{'cell_groups': [], '__idx__': 3, u'__color__': u'3', 'id': u'5'}
      row_group 	[{'cell_groups': [[{'header': {'asc': False, 'hidden': True, 'col': ...
      row_groups 	[(None, [{'cell_groups': [[{'header': {'asc': False, 'hidden': True, ...
      row_idx 	3
      self 	<trac.ticket.report.ReportModule object at 0x02C40D90>
      shown_pages 	[1]
      sort_col 	''
      sql 	"\nSELECT p.value AS __color__,\n id AS ticket, summary, component, ...
      title 	'{1} Active Tickets'
      user 	None
      value 	u'1219217708'

File "C:\apps\Python25\Lib\site-packages\trac\web\main.py", line 423, in _dispatch_request

dispatcher.dispatch(req)

File "C:\apps\Python25\Lib\site-packages\trac\web\main.py", line 197, in dispatch

resp = chosen_handler.process_request(req)

File "C:\apps\Python25\Lib\site-packages\trac\ticket\report.py", line 105, in process_request

template, data, content_type = self._render_view(req, db, id)

File "C:\apps\Python25\Lib\site-packages\trac\ticket\report.py", line 409, in _render_view

value = unicode(result[col_idx])

System Information:

User Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1
Trac: 	0.11
Python: 	2.5.2 (r252:60911, Mar 27 2008, 17:57:18) [MSC v.1310 32 bit (Intel)]
setuptools: 	0.6c7
MySQL: 	server: "5.1.26-rc-community", client: "5.0.27", thread-safe: 1
MySQLdb: 	1.2.2
Genshi: 	0.5
Pygments: 	0.10
jQuery:	1.2.3

comment:23 by Christian Boos, 16 years ago

See also #7959.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Jonas Borgström.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Jonas Borgström to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.