#3908 closed defect (fixed)
Unicode errors
Reported by: | anonymous | Owned by: | Jonas Borgström |
---|---|---|---|
Priority: | high | Milestone: | 0.11.1 |
Component: | general | Version: | 0.10 |
Severity: | normal | Keywords: | unicode |
Cc: | darwinscusp@…, matt_tricks@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description (last modified by )
I just installed Trac 0.10 on linux (built from source) running with mod_python and python2.4. The 'TracTickets' link throws the following traceback:
Traceback (most recent call last): File "/usr/local/lib/python2.4/site-packages/trac/web/main.py", line 356, in dispatch_request dispatcher.dispatch(req) File "/usr/local/lib/python2.4/site-packages/trac/web/main.py", line 224, in dispatch resp = chosen_handler.process_request(req) File "/usr/local/lib/python2.4/site-packages/trac/wiki/web_ui.py", line 134, in process_request self._render_view(req, db, page) File "/usr/local/lib/python2.4/site-packages/trac/wiki/web_ui.py", line 446, in _render_view req.hdf['wiki'] = { File "/usr/local/lib/python2.4/site-packages/trac/wiki/formatter.py", line 999, in wiki_to_html Formatter(env, req, absurls, db).format(wikitext, out, escape_newlines) File "/usr/local/lib/python2.4/site-packages/trac/wiki/formatter.py", line 822, in format result = re.sub(self.wiki.rules, self.replace, line) File "/usr/local/lib/python2.4/sre.py", line 142, in sub return _compile(pattern, 0).sub(repl, string, count) UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 4: ordinal not in range(128)
TracChangeset throws a similar traceback:
Traceback (most recent call last): File "/usr/local/lib/python2.4/site-packages/trac/web/main.py", line 356, in dispatch_request dispatcher.dispatch(req) File "/usr/local/lib/python2.4/site-packages/trac/web/main.py", line 224, in dispatch resp = chosen_handler.process_request(req) File "/usr/local/lib/python2.4/site-packages/trac/wiki/web_ui.py", line 134, in process_request self._render_view(req, db, page) File "/usr/local/lib/python2.4/site-packages/trac/wiki/web_ui.py", line 446, in _render_view req.hdf['wiki'] = { File "/usr/local/lib/python2.4/site-packages/trac/wiki/formatter.py", line 1000, in wiki_to_html return Markup(out.getvalue()) File "/usr/local/lib/python2.4/StringIO.py", line 271, in getvalue self.buf += ''.join(self.buflist) UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 50: ordinal not in range(128)
Attachments (0)
Change History (23)
comment:1 by , 18 years ago
Component: | changeset view → general |
---|---|
Description: | modified (diff) |
Keywords: | unicode added |
Owner: | changed from | to
Priority: | normal → high |
comment:2 by , 18 years ago
I can get into trac via WikiStart and probably 95% of the other pages just fine. However, when I click on the TracTickets and TracChangeset links the tracebacks listed above are thrown.
The path to the TracEnvironment is '/var/www/html/ooth.org/trac'.
follow-up: 4 comment:3 by , 18 years ago
Ok… both pages contain "Unicode only" characters (the “…”). That's for the common point between the 5% of problematic pages…
But normally, they should be stored as UTF-8 strings in the DB and Trac should see them as unicode
objects, so I don't yet understand what's going on for you.
What DB + Python driver for it are you using (and charset info, if applicable)?
follow-ups: 5 6 comment:4 by , 18 years ago
Ok, interesting. I'm using mysql with the MySQLDB python driver. I applied the patch from 3182 to correct some charset issues. DB uses latin1. Sounds like this might be due to difference in handling of '…' between Unicode and latin1?
comment:5 by , 18 years ago
Just for giggles, I converted my database to use utf8, and changed the database_charset entry (from 3182) to utf8. I still get the same error. Dunno if that makes any difference, but thought I'd document it anyway.
follow-up: 7 comment:6 by , 18 years ago
Milestone: | → 0.10.1 |
---|
Replying to darwinscusp@orderofthehidden.org:
… in handling of '…' between Unicode and latin1?
Actually, the Unicode characters in questions were "“" and "”", (i.e. sorry for the confusion.
In a newly created utf8 db, I doubt you'll get the problem (that's what I've used, and it worked for me, unpatched). Nevertheless, we should also support latin1 based database in some nicely degraded way (using replacement characters, for example).
follow-up: 8 comment:7 by , 18 years ago
Replying to cboos:
In a newly created utf8 db, I doubt you'll get the problem (that's what I've used, and it worked for me, unpatched). Nevertheless, we should also support latin1 based database in some nicely degraded way (using replacement characters, for example).
Really? I just created a new utf8 DB, and am trying a fresh trac installation. Everything works fine until after the prompt for templates…it appears at this point the installation tries to begin, but I get the following error: "OperationalError: (1071, 'Specified key was too long; max key length is 1000 bytes')". This was the same error I received when I tried to convert my existing latin1 DB to utf8, which required that I shorten the length of the fields in some of the indexes. This appears to be a shortcoming of mysql specifically as shown here.
Ugh.
follow-up: 9 comment:8 by , 18 years ago
Replying to anonymous:
Replying to cboos:
In a newly created utf8 db, I doubt you'll get the problem (that's what I've used, and it worked for me, unpatched). …
Really? I just created a new utf8 DB, and am trying a fresh trac installation. Everything works fine until after the prompt for templates…it appears at this point the installation tries to begin, but I get the following error: "OperationalError: (1071, 'Specified key was too long; max key length is 1000 bytes')".
That's something else, already reported (#3676), and there's already a fix available.
From the link to mysql-bugs:4541 above, I quote:
… what you really ask for is to allow UNIQUE keys longer than 1000 bytes - and this is in our TODO.
And that was two years ago, so I guess we can't hope things will change on the MySQL level in a reasonably short time frame (i.e. before 0.10.1).
comment:9 by , 18 years ago
Ok, thanks. I've switched everything over to sqlite for the time being anyway. I'll try again when 0.10.1 is available. Thanks!
follow-up: 12 comment:11 by , 18 years ago
The true problem may be the default database character setting of MySQL. If one user has the default char. set to be utf8, then it may work. However, the default database char. set for MySQL out of the box is latin1. Using that … it may fail.
0x93 maps to a Unicode control character, however, it maps to the curly quotes in windows (and maybe latin1?) On importing the data originally, it may 'translate' the curly quote to 0x93.
MySQL 4.1 and up support utf8 encodings. So, when initializing the environment, you should either create the database and set the character set to utf8, or once it's created, and before any tables are made, alter the database to use utf8. Then, all tables created after that should use utf8 by default for their columns, and importing the data would succeed.
You may have to deal with the index issue above … but that's a different problem.
See the Mysql Site:http://dev.mysql.com/doc/refman/4.1/en/charset-database.html
To note: I am using 0.10.3rc1, MySQL 5.1.12 on windows. I also got the same problem.
Jonathan A. Zylstra
comment:12 by , 18 years ago
Milestone: | 0.10.4 |
---|---|
Resolution: | → worksforme |
Status: | new → closed |
Replying to jon@jzylstra.com:
The true problem may be the default database character setting of MySQL. If one user has the default char. set to be utf8, then it may work. However, the default database char. set for MySQL out of the box is latin1. Using that … it may fail.
Exactly, that's why we now require utf8 charset to be set for the Trac MySQL database, see MySqlDb for more (and also #3884).
comment:13 by , 17 years ago
Milestone: | → 0.11 |
---|---|
Resolution: | worksforme |
Status: | closed → reopened |
Hi guys,
I'm afraid I'm going to have to dig up the ghosts of the past as this issue is still very much alive for me. I've had a really solid look into this and I'm convinced there's still a problem for MySQL backend installs.
Basically I get much the same tracebacks as the original reporter. It's very easy to reproduce. One way is to run: trac-admin /project wiki upgrade
It also crashes out with the UnicodeDecodeError when trying to load a number of the standard TracGuide pages (eg. Help → Installation).
I'm running 0.11dev the absolute latest as of right now with the latest genshi. Python 2.4, mysql-python-1.2.2 and mod-python-3.3.1. I've got a fully utf8 MySQL DB as far as I know (every table is set to utf8 as the encoding).
I'l paste the error I get when trying to load the Installation wiki help page below. I don't really understand how this is a database issue at all, as it seems to me there's a number of places especially in formatter.py that unicode text is just flat out not handled (a couple of them are flagged with a # FixMe). If the wikidom variable is non-ascii how can it ever work? Given that the StringIO function in Python2.4 clearly states that it will not work with non 7-bit characters. How does it ever work for anyone? I'm sure I'm missing something key here but well, it is late and I've never looked at Python code before.
Cheers, Matt
— File "C:\Python24\lib\site-packages\trac-0.11dev_r5643-py2.4.egg\trac\wiki\templates\wiki_view.html", line 55, in <Expression u'wiki_to_html(context, page.text)'> ${wiki_to_html(context, page.text)} File "c:\python24\lib\site-packages\Trac-0.11dev_r5643-py2.4.egg\trac\wiki\formatter.py", line 1012, in format_to_html return HtmlFormatter(ctx, wikidom).generate(escape_newlines) File "c:\python24\lib\site-packages\Trac-0.11dev_r5643-py2.4.egg\trac\wiki\formatter.py", line 973, in generate Formatter(self.context).format(self.wikidom, out, escape_newlines) File "c:\python24\lib\site-packages\Trac-0.11dev_r5643-py2.4.egg\trac\wiki\formatter.py", line 784, in format result = re.sub(self.wikiparser.rules, self.replace, line) File "C:\Python24\lib\sre.py", line 142, in sub return _compile(pattern, 0).sub(repl, string, count)
comment:14 by , 17 years ago
Cc: | added |
---|
comment:15 by , 17 years ago
Hi,
I've got the same problem (the pages 'new ticket', 'road map' and some others won't display and an error occurs about the encoding). It happened after upgrading from trac 0.9 to trac 0.11 b2, on win 2003, and upgrading python 2.3 to 2.5 and apache 2.0 to 2.2. I also use subversion 1.4.6 and SQlite as a database. I had no such issues with trac 0.9… and moreover (but I don't know if it's linked) all the displayed pages are in raw html and don't have any layout.
Someone has any ideas about this?
Thanks,
Eleonore
comment:16 by , 17 years ago
Oh, it's SQLite - from your reference to this ticket I somewhat assumed you were using MySQL. Ok - which Genshi version are you using? If not already using the latest (0.4.4), can you upgrade?
comment:17 by , 17 years ago
it's the latest version of genshi (0.4.4). And I've installed the apache binaries svn-python 1.4.6, mod_python 3.3.1, pysqlite 2.4.0,… and I think that's all.
comment:18 by , 17 years ago
More info on my problem : I've made a new install of trac 0.11 on my computer (and not on the server, as I had done previously). I installed trac, genshi, python, but not apache nor svn. I created a new project and launched it with tracd : No Problem. Then I put my old database instead of the new one : and there, again I had an encoding problem in the pages "new ticket" and "roadmap". But no problem with the layout.
So I guess my encoding problem comes from my old database (created with trac 0.9 I think) and the layout problem comes maybe from apache, or subversion.
Any idea?
Thanks,
Eleonore
comment:19 by , 17 years ago
With your nearly working 0.11 installation, where you have only the encoding issue, you should be able to see the "rich" error page (if you're at least running trunk at r6828). Then, you will be able to look at the value of the "text" local variable, and from there you should be able to find out from where the problematic data comes from (a ticket, a wiki page, etc.)
There used to be encoding issues with the post-commit-hook, and if you used that in 0.9, it's quite likely that you could end up with non-UTF8 data in your database. Another way to find out where the problematic data is, would be to ".dump" the SQLite database and try to convert that using UTF8 in the Python command line:
$ sqlite3 trac.db .dump > trac.dump $ python >>> data = open("trac.dump", "rb").read() >>> unicode(data, "utf-8") UnicodeDecodeError: can't decode .... at character xyz
Then you have the offset in the file. Well, if you have more than a few places to fix manually, I agree that's going to be a bit tedious. I think it could make sense to write a little recovery tool.
comment:20 by , 17 years ago
Well, after a re-installation of everything (trac, python, genshi) on our server and after creating an new project, without our old database, everything is ok (even the layout). I don't really need my old db so it's ok for me.
So thanks for your help! And congratulations for the installation of trac 0.11 on windows, which seems much simpler than the previous installations.
Eleonore
comment:21 by , 16 years ago
Milestone: | 0.11.2 → 0.11.1 |
---|---|
Resolution: | → fixed |
Status: | reopened → closed |
I think r7286 fixed the last remaining mysql unicode issue.
comment:22 by , 16 years ago
No, I think there are still issues.
I installed 0.11 stable, and had it running on MySQL with the database created as UTF-8. But when a user pasted some text from an email into a ticket, then the ticket system got stuck:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 201: ordinal not in range(128)
Python Traceback
Most recent call last:
- File "C:\apps\Python25\Lib\site-packages\trac\web\main.py", line 423, in _dispatch_request
Code fragment:
418. try: 419. if not env and env_error: 420. raise HTTPInternalError(env_error) 421. try: 422. dispatcher = RequestDispatcher(env) 423. dispatcher.dispatch(req) 424. except RequestDone: 425. pass 426. resp = req._response or [] 427. 428. except HTTPException, e:
Local variables:
Name Value after [u' except RequestDone:', u' pass', u' resp = ... before [u' try:', u' if not env and env_error:', u' raise ... dispatcher <trac.web.main.RequestDispatcher object at 0x02C40F70> e UnicodeDecodeError('ascii', 'Request from Olga:\r\nCan you also check ... env <trac.env.Environment object at 0x0275F7F0> env_error None exc_info (<type 'exceptions.UnicodeDecodeError'>, UnicodeDecodeError('ascii', ... filename 'C:\\apps\\Python25\\Lib\\site-packages\\trac\\web\\main.py' frames [{'function': '_dispatch_request', 'lines_before': [u' try:', u' ... has_admin True line u' dispatcher.dispatch(req)' lineno 422 message u"UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position ... req <Request "GET u'/report/1'"> resp [] tb <traceback object at 0x04C57378> tb_hide None traceback 'Traceback (most recent call last):\n File ...
- File "C:\apps\Python25\Lib\site-packages\trac\web\main.py", line 197, in dispatch
Code fragment:
192. req.args.get('__FORM_TOKEN') != req.form_token: 193. raise HTTPBadRequest('Missing or invalid form token. ' 194. 'Do you have cookies enabled?') 195. 196. # Process the request and render the template 197. resp = chosen_handler.process_request(req) 198. if resp: 199. if len(resp) == 2: # Clearsilver 200. chrome.populate_hdf(req) 201. template, content_type = \ 202. self._post_process_request(req, *resp)
Local variables:
Name Value chosen_handler <trac.ticket.report.ReportModule object at 0x02C40D90> chrome <trac.web.chrome.Chrome object at 0x02C40D10> err (<type 'exceptions.UnicodeDecodeError'>, UnicodeDecodeError('ascii', ... handler <trac.ticket.report.ReportModule object at 0x02C40D90> req <Request "GET u'/report/1'"> self <trac.web.main.RequestDispatcher object at 0x02C40F70>
- File "C:\apps\Python25\Lib\site-packages\trac\ticket\report.py", line 105, in process_request
Code fragment:
100. data = self._render_editor(req, db, id, action=='copy') 101. elif action == 'delete': 102. template = 'report_delete.html' 103. data = self._render_confirm_delete(req, db, id) 104. else: 105. template, data, content_type = self._render_view(req, db, id) 106. if content_type: # i.e. alternate format 107. return template, data, content_type 108. 109. if id != -1 or action == 'new': 110. add_ctxtnav(req, _('Available Reports'), href=req.href.report())
Local variables:
Name Value action 'view' data {} db <trac.db.pool.PooledConnection object at 0x049DC378> id 1 req <Request "GET u'/report/1'"> self <trac.ticket.report.ReportModule object at 0x02C40D90>
- File "C:\apps\Python25\Lib\site-packages\trac\ticket\report.py", line 409, in _render_view
Code fragment:
404. realm = 'ticket' 405. email_cells = [] 406. for header_group in header_groups: 407. cell_group = [] 408. for header in header_group: 409. value = unicode(result[col_idx]) 410. cell = {'value': value, 'header': header, 'index': col_idx} 411. col = header['col'] 412. col_idx += 1 413. # Detect and create new group 414. if col == '__group__' and value != prev_group_value:
Local variables:
Name Value args {'USER': 'ut159n'} asc True cell {'header': {'asc': False, 'hidden': True, 'col': u'_changetime', 'title': ... cell_group [{'header': {'asc': False, 'hidden': True, 'col': u'__color__', 'title': ... cell_groups [] col u'changetime' col_idx 11 cols [u'__color__', u'ticket', u'summary', u'component', u'version', ... context <Context <Resource u'report:1'>> cursor <trac.db.util.IterableCursor object at 0x04C0EF10> data {'paginator': <trac.util.presentation.Paginator object at 0x04C0E430>, ... db <trac.db.pool.PooledConnection object at 0x049DC378> description '\n * List all active tickets by priority.\n * Color each row based on ... email_cells [{'header': {'asc': False, 'hidden': False, 'col': u'owner', 'title': ... emails u'John Monnington' fields ['href', 'class', 'string', 'title'] format None header {'asc': False, 'hidden': True, 'col': u'_description', 'title': ... header_group [{'asc': False, 'hidden': True, 'col': u'__color__', 'title': u'Color'}, ... header_groups [[{'asc': False, 'hidden': True, 'col': u'__color__', 'title': u'Color'}, ... id 1 idx 12 limit 100 line ' ORDER BY CAST(p.value AS signed), milestone, t.type, time' num_items 5L numrows 5L offset 0 p ['/trac/GCM-MDS/report/1?page=1', None, '1', 'Page 1'] page 1 pagedata [['/trac/GCM-MDS/report/1?page=1', None, '1', 'Page 1']] paginator <trac.util.presentation.Paginator object at 0x04C0E430> prev_group_value None query "SELECT p.value AS __color__,id AS ticket, summary, component, version, ... realm 'ticket' report_resource <Resource u'report:1'> req <Request "GET u'/report/1'"> resource <Resource u'ticket:3'> result ['3', 5L, 'Investigate handling of LCDXs in Waterfall', 'Credit Pricer', ... results [['2', 2L, 'New CLICs SAGASS3,4 and 5 will not save in CP Lite', 'Credit ... row {'cell_groups': [], '__idx__': 3, u'__color__': u'3', 'id': u'5'} row_group [{'cell_groups': [[{'header': {'asc': False, 'hidden': True, 'col': ... row_groups [(None, [{'cell_groups': [[{'header': {'asc': False, 'hidden': True, ... row_idx 3 self <trac.ticket.report.ReportModule object at 0x02C40D90> shown_pages [1] sort_col '' sql "\nSELECT p.value AS __color__,\n id AS ticket, summary, component, ... title '{1} Active Tickets' user None value u'1219217708'
File "C:\apps\Python25\Lib\site-packages\trac\web\main.py", line 423, in _dispatch_request
dispatcher.dispatch(req)
File "C:\apps\Python25\Lib\site-packages\trac\web\main.py", line 197, in dispatch
resp = chosen_handler.process_request(req)
File "C:\apps\Python25\Lib\site-packages\trac\ticket\report.py", line 105, in process_request
template, data, content_type = self._render_view(req, db, id)
File "C:\apps\Python25\Lib\site-packages\trac\ticket\report.py", line 409, in _render_view
value = unicode(result[col_idx])
System Information:
User Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1 Trac: 0.11 Python: 2.5.2 (r252:60911, Mar 27 2008, 17:57:18) [MSC v.1310 32 bit (Intel)] setuptools: 0.6c7 MySQL: server: "5.1.26-rc-community", client: "5.0.27", thread-safe: 1 MySQLdb: 1.2.2 Genshi: 0.5 Pygments: 0.10 jQuery: 1.2.3
Do you mean, you can get into Trac (e.g. see the WikiStart page) and only when you click on such links (
TracTickets
orTracChangeset
) you get these errors, or do you get them right away?Also, what's the path to the TracEnvironment?