Edgewall Software
Modify

Opened 17 years ago

Closed 16 years ago

#5169 closed defect (wontfix)

UTF8 Conversion Error

Reported by: bodo.tasche@… Owned by: Jonas Borgström
Priority: normal Milestone:
Component: search system Version: 0.10.3.1
Severity: normal Keywords: needinfo
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description (last modified by Emmanuel Blot)

We updated our Trac to 0.10.3.1 and moved it to a new machine, but since then we have the following error if we search for some word like "version":

Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/trac/web/main.py", line 387, in dispatch_request
    dispatcher.dispatch(req)
  File "/usr/lib/python2.4/site-packages/trac/web/main.py", line 237, in dispatch
    resp = chosen_handler.process_request(req)
  File "/usr/lib/python2.4/site-packages/trac/Search.py", line 181, in process_request
    results += list(source.get_search_results(req, terms, filters))
  File "/usr/lib/python2.4/site-packages/trac/ticket/api.py", line 267, in get_search_results
    for summary, desc, author, keywords, tid, date, status in cursor:
  File "/usr/lib/python2.4/site-packages/trac/db/util.py", line 40, in __iter__
    row = self.cursor.fetchone()
  File "/usr/lib/python2.4/site-packages/trac/db/sqlite_backend.py", line 73, in fetchone
    return row and self._convert_row(row) or None
  File "/usr/lib/python2.4/site-packages/trac/db/sqlite_backend.py", line 69, in _convert_row
    return tuple([(isinstance(v, str) and [v.decode('utf-8')] or [v])[0]
  File "/usr/lib/python2.4/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa7 in position 174: unexpected code byte

Attachments (0)

Change History (6)

comment:1 by Emmanuel Blot, 17 years ago

Description: modified (diff)

It looks like the original DB contains non-UTF8 characters, you'll probably have to sanitize the DB and re-import it into Trac.

Have you tried to 'resync' the repository cache?

Which version did you upgrade from?

comment:2 by bodo.tasche@…, 17 years ago

We updated from 0.8.2 (I think, not sure atm).

I resynced the Database. It was advised here: http://trac.edgewall.org/wiki/TracUpgrade

HOw do I sanitize the DB?

in reply to:  2 comment:3 by Emmanuel Blot, 17 years ago

Replying to bodo.tasche@lexisnexis.de:

I resynced the Database. It was advised here: http://trac.edgewall.org/wiki/TracUpgrade

There are three different actions:

  1. mandatory: trac-admin upgrade to upgrade the database schema. A new version of Trac would not run without such an upgrade
  2. optional: trac-admin wiki upgrade to upgrade the default wiki pages. This is useful to get up-to-date documentation in your wiki pages
  3. optional: trac-admin resync to rebuild the content of the repository cache which is stored in the DB

I was refering to trac-admin resync: this would rebuild the cache, which may help fixing the issue if the non-UTF8 source is one of the SVN log message of your repository.

You need to find the source of the non-UTF8 characters: it could be in a log message, in a ticket, in a wiki page. One way to track down the issue is to use or create a user that is given permissions for only a subset of the Trac features, and search for a term: Trac only searches fom where a user has access. For ex. if you remove the CHANGESET_VIEW permission for a user, Trac won't search in the SVN log messages.

How do I sanitize the DB?

I'm afraid you'll have to dump the SQLite DB in a file, seach for non-UTF8 character, replace them with their UTF-8 counterpart and reload the SQLite DB

comment:4 by sid, 17 years ago

Keywords: needinfo added

Did eblot's suggestions help you resolve the problem?

comment:5 by Jeffrey Hulten <jeffh@…>, 17 years ago

One thing you can try is to use wget to crawl your site and then look for which file contains a traceback. I imported a bunch of information from CVSTrac when we migrated and found that pasted emails and word doc content were problematic because they were encoded with the standard Windows 1252 code page. 0xA7 is the section sign (like interlinked S characters). You might see what you can find that way.

Did you import data into Trac?

comment:6 by Christian Boos, 16 years ago

Resolution: wontfix
Status: newclosed

Probably a problem with data imported into Trac. See above for the recovery procedure.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Jonas Borgström.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Jonas Borgström to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.