Opened 19 years ago
Closed 18 years ago
#5169 closed defect (wontfix)
UTF8 Conversion Error
| Reported by: | Owned by: | Jonas Borgström | |
|---|---|---|---|
| Priority: | normal | Milestone: | |
| Component: | search system | Version: | 0.10.3.1 | 
| Severity: | normal | Keywords: | needinfo | 
| Cc: | Branch: | ||
| Release Notes: | |||
| API Changes: | |||
| Internal Changes: | |||
Description (last modified by )
We updated our Trac to 0.10.3.1 and moved it to a new machine, but since then we have the following error if we search for some word like "version":
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/trac/web/main.py", line 387, in dispatch_request
    dispatcher.dispatch(req)
  File "/usr/lib/python2.4/site-packages/trac/web/main.py", line 237, in dispatch
    resp = chosen_handler.process_request(req)
  File "/usr/lib/python2.4/site-packages/trac/Search.py", line 181, in process_request
    results += list(source.get_search_results(req, terms, filters))
  File "/usr/lib/python2.4/site-packages/trac/ticket/api.py", line 267, in get_search_results
    for summary, desc, author, keywords, tid, date, status in cursor:
  File "/usr/lib/python2.4/site-packages/trac/db/util.py", line 40, in __iter__
    row = self.cursor.fetchone()
  File "/usr/lib/python2.4/site-packages/trac/db/sqlite_backend.py", line 73, in fetchone
    return row and self._convert_row(row) or None
  File "/usr/lib/python2.4/site-packages/trac/db/sqlite_backend.py", line 69, in _convert_row
    return tuple([(isinstance(v, str) and [v.decode('utf-8')] or [v])[0]
  File "/usr/lib/python2.4/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa7 in position 174: unexpected code byte
      Attachments (0)
Change History (6)
comment:1 by , 19 years ago
| Description: | modified (diff) | 
|---|
follow-up: 3 comment:2 by , 19 years ago
We updated from 0.8.2 (I think, not sure atm).
I resynced the Database. It was advised here: http://trac.edgewall.org/wiki/TracUpgrade
HOw do I sanitize the DB?
comment:3 by , 19 years ago
Replying to bodo.tasche@lexisnexis.de:
I resynced the Database. It was advised here: http://trac.edgewall.org/wiki/TracUpgrade
There are three different actions:
- mandatory: 
trac-admin upgradeto upgrade the database schema. A new version of Trac would not run without such an upgrade - optional: 
trac-admin wiki upgradeto upgrade the default wiki pages. This is useful to get up-to-date documentation in your wiki pages - optional: 
trac-admin resyncto rebuild the content of the repository cache which is stored in the DB 
I was refering to trac-admin resync: this would rebuild the cache, which may help fixing the issue if the non-UTF8 source is one of the SVN log message of your repository.
You need to find the source of the non-UTF8 characters: it could be in a log message, in a ticket, in a wiki page. One way to track down the issue is to use or create a user that is given permissions for only a subset of the Trac features, and search for a term: Trac only searches fom where a user has access. For ex. if you remove the CHANGESET_VIEW permission for a user, Trac won't search in the SVN log messages.
How do I sanitize the DB?
I'm afraid you'll have to dump the SQLite DB in a file, seach for non-UTF8 character, replace them with their UTF-8 counterpart and reload the SQLite DB
comment:4 by , 18 years ago
| Keywords: | needinfo added | 
|---|
Did eblot's suggestions help you resolve the problem?
comment:5 by , 18 years ago
One thing you can try is to use wget to crawl your site and then look for which file contains a traceback. I imported a bunch of information from CVSTrac when we migrated and found that pasted emails and word doc content were problematic because they were encoded with the standard Windows 1252 code page. 0xA7 is the section sign (like interlinked S characters). You might see what you can find that way.
Did you import data into Trac?
comment:6 by , 18 years ago
| Resolution: | → wontfix | 
|---|---|
| Status: | new → closed | 
Probably a problem with data imported into Trac. See above for the recovery procedure.



  
It looks like the original DB contains non-UTF8 characters, you'll probably have to sanitize the DB and re-import it into Trac.
Have you tried to 'resync' the repository cache?
Which version did you upgrade from?