Edgewall Software

Opened 5 years ago

Closed 5 years ago

#12715 closed defect (fixed)

UnicodeEncodeError: 'ascii' codec can't encode characters in position 82-83: ordinal not in range(128)

Reported by: Ryan J Ollos Owned by: Ryan J Ollos
Priority: normal Milestone: plugin - spam-filter
Component: plugin/spamfilter Version:
Severity: normal Keywords:
Cc: Dirk Stöcker Branch:
Release Notes:

Fixed UnicodeEncodeError when submitting content to Akismet.

API Changes:
Internal Changes:


How to Reproduce

While doing a POST operation on /admin/spamfilter/monitor, Trac issued an internal error.

(please provide additional details here)

Request parameters:

{u'__FORM_TOKEN': u'08455ce73d6144f5f1320720',
 'cat_id': u'spamfilter',
 u'markspamdel': u'Delete selected as Spam',
 u'num': u'50',
 u'page': u'1',
 'panel_id': u'monitor',
 'path_info': None,
 u'sel': u'200783'}

User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36

System Information

Trac 1.3.2.dev0
Babel 2.3.4
dnspython 1.12.0
Docutils 0.12
Genshi 0.7 (with speedups)
GIT 2.1.4
Jinja2 2.9.5
Mercurial 3.1.2
mod_wsgi 4.5.13 (WSGIProcessGroup trac WSGIApplicationGroup %{GLOBAL})
Pillow 2.6.1
PostgreSQL server: 9.4.10, client: 9.4.10
psycopg2 2.5.4
Pygments 2.0.1
Python 2.7.9 (default, Jun 29 2016, 13:11:10)
[GCC 4.9.2]
pytz 2012c
setuptools 18.2
SpamBayes 1.1b1
Subversion 1.8.10 (r1615264)
jQuery 1.11.3
jQuery UI 1.11.4
jQuery Timepicker 1.5.5

Enabled Plugins

help-guide-version-notice N/A
milestone-to-version r15098
StatusFixer r6326
TracSpamFilter 1.3.0.dev0
TracVote 0.6.0.dev0
TracWikiExtras 1.3.1.dev0
TranslatedPagesMacro 0.5

Interface Customization

site-templates site.html, site_footer.html, site_head.html, site_header.html, site_leftbox.html

Python Traceback

Traceback (most recent call last):
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/trac/web/main.py", line 630, in _dispatch_request
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/trac/web/main.py", line 252, in dispatch
    resp = chosen_handler.process_request(req)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/trac/admin/web_ui.py", line 96, in process_request
    resp = provider.render_admin_panel(req, cat_id, panel_id, path_info)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/TracSpamFilter-1.3.0.dev0-py2.7.egg/tracspamfilter/admin.py", line 89, in render_admin_panel
    if self._process_monitoring_panel(req):
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/TracSpamFilter-1.3.0.dev0-py2.7.egg/tracspamfilter/admin.py", line 285, in _process_monitoring_panel
    filtersys.train(req, entries, spam=spam, delete=delete)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/TracSpamFilter-1.3.0.dev0-py2.7.egg/tracspamfilter/filtersystem.py", line 347, in train
    entry.content, entry.ipnr, spam=spam)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/TracSpamFilter-1.3.0.dev0-py2.7.egg/tracspamfilter/filters/akismet.py", line 87, in train
    self._post(url, req, author, content, ip)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/TracSpamFilter-1.3.0.dev0-py2.7.egg/tracspamfilter/filters/akismet.py", line 160, in _post
    urlreq = urllib2.Request(url, urlencode(params),
  File "/usr/lib/python2.7/urllib.py", line 1338, in urlencode
    v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 82-83: ordinal not in range(128)

Attachments (1)

t12715.diff (2.1 KB ) - added by Ryan J Ollos 5 years ago.

Download all attachments as: .zip

Change History (7)

comment:1 by Ryan J Ollos, 5 years ago

user_agent string is u'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html\xa3\xa9'.

>>> s = u'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html\xa3\xa9'
>>> s.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/rjollos/Documents/Workspace/trac-dev/trac-github-dev/pve/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 82-83: ordinal not in range(128)

comment:2 by Ryan J Ollos, 5 years ago

I haven't tested the change yet, but it looks like we just need to encode the User-Agent string to utf-8: plugins/1.2/spam-filter/tracspamfilter/filters/akismet.py@15254:149#L136.

  • tracspamfilter/filters/akismet.py

    146146        params = {
    147147            'blog': req.base_url,
    148148            'user_ip': ip,
    149             'user_agent': req.get_header('User-Agent'),
     149            'user_agent': req.get_header('User-Agent').encode('utf-8'),
    150150            'referrer': req.get_header('Referer') or 'unknown',
    151151            'comment_author': author_name,
    152152            'comment_type': 'trac',

Possibly for blogspam.py as well:

  • tracspamfilter/filters/blogspam.py

    123123            'ip': ip,
    124124            'name': author_name,
    125125            'comment': content.encode('utf-8'),
    126             'agent': req.get_header('User-Agent'),
     126            'agent': req.get_header('User-Agent').encode('utf-8'),
    127127            'site': req.base_url,
    128128            'version': user_agent
    129129        }

comment:3 by Jun Omae, 5 years ago

I think req.get_header() returns str but req.base_url is unicode.

I consider we could use unicode_urlencode in trac.util.text rather than adding .encode('utf-8') to each entry (untested):

  • tracspamfilter/filters/akismet.py

    2525from trac.config import IntOption, Option
    2626from trac.core import Component, implements
    2727from trac.mimeview.api import is_binary
     28from trac.util.text import unicode_urlencode
    2829from tracspamfilter.api import IFilterStrategy, N_
    155156            'referrer': req.get_header('Referer') or 'unknown',
    156157            'comment_author': author_name,
    157158            'comment_type': 'trac',
    158             'comment_content': content.encode('utf-8')
     159            'comment_content': content,
    159160        }
    160161        if author_email:
    161162            params['comment_author_email'] = author_email
    162163        for k, v in req.environ.items():
    163164            if k.startswith('HTTP_') and k not in self.noheaders:
    164                 params[k] = v.encode('utf-8')
    165         urlreq = urllib2.Request(url, urlencode(params),
     165                params[k] = v
     166        urlreq = urllib2.Request(url, unicode_urlencode(params),
    166167                                 {'User-Agent': self.user_agent})
    168169        resp = urllib2.urlopen(urlreq)

by Ryan J Ollos, 5 years ago

Attachment: t12715.diff added

comment:4 by Ryan J Ollos, 5 years ago

Thanks, I'm testing t12715.diff.

comment:5 by Ryan J Ollos, 5 years ago

Committed to 1.0 branch in r15643, merged to 1.2 branch in r15644, merged to trunk in r15645. Merged all pending changesets from trunk to jinja2 branch in r15646.

comment:6 by Ryan J Ollos, 5 years ago

Cc: Dirk Stöcker added
Release Notes: modified (diff)
Resolution: fixed
Status: assignedclosed

Modify Ticket

Change Properties
Set your email in Preferences
as closed The owner will remain Ryan J Ollos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Ryan J Ollos to the specified user.

Add Comment

E-mail address and name can be saved in the Preferences .
Note: See TracTickets for help on using tickets.