Edgewall Software
Modify

Opened 5 years ago

Closed 4 years ago

#13200 closed defect (fixed)

ValueError: A string literal cannot contain NUL (0x00) characters.

Reported by: Ryan J Ollos Owned by: Ryan J Ollos
Priority: normal Milestone: plugin - spam-filter
Component: plugin/spamfilter Version: 1.4
Severity: normal Keywords:
Cc: Branch:
Release Notes:

Replace nulls with spaces before training data with Bayesian strategy.

API Changes:
Internal Changes:

Description (last modified by Ryan J Ollos)

How to Reproduce

While doing a POST operation on /admin/spamfilter/monitor, Trac issued an internal error.

Spam entry that causes the issue:

Request parameters:

{u'__FORM_TOKEN': u'3fe406664047354be332d4a9',
 'cat_id': u'spamfilter',
 u'markspamdel': u'Delete selected as Spam',
 u'num': u'50',
 u'page': u'1',
 'panel_id': u'monitor',
 'path_info': None,
 u'sel': u'232668',
 u'toggle_group': u'on'}

User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36

System Information

Trac 1.4
Babel 2.7.0
dnspython 1.15.0
Docutils 0.14
Genshi 0.7.1 (with speedups)
GIT 2.11.0
Jinja2 2.10.1
Mercurial 4.8.2
mod_wsgi 4.5.13 (WSGIProcessGroup trac WSGIApplicationGroup %{GLOBAL})
Pillow 6.1.0
PostgreSQL server: 9.6.15, client: 9.6.15
psycopg2 2.8.3
Pygments 2.3.1
Python 2.7.13 (default, Sep 26 2018, 18:42:22)
[GCC 6.3.0 20170516]
pytz 2018.9
setuptools 41.0.1
SpamBayes 1.1b3
Subversion 1.9.5 (r1770682)
jQuery 1.12.4
jQuery UI 1.12.1
jQuery Timepicker 1.6.3

Enabled Plugins

conditional-clear-milestone-operation N/A
help-guide-version-notice N/A
milestone-to-version r15098
StatusFixer r6326
trac-releases N/A
TracMercurial 1.0.0.9.dev0
TracSpamFilter 1.3.0.dev0
TracVote 0.7.0.dev0
TracWikiExtras 1.3.1.dev0
TranslatedPages 1.1.0

Interface Customization

shared-htdocs
shared-templates
site-htdocs
site-templates site.html, site_footer.html, site_head.html, site_header.html, site_leftbox.html

Python Traceback

Traceback (most recent call last):
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/trac/web/main.py", line 639, in dispatch_request
    dispatcher.dispatch(req)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/trac/web/main.py", line 250, in dispatch
    resp = chosen_handler.process_request(req)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/trac/admin/web_ui.py", line 103, in process_request
    resp = provider.render_admin_panel(req, cat_id, panel_id, path_info)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/tracspamfilter/admin.py", line 87, in render_admin_panel
    if self._process_monitoring_panel(req):
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/tracspamfilter/admin.py", line 288, in _process_monitoring_panel
    filtersys.train(req, entries, spam=spam, delete=delete)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/tracspamfilter/filtersystem.py", line 393, in train
    spam=spam)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/tracspamfilter/filters/bayes.py", line 90, in train
    hammie.train(testcontent.encode('utf-8', 'ignore'), spam)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/spambayes/hammie.py", line 164, in train
    self.bayes.learn(tokenize(msg), is_spam)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/spambayes/classifier.py", line 252, in learn
    self._add_msg(wordstream, is_spam)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/spambayes/classifier.py", line 354, in _add_msg
    record = self._wordinfoget(word)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/tracspamfilter/filters/bayes.py", line 211, in _wordinfoget
    row = self._get_row(word)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/tracspamfilter/filters/bayes.py", line 168, in _get_row
    """, (word,)):
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/trac/db/api.py", line 50, in execute
    return db.execute(query, params)
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/trac/db/util.py", line 129, in execute
    cursor.execute(query, params if params is not None else [])
  File "/usr/local/virtualenv/1.3dev/lib/python2.7/site-packages/trac/db/util.py", line 73, in execute
    return self.cursor.execute(sql_escape_percent(sql), args)
ValueError: A string literal cannot contain NUL (0x00) characters.

Attachments (1)

Screen Shot 2019-09-01 at 18.39.41.jpg (33.5 KB ) - added by Ryan J Ollos 5 years ago.

Download all attachments as: .zip

Change History (9)

by Ryan J Ollos, 5 years ago

comment:1 by Ryan J Ollos, 5 years ago

Description: modified (diff)

comment:2 by Ryan J Ollos, 5 years ago

Owner: changed from Dirk Stöcker to Ryan J Ollos
Status: newassigned

comment:3 by Ryan J Ollos, 5 years ago

The content is \x7fwg\x03\xc3\xbf\x00, so maybe we just need to remove null characters from the string.

  • tracspamfilter/filters/bayes.py

     
    156156            self.nspam = self.nham = 0
    157157
    158158    def _sanitize(self, text):
     159        text = text.replace('\x00', '')  # strip null characters
    159160        if isinstance(text, unicode):
    160161            return text
    161162        # Remove invalid byte sequences from utf-8 encoded text

comment:4 by Jun Omae, 5 years ago

Additional work around is to replace nulls to spaces before passing hammie.train().

  • tracspamfilter/filters/bayes.py

     
    7979                    ("%3.2f" % (score * 100)))
    8080
    8181    def train(self, req, author, content, ip, spam=True):
     82        # Split tokens at null characters by tokenizer in spambayes.hammie
     83        content = content.replace('\x00', ' ')
    8284        if author is not None:
    8385            testcontent = author + '\n' + content
    8486        else:

comment:5 by Ryan J Ollos, 5 years ago

Okay, I'll apply comment:4 instead if you think it's better to strip out the nulls further up the call stack. Or perhaps even in FilterSystem.train?

in reply to:  5 comment:6 by Jun Omae, 5 years ago

Replying to Ryan J Ollos:

Okay, I'll apply comment:4 instead if you think it's better to strip out the nulls further up the call stack.

Yes. Removing null characters would create another long word, e.g. aaa\x00bbbaaabbb. I think it would be a little nice to avoid it.

Or perhaps even in FilterSystem.train?

I think that is not good. If a filter would use such null characters to detect spam, the filter will stop the detecting.

comment:7 by Jun Omae, 4 years ago

#13218 was closed as a duplicate.

comment:8 by Ryan J Ollos, 4 years ago

Release Notes: modified (diff)
Resolution: fixed
Status: assignedclosed

Fixed in on 1.4-stable in r17267, merged in r17268.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Ryan J Ollos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Ryan J Ollos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.