= Trac Spam Filtering =
[[PageOutline(2-3)]]

This plugin allows different ways to reject contributions that contain spam. This plugin requires Trac release [milestone:0.12]. The plugin code for versions before 0.12 isn't updated any more.

The spamfilter plugin has many options, but most of them are optional. Basically installing is enough to have a basic spam protection. But there are some things which may be helpful (in order of importance):
 * Train bayes database (using the entries of the log) to activate that filter and reach good performance
 * Setup !BadContent page containing regular expressions to filter
 * Get API keys for Akismet, !TypePad, Defensio and/or HTTP:BL to use external services
 * Activate captcha rejection handler to improve user treatment (may need reCAPTCHA access when that method should be used)
 * Finetune the karma settings and parameters for your system (e.g. you may increase karma for good trained bayes filters or stop trusting registered users)

WebAdmin is used for configuration, monitoring, and training. For monitoring and training purposes, it optionally logs all activity to a table in the database. Upgrading the environment is necessary to install the database table required for this logging.

== Supported Internal Filtering Strategies ==

The individual strategies assign scores (“karma”) to submitted content, and the total karma determines whether a submission is rejected or not.

=== Regular Expressions ===

The [source:plugins/0.12/spam-filter-captcha/tracspamfilter/filters/regex.py regex] filter reads a list of regular expressions from a wiki page named “BadContent”, each regular expression being on a separate line inside the first code block on the page, using the [http://docs.python.org/lib/re-syntax.html Python syntax] for regular expressions.

If any of those regular expressions matches the submitted content, the submission will be rejected.

=== Regular Expressions for IP ===

The [source:plugins/0.12/spam-filter-captcha/tracspamfilter/filters/ip_regex.py ip_regex] filter reads a list of regular expressions from a wiki page named “BadIP”, each regular expression being on a separate line inside the first code block on the page, using the [http://docs.python.org/lib/re-syntax.html Python syntax] for regular expressions.

If any of those regular expressions matches the submitters IP, the submission will be rejected.

Regular expressions are much too powerful for the simple task of matching an IP or IP range, but to keep things simple for users the design is equal to the content based regular expressions. You simple can specify full IPV4
addresses even if the dot has special meaning, as the match will work correctly. Only when matching partial addresses more care is needed.

=== IP Throttling ===

The [source:plugins/0.12/spam-filter-captcha/tracspamfilter/filters/ip_throttle.py ip_throttle] filter limits the number of posts per hour allowed from a single IP.

The maximum number of posts per hour is configured in [wiki:TracIni trac.ini]:

{{{
[spam-filter]
max_posts_by_ip = 5
}}}

When this limit is exceeded, the filter starts giving submissions negative karma as specified by the `ip_throttle_karma` option.

=== Captcha ===

Support to have CAPTCHA-style "human" verification is integrated. Captcha usage is configured in the 'Captcha' administration page.

Currently three captcha types are supported:
 * Simple text captcha
 * Image captcha
 * External reCAPTCHA service: To use reCAPTCHA captcha method, you'll need to sign up at [http://www.google.com/recaptcha/whyrecaptcha] and set the keys at 'Captcha' administration page.

=== Bayes ===

The Bayes filter is a very powerful tool when used properly. Following are a few guidelines how to use and train the filter to get good results:

 * When beginning, the filter needs a minimum amount of 25 entries for HAM (useful entries) and also for SPAM (advertising). Simply train every submission you get until these limits are reached.
 * The training is done in Administration Menu "Spam Filtering / Monitoring". You have following buttons
  * ''Mark selected as Spam'' - Mark the entries as SPAM and train them in Bayes database
  * ''Mark selected as Ham'' - Mark the entries as HAM and train them in Bayes database
  * ''Delete selected'' - remove entry without training
  * ''Delete selected as Spam'' - Mark the entries as SPAM and train them in Bayes database, remove them afterwards
  * ''Delete selected as Ham'' - Mark the entries as HAM and train them in Bayes database, remove them afterwards
  * When !JavaScript is enabled a number of check boxes is available, which help selecting entries
 * Rules for a good trained database are:
  * Don't train the same stuff multiple times
  * HAM and SPAM count should be nearly equal (In reality you will have more SPAM, but a factor of 1 to 5 should be the maximum)
  * Restart from scratch when results are poor
  * It is hard to get rid of training errors, so be carefully
  * See [http://spambayes.org/background.html SpamBayes pages] for more details.
 * Strategy for Trac usage:
  * Use the ''Delete selected as Spam'' and ''Delete selected as Ham''
  * Remove every strange entry (e.g. SandBox stuff) using ''Delete selected''
  * Train every valid HAM entry (or database will get unbalanced)
  * Be sure to train every error: Rejected user submissions as well as undetected SPAM
  * Train every SPAM entry with a score below 90% (at the beginning you may train everything not 100%)
  * Delete SPAM entries with high score (100% in any case, after beginning phase everything above 90%)
  * When in doubt if SPAM or HAM, delete entry
 * NOTE: When Akismet, Defensio, !BlogSpam or !TypePad are activated, then training will send the entries also to these services.
 * If you append the parameter "num" with values between 5 and 150 at monitoring page {{{url.../admin/spamfilter/monitor?num=100}}} you can show more entries, but don't train very large dataset at once.

== Supported External Filtering Strategies ==

=== IP Blacklisting ===

The [source:plugins/0.12/spam-filter-captcha/tracspamfilter/filters/ip_blacklist.py ip_blacklist] filter uses the third-party Python library [http://www.dnspython.org/ dnspython] to make DNS requests to a configurable list of IP blacklist servers.

See e.g. [http://spamlinks.net/filter-dnsbl-lists.htm SpamLinks DNS Lists] for a list of DNS based blacklists. A blacklist usable for this filter must return an IP for listed entries and no IP (NXDOMAIN) for unlisted entries.

'''NOTE''': Submitters IP is sent to configured servers.

=== Akismet ===

The [source:plugins/0.12/spam-filter-captcha/tracspamfilter/filters/akismet.py Akismet] filter uses the [http://akismet.com/ Akismet] web service to check content for possible spam.

The use of this filter requires a [http://www.wordpress.com Wordpress] API key. The API key is configured in the 'External' administration page.

'''NOTE''': Submitted content is sent to Akismet servers. Don't use this in private environments.

=== !TypePad ===

The [source:plugins/0.12/spam-filter-captcha/tracspamfilter/filters/typepad.py TypePad AntiSpam] filter uses the [http://antispam.typepad.com/ Typepad] web service to check content for possible spam.

The use of this filter requires a API key. The API key is configured in the 'External' administration page.

'''NOTE''': Submitted content is sent to !TypePad servers. Don't use this in private environments.

=== Defensio ===

The [source:plugins/0.12/spam-filter-captcha/tracspamfilter/filters/defensio.py Defensio] filter uses the [http://defensio.com/ Defensio] web service to check content for possible spam.

The use of this filter requires an API key. The API key is configured in the 'External' administration page.

'''NOTE''': Submitted content is sent to Defensio servers. Don't use this in private environments.

=== !StopForumSpam ===

The [source:plugins/0.12/spam-filter-captcha/tracspamfilter/filters/stopforumspam.py StopForumSpam] filter uses the [http://stopforumspam.com/ StopForumSpam] web service to check content for possible spam. This services tests IP, username and/or email address.

Training this filter requires an API key. The API key is configured in the 'External' administration page.

'''NOTE''': Submitted username and IP is sent to !StopForumSpam servers. Don't use this in private environments.

=== !BlogSpam ===

The [source:plugins/0.12/spam-filter-captcha/tracspamfilter/filters/blogspam.py BlogSpam] filter uses the [http://blogspam.net/ BlogSpam] web service to check content for possible spam.

This service includes also DNS checks and services identical to the checks in this plugin. Be sure to set proper karma or these checks are counted twice. You also can disable individual checks in preferences.

'''NOTE''': Submitted content is sent to !BlogSpam servers. Don't use this in private environments.

=== HTTP:BL ===

The [source:plugins/0.12/spam-filter-captcha/tracspamfilter/filters/httpbl.py HTTP:BL] filter uses the [http://www.projecthoneypot.org/httpbl.php Project HoneyPot HTTP:BL] web service to check content for possible spam.

The use of this filter requires a [http://www.projecthoneypot.org/httpbl_configure.php HTTP:BL] API key. The API key is configured in the 'External' administration page.

'''NOTE''': Submitters IP is sent to HTTP:BL servers.

== Get the Plugin ==

See the [wiki:TracPlugins#Requirements Trac plugin requirements] for instructions on installing `setuptools`.  `Setuptools` includes the `easy_install` application which you can use to install the SpamFilter:
{{{
easy_install TracSpamFilter
}}}

You can also obtain the code from the Trac Subversion repository:
{{{
svn co http://svn.edgewall.com/repos/trac/plugins/0.12/spam-filter-captcha
}}}

or download [http://trac.edgewall.org/changeset/latest/plugins/0.12/spam-filter-captcha?old_path=/&format=zip zipped source].

See TracPlugins for instructions on building and installing plugins.

You can [source:plugins/0.12/spam-filter-captcha browse the source in Trac].

''[http://svn.edgewall.com/repos/trac/plugins/0.12/spam-filter-captcha/#egg=TracSpamFilter-dev This is a link for setuptools to find the SVN download]''

== Enabling the Plugin ==

If you install the plugin globally (as described [wiki:TracPlugins#ForAllProjects here]), you'll also need to enable it in the web administration or in [wiki:TracIni trac.ini] as follows:
{{{
[components]
tracspamfilter.* = enabled
}}}

== Further Reading ==

 * More info about SpamFilter (and screenshots): [http://www.cmlenz.net/blog/2006/11/managing_trac_s.html Managing Trac Spam]
 * An alternate solution based on mod_security: [http://projects.otaku42.de/wiki/ScallyWhack ScallyWhack].

== Known Issues ==
 * '''Attention''': The 1.7 series of dnspython causes a massive slowdown of whole Trac. Use 1.6.x or 1.8.x.
[[TicketQuery(component=plugin/spamfilter,status=!closed)]]

== Requirements ==

 * The modules for IP blacklistening und HTTP:BL need [http://www.dnspython.org/ dnspython] installed.
   Install "setuptools" based on the [wiki:TracPlugins#Requirements Trac plugin requirements], then you can run "easy_install dnspython" to automatically download and install the package.
 * '''Attention''': The 1.7 series of dnspython causes a massive slowdown of whole Trac. Use 1.6.x or 1.8.x.
 * The !ImageCaptcha requires python-imaging to work.
 * Bayes filtering needs spambayes software installed.

----
See also: TracPlugins, PluginList