[[PageOutline(2-5,Contents,pullout)]] = Trac Spam Filtering This plugin allows different ways to reject contributions that contain spam. It requires at least Trac release 1.0. The source code for version 0.12 and before isn't updated any more, but is still available. The spamfilter plugin has many options, but most of them are optional. Out of the box the plugin provides basic spam protection. But there are some things which may be helpful in order of importance: * Train Bayes database using the entries of the log to activate that filter and reach good performance. The Bayes filter requires spambayes. * Setup !BadContent page containing regular expressions to filter. * Get API keys for Akismet, and/or HTTP:BL to use external services. * Activate captcha rejection handler to improve user treatment; requires reCAPTCHA access when that method is used. * Finetune the karma settings and parameters for your system, eg you may increase karma for good trained Bayes filters or stop trusting registered users. * If necessary get API keys for other services and activate them. There are sections in the Trac admin module for configuration, monitoring, and training the spam filter. For monitoring and training purposes, it optionally logs all activity to a table in the database. Upgrading the environment is necessary to install the database table required for this logging. == How good is the filtering? The spam filter will never be perfect. You need to check submissions to Trac and improve training or settings of the filter when necessary. But a fine trained setup will help you to run a site even if it is actively spammed, ie thousands of spam attempts a day. Even large sites with completely anonymous edits are possible. But from time to time spam attacks nevertheless will succeed and handwork is required. Try removing successful spam as fast as possible. The longer it stays in the pages the harder your work will get. Some spammers even monitor successful attempts and retry more intensely. Spam should be removed completely, also in the page history. Trac has options to delete tickets as well as wiki page versions. If done early enough this does not produce gaps in page history. Spam can also be in uploaded files. Delete them! Some spam bots edit a page twice, whereby the last change is harmless and the previous one contains the spam. Sometimes spam is done by humans, and while usually successful, humans are easily discouraged by fast deletion. The Bayes filter when properly trained usually has the best detection rates and can be adapted quickly to new attacks by training the successful spam attempts. Akismet is a good second line of defense and it also uses adaptive algorithms. Training also helps the external service when a new type of attack begins. All other services are good to catch spam inserted through rather dumb methods, which is the majority. A realistic goal is in the order of 1 spam for every 10.000 attempts. However, for a new type spam wave, which happens once or twice a year, you have maybe 10-20 slip through at the start of the wave. False rejects should be in the order of one rejection per 1.000 or more successful submissions. == Supported Internal Filtering Strategies The individual strategies assign scores or "karma" to submitted content, and the total karma determines whether a submission is rejected or not. === Regular Expressions The [source:plugins/1.0/spam-filter/tracspamfilter/filters/regex.py regex] filter reads a list of regular expressions from a wiki page named "BadContent", each regular expression being on a separate line inside the first code block on the page, using the [https://docs.python.org/2/library/re.html Python syntax] for regular expressions. If any of those regular expressions matches the submitted content, the submission will be rejected. === Regular Expressions for IP The [source:plugins/1.0/spam-filter/tracspamfilter/filters/ip_regex.py ip_regex] filter reads a list of regular expressions from a wiki page named "BadIP", each regular expression being on a separate line inside the first code block on the page, using the [https://docs.python.org/2/library/re.html Python syntax] for regular expressions. If any of those regular expressions matches the submitters IP, the submission will be rejected. Regular expressions are too powerful for the simple task of matching an IP or an IP range, but to keep things simple for users the design is equal to the content-based regular expressions. You can even specify full IPV4 addresses, where the dot has special meaning, as the match will work correctly. Only when matching partial addresses more care is needed. === IP Throttling The [source:plugins/1.0/spam-filter/tracspamfilter/filters/ip_throttle.py ip_throttle] filter limits the number of posts per hour allowed from a single IP. The maximum number of posts per hour is configured in [wiki:TracIni trac.ini]: {{{#!ini [spam-filter] max_posts_by_ip = 5 }}} When this limit is exceeded, the filter starts giving submissions negative karma as specified by the `ip_throttle_karma` option. === Captcha Support for CAPTCHA-style "human" verification is integrated. Captcha usage is configured in the 'Captcha' administration page. Currently the following captcha types are supported: * Simple text captcha: Spam robots can bypass these, so they are not recommended. * Image captcha. * External reCAPTCHA service: To use reCAPTCHA captcha method, you'll need to sign up at [https://www.google.com/recaptcha/intro/index.html] and set the keys at 'Captcha' administration page. * External !KeyCaptcha service: To use !KeyCaptcha captcha method, you'll need to sign up at [http://www.keycaptcha.com/] and set the user id and key at 'Captcha' administration page. Note: requires JavaScript at the user side. The captcha in spamfilter is a rejection system: they are only displayed to the user when otherwise a submission would be rejected as spam. In this case a successfully solved captcha can increase the score of a transmission. If a transmission has too many spam points even a successfully solved captcha can't save it, ie the score is 30 and a captcha only removed 20 points. === Bayes The Bayes filter is a very powerful tool when trained and used properly: * When beginning, the filter needs a minimum amount of 25 entries for HAM (useful entries) and also for SPAM (advertising). Simply train every submission you get until these limits are reached. * The training is done in Administration Menu "Spam Filtering / Monitoring". You have following buttons: * ''Mark selected as Spam'' - Mark the entries as SPAM and train them in Bayes database (not visible by default for newer versions) * ''Mark selected as Ham'' - Mark the entries as HAM and train them in Bayes database (not visible by default for newer versions) * ''Delete selected'' - remove entry without training * ''Delete selected as Spam'' - Mark the entries as SPAM and train them in Bayes database, remove them afterwards * ''Delete selected as Ham'' - Mark the entries as HAM and train them in Bayes database, remove them afterwards * When !JavaScript is enabled a number of check boxes is available, which help selecting entries * Rules for a good trained database are: * Don't train the same stuff multiple times * HAM and SPAM count should be nearly equal; in reality you will have more SPAM, but a ratio of 1 to 5 should be the maximum * Start from scratch when results are poor * It is hard to get rid of training errors, so be careful * See [http://spambayes.org/background.html SpamBayes pages] for more details. * The Bayes admin pages have two options for database cleaning: * One resets the database completely - this is useful in cases where wrong training produces bad results. * The second options allows to reduce the training database when it got too large. * Do not use these options if everything works! * Do not reduce database in regular intervals or when only few entries are removable. * The function to reduce database is disabled with less than 10'000 lines to remove! * A valid use may be to strip ¾ of a database with 200'000 entries after several years of training. * Strategy for Trac usage: * Use the ''Delete selected as Spam'' and ''Delete selected as Ham'' * Remove every strange entry using ''Delete selected'', eg SandBox stuff * Train every valid HAM entry or the database will get unbalanced * Be sure to train every error: Rejected user submissions as well as undetected SPAM * Train every SPAM entry with a score below 90%; at the beginning you may train everything not 100% * Delete SPAM entries with high score; 100% in any case, after beginning phase everything above 90% * When in doubt if SPAM or HAM, delete entry * NOTE: When Akismet or !StopForumSpam (with API key) are activated, then training will send the entries also to these services. * If you append the parameter "num" with values between 5 and 150 at monitoring page {{{url.../admin/spamfilter/monitor?num=100}}} you can show more entries, but don't train with a very large dataset at once. === !TrapField The [source:plugins/1.0/spam-filter/tracspamfilter/filters/trapfield.py TrapField] filter uses a hidden form field to check content for possible spam. If enabled, an additional benefit is usually better performance for some of the external services as well. == Supported External Filtering Strategies === IP Blacklisting The [source:plugins/1.0/spam-filter/tracspamfilter/filters/ip_blacklist.py ip_blacklist] filter uses the third-party Python library [http://www.dnspython.org/ dnspython] to make DNS requests to a configurable list of IP blacklist servers. See [wikipedia:Comparison_of_DNS_blacklists SpamLinks DNS Lists] for a list of DNS based blacklists. A blacklist usable for this filter must return an IP for listed entries and no IP (NXDOMAIN) for unlisted entries. '''Note''': The submitters IP is sent to the configured servers. === URL Blacklisting The [source:plugins/1.0/spam-filter/tracspamfilter/filters/url_blacklist.py url_blacklist] filter uses the third-party Python library [http://www.dnspython.org/ dnspython] to make DNS requests to a configurable list of URL blacklist servers. It checks domains found in the transmitted data. See [http://mxtoolbox.com/blacklists.aspx SpamLinks URL Lists] for a list of URL based blacklists. A blacklist usable for this filter must return an IP for listed entries and no IP (NXDOMAIN) for unlisted entries. '''Note''': Domain links submitted in the transmission are sent to the configured servers. === Akismet The [source:plugins/1.0/spam-filter/tracspamfilter/filters/akismet.py Akismet] filter uses the [http://akismet.com/ Akismet web service] to check content for possible spam. The use of this filter requires a [http://www.wordpress.com Wordpress] API key. The API key is configured in the 'External' administration page. '''Note''': Submitted content is sent to Akismet servers. Don't use this in private environments. === !StopForumSpam The [source:plugins/1.0/spam-filter/tracspamfilter/filters/stopforumspam.py StopForumSpam] filter uses the [http://stopforumspam.com/ StopForumSpam web service] to check content for possible spam. This services tests IP, username and/or email address. Training this filter requires an API key. The API key is configured in the 'External' administration page. '''Note''': Submitted username and IP is sent to !StopForumSpam servers. Don't use this in private environments. === HTTP:BL The [source:plugins/1.0/spam-filter/tracspamfilter/filters/httpbl.py HTTP:BL] filter uses the [http://www.projecthoneypot.org/httpbl.php Project HoneyPot HTTP:BL web service] to check content for possible spam. The use of this filter requires a [http://www.projecthoneypot.org/httpbl_configure.php HTTP:BL] API key. The API key is configured in the 'External' administration page. '''Note''': Submitters IP is sent to HTTP:BL servers. === !BotScout The [source:plugins/1.0/spam-filter/tracspamfilter/filters/botscout.py BotScout] filter uses the [http://botscout.com/ BotScout web service] to check content for possible spam. This services tests IP, username and/or email address. Using this filter requires an API key. The API key is configured in the 'External' administration page. '''Note''': Submitted username and IP is sent to !BotScout servers. Don't use this in private environments. === FSpamList The [source:plugins/1.0/spam-filter/tracspamfilter/filters/fspamlist.py FSpamList] filter uses the [http://www.fspamlist.com/ FSpamList web service] to check content for possible spam. This services tests IP, username and/or email address. Using this filter requires an API key. The API key is configured in the 'External' administration page. '''Note''': Submitted username and IP is sent to FSpamList servers. Don't use this in private environments. == Get the Plugin See TracPlugins for instructions on building and installing plugins. The plugin can be installed from [pypi:TracSpamFilter PyPI] using `pip` (preferred) or `easy_install`. For Trac 1.2.x: {{{#!sh $ pip install TracSpamFilter }}} For Trac 1.0.x: {{{#!sh $ pip install "TracSpamFilter<1.2" }}} You also can obtain the code from the Trac Subversion repository or download the zipped source. {{{#!sh svn co $svnurl }}} For Trac 1.0.x: * svnurl: `https://svn.edgewall.org/repos/trac/plugins/1.0/spam-filter` * [source:plugins/1.0/spam-filter Browse the source] or download the [browser:plugins/1.0/spam-filter?format=zip zipped source] For Trac 1.2.x: * svnurl: `https://svn.edgewall.org/repos/trac/plugins/1.2/spam-filter` * [source:plugins/1.2/spam-filter Browse the source] or download the [browser:plugins/1.2/spam-filter?format=zip zipped source] For Trac 1.4.x: * svnurl: `https://svn.edgewall.org/repos/trac/plugins/1.4/spam-filter` * [source:plugins/1.4/spam-filter Browse the source] or download the [browser:plugins/1.4/spam-filter?format=zip zipped source] For Trac 1.5.x: * svnurl: `https://svn.edgewall.org/repos/trac/plugins/trunk/spam-filter` * [source:plugins/trunk/spam-filter Browse the source] or download the [browser:plugins/trunk/spam-filter?format=zip zipped source] ''[https://svn.edgewall.org/repos/trac/plugins/1.4/spam-filter/#egg=TracSpamFilter-dev This is a link for setuptools to find the SVN download]''. == Enabling the Plugin If you install the plugin globally as described [wiki:TracPlugins#ForAllProjects here], you also need to enable it in the web administration or in [wiki:TracIni trac.ini]: {{{#!ini [components] tracspamfilter.* = enabled }}} You can disable individual strategies: * Disable the corresponding class in plugin handling * Set karma to 0 * External services requiring API key are disabled without key * All external services can be disabled in 'External' section (completely and only for training) == Permissions The Spamfilter adds new permissions to Trac: ||=Permission=||=Functions=|| || SPAM_CHECKREPORTS || Allows to review and delete user spam reports || || SPAM_CONFIG || Get the admin menu entries to configure the filter || || SPAM_MONITOR || Get the admin menu entries to monitor the submissions (spam or ham) || || SPAM_REPORT || Add link to report spam to pages, so user can submit pages as spam to the admins || || SPAM_TRAIN || In the monitoring panel access the spam training functions (useless without SPAM_MONITOR) || || SPAM_USER || Enables user evaluation display which allows to detect and delete inactive accounts || || SPAM_ADMIN || Combination of all six || The permission SPAM_REPORT should probably not be assigned to unauthenticated users or else there will be many false reports. Minimum in this case should be to exclude '/reportspam' URL in robots.txt file. == SpamFilter and !AccountManager If the [th:AccountManagerPlugin] is used in version 0.4 or better, then the registrations can be checked for spam as well. To do so, the entry **!RegistrationFilterAdapter** needs to be added to key **register_check** in section **account-manager** of the Trac configuration. There are several ways to do this: * Add it as first in the line: the filter then displays reject reasons in the spamfilter log. * Add it as last in line: first are the accountmanager checks and only if all is fine, then the spamfilter is called. * Enable the "account_replace_checks" to let spamfilter perform the Accountmanager checks (not recommended). Newer versions of !AccountManager have a configuration dialog to do necessary setup to enable the !RegistrationFilterAdapter. The !SpamFilter plugin has several modules to check the contributions of users, find inactive and delete unwanted users. To delete users, the corresponding !AccountManager modules are called. == Translation You can translate the plugin into your language: [https://www.transifex.com/projects/p/Trac_Plugin-L10N/resource/spamfilter/] Top translations: Trac_Plugin-L10N » [https://www.transifex.com/projects/p/Trac_Plugin-L10N/resource/spamfilter/ spamfilter][[BR]] [[Image(https://www.transifex.com/projects/p/Trac_Plugin-L10N/resource/spamfilter/chart/image_png, title=Go to Trac_Plugin-L10N project page on Transifex.net, link=https://www.transifex.com/projects/p/Trac_Plugin-L10N/resource/spamfilter/)]] == Known Issues '''Attention''': dnspython v1.7 causes a massive slowdown of the Trac site. [[TicketQuery(component=plugin/spamfilter,status=!closed)]] == Requirements * The modules for IP blacklisting and HTTP:BL require [pypi:dnspython] (v1.8+). * The !ImageCaptcha requires [pypi:pillow] to work. * Bayes filtering requires [pypi:spambayes]. The packages can be installed using `pip install `. ---- See also: TracPlugins, PluginList