Edgewall Software

Changes between Version 53 and Version 54 of SpamFilter


Ignore:
Timestamp:
Jan 1, 2011, 8:38:52 PM (13 years ago)
Author:
Dirk Stöcker
Comment:

Improve Bayes description

Legend:

Unmodified
Added
Removed
Modified
  • SpamFilter

    v53 v54  
    4747The use of this filter requires a [http://www.wordpress.com Wordpress] API key. The API key is configured in the 'External' administration page.
    4848
     49'''NOTE''': Submitted content is sent to Akismet servers. Don't use this in private environments.
     50
    4951=== !TypePad ===
    5052
     
    5254
    5355The use of this filter requires a API key. The API key is configured in the 'External' administration page.
     56
     57'''NOTE''': Submitted content is sent to !TypePad servers. Don't use this in private environments.
    5458
    5559=== HTTP:BL ===
     
    6771=== Bayes ===
    6872
    69 ''TODO''
     73The Bayes filter is a very powerful tool when used properly. Following are a few guidelines how to use and train the filter to get good results:
    7074
    71 > (The code in svn uses [http://spambayes.org SpamBayes], which is a logical choice.  It would make sense to use a custom tokenizer, however, rather than the email-centric one that is included with [http://spambayes.org SpamBayes].  The bigger issue is that some form of training is required (e.g. the API could be extended so that (optionally) authenticated users (and the other filters) could report contributions as spam (using automatic training to assume that everything else is ham); however, this is a complex change).  An alternative to this would be a script that could be periodically executed that would train all existing contributions as ham, and gather spam from an appropriate source.  If you decide to continue with this in the future, please don't hestiate to ask [mailto:spambayes-dev@python.org spambayes-dev] for help.
     75 * When beginning, the filter needs a minimum amount of 25 entries for HAM (useful entries) and also for SPAM (advertising). Simply train every submission you get until this limits are reached.
     76 * The training is done in Administration Menu "Spam Filtering / Monitoring". You have following buttons
     77  * ''Mark selected as Spam'' - Mark the entries as SPAM and train them in Bayes database
     78  * ''Mark selected as Ham'' - Mark the entries as HAM and train them in Bayes database
     79  * ''Delete selected'' - remove entry without training
     80  * ''Delete selected as Spam'' - Mark the entries as SPAM and train them in Bayes database, remove them afterwards
     81  * ''Delete selected as Ham'' - Mark the entries as HAM and train them in Bayes database, remove them afterwards
     82  * When !JavaScript is enabled a number of check boxes is available, which help selecting entries
     83 * Rules for a good trained database are:
     84  * Don't train the same stuff multiple times
     85  * HAM and SPAM count should be nearly equal (In reality you will have more SPAM, but a factor of 1 to 5 should be the maximum)
     86  * Restart from scratch when results are poor
     87  * It is hard to get rid of training errors, so be carefully
     88  * See [http://spambayes.org/background.html SpamBayes pages] for more details.
     89 * Strategy for Trac usage:
     90  * Use the ''Delete selected as Spam'' and ''Delete selected as Ham''
     91  * Remove every strange entry (e.g. SandBox stuff) using ''Delete selected''
     92  * Train every valid HAM entry (or database will get unbalanced)
     93  * Be sure to train every error: Rejected user submissions as well as undetected SPAM
     94  * Train every SPAM entry with a score below 90% (at the beginning you may train everything not 100%)
     95  * Delete SPAM entries with high score (100% in any case, after beginning phase everything above 90%)
     96  * When in doubt if SPAM or HAM, delete entry
     97 * NOTE: When Akismet or !TypePad are activated, then training will send the entries also to these services.
     98 * If you append the parameter "num" with values between 5 and 150 at monitoring page {{{url.../admin/spamfilter/monitor?num=100}}} you can show more entries, but don't train very large dataset at once.
    7299
    73100== Get the Plugin ==