#9423 closed enhancement (fixed)
Replying to ticket comments judged as spam
Reported by: | anonymous | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | plugin - spam-filter |
Component: | plugin/spamfilter | Version: | |
Severity: | normal | Keywords: | |
Cc: | trac@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
I replied to a ticket comment on trac.edgewall.org and it was 88% judged as spam without saying why and due which words.
I tried several changes to reduce to less links and formating, but in the end my IP address was blocked with "too many spam trials in this hour." :-(
Hmm… if trac blocks replying with spam suspicion, I think it must help to find a way out.
Attachments (0)
Change History (15)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
Cc: | added |
---|---|
Component: | ticket system → plugin/spamfilter |
Milestone: | → plugin - spam-filter |
Type: | defect → enhancement |
I think we already have a ticket for that… but I can't find it. Keeping this one for now.
In your particular case, the reject was because of this:
BayesianFilterStrategy (-4): SpamBayes determined spam probability of 81.76%
So I'm not sure if there's a way to get the Bayes filter explain its reasoning…
But for other strategies (regexp), it can be worth showing the reason.
comment:3 by , 14 years ago
I think t.e.o is running an old spamfilter plugin. The version in SVN displays the reject reason for some time now.
comment:4 by , 14 years ago
I'll look into installing a new version - I'll use the latest version when you tell me you're done with the updates.
follow-up: 6 comment:5 by , 14 years ago
I think you can try. The current state (0.3.1) in SVN is basically what I have running on a 0.11.7 and 0.12 system for several months now. Maybe you also want to add http:BL as check. It works good for josm. I also increased the spambayes score to 14 for josm and after about 4000+1500 training entries I now usually have <5% and >95% bayes accuracy.
The new javascript based checkbox buttons help a lot with training. :-)
comment:6 by , 14 years ago
follow-up: 8 comment:7 by , 14 years ago
Be happy. Essentially I forgot one file of the changes you did in 0.11 in my own tests and exactly this file caused major issues for me as well. The missing content="" prevented upload files for a whole day on my site. It is only fair you get some trouble from this as well. :-)
comment:8 by , 14 years ago
Replying to stoecker:
Be happy. Essentially I forgot one file of the changes you did in 0.11
That's why the mergeinfo in general and the eligible links in particular are useful. I've forgot to set them up initially when doing the move to /plugins, but that's fixed now.
It is only fair you get some trouble from this as well. :-)
Grr … ;-)
comment:9 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:10 by , 14 years ago
Hmm… today I wrote a ticket comment which was judged as 85% spam. I didn't get a hint why. I removed all used links and still were judged as 64% spam. Then I inserted a space in the word a actually wanted to be a wiki link and it was then called Sub Tickets. After that space character I got the comment through the spam filter. *sigh* It's very hard to write comments… *sigh* ;-)
follow-up: 12 comment:11 by , 14 years ago
And why are Wikipedia links judged as spam? This ticket should be reopened.
follow-up: 13 comment:12 by , 14 years ago
Replying to fbrettschneider@…:
Hmm… today I wrote a ticket comment which was judged as 85% spam. I didn't get a hint why. I removed all used links and still were judged as 64% spam. Then I inserted a space in the word a actually wanted to be a wiki link and it was then called Sub Tickets. After that space character I got the comment through the spam filter. *sigh* It's very hard to write comments… *sigh*
There is an option "show_blacklisted", which gives the comment writer a bigger hint about the reason. But your text above suggests that you refer to the bayes filter (the only one which has percentage). I doubt there is a possibility to view the user why bayes thinks it is spam. Maybe the bayes filter of t.e.o needs better training. On my site spammers usually never go below 40% (and usually are above 98%) and real users never above 20%.
Replying to anonymous:
And why are Wikipedia links judged as spam? This ticket should be reopened.
Wikipedia links like all links are handled as external links. They increase spam score. A rejection should only result in overly usage of links.
comment:13 by , 14 years ago
Replying to dstoecker:
There is an option "show_blacklisted", which gives the comment writer a bigger hint about the reason. But your text above suggests that you refer to the bayes filter (the only one which has percentage).
I'm talking about writing comments on trac.edgewall.org via Windows Firefox 3.6.12. The spam filter of trac's homepage site is configured as too restrictive.
I doubt there is a possibility to view the user why bayes thinks it is spam. Maybe the bayes filter of t.e.o needs better training. On my site spammers usually never go below 40% (and usually are above 98%) and real users never above 20%.
It seems using a mix of upper and lower case letters in words is not accepted and also links increase the spam percents a lot. Maybe there should be an email address where I can send my text to you when I'm blocked but know the text is actually OK.
Replying to anonymous: Wikipedia links like all links are handled as external links. They increase spam score. A rejection should only result in overly usage of links.
Maybe some trusted sites could be taken from the blacklist.
follow-up: 15 comment:14 by , 14 years ago
The spam filter of trac's homepage site is configured as too restrictive.
I fear this is mainly a training issue. Bayes filters need training. For my site I train everything which does not reach 0% or 100%.
It seems using a mix of upper and lower case letters in words is not accepted.
There is no such rule in spamfilter, but words of this sort are often used in SPAM, so the bayes filter will catch these.
Maybe there should be an email address where I can send my text to you when I'm blocked but know the text is actually OK.
The operator of the site has the option to log every text and use it for training the filter. It is usually easy to detect wrong recognitions, as contrary to spammers a user retries some time until he gets the text through with some modifications and captcha solving. I'm not the operator of t.e.o.
Maybe some trusted sites could be taken from the blacklist.
I don't think this is useful.
comment:15 by , 14 years ago
Replying to dstoecker:
It seems using a mix of upper and lower case letters in words is not accepted.
There is no such rule in spamfilter, but words of this sort are often used in SPAM, so the bayes filter will catch these.
It's needed for those wiki links.
Maybe there should be an email address where I can send my text to you when I'm blocked but know the text is actually OK.
The operator of the site has the option to log every text and use it for training the filter. It is usually easy to detect wrong recognitions, as contrary to spammers a user retries some time until he gets the text through with some modifications and captcha solving.
Yes, this sounds good (if they can use the several tries for training)
addon: I'm talking about http://trac.edgewall.org/ticket/8933#comment:6. This stripped text was still 62% spam until I was suddenly blocked as spammer. I reset my DSL gateway to get another provider IP and after that the same text I safed to text file was accepted with the first trial. Strange.