Edgewall Software
Modify

Opened 14 years ago

Closed 14 years ago

Last modified 14 years ago

#9423 closed enhancement (fixed)

Replying to ticket comments judged as spam

Reported by: anonymous Owned by:
Priority: normal Milestone: plugin - spam-filter
Component: plugin/spamfilter Version:
Severity: normal Keywords:
Cc: trac@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description

I replied to a ticket comment on trac.edgewall.org and it was 88% judged as spam without saying why and due which words.

I tried several changes to reduce to less links and formating, but in the end my IP address was blocked with "too many spam trials in this hour." :-(

Hmm… if trac blocks replying with spam suspicion, I think it must help to find a way out.

Attachments (0)

Change History (15)

comment:1 by anonymous, 14 years ago

addon: I'm talking about http://trac.edgewall.org/ticket/8933#comment:6. This stripped text was still 62% spam until I was suddenly blocked as spammer. I reset my DSL gateway to get another provider IP and after that the same text I safed to text file was accepted with the first trial. Strange.

comment:2 by Christian Boos, 14 years ago

Cc: trac@… added
Component: ticket systemplugin/spamfilter
Milestone: plugin - spam-filter
Type: defectenhancement

I think we already have a ticket for that… but I can't find it. Keeping this one for now.

In your particular case, the reject was because of this:

BayesianFilterStrategy (-4): SpamBayes determined spam probability of 81.76%

So I'm not sure if there's a way to get the Bayes filter explain its reasoning…

But for other strategies (regexp), it can be worth showing the reason.

comment:3 by Dirk Stöcker, 14 years ago

I think t.e.o is running an old spamfilter plugin. The version in SVN displays the reject reason for some time now.

comment:4 by Christian Boos, 14 years ago

I'll look into installing a new version - I'll use the latest version when you tell me you're done with the updates.

comment:5 by Dirk Stöcker, 14 years ago

I think you can try. The current state (0.3.1) in SVN is basically what I have running on a 0.11.7 and 0.12 system for several months now. Maybe you also want to add http:BL as check. It works good for josm. I also increased the spambayes score to 14 for josm and after about 4000+1500 training entries I now usually have <5% and >95% bayes accuracy.

The new javascript based checkbox buttons help a lot with training. :-)

in reply to:  5 comment:6 by Christian Boos, 14 years ago

Replying to dstoecker:

I think you can try. The current state (0.3.1) in SVN is basically what I have running on a 0.11.7 and 0.12 system for several months now

I installed 0.3.2dev-r9893 this afternoon, and we already have our first problem report ;-) See #9460.

comment:7 by stoecker, 14 years ago

Be happy. Essentially I forgot one file of the changes you did in 0.11 in my own tests and exactly this file caused major issues for me as well. The missing content="" prevented upload files for a whole day on my site. It is only fair you get some trouble from this as well. :-)

in reply to:  7 comment:8 by Christian Boos, 14 years ago

Replying to stoecker:

Be happy. Essentially I forgot one file of the changes you did in 0.11

That's why the mergeinfo in general and the eligible links in particular are useful. I've forgot to set them up initially when doing the move to /plugins, but that's fixed now.

It is only fair you get some trouble from this as well. :-)

Grr … ;-)

comment:9 by Dirk Stöcker, 14 years ago

Resolution: fixed
Status: newclosed

comment:10 by fbrettschneider@…, 14 years ago

Hmm… today I wrote a ticket comment which was judged as 85% spam. I didn't get a hint why. I removed all used links and still were judged as 64% spam. Then I inserted a space in the word a actually wanted to be a wiki link and it was then called Sub Tickets. After that space character I got the comment through the spam filter. *sigh* It's very hard to write comments… *sigh* ;-)

comment:11 by anonymous, 14 years ago

And why are Wikipedia links judged as spam? This ticket should be reopened.

in reply to:  11 ; comment:12 by Dirk Stöcker, 14 years ago

Replying to fbrettschneider@…:

Hmm… today I wrote a ticket comment which was judged as 85% spam. I didn't get a hint why. I removed all used links and still were judged as 64% spam. Then I inserted a space in the word a actually wanted to be a wiki link and it was then called Sub Tickets. After that space character I got the comment through the spam filter. *sigh* It's very hard to write comments… *sigh*

There is an option "show_blacklisted", which gives the comment writer a bigger hint about the reason. But your text above suggests that you refer to the bayes filter (the only one which has percentage). I doubt there is a possibility to view the user why bayes thinks it is spam. Maybe the bayes filter of t.e.o needs better training. On my site spammers usually never go below 40% (and usually are above 98%) and real users never above 20%.

Replying to anonymous:

And why are Wikipedia links judged as spam? This ticket should be reopened.

Wikipedia links like all links are handled as external links. They increase spam score. A rejection should only result in overly usage of links.

in reply to:  12 comment:13 by anonymous, 14 years ago

Replying to dstoecker:

There is an option "show_blacklisted", which gives the comment writer a bigger hint about the reason. But your text above suggests that you refer to the bayes filter (the only one which has percentage).

I'm talking about writing comments on trac.edgewall.org via Windows Firefox 3.6.12. The spam filter of trac's homepage site is configured as too restrictive.

I doubt there is a possibility to view the user why bayes thinks it is spam. Maybe the bayes filter of t.e.o needs better training. On my site spammers usually never go below 40% (and usually are above 98%) and real users never above 20%.

It seems using a mix of upper and lower case letters in words is not accepted and also links increase the spam percents a lot. Maybe there should be an email address where I can send my text to you when I'm blocked but know the text is actually OK.

Replying to anonymous: Wikipedia links like all links are handled as external links. They increase spam score. A rejection should only result in overly usage of links.

Maybe some trusted sites could be taken from the blacklist.

comment:14 by Dirk Stöcker, 14 years ago

The spam filter of trac's homepage site is configured as too restrictive.

I fear this is mainly a training issue. Bayes filters need training. For my site I train everything which does not reach 0% or 100%.

It seems using a mix of upper and lower case letters in words is not accepted.

There is no such rule in spamfilter, but words of this sort are often used in SPAM, so the bayes filter will catch these.

Maybe there should be an email address where I can send my text to you when I'm blocked but know the text is actually OK.

The operator of the site has the option to log every text and use it for training the filter. It is usually easy to detect wrong recognitions, as contrary to spammers a user retries some time until he gets the text through with some modifications and captcha solving. I'm not the operator of t.e.o.

Maybe some trusted sites could be taken from the blacklist.

I don't think this is useful.

in reply to:  14 comment:15 by fbrettschneider@…, 14 years ago

Replying to dstoecker:

It seems using a mix of upper and lower case letters in words is not accepted.

There is no such rule in spamfilter, but words of this sort are often used in SPAM, so the bayes filter will catch these.

It's needed for those wiki links.

Maybe there should be an email address where I can send my text to you when I'm blocked but know the text is actually OK.

The operator of the site has the option to log every text and use it for training the filter. It is usually easy to detect wrong recognitions, as contrary to spammers a user retries some time until he gets the text through with some modifications and captcha solving.

Yes, this sounds good (if they can use the several tries for training)

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The ticket will remain with no owner.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from (none) to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.