Edgewall Software
Modify

Opened 14 years ago

Closed 14 years ago

Last modified 14 years ago

#9872 closed enhancement (fixed)

provide steps for appealing spam-rejected tickets

Reported by: Andrew C Martin <andrew.c.martin@…> Owned by: Dirk Stöcker
Priority: normal Milestone: plugin - spam-filter
Component: plugin/spamfilter Version: 0.12dev
Severity: normal Keywords: spam
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description

I have been trying to submit a new ticket (attached), but every time I do I receive this error:

Submission rejected as potential spam (Akismet says content is spam, SpamBayes determined spam probability of 99.84%)

Obviously my ticket is not spam.

This ticket is a request to add a link in the above error for more information regarding how to proceed in the case of a legitimate ticket. This may be specific to the edgewall trac in the form of a wiki page or a general purpose solution. It's very frustrating trying to tweak a ticket description to pass a spam filter without any success.

I also question the accuracy of the SpamBayes probability based on my limited interaction with it, but that's outside the scope of Trac.

Attachments (0)

Change History (20)

comment:1 by Andrew C Martin <andrew.c.martin@…>, 14 years ago

I can't add the ticket as an attachment either — due to the spam filter. Here's a link:

http://pastebin.com/aQLzvH19

I would appreciate it if someone with appropriate credentials could add it as a new ticket.

comment:2 by Andrew C Martin <andrew.c.martin@…>, 14 years ago

Summary: provideprovide steps for appealing spam-rejected tickets

comment:3 by Remy Blank, 14 years ago

Component: generalplugin/spamfilter
Milestone: plugin - spam-filter
Owner: set to Dirk Stöcker

Interesting idea. What do you imagine as a way to proceed in case of a false positive? We could display a code (e.g. a hex hash) and let submissions through (and automatically learn them as ham) when the code is present in the content. Of course, the code would change every few minutes, and should be difficult to extract from the displayed page (e.g. generated with JavaScript).

Or could we even find a way to differentiate between a user clicking on the "Submit" button in a browser, and a bot just sending POST requests? The code could already be present on every page (again, encoded with JavaScript) and it could be added to the form on submission. This would detect the presence of a browser by the availability of JavaScript (which I assume is not likely to be available in a bot).

This could just become one additional filtering strategy, with a configurable weight.

comment:4 by Dirk Stöcker, 14 years ago

Why should we do such a thing? We already have captcha support which can do human verification. When you use e.g. reCAPTCHA the chances are very low that a spam bots has passed, but also the other two have never been broken on my site.

I think this issue is mainly a problem of t.e.o, which has a badly trained bayes filter and probably bad settings for the score points. Better training the filter and using e.g. 20 points for captcha response should fix the errors.

in reply to:  4 comment:5 by Remy Blank, 14 years ago

Replying to dstoecker:

Why should we do such a thing?

Because it's easier for users to do nothing than to solve a captcha (provided the method I suggested above is reliable against bots, of course).

I think this issue is mainly a problem of t.e.o, which has a badly trained bayes filter and probably bad settings for the score points.

I don't know about the score points, but the bayes filter certainly needs some help. I'm currently feeding it all the recent ham (we trained too much spam).

comment:6 by Dirk Stöcker, 14 years ago

It is not reliable. It will be broken extremely fast. Spammers already adapt for Trac, so you wont even have the advantage of a working solution due to the fact nobody cares for working around.

comment:7 by Christian Boos <cboos@…>, 14 years ago

Blue sky idea: the appealing procedure could be to just retry the submission this time with a captcha.

Though I'm not sure if in practice it would be possible to "conditionally" activate the captcha checks. Maybe by allowing a list in reject_handler, using the first component as the default, and allowing alternatives to be used if specified explicitly.

in reply to:  6 comment:8 by Remy Blank, 14 years ago

Replying to dstoecker:

It is not reliable. It will be broken extremely fast.

Do you have any data to back this claim? JavaScript has kept my e-mail address safe from spammers for many years, despite its being clickable on several pages of my website. I know it can be broken by integrating a JavaScript engine in a bot, but I doubt it's economically interesting for spammers. The easiest for them would probably be to implement the bot as a browser plugin.

comment:9 by Dirk Stöcker, 14 years ago

The captcha is sticky, so once done it increases the score by given amount until timeout is reached.

Regarding your code idea - One of the projects I work a little for is JDownloader - a tool to access hosting providers. The main task of the tool is to work around such protections as you describe them. Believe me - it is much easier to work around this stuff than to set it up. You constantly need to change the algorithms and the code and you will probably gain about 1 week until it is broken again. And most time it is not necessary to have a JavaScript engine at all.

It works nevertheless for private pages, but only as no-one cares for your private page enough to break it. But We don't can rely on this, as spammers care for Trac already, so they will easily adapt to such changes.

And actually I don't see any need to implement such stuff. reCAPATCHA for example is an established system which providers alternatives also for disabled people and you only need to enter the captcha once for a complete session and also this only in case you go over the spam threshold. For the JOSM site, which is used more than this one, I approximately get 1 captcha test once in a week.

in reply to:  1 comment:10 by Andrew C Martin <andrew.c.martin@…>, 14 years ago

Replying to Christian Boos <cboos@…>:

Blue sky idea: the appealing procedure could be to just retry the submission this time with a captcha.

I agree that this would be an ideal solution (if it is feasible).

Replying to Andrew C Martin <andrew.c.martin@…>:

I would appreciate it if someone with appropriate credentials could add it as a new ticket.

the additional data fed to the bayes filter was enough for me to write the ticket. #9874

in reply to:  9 comment:11 by Remy Blank, 14 years ago

Replying to dstoecker:

The captcha is sticky, so once done it increases the score by given amount until timeout is reached.

Ok, I didn't know that.

Regarding your code idea - One of the projects I work a little for is JDownloader - a tool to access hosting providers. The main task of the tool is to work around such protections as you describe them.

Heh, interesting. Thanks for the explanation, now I understand why you were so negative.

And most time it is not necessary to have a JavaScript engine at all.

I wonder about that. I haven't looked at the spamfilter code yet, but I assume it is modular enough that I could add a filtering strategy in a plugin. I may try the idea, and if it doesn't work, nothing lost on your side.

Then again, maybe reCAPTCHA would be the best solution. We should definitely give it a try here.

in reply to:  description comment:12 by anonymous, 14 years ago

Replying to Andrew C Martin <andrew.c.martin@…>:

I have been trying to submit a new ticket (attached), but every time I do I receive this error:

Submission rejected as potential spam (Akismet says content is spam, SpamBayes determined spam probability of 99.84%)

Obviously my ticket is not spam.

This is the same effect as I experienced and described in #9423. The difference is it was just a comment to a ticket there.

comment:13 by anonymous, 14 years ago

Maybe in case of judged as spam, the filter could be so intelligent and repeat the test for each single word or sentence or paragraph and tell each rate. This way one could find out the phrases to avoid.

comment:14 by anonymous, 14 years ago

Addionally, provide an I-swear-this-is-not-spam button where I can push through the text via CAPTCHA check.

in reply to:  14 ; comment:15 by Dirk Stöcker, 14 years ago

Replying to anonymous:

Addionally, provide an I-swear-this-is-not-spam button where I can push through the text via CAPTCHA check.

This is what Captcha does. It is not activated for this site.

@rblank:

The SpamFilter is VERY badly trained. I tried to post as anonymous myself and failed. Maybe you can give me SPAM_ADMIN permissions, so I can train it better than it is now.

in reply to:  15 comment:16 by Christian Boos, 14 years ago

Replying to dstoecker:

@rblank: The SpamFilter is VERY badly trained. I tried to post as anonymous myself and failed. Maybe you can give me SPAM_ADMIN permissions, so I can train it better than it is now.

Thought it was already the case, repaired this omission. Besides, don't hesitate to tell us when an upgrade of the plugin is needed (currently running 0.3.3dev-r9994).

comment:17 by Dirk Stöcker, 14 years ago

You should update. Current is 0.3.4 something. This fixes a bug in 0.11, where wiki-pages aren't scanned (someone changed the meaning of page.text again without informing me *tss*) and also improves Akismet/Typepad a lot (Akismet people contacted me :-)

You need to enable the new "External" admin page in the plugin prefs after installing, as "Akismet" has been replaced by this more generic name.

I'm currently adding Admin-page for Captcha stuff, but this may take some time.

comment:18 by Dirk Stöcker, 14 years ago

Ok. 0.4.3 is now current (before was 0.4.2, not 0.3.4 as I said) including the captcha configurations pages. 0.3.3 is outdated a lot. there have been many fixes inbetween.

comment:19 by Dirk Stöcker, 14 years ago

Resolution: fixed
Status: newclosed

SpamFilter updated and Captcha is activated. Also Bayes results are much better now. Reopen when new issues appear.

comment:21 by anonymous cboos, 14 years ago

Works great ;-)

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Dirk Stöcker.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Dirk Stöcker to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.