Edgewall Software
Modify

Opened 16 years ago

Closed 16 years ago

#6609 closed defect (fixed)

e-mail addresses containing hyphens are not recognized properly

Reported by: thomas.moschny@… Owned by: Christian Boos
Priority: normal Milestone: 0.11
Component: wiki system Version: 0.11b1
Severity: normal Keywords: email review
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description

The wiki parser does seem to stop at the hyphen when parsing a mail address of the form user@do-main.invalid. If obfuscated, it is rendered as user@...-main.invalid. If not, the caption and the href target of generated mailto: link also contain the part before the hyphen only.

Attachments (1)

EMAIL_LOOKALIKE_PATTERN-r6658.diff (2.0 KB ) - added by Christian Boos 16 years ago.
Put all the regexps together, shake, extract the best mix.

Download all attachments as: .zip

Change History (13)

comment:1 by Christian Boos, 16 years ago

Component: generalwiki
Keywords: email added
Milestone: 0.11
Owner: changed from Jonas Borgström to Christian Boos

Yes, adding hyphens to the corresponding regexp would be good.

  • trac/wiki/parser.py

     
    7373
    7474    _post_rules = [
    7575        # e-mails
    76         r"(?P<email>\w[\w.]+@\w[\w.]+\w)",
     76        r"(?P<email>\w[\w.-]+@\w[\w.-]+\w)",
    7777        # > ...
    7878        r"(?P<citation>^(?P<cdepth>>(?: *>)*))",
    7979        # &, < and > to &amp;, &lt; and &gt;

Anything else needed?

comment:2 by thomas.moschny@…, 16 years ago

Here are some more issues:

  • some addresses have a plus sign in the user part, and "_" and "%" signs seem to be valid there, too.
  • depending on the locale, "\w" ([[:alnum:]]) may match too many characters.

While I know that is almost impossible to come up with a valid regexp matching all valid email addresses, there are nevertheless some suggestions for catching most of all addresses in use today. The author of the www.regular-expressions.info site suggests this one:

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

Might be worth a try.

in reply to:  2 ; comment:3 by Emmanuel Blot, 16 years ago

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

I think the ER for email addresses should be defined only once, so it might be useful to use the same definition as the one already defined in the notification subsytem - and maybe adapt it.

in reply to:  3 comment:4 by anonymous, 16 years ago

Replying to eblot:

s/ER/RE/ ;-)

in reply to:  3 comment:5 by anonymous, 16 years ago

Replying to eblot:

I think the ER for email addresses should be defined only once, so it might be useful to use the same definition as the one already defined in the notification subsytem - and maybe adapt it.

The notification subsystem currently uses this one

[\w\d_\.\-\+=]+\@(?:(?:[\w\d\-])+\.)+(?:[\w\d]{2,4})

comment:6 by osimons, 16 years ago

In the util for one of my custom plugins, I have this one:

^([0-9a-zA-Z]+[-._+&])*[0-9a-zA-Z]+@([-0-9a-zA-Z]+[.])+[a-zA-Z]{2,6}$

I have no idea any longer where that came from or how correct it is, but remember doing some research on it and picking this as the most correct in one of the regexp sites.

It would be very useful if there only was only one trac.util method for this that we all could use.

comment:7 by hyuga <hyugaricdeau@…>, 16 years ago

While we're on the topic of e-mail addresses, I should bring up #5834, which complains that addresses with apostrophes in them are rejected. I didn't think apostrophes were even valid…

comment:8 by thomas.moschny@…, 16 years ago

Ping? What is the status of this bug?

comment:9 by Christian Boos, 16 years ago

#5834 mention the need to support the "'" (single quote) character.

comment:10 by harningt@…, 16 years ago

This appears to contain the most complete regular expression for RFC822 email address validation: http://rosskendall.com/blog/web/javascript-function-to-check-an-email-address-conforms-to-rfc822

However that is JavaScript (could be a useful addition for pre-validation)..

Also there's the fact that RFC2822 supercedes that version…

I suggest a hunt for an RFC2822 compliant regex for python, so that no strange emails are rejected (I hate sites that don't accept a + in email addresses.. its a great method of working w/ Gmail and tracking down what site leaked my email address)

comment:11 by Christian Boos, 16 years ago

Keywords: review added
Status: newassigned

Please try out the attachment:EMAIL_LOOKALIKE_PATTERN-r6658.diff.

Summary of changes:

  • put the email pattern in one place (trac.notification.EMAIL_LOOKALIKE_PATTERN)
  • added +, - and _
  • added ' (#5834)
  • % seems to be used for user/host separation, when the @ is needed for proxying (see #3212) - it's not supported

by Christian Boos, 16 years ago

Put all the regexps together, shake, extract the best mix.

comment:12 by Christian Boos, 16 years ago

Resolution: fixed
Status: assignedclosed

Nearly same patch committed as [6676] (I had forgotten the undescore character in the patch) and tests committed as [6677].

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.