Edgewall Software
Modify

Opened 7 years ago

Closed 6 years ago

#6609 closed defect (fixed)

e-mail addresses containing hyphens are not recognized properly

Reported by: thomas.moschny@… Owned by: cboos
Priority: normal Milestone: 0.11
Component: wiki system Version: 0.11b1
Severity: normal Keywords: email review
Cc:
Release Notes:
API Changes:

Description

The wiki parser does seem to stop at the hyphen when parsing a mail address of the form user@do-main.invalid. If obfuscated, it is rendered as user@...-main.invalid. If not, the caption and the href target of generated mailto: link also contain the part before the hyphen only.

Attachments (1)

EMAIL_LOOKALIKE_PATTERN-r6658.diff (2.0 KB) - added by cboos 6 years ago.
Put all the regexps together, shake, extract the best mix.

Download all attachments as: .zip

Change History (13)

comment:1 Changed 7 years ago by cboos

  • Component changed from general to wiki
  • Keywords email added
  • Milestone set to 0.11
  • Owner changed from jonas to cboos

Yes, adding hyphens to the corresponding regexp would be good.

  • trac/wiki/parser.py

     
    7373 
    7474    _post_rules = [ 
    7575        # e-mails 
    76         r"(?P<email>\w[\w.]+@\w[\w.]+\w)", 
     76        r"(?P<email>\w[\w.-]+@\w[\w.-]+\w)", 
    7777        # > ... 
    7878        r"(?P<citation>^(?P<cdepth>>(?: *>)*))", 
    7979        # &, < and > to &amp;, &lt; and &gt; 

Anything else needed?

comment:2 follow-up: Changed 7 years ago by thomas.moschny@…

Here are some more issues:

  • some addresses have a plus sign in the user part, and "_" and "%" signs seem to be valid there, too.
  • depending on the locale, "\w" ([[:alnum:]]) may match too many characters.

While I know that is almost impossible to come up with a valid regexp matching all valid email addresses, there are nevertheless some suggestions for catching most of all addresses in use today. The author of the www.regular-expressions.info site suggests this one:

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

Might be worth a try.

comment:3 in reply to: ↑ 2 ; follow-ups: Changed 7 years ago by eblot

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

I think the ER for email addresses should be defined only once, so it might be useful to use the same definition as the one already defined in the notification subsytem - and maybe adapt it.

comment:4 in reply to: ↑ 3 Changed 7 years ago by anonymous

Replying to eblot:

s/ER/RE/ ;-)

comment:5 in reply to: ↑ 3 Changed 7 years ago by anonymous

Replying to eblot:

I think the ER for email addresses should be defined only once, so it might be useful to use the same definition as the one already defined in the notification subsytem - and maybe adapt it.

The notification subsystem currently uses this one

[\w\d_\.\-\+=]+\@(?:(?:[\w\d\-])+\.)+(?:[\w\d]{2,4})

comment:6 Changed 7 years ago by osimons

In the util for one of my custom plugins, I have this one:

^([0-9a-zA-Z]+[-._+&])*[0-9a-zA-Z]+@([-0-9a-zA-Z]+[.])+[a-zA-Z]{2,6}$

I have no idea any longer where that came from or how correct it is, but remember doing some research on it and picking this as the most correct in one of the regexp sites.

It would be very useful if there only was only one trac.util method for this that we all could use.

comment:7 Changed 7 years ago by hyuga <hyugaricdeau@…>

While we're on the topic of e-mail addresses, I should bring up #5834, which complains that addresses with apostrophes in them are rejected. I didn't think apostrophes were even valid…

comment:8 Changed 7 years ago by thomas.moschny@…

Ping? What is the status of this bug?

comment:9 Changed 7 years ago by cboos

#5834 mention the need to support the "'" (single quote) character.

comment:10 Changed 7 years ago by harningt@…

This appears to contain the most complete regular expression for RFC822 email address validation: http://rosskendall.com/blog/web/javascript-function-to-check-an-email-address-conforms-to-rfc822

However that is JavaScript (could be a useful addition for pre-validation)..

Also there's the fact that RFC2822 supercedes that version…

I suggest a hunt for an RFC2822 compliant regex for python, so that no strange emails are rejected (I hate sites that don't accept a + in email addresses.. its a great method of working w/ Gmail and tracking down what site leaked my email address)

comment:11 Changed 6 years ago by cboos

  • Keywords review added
  • Status changed from new to assigned

Please try out the attachment:EMAIL_LOOKALIKE_PATTERN-r6658.diff.

Summary of changes:

  • put the email pattern in one place (trac.notification.EMAIL_LOOKALIKE_PATTERN)
  • added +, - and _
  • added ' (#5834)
  • % seems to be used for user/host separation, when the @ is needed for proxying (see #3212) - it's not supported

Changed 6 years ago by cboos

Put all the regexps together, shake, extract the best mix.

comment:12 Changed 6 years ago by cboos

  • Resolution set to fixed
  • Status changed from assigned to closed

Nearly same patch committed as [6676] (I had forgotten the undescore character in the patch) and tests committed as [6677].

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed The owner will remain cboos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from cboos to the specified user.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.