Edgewall Software

Ticket #6609 (closed defect: fixed)

Opened 18 months ago

Last modified 16 months ago

e-mail addresses containing hyphens are not recognized properly

Reported by: thomas.moschny@… Owned by: cboos
Priority: normal Milestone: 0.11
Component: wiki system Version: 0.11b1
Severity: normal Keywords: email review
Cc:

Description

The wiki parser does seem to stop at the hyphen when parsing a mail address of the form user@do-main.invalid. If obfuscated, it is rendered as user@...-main.invalid. If not, the caption and the href target of generated mailto: link also contain the part before the hyphen only.

Attachments

EMAIL_LOOKALIKE_PATTERN-r6658.diff Download (2.0 KB) - added by cboos 16 months ago.
Put all the regexps together, shake, extract the best mix.

Change History

  Changed 18 months ago by cboos

  • keywords email added
  • owner changed from jonas to cboos
  • component changed from general to wiki
  • milestone set to 0.11

Yes, adding hyphens to the corresponding regexp would be good.

  • trac/wiki/parser.py

     
    7373 
    7474    _post_rules = [ 
    7575        # e-mails 
    76         r"(?P<email>\w[\w.]+@\w[\w.]+\w)", 
     76        r"(?P<email>\w[\w.-]+@\w[\w.-]+\w)", 
    7777        # > ... 
    7878        r"(?P<citation>^(?P<cdepth>>(?: *>)*))", 
    7979        # &, < and > to &amp;, &lt; and &gt; 

Anything else needed?

follow-up: ↓ 3   Changed 18 months ago by thomas.moschny@…

Here are some more issues:

  • some addresses have a plus sign in the user part, and "_" and "%" signs seem to be valid there, too.
  • depending on the locale, "\w" ([[:alnum:]]) may match too many characters.

While I know that is almost impossible to come up with a valid regexp matching all valid email addresses, there are nevertheless some suggestions for catching most of all addresses in use today. The author of the  www.regular-expressions.info site suggests this one:

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

Might be worth a try.

in reply to: ↑ 2 ; follow-ups: ↓ 4 ↓ 5   Changed 18 months ago by eblot

{{{ \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b }}}

I think the ER for email addresses should be defined only once, so it might be useful to use the same definition as the one already defined in the notification subsytem - and maybe adapt it.

in reply to: ↑ 3   Changed 18 months ago by anonymous

Replying to eblot:

s/ER/RE/ ;-)

in reply to: ↑ 3   Changed 18 months ago by anonymous

Replying to eblot:

I think the ER for email addresses should be defined only once, so it might be useful to use the same definition as the one already defined in the notification subsytem - and maybe adapt it.

The notification subsystem currently uses this one

[\w\d_\.\-\+=]+\@(?:(?:[\w\d\-])+\.)+(?:[\w\d]{2,4})

  Changed 18 months ago by osimons

In the util for one of my custom plugins, I have this one:

^([0-9a-zA-Z]+[-._+&])*[0-9a-zA-Z]+@([-0-9a-zA-Z]+[.])+[a-zA-Z]{2,6}$

I have no idea any longer where that came from or how correct it is, but remember doing some research on it and picking this as the most correct in one of the regexp sites.

It would be very useful if there only was only one trac.util method for this that we all could use.

  Changed 18 months ago by hyuga <hyugaricdeau@…>

While we're on the topic of e-mail addresses, I should bring up #5834, which complains that addresses with apostrophes in them are rejected. I didn't think apostrophes were even valid...

  Changed 18 months ago by thomas.moschny@…

Ping? What is the status of this bug?

  Changed 18 months ago by cboos

#5834 mention the need to support the "'" (single quote) character.

  Changed 17 months ago by harningt@…

This appears to contain the most complete regular expression for RFC822 email address validation:  http://rosskendall.com/blog/web/javascript-function-to-check-an-email-address-conforms-to-rfc822

However that is JavaScript (could be a useful addition for pre-validation)..

Also there's the fact that RFC2822 supercedes that version...

I suggest a hunt for an RFC2822 compliant regex for python, so that no strange emails are rejected (I hate sites that don't accept a + in email addresses.. its a great method of working w/ Gmail and tracking down what site leaked my email address)

  Changed 16 months ago by cboos

  • keywords review added
  • status changed from new to assigned

Please try out the attachment:EMAIL_LOOKALIKE_PATTERN-r6658.diff Download.

Summary of changes:

  • put the email pattern in one place (trac.notification.EMAIL_LOOKALIKE_PATTERN)
  • added +, - and _
  • added ' (#5834)
  • % seems to be used for user/host separation, when the @ is needed for proxying (see #3212) - it's not supported

Changed 16 months ago by cboos

Put all the regexps together, shake, extract the best mix.

  Changed 16 months ago by cboos

  • status changed from assigned to closed
  • resolution set to fixed

Nearly same patch committed as [6676] (I had forgotten the undescore character in the patch) and tests committed as [6677].

Add/Change #6609 (e-mail addresses containing hyphens are not recognized properly)

Author


E-mail address and user name can be saved in the Preferences.


Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
to The owner will change from cboos. Next status will be 'closed'
 
Note: See TracTickets for help on using tickets.