Opened 17 years ago
Closed 17 years ago
#6609 closed defect (fixed)
e-mail addresses containing hyphens are not recognized properly
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | normal | Milestone: | 0.11 |
Component: | wiki system | Version: | 0.11b1 |
Severity: | normal | Keywords: | email review |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
The wiki parser does seem to stop at the hyphen when parsing a mail address of the form user@do-main.invalid
. If obfuscated, it is rendered as user@...-main.invalid
. If not, the caption and the href
target of generated mailto:
link also contain the part before the hyphen only.
Attachments (1)
Change History (13)
comment:1 by , 17 years ago
Component: | general → wiki |
---|---|
Keywords: | email added |
Milestone: | → 0.11 |
Owner: | changed from | to
follow-up: 3 comment:2 by , 17 years ago
Here are some more issues:
- some addresses have a plus sign in the user part, and "
_
" and "%
" signs seem to be valid there, too. - depending on the locale, "
\w
" ([[:alnum:]]
) may match too many characters.
While I know that is almost impossible to come up with a valid regexp matching all valid email addresses, there are nevertheless some suggestions for catching most of all addresses in use today. The author of the www.regular-expressions.info site suggests this one:
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b
Might be worth a try.
follow-ups: 4 5 comment:3 by , 17 years ago
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b
I think the ER for email addresses should be defined only once, so it might be useful to use the same definition as the one already defined in the notification subsytem - and maybe adapt it.
comment:5 by , 17 years ago
Replying to eblot:
I think the ER for email addresses should be defined only once, so it might be useful to use the same definition as the one already defined in the notification subsytem - and maybe adapt it.
The notification subsystem currently uses this one
[\w\d_\.\-\+=]+\@(?:(?:[\w\d\-])+\.)+(?:[\w\d]{2,4})
comment:6 by , 17 years ago
In the util for one of my custom plugins, I have this one:
^([0-9a-zA-Z]+[-._+&])*[0-9a-zA-Z]+@([-0-9a-zA-Z]+[.])+[a-zA-Z]{2,6}$
I have no idea any longer where that came from or how correct it is, but remember doing some research on it and picking this as the most correct in one of the regexp sites.
It would be very useful if there only was only one trac.util method for this that we all could use.
comment:7 by , 17 years ago
While we're on the topic of e-mail addresses, I should bring up #5834, which complains that addresses with apostrophes in them are rejected. I didn't think apostrophes were even valid…
comment:10 by , 17 years ago
This appears to contain the most complete regular expression for RFC822 email address validation: http://rosskendall.com/blog/web/javascript-function-to-check-an-email-address-conforms-to-rfc822
However that is JavaScript (could be a useful addition for pre-validation)..
Also there's the fact that RFC2822 supercedes that version…
I suggest a hunt for an RFC2822 compliant regex for python, so that no strange emails are rejected (I hate sites that don't accept a + in email addresses.. its a great method of working w/ Gmail and tracking down what site leaked my email address)
comment:11 by , 17 years ago
Keywords: | review added |
---|---|
Status: | new → assigned |
Please try out the attachment:EMAIL_LOOKALIKE_PATTERN-r6658.diff.
Summary of changes:
by , 17 years ago
Attachment: | EMAIL_LOOKALIKE_PATTERN-r6658.diff added |
---|
Put all the regexps together, shake, extract the best mix.
comment:12 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Yes, adding hyphens to the corresponding regexp would be good.
trac/wiki/parser.py
]+@\w[\w.]+\w)",Anything else needed?