#2580 closed defect (fixed)
RSS feed validation in 0.9.3
Reported by: | osimons | Owned by: | Christian Boos |
---|---|---|---|
Priority: | normal | Milestone: | 0.10 |
Component: | timeline | Version: | 0.9.3 |
Severity: | normal | Keywords: | |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
Upgraded to 0.9.3 and have found 2 problems with gettings timeline feeds validated in both RSS Bandit and Sage - and both Internet Explorer and Firefox also confirm "XML Parsing Error: not well-formed".
- /trunk/trac/wiki/formatter.py in line 685-687 (end of OneLineFormatter format method): Three dots are converted to
…
. Some feed readers do not like this, especially if they use validating xml components, as there is no DTD for the Trac RSS XML, and the hellip entity is not recognized. A quick workaround(fix?) is to use the numeric entity reference instead, which is: …. Works for us. As far as I can see, this code has not changed from 0.9.2-0.9.3, but new Marker/escape/sanitise handling have probably introduced new ways of handling the output.
- That fixed all my projects except one. The problem here is located to be the an entry in a Subversion log message (a changeset timeline event) that is based on a log link to a start-stop revision list for a folder
[log:..mysource/folder@34:45]
. The HTML for this gets converted tohref="..mysource/folder?rev=45&stop_rev=34
without escaping the HTML ampersand. The feed will not validate. Manually replacing&
with&
makes the feed validate without errors. The output from the log link seems to start with _format_link method at the end of trunk/trac/versioncontrol/web_ui/log.py, but with Href, Formatter and Markup etc. on it's path to output I have to leave it to someone who knows the source better.
PS! Wanted to, but cannot, set version to 0.9.3 for this ticket :-)
Attachments (3)
Change History (23)
comment:1 by , 19 years ago
Milestone: | → 0.9.4 |
---|---|
Version: | 0.9.2 → 0.9.3 |
comment:3 by , 19 years ago
I created ticket(715) at demo site to show alert dialog box (Firefox with Sage). https://ssl.bulix.org/projects/demo/timeline?milestone=on&ticket=on&changeset=on&wiki=on&max=50&daysback=90&format=rss
comment:4 by , 19 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:5 by , 19 years ago
Test fix for trac 0.9.3 with [2755] applied. … problem also disappear.
--- timeline_rss.cs (before) +++ timeline_rss.cs (after) @@ -12,19 +12,19 @@ <generator>Trac v<?cs var:trac.version ?></generator> <image> <title><?cs var:project.name_encoded ?></title> - <url><?cs if:!header_logo.src_abs ?><?cs var:base_host ?><?cs /if ?><?cs - var:header_logo.src ?></url> - <link><?cs var:base_host ?><?cs var:trac.href.timeline ?></link> + <url><?cs if:!header_logo.src_abs ?><?cs var:html_escape(base_host) ?><?cs /if ?><?cs + var:html_escape(header_logo.src) ?></url> + <link><?cs var:html_escape(base_host) ?><?cs var:html_escape(trac.href.timeline) ?></link> </image><?cs each:event = timeline.events ?> <item> - <title><?cs var:event.title ?></title><?cs + <title><?cs var:html_escape(event.title) ?></title><?cs if:event.author.email ?> - <author><?cs var:event.author.email ?></author><?cs + <author><?cs var:html_escape(event.author.email) ?></author><?cs /if ?> - <pubDate><?cs var:event.date ?></pubDate> - <link><?cs var:event.href ?></link> - <description><?cs var:event.message ?></description> + <pubDate><?cs var:html_escape(event.date) ?></pubDate> + <link><?cs var:html_escape(event.href) ?></link> + <description><?cs var:html_escape(html_escape(event.message)) ?></description> </item><?cs /each ?> </channel>
comment:6 by , 19 years ago
Actually, I'm working right now on a more drastic change. I found no evidence on the web that there should be support for rendering HTML found in <title> or <description> elements of RSS feeds… So we shouldn't put HTML there in the first place.
Also, we don't use Clearsilver's html_escape
, but HTML escape
everything except what we explicitely mark as being
Markup.
comment:7 by , 19 years ago
I found some more background information on this topic.
Actually, the RSS 2.0 spec suggests that you put HTML in the <content> element by escaping it…
That's quite controversial:
- Norman Walsh's Escaped Markup Considered Harmful
- Sam Ruby's comment on the above
- Tim Bray's view on the subject
And finally, there's a rich page about this problem in the Atom wiki (Atom:EscapedHtmlDiscussion).
comment:8 by , 19 years ago
Ok, here's my take on the issue: attachment:rss_html_escape.diff
All the RSS code has been reviewed.
- The <title> should never contain HTML, escaped or not.
- The <description> will contain HTML escaped text if a configuration setting enables it, otherwise the HTML tags will be stripped.
Initial tests shows that all the feeds are now valid (http://validator.w3.org/feed/check.cgi).
comment:9 by , 19 years ago
Actually, the only remaining concern are with character entities. It appears that some RSS readers (Thunderbird, Firefox, Wizz RSS Reader) don't even like the numerical entities…
So I revert to using …
and remove all the entities.
comment:11 by , 19 years ago
OK, here's the take 4 (attachment:rss_fixes.diff),
integrating some feedback form cmlenz
(got rid of the rss_escape_html
configuration option)
comment:12 by , 19 years ago
The latest patch allows Sage (Firefox extension) to accept the RSS feed from Trac - without the patch, Sage reports a XML parser error. This is a very good news! Thanks.
comment:13 by , 19 years ago
Applied slightly improved patch on trunk, in r2818.
In particular, I now let go through all numerical entities in <title> elemens, even if the W3C validator complains about that (*), as other feed readers are OK with them.
Please test once again before I'll port this to 0.9.4.
(*)
line 12, column 21: title contains bad characters [help] <title>Revision … <81>: Another attempt ^
comment:15 by , 19 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Both changesets ported to 0.9-stable in r2829.
comment:16 by , 18 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
I believe this is not fixed, in the case when the config option
[timeline] changeset_show_files = 1
is on. In that case the rss feed's description element looks like this:
<description><ul class="changes"><li><div class="copy"></div>branches/dev_branch/sapphire/log.c</li><li>…</li></ul><p> Merged <a class="changeset" href="http://www/trac/changeset/11781" title="Logging facility ...">[11781]</a> (logging support) to trunk </p> </description>
which contains the invalid … element as well as some unescaped html code. The expat xml parser complains that it's invalid xml in that case.
I worked around it for my site by disabling changeset_show_files.
comment:17 by , 18 years ago
Milestone: | 0.9.4 → 0.10 |
---|
This problem happens because in some cases we produce HTML using a Fragment or an Element object.
Calling unicode
on this produces a Markup object, which is then purposedly not escaped.
I'm not 100% sure how to fix this class of issues… Should Fragment.__str__
really produce a Markup object?
comment:18 by , 18 years ago
Status: | reopened → new |
---|
After discussing with cmlenz, we concluded that we should produce directly a unicode
object. So I tested the following change:
Index: trac/util/markup.py =================================================================== --- trac/util/markup.py (revision 3592) +++ trac/util/markup.py (working copy) @@ -309,8 +309,11 @@ else: yield escape(child, quotes=False) + def __unicode__(self): + return u''.join(self.serialize()) + def __str__(self): - return Markup(''.join(self.serialize())) + return ''.join(self.serialize()) def __add__(self, other): return Fragment()(self, other)
which fixed the present issue. This change didn't seem to introduce any adverse effects, and I tested a big bunch of wiki pages/ macros and all the different subsystems, so I think that after some positive user feedback, we could safely apply it.
comment:19 by , 18 years ago
Status: | new → assigned |
---|
Thanks for the heads up, version added