Edgewall Software
Modify

Opened 14 years ago

Closed 13 years ago

Last modified 13 years ago

#2580 closed defect (fixed)

RSS feed validation in 0.9.3

Reported by: osimons Owned by: Christian Boos
Priority: normal Milestone: 0.10
Component: timeline Version: 0.9.3
Severity: normal Keywords:
Cc: Branch:
Release Notes:
API Changes:

Description

Upgraded to 0.9.3 and have found 2 problems with gettings timeline feeds validated in both RSS Bandit and Sage - and both Internet Explorer and Firefox also confirm "XML Parsing Error: not well-formed".

  1. /trunk/trac/wiki/formatter.py in line 685-687 (end of OneLineFormatter format method): Three dots are converted to …. Some feed readers do not like this, especially if they use validating xml components, as there is no DTD for the Trac RSS XML, and the hellip entity is not recognized. A quick workaround(fix?) is to use the numeric entity reference instead, which is: …. Works for us. As far as I can see, this code has not changed from 0.9.2-0.9.3, but new Marker/escape/sanitise handling have probably introduced new ways of handling the output.
  1. That fixed all my projects except one. The problem here is located to be the an entry in a Subversion log message (a changeset timeline event) that is based on a log link to a start-stop revision list for a folder [log:..mysource/folder@34:45]. The HTML for this gets converted to href="..mysource/folder?rev=45&stop_rev=34 without escaping the HTML ampersand. The feed will not validate. Manually replacing & with & makes the feed validate without errors. The output from the log link seems to start with _format_link method at the end of trunk/trac/versioncontrol/web_ui/log.py, but with Href, Formatter and Markup etc. on it's path to output I have to leave it to someone who knows the source better.

PS! Wanted to, but cannot, set version to 0.9.3 for this ticket :-)

Attachments (3)

rss_html_escape.diff (15.9 KB ) - added by Christian Boos 14 years ago.
patch on r2763, take 3
rss_fixes.diff (14.2 KB ) - added by Christian Boos 14 years ago.
patch on top of r2766
rss_fixes.2.diff (14.3 KB ) - added by Christian Boos 14 years ago.
patch on top of r2766 (aka take 5)

Download all attachments as: .zip

Change History (23)

comment:1 by Christopher Lenz, 14 years ago

Milestone: 0.9.4
Version: 0.9.20.9.3

Thanks for the heads up, version added

comment:2 by anonymous, 14 years ago

RSS of Ticket and Log's <, &hellip;, > has same problem(2).

comment:3 by anonymous, 14 years ago

I created ticket(715) at demo site to show alert dialog box (Firefox with Sage). https://ssl.bulix.org/projects/demo/timeline?milestone=on&ticket=on&changeset=on&wiki=on&max=50&daysback=90&format=rss

comment:4 by Christian Boos, 14 years ago

Owner: changed from Jonas Borgström to Christian Boos
Status: newassigned

comment:5 by anonymous, 14 years ago

Test fix for trac 0.9.3 with [2755] applied. &hellip; problem also disappear.

--- timeline_rss.cs     (before)
+++ timeline_rss.cs     (after)
@@ -12,19 +12,19 @@
   <generator>Trac v<?cs var:trac.version ?></generator>
   <image>
    <title><?cs var:project.name_encoded ?></title>
-   <url><?cs if:!header_logo.src_abs ?><?cs var:base_host ?><?cs /if ?><?cs
-    var:header_logo.src ?></url>
-   <link><?cs var:base_host ?><?cs var:trac.href.timeline ?></link>
+   <url><?cs if:!header_logo.src_abs ?><?cs var:html_escape(base_host) ?><?cs /if ?><?cs
+    var:html_escape(header_logo.src) ?></url>
+   <link><?cs var:html_escape(base_host) ?><?cs var:html_escape(trac.href.timeline) ?></link>
   </image><?cs
   each:event = timeline.events ?>
    <item>
-    <title><?cs var:event.title ?></title><?cs
+    <title><?cs var:html_escape(event.title) ?></title><?cs
     if:event.author.email ?>
-     <author><?cs var:event.author.email ?></author><?cs
+     <author><?cs var:html_escape(event.author.email) ?></author><?cs
     /if ?>
-    <pubDate><?cs var:event.date ?></pubDate>
-    <link><?cs var:event.href ?></link>
-    <description><?cs var:event.message ?></description>
+    <pubDate><?cs var:html_escape(event.date) ?></pubDate>
+    <link><?cs var:html_escape(event.href) ?></link>
+    <description><?cs var:html_escape(html_escape(event.message)) ?></description>
    </item><?cs
   /each ?>
  </channel>

comment:6 by Christian Boos, 14 years ago

Actually, I'm working right now on a more drastic change. I found no evidence on the web that there should be support for rendering HTML found in <title> or <description> elements of RSS feeds… So we shouldn't put HTML there in the first place.

Also, we don't use Clearsilver's html_escape, but HTML escape everything except what we explicitely mark as being Markup.

comment:7 by Christian Boos, 14 years ago

I found some more background information on this topic.

Actually, the RSS 2.0 spec suggests that you put HTML in the <content> element by escaping it…

That's quite controversial:

And finally, there's a rich page about this problem in the Atom wiki (Atom:EscapedHtmlDiscussion).

comment:8 by Christian Boos, 14 years ago

Ok, here's my take on the issue: attachment:rss_html_escape.diff

All the RSS code has been reviewed.

  • The <title> should never contain HTML, escaped or not.
  • The <description> will contain HTML escaped text if a configuration setting enables it, otherwise the HTML tags will be stripped.

Initial tests shows that all the feeds are now valid (http://validator.w3.org/feed/check.cgi).

comment:9 by Christian Boos, 14 years ago

Actually, the only remaining concern are with character entities. It appears that some RSS readers (Thunderbird, Firefox, Wizz RSS Reader) don't even like the numerical entities…

So I revert to using &hellip; and remove all the entities.

by Christian Boos, 14 years ago

Attachment: rss_html_escape.diff added

patch on r2763, take 3

comment:10 by Christian Boos, 14 years ago

I've updated the patch once again. Looks good to me. Comments?

comment:11 by Christian Boos, 14 years ago

OK, here's the take 4 (attachment:rss_fixes.diff), integrating some feedback form cmlenz (got rid of the rss_escape_html configuration option)

by Christian Boos, 14 years ago

Attachment: rss_fixes.diff added

patch on top of r2766

by Christian Boos, 14 years ago

Attachment: rss_fixes.2.diff added

patch on top of r2766 (aka take 5)

comment:12 by Emmanuel Blot, 14 years ago

The latest patch allows Sage (Firefox extension) to accept the RSS feed from Trac - without the patch, Sage reports a XML parser error. This is a very good news! Thanks.

comment:13 by Christian Boos, 14 years ago

Applied slightly improved patch on trunk, in r2818.

In particular, I now let go through all numerical entities in <title> elemens, even if the W3C validator complains about that (*), as other feed readers are OK with them.

Please test once again before I'll port this to 0.9.4.


(*)

      line 12, column 21: title contains bad characters [help]

               <title>Revision &#133; &lt;81&gt;: Another attempt
                               ^

comment:14 by Christian Boos, 14 years ago

(actually, r2819 is needed)

comment:15 by Christian Boos, 14 years ago

Resolution: fixed
Status: assignedclosed

Both changesets ported to 0.9-stable in r2829.

comment:16 by anonymous, 13 years ago

Resolution: fixed
Status: closedreopened

I believe this is not fixed, in the case when the config option

[timeline]
changeset_show_files = 1

is on. In that case the rss feed's description element looks like this:

    <description><ul class="changes"><li><div
class="copy"></div>branches/dev_branch/sapphire/log.c</li><li>&hellip;</li></ul><p>
Merged <a class="changeset" href="http://www/trac/changeset/11781"
title="Logging facility  ...">[11781]</a> (logging support) to trunk
</p>
</description>

which contains the invalid &hellip; element as well as some unescaped html code. The expat xml parser complains that it's invalid xml in that case.

I worked around it for my site by disabling changeset_show_files.

comment:17 by Christian Boos, 13 years ago

Milestone: 0.9.40.10

This problem happens because in some cases we produce HTML using a Fragment or an Element object. Calling unicode on this produces a Markup object, which is then purposedly not escaped.

I'm not 100% sure how to fix this class of issues… Should Fragment.__str__ really produce a Markup object?

comment:18 by Christian Boos, 13 years ago

Status: reopenednew

After discussing with cmlenz, we concluded that we should produce directly a unicode object. So I tested the following change:

Index: trac/util/markup.py
===================================================================
--- trac/util/markup.py (revision 3592)
+++ trac/util/markup.py (working copy)
@@ -309,8 +309,11 @@
             else:
                 yield escape(child, quotes=False)

+    def __unicode__(self):
+        return u''.join(self.serialize())
+
     def __str__(self):
-        return Markup(''.join(self.serialize()))
+        return ''.join(self.serialize())

     def __add__(self, other):
         return Fragment()(self, other)

which fixed the present issue. This change didn't seem to introduce any adverse effects, and I tested a big bunch of wiki pages/ macros and all the different subsystems, so I think that after some positive user feedback, we could safely apply it.

comment:19 by Christian Boos, 13 years ago

Status: newassigned

comment:21 by Christian Boos, 13 years ago

Resolution: fixed
Status: assignedclosed

Fix applied in r3609.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to as closed The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.