Edgewall Software
Modify

Opened 17 years ago

Closed 17 years ago

Last modified 15 years ago

#4246 closed enhancement (fixed)

Adopt Pygments for syntax highlighting

Reported by: Christopher Lenz Owned by: Christopher Lenz
Priority: normal Milestone: 0.11
Component: general Version: 0.10.2
Severity: major Keywords: mimeview highlighting
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description

Pygments provides very nice—and pure Python—syntax highlighting. We should strongly consider adopting it as the default highlighter for future Trac releases, probably starting with 0.11, seeing how the current Enscript/Silvercity/PHP methods all suck.

Matt has already cooked up a Trac plugin for Pygments here.

I'm attaching some patches for both the Trac Mimeview component and the TracPygments plugin, accomplishing the following:

  • the Pygments highlighter generates a Genshi event stream instead of a big string,
  • the Mimeview component handles Genshi streams correctly, and supports putting individual lines in table rows,
  • the Pygments support comes with a preference panel that allows the user to choose the highlighting theme, including live preview.

Some of this requires recent Genshi trunk.

The patches are more of a preview of what I'd like to see integrated for 0.11, not really a patch ready to be integrated. For example, the TracPygments patch is intended to be applied to the plugin. Also, a lot of details need to be refined.

Attachments (7)

trac-pygments.diff (12.6 KB ) - added by Christopher Lenz 17 years ago.
Patch for Trac
pygments-plugin.diff (12.6 KB ) - added by Christopher Lenz 17 years ago.
Patch for TracPygments plugin
trac-pygments.2.diff (16.5 KB ) - added by Christopher Lenz 17 years ago.
Updated patch against Trac
pygments-plugin.2.diff (13.0 KB ) - added by Christopher Lenz 17 years ago.
Updated patch for Pygments plugin
trac-pygments.3.diff (29.4 KB ) - added by Christopher Lenz 17 years ago.
New patch: moves Pygments support into Trac proper (replacing the plugin)
generate1.py (1.3 KB ) - added by Christian Boos 17 years ago.
Clarify a bit the _generate() helper function within _group_lines()
generate2.py (1.3 KB ) - added by Christian Boos 17 years ago.
Alternate API for the _generate() helper function within _group_lines()

Download all attachments as: .zip

Change History (19)

by Christopher Lenz, 17 years ago

Attachment: trac-pygments.diff added

Patch for Trac

by Christopher Lenz, 17 years ago

Attachment: pygments-plugin.diff added

Patch for TracPygments plugin

comment:1 by Christian Boos, 17 years ago

Keywords: mimeview highlighting added
Severity: normalmajor

probably starting with 0.11, seeing how the current Enscript/Silvercity/PHP methods all suck.

Oh yes, big +1 from me.

in reply to:  1 ; comment:2 by Emmanuel Blot, 17 years ago

Replying to cboos:

Oh yes, big +1 from me.

+1 as well, this would be great. Does this mean we could drop support for other highlighting engines in the future?

in reply to:  2 comment:3 by Christopher Lenz, 17 years ago

Replying to eblot:

Does this mean we could drop support for other highlighting engines in the future?

We should keep them in for 0.11, but after that I guess they could be moved to trac-hacks.

by Christopher Lenz, 17 years ago

Attachment: trac-pygments.2.diff added

Updated patch against Trac

by Christopher Lenz, 17 years ago

Attachment: pygments-plugin.2.diff added

Updated patch for Pygments plugin

comment:4 by Christopher Lenz, 17 years ago

I've cleaned up and reworked both patches. The Trac patch is a candidate for merging into trunk, so I'd appreciate some testing. The Pygments part would go on trac-hacks first, and be merged into Trac trunk later on.

The one thing I don't like all that much is the added returns_source class variable on IHTMLPreviewRenderer. It is intended to detect the difference between renderers such as ImageRenderer and PatchRenderer, and the various syntax highlighters. The output by the former should not be messed with, while the syntax highlighter output should be put in a <pre> element or a table with annotations. Previously, this difference in behavior was dependent on whether the renderer returns a list or a string. Now I want syntax highlighters to be able to returns Genshi streams, so that method doesn't work any more.

comment:5 by Christian Boos, 17 years ago

I've tried the latest patches. The Pygments highlighter and its Trac plugin look great.

The patch for Trac also look fine, it contains a few changes I wanted to do myself (the use of the builder). The PHP highlighter doesn't work anymore though, but that's probably a trivial thing.

The only thing I'm a bit concerned with is the direct use of Genshi streams… the code is a bit hard to read and it seems like we're using the internals of Genshi. I would have preferred the usage of the builder there too, if possible. Is there really a big difference in term of speed between the two approaches?

In any case, I'm very much in favor that you commit those changes.

in reply to:  5 comment:6 by Christopher Lenz, 17 years ago

Replying to cboos:

The only thing I'm a bit concerned with is the direct use of Genshi streams… the code is a bit hard to read and it seems like we're using the internals of Genshi. I would have preferred the usage of the builder there too, if possible. Is there really a big difference in term of speed between the two approaches?

  • Which code exactly is hard to read? Assuming you mean the GenshiHtmlFormatter in the Pygments plugin, I don't think that's particularly hard to follow.
  • Streams aren't really internals, they are an important part of the public API, although admittedly low-level.
  • The difference between the builder and direct stream generation isn't so much about speed but about memory usage (and garbage collection). Content highlighted by Pygments will normally have a ton of little <span class="foo"> elements. You'd be creating/deleting Element objects for every one of them, holding them all in a tree until you call generate() and let the Fragment be garbage collected. And later the containing fragment has to do a number of runtime checks (isinstance etc) to determine what to do with all the child nodes. That overhead can be completely avoided by just generating the stream directly.

I general I'd advise against using builder to generate anything larger than small snippets of markup. In those cases, using a template or generating a stream directly is going to provide much better performance and memory usage.

comment:7 by Christian Boos, 17 years ago

No, I was talking about _group_lines(stream) in trac/mimeview/api.py.

Ok, thanks for the clarifications, if think I should start to getting used to it…

Still, I wonder if the Genshi Stream API couldn't be "humanized" a bit, like by using small objects (with __slots__) instead of tuples.

by Christopher Lenz, 17 years ago

Attachment: trac-pygments.3.diff added

New patch: moves Pygments support into Trac proper (replacing the plugin)

comment:8 by Christopher Lenz, 17 years ago

Above patch adds the PygmentsRenderer to the Trac code base, so that the plugin isn't necessary anymore. Unfortunately, we cannot name the module pygments because that would conflict with the name of the Pygments package itself. For the lack of a better idea, I chose pygment, but maybe something like trac.mimeview.pygments_renderer would be more appropriate — not sure.

in reply to:  7 ; comment:9 by Christopher Lenz, 17 years ago

Status: newassigned

Replying to cboos:

No, I was talking about _group_lines(stream) in trac/mimeview/api.py.

Ah right, that one's a bitch, and would definitely deserve some comments. It's the replacement for the old _html_splitlines(), but now works on Genshi streams instead of on flat strings.

Still, I wonder if the Genshi Stream API couldn't be "humanized" a bit, like by using small objects (with __slots__) instead of tuples.

That does unfortunately still have quite significant overhead compared to raw tuples.

One option for a more convenient API (on top of the current tuple stuff) would be to provide event creation functions, for example:

  yield Stream.start('div', class_='foobar')
  yield Stream.end('div')

But I don't think anything in that direction would actually help improving the _group_lines() function.

in reply to:  8 comment:10 by Christian Boos, 17 years ago

I've tested the attachment:trac-pygments.3.diff, it works fine.

I've sorted out the PHP issue, it's simply that it goes through the _group_lines(stream). The following would fix it:

Index: php.py
===================================================================
--- php.py	(revision 4376)
+++ php.py	(working copy)
@@ -102,7 +104,6 @@
                 break
 
         html = PhpDeuglifier().format(odata.decode('utf-8'))
-        for line in html.split('<br />'):
-            # PHP generates _way_ too many non-breaking spaces...
-            # We don't need them anyway, so replace them by normal spaces
-            yield line.replace('&nbsp;', ' ')
+        return [line.replace('&nbsp;', ' ') for line in html.split('<br />')]
+        # PHP generates _way_ too many non-breaking spaces...
+        # We don't need them anyway, so replace them by normal spaces

by Christian Boos, 17 years ago

Attachment: generate1.py added

Clarify a bit the _generate() helper function within _group_lines()

by Christian Boos, 17 years ago

Attachment: generate2.py added

Alternate API for the _generate() helper function within _group_lines()

in reply to:  9 comment:11 by Christian Boos, 17 years ago

Replying to cmlenz:

Replying to cboos:

Still, I wonder if the Genshi Stream API couldn't be "humanized" a bit,…

… One option for a more convenient API (on top of the current tuple stuff) would be to provide event creation functions, for example: …

Yes, that was something like that I had in mind. More specifically, those helper function could do the right thing to produce a new event from an old one, overriding the adequate parts. See attachment:generate2.py for an illustration.

Even without that, I think it's possible to improve a little bit the readability of this function, see attachment:generate1.py.

But I don't think anything in that direction would actually help improving the _group_lines() function.

What was the most problematic part I think was that event[1][0] expression, which gives you the feeling that you have to really know the internals to understand what's going on ;)

comment:12 by Christopher Lenz, 17 years ago

Resolution: fixed
Status: assignedclosed

Checked in in [4399].

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christopher Lenz.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Christopher Lenz to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.