Edgewall Software

Ticket #4246 (closed enhancement: fixed)

Opened 2 years ago

Last modified 2 years ago

Adopt Pygments for syntax highlighting

Reported by: cmlenz Owned by: cmlenz
Priority: normal Milestone: 0.11
Component: general Version: 0.10.2
Severity: major Keywords: mimeview highlighting
Cc:

Description

Pygments provides very nice—and pure Python—syntax highlighting. We should strongly consider adopting it as the default highlighter for future Trac releases, probably starting with 0.11, seeing how the current Enscript/Silvercity/PHP methods all suck.

Matt has already cooked up a Trac plugin for Pygments here.

I'm attaching some patches for both the Trac Mimeview component and the TracPygments plugin, accomplishing the following:

  • the Pygments highlighter generates a Genshi event stream instead of a big string,
  • the Mimeview component handles Genshi streams correctly, and supports putting individual lines in table rows,
  • the Pygments support comes with a preference panel that allows the user to choose the highlighting theme, including live preview.

Some of this requires recent Genshi trunk.

The patches are more of a preview of what I'd like to see integrated for 0.11, not really a patch ready to be integrated. For example, the TracPygments patch is intended to be applied to the plugin. Also, a lot of details need to be refined.

Attachments

trac-pygments.diff (12.6 KB) - added by cmlenz 2 years ago.
Patch for Trac
pygments-plugin.diff (12.6 KB) - added by cmlenz 2 years ago.
Patch for TracPygments plugin
trac-pygments.2.diff (16.5 KB) - added by cmlenz 2 years ago.
Updated patch against Trac
pygments-plugin.2.diff (13.0 KB) - added by cmlenz 2 years ago.
Updated patch for Pygments plugin
trac-pygments.3.diff (29.4 KB) - added by cmlenz 2 years ago.
New patch: moves Pygments support into Trac proper (replacing the plugin)
generate1.py (1.3 KB) - added by cboos 2 years ago.
Clarify a bit the _generate() helper function within _group_lines()
generate2.py (1.3 KB) - added by cboos 2 years ago.
Alternate API for the _generate() helper function within _group_lines()

Change History

Changed 2 years ago by cmlenz

Patch for Trac

Changed 2 years ago by cmlenz

Patch for TracPygments plugin

follow-up: ↓ 2   Changed 2 years ago by cboos

  • keywords mimeview highlighting added
  • severity changed from normal to major

probably starting with 0.11, seeing how the current Enscript/Silvercity/PHP methods all suck.

Oh yes, big +1 from me.

in reply to: ↑ 1 ; follow-up: ↓ 3   Changed 2 years ago by eblot

Replying to cboos:

Oh yes, big +1 from me.

+1 as well, this would be great. Does this mean we could drop support for other highlighting engines in the future?

in reply to: ↑ 2   Changed 2 years ago by cmlenz

Replying to eblot:

Does this mean we could drop support for other highlighting engines in the future?

We should keep them in for 0.11, but after that I guess they could be moved to trac-hacks.

Changed 2 years ago by cmlenz

Updated patch against Trac

Changed 2 years ago by cmlenz

Updated patch for Pygments plugin

  Changed 2 years ago by cmlenz

I've cleaned up and reworked both patches. The Trac patch is a candidate for merging into trunk, so I'd appreciate some testing. The Pygments part would go on trac-hacks first, and be merged into Trac trunk later on.

The one thing I don't like all that much is the added returns_source class variable on IHTMLPreviewRenderer. It is intended to detect the difference between renderers such as ImageRenderer and PatchRenderer, and the various syntax highlighters. The output by the former should not be messed with, while the syntax highlighter output should be put in a <pre> element or a table with annotations. Previously, this difference in behavior was dependent on whether the renderer returns a list or a string. Now I want syntax highlighters to be able to returns Genshi streams, so that method doesn't work any more.

follow-up: ↓ 6   Changed 2 years ago by cboos

I've tried the latest patches. The Pygments highlighter and its Trac plugin look great.

The patch for Trac also look fine, it contains a few changes I wanted to do myself (the use of the builder). The PHP highlighter doesn't work anymore though, but that's probably a trivial thing.

The only thing I'm a bit concerned with is the direct use of Genshi streams... the code is a bit hard to read and it seems like we're using the internals of Genshi. I would have preferred the usage of the builder there too, if possible. Is there really a big difference in term of speed between the two approaches?

In any case, I'm very much in favor that you commit those changes.

in reply to: ↑ 5   Changed 2 years ago by cmlenz

Replying to cboos:

The only thing I'm a bit concerned with is the direct use of Genshi streams... the code is a bit hard to read and it seems like we're using the internals of Genshi. I would have preferred the usage of the builder there too, if possible. Is there really a big difference in term of speed between the two approaches?

  • Which code exactly is hard to read? Assuming you mean the GenshiHtmlFormatter in the Pygments plugin, I don't think that's particularly hard to follow.
  • Streams aren't really internals, they are an important part of the public API, although admittedly low-level.
  • The difference between the builder and direct stream generation isn't so much about speed but about memory usage (and garbage collection). Content highlighted by Pygments will normally have a ton of little <span class="foo"> elements. You'd be creating/deleting Element objects for every one of them, holding them all in a tree until you call generate() and let the Fragment be garbage collected. And later the containing fragment has to do a number of runtime checks (isinstance etc) to determine what to do with all the child nodes. That overhead can be completely avoided by just generating the stream directly.

I general I'd advise against using builder to generate anything larger than small snippets of markup. In those cases, using a template or generating a stream directly is going to provide much better performance and memory usage.

follow-up: ↓ 9   Changed 2 years ago by cboos

No, I was talking about _group_lines(stream) in trac/mimeview/api.py.

Ok, thanks for the clarifications, if think I should start to getting used to it...

Still, I wonder if the Genshi Stream API couldn't be "humanized" a bit, like by using small objects (with __slots__) instead of tuples.

Changed 2 years ago by cmlenz

New patch: moves Pygments support into Trac proper (replacing the plugin)

follow-up: ↓ 10   Changed 2 years ago by cmlenz

Above patch adds the PygmentsRenderer to the Trac code base, so that the plugin isn't necessary anymore. Unfortunately, we cannot name the module pygments because that would conflict with the name of the Pygments package itself. For the lack of a better idea, I chose pygment, but maybe something like trac.mimeview.pygments_renderer would be more appropriate -- not sure.

in reply to: ↑ 7 ; follow-up: ↓ 11   Changed 2 years ago by cmlenz

  • status changed from new to assigned

Replying to cboos:

No, I was talking about _group_lines(stream) in trac/mimeview/api.py.

Ah right, that one's a bitch, and would definitely deserve some comments. It's the replacement for the old _html_splitlines(), but now works on Genshi streams instead of on flat strings.

Still, I wonder if the Genshi Stream API couldn't be "humanized" a bit, like by using small objects (with __slots__) instead of tuples.

That does unfortunately still have quite significant overhead compared to raw tuples.

One option for a more convenient API (on top of the current tuple stuff) would be to provide event creation functions, for example:

  yield Stream.start('div', class_='foobar')
  yield Stream.end('div')

But I don't think anything in that direction would actually help improving the _group_lines() function.

in reply to: ↑ 8   Changed 2 years ago by cboos

I've tested the attachment:trac-pygments.3.diff, it works fine.

I've sorted out the PHP issue, it's simply that it goes through the _group_lines(stream). The following would fix it:

Index: php.py
===================================================================
--- php.py	(revision 4376)
+++ php.py	(working copy)
@@ -102,7 +104,6 @@
                 break
 
         html = PhpDeuglifier().format(odata.decode('utf-8'))
-        for line in html.split('<br />'):
-            # PHP generates _way_ too many non-breaking spaces...
-            # We don't need them anyway, so replace them by normal spaces
-            yield line.replace('&nbsp;', ' ')
+        return [line.replace('&nbsp;', ' ') for line in html.split('<br />')]
+        # PHP generates _way_ too many non-breaking spaces...
+        # We don't need them anyway, so replace them by normal spaces

Changed 2 years ago by cboos

Clarify a bit the _generate() helper function within _group_lines()

Changed 2 years ago by cboos

Alternate API for the _generate() helper function within _group_lines()

in reply to: ↑ 9   Changed 2 years ago by cboos

Replying to cmlenz:

Replying to cboos:

Still, I wonder if the Genshi Stream API couldn't be "humanized" a bit,...

... One option for a more convenient API (on top of the current tuple stuff) would be to provide event creation functions, for example: ...

Yes, that was something like that I had in mind. More specifically, those helper function could do the right thing to produce a new event from an old one, overriding the adequate parts. See attachment:generate2.py for an illustration.

Even without that, I think it's possible to improve a little bit the readability of this function, see attachment:generate1.py.

But I don't think anything in that direction would actually help improving the _group_lines() function.

What was the most problematic part I think was that event[1][0] expression, which gives you the feeling that you have to really know the internals to understand what's going on ;)

  Changed 2 years ago by cmlenz

  • status changed from assigned to closed
  • resolution set to fixed

Checked in in [4399].

Add/Change #4246 (Adopt Pygments for syntax highlighting)

Author



Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
to The owner will change from cmlenz. Next status will be 'closed'
 
Note: See TracTickets for help on using tickets.