Edgewall Software
Modify

Opened 8 years ago

Closed 8 years ago

Last modified 22 months ago

#2296 closed enhancement (wontfix)

Export wiki pages to latex

Reported by: emilk@… Owned by: athomas
Priority: normal Milestone:
Component: wiki system Version: devel
Severity: normal Keywords: mimetype converter
Cc: emilk@…, tapted@…
Release Notes:
API Changes:

Description

It would be very usefull to export wiki pages to Latex. A wiki is a great tool for brainstorming and planning. Latex is better for definitiv printed texts. It would be cool if I could use wiki pages as a starting point for Latex articles.

Attachments (14)

wikilatex.py (16.5 KB) - added by Trent Apted <tapted@…> 8 years ago.
site-packages/trac/wiki/wikilatex.py patch for LaTeX export (first cut)
test2.dvi (3.5 KB) - added by Trent Apted <tapted@…> 8 years ago.
Ignore the 0.8 stuff — this is just an old start page that was sitting in an upgraded Trac on my system. … need to fix quotes…
wikilatex.2.py (19.6 KB) - added by Trent Apted <tapted@…> 8 years ago.
/trunk/trac/wiki/wikilatex.py LaTeX formatter (wiki export)
test.tex (49.1 KB) - added by Trent Apted <tapted@…> 8 years ago.
Result of /trunk/trac/wiki/tests/wiki-tests.txt
WikiStart.tex (2.8 KB) - added by Trent Apted <tapted@…> 8 years ago.
Result of /trunk/wiki-default/WikiStart
WikiStart.dvi (3.7 KB) - added by Trent Apted <tapted@…> 8 years ago.
Result of /trunk/wiki-default/WikiStart (DVI)
test.dvi (34.9 KB) - added by Trent Apted <tapted@…> 8 years ago.
Result of /trunk/trac/wiki/tests/wiki-tests.txt (DVI)
wiki_latex_export.diff (1.4 KB) - added by Trent Apted <tapted@…> 8 years ago.
svn diff against trunk rev.3213@2006-04-19 09:59:20 UTC
mime-system.diff (13.6 KB) - added by athomas 8 years ago.
Migrated ticket, wiki and query interfaces
wikilatex.3.py (22.1 KB) - added by Trent Apted <tapted@…> 8 years ago.
Fixed some bugs. Tickets, reports, oneliners and revision logs and images are still broken (investigating)
WikiFormatting.tex (11.4 KB) - added by Trent Apted <tapted@…> 8 years ago.
Result of trunk/wiki-default/WikiFormatting (LaTeX)
WikiFormatting.pdf (111.3 KB) - added by Trent Apted <tapted@…> 8 years ago.
Typeset Result of trunk/wiki-default/WikiFormatting (PDF)
content-converter.diff (20.5 KB) - added by athomas 8 years ago.
New patch, but interface now returns the default extension for each conversion
pagetolatexplugin.tar.gz (6.4 KB) - added by Trent Apted <tapted@…> 8 years ago.
Plugin, built against r3425 [issues]

Download all attachments as: .zip

Change History (32)

comment:1 Changed 8 years ago by mgood

  • Component changed from general to wiki
  • Priority changed from normal to low

comment:2 Changed 8 years ago by tapted@…

I think this is an excellent idea, and intend to work on it myself as a patch for a system we are using to allow students to manage group software projects at the University of Sydney. Basically, a lot of their documentation in on the Wiki, but they need to generate a final report — ideally in LaTeX, but some use Word, which we try to discourage. An export to LaTeX would be a very convincing reason to (a) use LaTeX but also (b) maintain hyperlinked documentation on their wikis…

comment:3 Changed 8 years ago by Trent Apted <tapted@…>

Here is a first cut, against trac 0.9.3

--- /old/web_ui.py     2006-01-16 17:09:09.000000000 +1100
+++ /new/web_ui.py   2006-04-14 17:01:25.000000000 +1000
@@ -32,6 +32,7 @@
 from trac.web import IRequestHandler
 from trac.wiki.model import WikiPage
 from trac.wiki.formatter import wiki_to_html, wiki_to_oneliner
+from trac.wiki.wikilatex import wiki_to_latex
 
 
 class WikiModule(Component):
@@ -113,6 +114,12 @@
                 req.end_headers()
                 req.write(page.text)
                 return
+            if req.args.get('format') == 'latex':
+                req.send_response(200)
+                req.send_header('Content-Type', 'text/plain;charset=utf-8')
+                req.end_headers()
+                req.write(wiki_to_latex(page, self.env, req))
+                return
             self._render_view(req, db, page)
 
         req.hdf['wiki.action'] = action
@@ -358,6 +365,9 @@
         txt_href = self.env.href.wiki(page.name, version=version, format='txt')
         add_link(req, 'alternate', txt_href, 'Plain Text', 'text/plain')
 
+        latex_href = self.env.href.wiki(page.name, version=version, format='latex')
+        add_link(req, 'alternate', latex_href, 'LaTeX', 'text/plain')
+
         req.hdf['wiki'] = {'page_name': page.name, 'exists': page.exists,
                            'version': page.version, 'readonly': page.readonly}
         if page.exists:

and wikilatex.py attached

Changed 8 years ago by Trent Apted <tapted@…>

site-packages/trac/wiki/wikilatex.py patch for LaTeX export (first cut)

comment:4 Changed 8 years ago by Trent Apted <tapted@…>

I should add that the 'first cut' is extremely hackish, and needs a lot of work. The idea, in this case, was to get a workable LaTeX file for the students to fix up themselves.

Note that this has the potential to replace the unattractive 'export to PDF' options that are toted on other tickets, which first convert to HTML, then directly to PDF.

I'll attach a sample DVI file to give an impression of what this looks like..

  • Trent.

Changed 8 years ago by Trent Apted <tapted@…>

Ignore the 0.8 stuff — this is just an old start page that was sitting in an upgraded Trac on my system. … need to fix quotes…

Changed 8 years ago by Trent Apted <tapted@…>

/trunk/trac/wiki/wikilatex.py LaTeX formatter (wiki export)

Changed 8 years ago by Trent Apted <tapted@…>

Result of /trunk/trac/wiki/tests/wiki-tests.txt

Changed 8 years ago by Trent Apted <tapted@…>

Result of /trunk/wiki-default/WikiStart

Changed 8 years ago by Trent Apted <tapted@…>

Result of /trunk/wiki-default/WikiStart (DVI)

Changed 8 years ago by Trent Apted <tapted@…>

Result of /trunk/trac/wiki/tests/wiki-tests.txt (DVI)

Changed 8 years ago by Trent Apted <tapted@…>

svn diff against trunk rev.3213@2006-04-19 09:59:20 UTC

comment:5 Changed 8 years ago by Trent Apted <tapted@…>

  • Cc tapted@… added
  • Priority changed from low to normal
  • Version changed from 0.9 to devel

Righto.

A lot of the Wiki markup makes no sense in LaTeX. It does "stuff" with most things though. I got it to a point where it works on trunk/trac/wiki/tests/wiki-tests.txt. That is, without breaking Trac or latex (although one of the tests results in a 'too deeply nested' error, which you can just batchmode through). The result is ugly, but no uglier than the html that Trac makes by default.

I've done this against the SVN trunk, and the changes are very isolated, so I see little reason for this not to be bumped in.

related tickets: #1468, #2207 and ticket 76 on trac-hacks.org

comment:6 Changed 8 years ago by cboos

  • Keywords converter added
  • Owner changed from jonas to cboos

This should be packaged as a plugin, and we would need to add an interface extension point for exporting to alternate formats.

We should first discuss this, I think.

class IMIMETypeConverter:
  def get_supported_conversions():
    """Yield tuples corresponding to the supported export formats:

    Each tuple should be of the form `(key, name, in_mimetype, out_mimetype)`

    e.g. ('latex', 'Wiki to LaTeX', 'text/x-trac-wiki', 'text/plain')
    """

  def convert(self, content, mimetype, key):
    """Perform the actual conversion of `content`.

    The actual MIMEType is given in `mimetype` and the conversion mode 
    is the chosen `key`.

    The result should be a `(converted_content, out_mimetype)` pair.
    """"

With this, the !WikiModule could build a list of alternate download links corresponding to the text/x-trac-wiki converters, and then perform the conversion in a generic way.

Having this interface at the Mimeview level would enable to install a similar mechanism for alternate download formats in the attachment view, and in the repository browser view.

comment:7 Changed 8 years ago by athomas

  • Owner changed from cboos to athomas
  • Status changed from new to assigned

I've made an implementation based on your concept, and the one I added #1468. This seems pretty clean to me, opinions?

I think this could supplant the existing IHTMLPreviewRenderer interface as well, either removing it entirely (not good for backwards compatibility) or adding an adaptor using the new interface (probably a better idea) for IMIMETypeConverters that convert to text/html.

This could also be used for adding CSV export for ticket data, which gets requested a bit on the IrcChannel and the MailingList; text/x-trac-ticket to text/csv.

comment:8 Changed 8 years ago by athomas

And here's a quick example for converting Wiki text to text with the formatting stripped:

from trac.core import *
from trac.wiki.formatter import wiki_to_html
from trac.mimeview.api import IMIMETypeConverter

class StrippedWikiConverter(Component):
    implements(IMIMETypeConverter)

    def get_mime_conversions(self):
        yield ('strippedtxt', 'Plain Text (no formatting)', 'text/x-trac-wiki', 'text/plain', 9)

    def convert_mime_content(self, req, mimetype, content, key, filename=None, url=None):
        return (wiki_to_html(content, self.env, req).plaintext(), 'text/plain;charset=utf-8')

comment:9 Changed 8 years ago by cboos

I've started to make some comments here, but they grew a bit too much, so I'll post them on Trac-Devel instead, in a couple of minutes.

comment:10 Changed 8 years ago by cboos

comment:11 Changed 8 years ago by cboos

Hm, wrong, see trac-dev:494 (sorry)

Changed 8 years ago by athomas

Migrated ticket, wiki and query interfaces

Changed 8 years ago by Trent Apted <tapted@…>

Fixed some bugs. Tickets, reports, oneliners and revision logs and images are still broken (investigating)

Changed 8 years ago by Trent Apted <tapted@…>

Changed 8 years ago by Trent Apted <tapted@…>

Typeset Result of trunk/wiki-default/WikiFormatting (PDF)

comment:12 Changed 8 years ago by Trent Apted <tapted@…>

I've done some refinements to wikilatex.py. To be honest, I am still essentially hacking — I'm not yet familiar with inner-workings of Trac. When I have some more time I'll delve in to the Trac code, cleanup this stuff and see how this should fit in with the proposed IMIMETypeConverter.

Some comments on the LaTeX converter, as it currently stands (with reference to WikiFormatting.pdf):

  • I think the typeset stuff looks pretty good
    • but I'm biased — does anyone else have an opinion?
  • Keep in mind that the current idea is that each wiki page exported will probably be incorporated into some larger document
    • However, this should be an option flag to be passed somehow to the exporter
    • cgi request? LaTeX conversion staging page?
    • Hence:
  • Handling section/subsection/subsubesction
    • Currently the page name is put into the \section{} at the start, but convention might mean that this is not right
    • Maybe we should rely on the wiki page to have a =Heading= at the start, to be promoted to \section{}
    • Otherwise =h1= now maps to \subsection{}, ==h2== to \subsubsection{} and ≥ ===h3=== to \subsubsection*{}
  • Handling hyperlinks
    • since the Wiki is inherently hyperlinked, it makes sense to carry this over to the PDF
    • hence, hyperref is now part of the preamble, with some sensible options set
      • pdfauthor could be set to the login ID, perhaps (pass in context..)
    • Wiki links obviously cannot be resolved until the all-in-one document is generated (so they come up as \S{}??), but otherwise work, and are clickable thanks to hyperref
    • To dileneate links in the printed version, they are currently underlined (but this is easily changed by tweaking the \anchortext command in the preamble)
    • If it is something other than an automatic CamelCase or http://www.example.com link, then a footnote is also created, showing where the link goes
    • If we _know_ hyperref is going to be used (currently it does not rely on any specials in the hyperref package — it just overrides builtins to make links in the PDF), then the \anchortext{} command could be adjusted to accept an [optional] argument to make the actual anchortext an active link (but we should probably still do the footnote method for the printed version)
  • … about that preamble (see WikiFormatting.tex)
    • OK, so this is meant to be part of a larger document, so there should only be one preamble
    • But it's easy to delete a preamble (harder to make up your own), and this should help latex n00bs that just want a PDF
    • If this later goes the route required for #2207, individual preambles can probably be stripped automatically
  • line separation of list items
    • the default in LaTeX has quite a large spacing between list items, which is maybe not what people expect
    • this can be adjusted in the preamble..
  • Tickets
    • I've written a nice way of showing tickets for Trac v0.8 (which is what is running on a legacy system for students at my university), but the way tickets are handled in the formatter has changed quite drastically to v0.9
    • So this hasn't yet been implemented in the attached wikilatex.py (and acutally causes some unicode freakout that I don't fully understand at this point)
    • Same goes for reports and revision logs
  • Images
    • obviously, the Image cannot be embedded in the LaTeX, but we can make a figure float for .{png,jpeg,gif,etc} links (not yet implemented)
    • this required the image to be downloaded separately and put somewhere that pdflatex can find it (I wouldn't recommend /usr/bin/latex or dvipdfm because they want images in eps format, and would need to be converted)
    • At some point there will need to be a 'LaTeX' flavour of the Image macro
  • Quotes
    • double quote characters (") are TeXified? to `` or '', according to a regex
  • Oneliners
    • not yet properly implemented, but probably similar to the method for Wiki/HTML
  • Tables
    • Tables in LaTeX are represented radically different to HTML (and are crap, quite frankly)
      • e.g. you need a column desciption before the table starts, with number of columns, etc.
    • The current implementation is a hack, but behaves most of the time
  • WikiProcessors
    • currently these are all dumped in a \verbatim{} environment
  • WikiMacros
    • PageOutline is converted to \tableofcontents{}
    • BR/br is converted to
      (or \vspace{1em} if "there is no line to end")
    • Others are rendered in HTML, then the HTML code is put in a \verbatim{} environment (see, e.g. Timestamp in WikiFormating.pdf
  • subscript/superscript
    • These are implemented using math mode, making them very sensitive for anything much more complicated than plain text
  • The Implementation
    • probably needs cleaning up
    • needs a more rigourous approach to escaping of LaTeX specials like %, _, #, $, etc.
  • The Future
    • Math formulae using LaTeX
      • This is suggested on TracHacks
      • Perhaps an opportunity to test the MIME detection/flavours for Macros..
      • But the suggested solution on TracHacks is not very clean at the moment
    • Testing
      • This probably needs a test suite, which would be markedly different to the one for the HTML Wiki, testing handling of LaTeX specials, etc.
    • Unicode
      • LaTeX does not like unicode, the preferred way to do &eacute;, for example is \'{e} , but there is no solid mapping between unicode and LaTeX escapes like this — currently if unicode is encountered, a exception is raised (not deliberately.. some other bug is at work here), and the line is output as <bad unicode on this line> or similar
  • Trent.

comment:13 Changed 8 years ago by cboos

alect: I've tested the patch, and even created a sample Ticket→Excel converter, see #2669; it works great (it only needed a small generalization of the export_csv methods)

comment:14 Changed 8 years ago by cboos

  • Keywords mimetype added

Trent: I've looked at the WikiFormatting.pdf (despite of the difficulty to access it, because of the #2974 issue…) and it looks promising. However the current approach is a bit heavy-weight, as you are forced to reimplement most of the Formatter logic. It would be much better if there would be a better separation of the parsing/formatting methods, within the Formatter class. E.g. the _xxx_formatter could be renamed _parse_xxx and would call format_this or format_that as appropriate.

Changed 8 years ago by athomas

New patch, but interface now returns the default extension for each conversion

comment:15 Changed 8 years ago by athomas

Slightly updated version of the IContentConverter interface committed in r3305 and r3306.

I think we can close this now that it can be implemented as a plugin?

Changed 8 years ago by Trent Apted <tapted@…>

Plugin, built against r3425 [issues]

comment:16 Changed 8 years ago by Trent Apted <tapted@…>

OK, I've implemented it as a plugin (which can probably migrate to trac-hacks), so this ticket can probably be closed. But there are some other issues that implementing this has highlighted. These issues may have arisen because this 'plugin' is still in essence a hack, and they come in light of Christian's comment about the duplicated functionality in the latex formatter vs the html formatter that comes with Trac.

The change from 0.8/0.9 to 0.10 that puts HTML formatting for tickets, changesets, etc. into the extension architecture — ExtensionPoint?(IWikiSyntaxProvider) and friends — means that I can no longer hook into the regex bindings for these the way I could for the versions of wikilatex implemented for 0.8 and 0.9. So, for now, I have just disabled them (maybe this is not such an issue for LaTeX, but it would be nice to have a footnote with the ticket description or changeset comment, as I did for 0.8). That is, I am no longer using wiki.rules, but generating my own that don't inlcude those inserted by extensions (which includes ticket, changeset, etc.):

syntax = Formatter._pre_rules[:]
syntax += Formatter._post_rules[:]
helper_re = re.compile(r'\?P<([a-z\d_]+)>')
for rule in syntax:
    helpers += helper_re.findall(rule)[1:]
self.myrules = re.compile('(?:' + '|'.join(syntax) + ')')

(borrowed from trunk/trac/wiki/api.py)

If we want the Formatter logic to be reusable for extensions, then there might need to be a clean way of overriding these or hooking into the functionality (maybe there is and I'm missing it). In any case, it quickly becomes messy because there is no way to anticipate which extensions have been overridden and will start trying to feed 'Element' objects to the processor, rather than strings. For wikilatex to work reliably, all of these would have to be overridden. Also, as Christian points out, the _parsing_ logic in Formatter should be able to be reused. Maybe this should be a new ticket. On that, after all this hacking, I am really starting to hate regex. I would suggest a recursive descent parser to generate a nice parse tree that could be passed to these content converters, but I don't think the grammar is context-free so this would have issues.

So where does this ticket stand?

With the current API, I don't think that Formatter can be reused, and for this plugin to remain maintainable, I feel as though I would have to write all the parsing logic from scratch. That's fair enough, I suppose, because inheriting from Formatter is what makes this a hack rather than a plugin, because I don't think the Formatter is an official API. So, perhaps if I run out of other work to do or feel like procrastinating, wikilatex will become the first attempt at implementing the wiki formatter as a recursive descent parser that can later be incorporated into the core to allow the parse tree used by other output generators (but I wouldn't be so presumptuous to suggest that it be used for the main HTML Formatter, which may be more suited to the regex implementation).

comment:17 Changed 8 years ago by cboos

  • Resolution set to wontfix
  • Status changed from assigned to closed

Making the Wiki parser reusable by separating the parsing and formatting steps, and using a recursive descent parser instead of a regexp-based engine are two different things. The former can be achieved without the latter, and I have the feeling that getting away from regexps will be bad in terms of performance (see Trac-Dev:316 and the following DrProject's blog entry), and make it less flexible for introducing new constructions and being extensible by plugins.

Also, a Wiki engine is different than a parser for a programming language, as it parses text meant to be read by humans ;)

For Trent's plugin, see TracHacks:PageToLatexPlugin.

comment:18 in reply to: ↑ description Changed 22 months ago by anonymous

I did the recurse descent approach. Now there is a tool that generates LaTeX and PDF files:

http://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf

it is available in binary form for Windows and Ubuntu Linux. Furthermore the source code is available under GPL 2. It works form the client side, and does not require any changes on the installation on the servers. So you essentially got the requested feature now.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed The owner will remain athomas.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from athomas to the specified user.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.