Opened 10 years ago

Closed 9 years ago

Export wiki pages to latex

Reported by: Owned by: emilk@… Alec Thomas normal wiki system devel normal mimetype converter emilk@…, tapted@…

Description

It would be very usefull to export wiki pages to Latex. A wiki is a great tool for brainstorming and planning. Latex is better for definitiv printed texts. It would be cool if I could use wiki pages as a starting point for Latex articles.

comment:1 Changed 10 years ago by Matthew Good

• Component changed from general to wiki
• Priority changed from normal to low

comment:2 Changed 10 years ago by tapted@…

I think this is an excellent idea, and intend to work on it myself as a patch for a system we are using to allow students to manage group software projects at the University of Sydney. Basically, a lot of their documentation in on the Wiki, but they need to generate a final report — ideally in LaTeX, but some use Word, which we try to discourage. An export to LaTeX would be a very convincing reason to (a) use LaTeX but also (b) maintain hyperlinked documentation on their wikis…

comment:3 Changed 10 years ago by Trent Apted <tapted@…>

Here is a first cut, against trac 0.9.3

--- /old/web_ui.py     2006-01-16 17:09:09.000000000 +1100
+++ /new/web_ui.py   2006-04-14 17:01:25.000000000 +1000
@@ -32,6 +32,7 @@
from trac.web import IRequestHandler
from trac.wiki.model import WikiPage
from trac.wiki.formatter import wiki_to_html, wiki_to_oneliner
+from trac.wiki.wikilatex import wiki_to_latex

class WikiModule(Component):
@@ -113,6 +114,12 @@
req.write(page.text)
return
+            if req.args.get('format') == 'latex':
+                req.send_response(200)
+                req.write(wiki_to_latex(page, self.env, req))
+                return
self._render_view(req, db, page)

req.hdf['wiki.action'] = action
@@ -358,6 +365,9 @@
txt_href = self.env.href.wiki(page.name, version=version, format='txt')

+        latex_href = self.env.href.wiki(page.name, version=version, format='latex')
+
req.hdf['wiki'] = {'page_name': page.name, 'exists': page.exists,
if page.exists:


and wikilatex.py attached

Changed 10 years ago by Trent Apted <tapted@…>

site-packages/trac/wiki/wikilatex.py patch for LaTeX export (first cut)

comment:4 Changed 10 years ago by Trent Apted <tapted@…>

I should add that the 'first cut' is extremely hackish, and needs a lot of work. The idea, in this case, was to get a workable LaTeX file for the students to fix up themselves.

Note that this has the potential to replace the unattractive 'export to PDF' options that are toted on other tickets, which first convert to HTML, then directly to PDF.

I'll attach a sample DVI file to give an impression of what this looks like..

• Trent.

Changed 10 years ago by Trent Apted <tapted@…>

Ignore the 0.8 stuff — this is just an old start page that was sitting in an upgraded Trac on my system. … need to fix quotes…

Changed 10 years ago by Trent Apted <tapted@…>

/trunk/trac/wiki/wikilatex.py LaTeX formatter (wiki export)

Changed 10 years ago by Trent Apted <tapted@…>

Result of /trunk/trac/wiki/tests/wiki-tests.txt

Changed 10 years ago by Trent Apted <tapted@…>

Result of /trunk/wiki-default/WikiStart

Changed 10 years ago by Trent Apted <tapted@…>

Result of /trunk/wiki-default/WikiStart (DVI)

Changed 10 years ago by Trent Apted <tapted@…>

Result of /trunk/trac/wiki/tests/wiki-tests.txt (DVI)

Changed 10 years ago by Trent Apted <tapted@…>

svn diff against trunk rev.3213@2006-04-19 09:59:20 UTC

comment:5 Changed 10 years ago by Trent Apted <tapted@…>

• Priority changed from low to normal
• Version changed from 0.9 to devel

Righto.

A lot of the Wiki markup makes no sense in LaTeX. It does "stuff" with most things though. I got it to a point where it works on trunk/trac/wiki/tests/wiki-tests.txt. That is, without breaking Trac or latex (although one of the tests results in a 'too deeply nested' error, which you can just batchmode through). The result is ugly, but no uglier than the html that Trac makes by default.

I've done this against the SVN trunk, and the changes are very isolated, so I see little reason for this not to be bumped in.

related tickets: #1468, #2207 and ticket 76 on trac-hacks.org

comment:6 Changed 10 years ago by Christian Boos

• Owner changed from Jonas Borgström to Christian Boos

This should be packaged as a plugin, and we would need to add an interface extension point for exporting to alternate formats.

We should first discuss this, I think.

class IMIMETypeConverter:
def get_supported_conversions():
"""Yield tuples corresponding to the supported export formats:

Each tuple should be of the form (key, name, in_mimetype, out_mimetype)

e.g. ('latex', 'Wiki to LaTeX', 'text/x-trac-wiki', 'text/plain')
"""

def convert(self, content, mimetype, key):
"""Perform the actual conversion of content.

The actual MIMEType is given in mimetype and the conversion mode
is the chosen key.

The result should be a (converted_content, out_mimetype) pair.
""""


With this, the !WikiModule could build a list of alternate download links corresponding to the text/x-trac-wiki converters, and then perform the conversion in a generic way.

Having this interface at the Mimeview level would enable to install a similar mechanism for alternate download formats in the attachment view, and in the repository browser view.

comment:7 Changed 10 years ago by Alec Thomas

• Owner changed from Christian Boos to Alec Thomas
• Status changed from new to assigned

I've made an implementation based on your concept, and the one I added #1468. This seems pretty clean to me, opinions?

I think this could supplant the existing IHTMLPreviewRenderer interface as well, either removing it entirely (not good for backwards compatibility) or adding an adaptor using the new interface (probably a better idea) for IMIMETypeConverters that convert to text/html.

This could also be used for adding CSV export for ticket data, which gets requested a bit on the IrcChannel and the MailingList; text/x-trac-ticket to text/csv.

comment:8 Changed 10 years ago by Alec Thomas

And here's a quick example for converting Wiki text to text with the formatting stripped:

from trac.core import *
from trac.wiki.formatter import wiki_to_html
from trac.mimeview.api import IMIMETypeConverter

class StrippedWikiConverter(Component):
implements(IMIMETypeConverter)

def get_mime_conversions(self):
yield ('strippedtxt', 'Plain Text (no formatting)', 'text/x-trac-wiki', 'text/plain', 9)

def convert_mime_content(self, req, mimetype, content, key, filename=None, url=None):
return (wiki_to_html(content, self.env, req).plaintext(), 'text/plain;charset=utf-8')


comment:9 Changed 10 years ago by Christian Boos

I've started to make some comments here, but they grew a bit too much, so I'll post them on Trac-Devel instead, in a couple of minutes.

comment:11 Changed 10 years ago by Christian Boos

Hm, wrong, see trac-dev:494 (sorry)

Changed 10 years ago by Alec Thomas

Migrated ticket, wiki and query interfaces

Changed 10 years ago by Trent Apted <tapted@…>

Fixed some bugs. Tickets, reports, oneliners and revision logs and images are still broken (investigating)

Changed 10 years ago by Trent Apted <tapted@…>

Typeset Result of trunk/wiki-default/WikiFormatting (PDF)

comment:12 Changed 10 years ago by Trent Apted <tapted@…>

I've done some refinements to wikilatex.py. To be honest, I am still essentially hacking — I'm not yet familiar with inner-workings of Trac. When I have some more time I'll delve in to the Trac code, cleanup this stuff and see how this should fit in with the proposed IMIMETypeConverter.

Some comments on the LaTeX converter, as it currently stands (with reference to WikiFormatting.pdf):

• I think the typeset stuff looks pretty good
• but I'm biased — does anyone else have an opinion?
• Keep in mind that the current idea is that each wiki page exported will probably be incorporated into some larger document
• However, this should be an option flag to be passed somehow to the exporter
• cgi request? LaTeX conversion staging page?
• Hence:
• Handling section/subsection/subsubesction
• Currently the page name is put into the \section{} at the start, but convention might mean that this is not right
• Maybe we should rely on the wiki page to have a =Heading= at the start, to be promoted to \section{}
• Otherwise =h1= now maps to \subsection{}, ==h2== to \subsubsection{} and ≥ ===h3=== to \subsubsection*{}
• since the Wiki is inherently hyperlinked, it makes sense to carry this over to the PDF
• hence, hyperref is now part of the preamble, with some sensible options set
• pdfauthor could be set to the login ID, perhaps (pass in context..)
• Wiki links obviously cannot be resolved until the all-in-one document is generated (so they come up as \S{}??), but otherwise work, and are clickable thanks to hyperref
• To dileneate links in the printed version, they are currently underlined (but this is easily changed by tweaking the \anchortext command in the preamble)
• If it is something other than an automatic CamelCase or http://www.example.com link, then a footnote is also created, showing where the link goes
• If we _know_ hyperref is going to be used (currently it does not rely on any specials in the hyperref package — it just overrides builtins to make links in the PDF), then the \anchortext{} command could be adjusted to accept an [optional] argument to make the actual anchortext an active link (but we should probably still do the footnote method for the printed version)
• … about that preamble (see WikiFormatting.tex)
• OK, so this is meant to be part of a larger document, so there should only be one preamble
• But it's easy to delete a preamble (harder to make up your own), and this should help latex n00bs that just want a PDF
• If this later goes the route required for #2207, individual preambles can probably be stripped automatically
• line separation of list items
• the default in LaTeX has quite a large spacing between list items, which is maybe not what people expect
• this can be adjusted in the preamble..
• Tickets
• I've written a nice way of showing tickets for Trac v0.8 (which is what is running on a legacy system for students at my university), but the way tickets are handled in the formatter has changed quite drastically to v0.9
• So this hasn't yet been implemented in the attached wikilatex.py (and acutally causes some unicode freakout that I don't fully understand at this point)
• Same goes for reports and revision logs
• Images
• obviously, the Image cannot be embedded in the LaTeX, but we can make a figure float for .{png,jpeg,gif,etc} links (not yet implemented)
• this required the image to be downloaded separately and put somewhere that pdflatex can find it (I wouldn't recommend /usr/bin/latex or dvipdfm because they want images in eps format, and would need to be converted)
• At some point there will need to be a 'LaTeX' flavour of the Image macro
• Quotes
• double quote characters (") are TeXified? to  or '', according to a regex
• Oneliners
• not yet properly implemented, but probably similar to the method for Wiki/HTML
• Tables
• Tables in LaTeX are represented radically different to HTML (and are crap, quite frankly)
• e.g. you need a column desciption before the table starts, with number of columns, etc.
• The current implementation is a hack, but behaves most of the time
• WikiProcessors
• currently these are all dumped in a \verbatim{} environment
• WikiMacros
• PageOutline is converted to \tableofcontents{}
• BR/br is converted to
(or \vspace{1em} if "there is no line to end")
• Others are rendered in HTML, then the HTML code is put in a \verbatim{} environment (see, e.g. Timestamp in WikiFormating.pdf
• subscript/superscript
• These are implemented using math mode, making them very sensitive for anything much more complicated than plain text
• The Implementation
• probably needs cleaning up
• needs a more rigourous approach to escaping of LaTeX specials like %, _, #, \$, etc.
• The Future
• Math formulae using LaTeX
• This is suggested on TracHacks
• Perhaps an opportunity to test the MIME detection/flavours for Macros..
• But the suggested solution on TracHacks is not very clean at the moment
• Testing
• This probably needs a test suite, which would be markedly different to the one for the HTML Wiki, testing handling of LaTeX specials, etc.
• Unicode
• LaTeX does not like unicode, the preferred way to do &eacute;, for example is  \'{e} , but there is no solid mapping between unicode and LaTeX escapes like this — currently if unicode is encountered, a exception is raised (not deliberately.. some other bug is at work here), and the line is output as <bad unicode on this line> or similar
• Trent.

comment:13 Changed 10 years ago by Christian Boos

alect: I've tested the patch, and even created a sample Ticket→Excel converter, see #2669; it works great (it only needed a small generalization of the export_csv methods)

comment:14 Changed 10 years ago by Christian Boos

Trent: I've looked at the WikiFormatting.pdf (despite of the difficulty to access it, because of the #2974 issue…) and it looks promising. However the current approach is a bit heavy-weight, as you are forced to reimplement most of the Formatter logic. It would be much better if there would be a better separation of the parsing/formatting methods, within the Formatter class. E.g. the _xxx_formatter could be renamed _parse_xxx and would call format_this or format_that as appropriate.

Changed 10 years ago by Alec Thomas

New patch, but interface now returns the default extension for each conversion

comment:15 Changed 10 years ago by Alec Thomas

Slightly updated version of the IContentConverter interface committed in r3305 and r3306.

I think we can close this now that it can be implemented as a plugin?

Changed 9 years ago by Trent Apted <tapted@…>

Plugin, built against r3425 [issues]

comment:16 Changed 9 years ago by Trent Apted <tapted@…>

OK, I've implemented it as a plugin (which can probably migrate to trac-hacks), so this ticket can probably be closed. But there are some other issues that implementing this has highlighted. These issues may have arisen because this 'plugin' is still in essence a hack, and they come in light of Christian's comment about the duplicated functionality in the latex formatter vs the html formatter that comes with Trac.

The change from 0.8/0.9 to 0.10 that puts HTML formatting for tickets, changesets, etc. into the extension architecture — ExtensionPoint?(IWikiSyntaxProvider) and friends — means that I can no longer hook into the regex bindings for these the way I could for the versions of wikilatex implemented for 0.8 and 0.9. So, for now, I have just disabled them (maybe this is not such an issue for LaTeX, but it would be nice to have a footnote with the ticket description or changeset comment, as I did for 0.8). That is, I am no longer using wiki.rules, but generating my own that don't inlcude those inserted by extensions (which includes ticket, changeset, etc.):

syntax = Formatter._pre_rules[:]
syntax += Formatter._post_rules[:]
helper_re = re.compile(r'\?P<([a-z\d_]+)>')
for rule in syntax:
helpers += helper_re.findall(rule)[1:]
self.myrules = re.compile('(?:' + '|'.join(syntax) + ')')


(borrowed from trunk/trac/wiki/api.py)

If we want the Formatter logic to be reusable for extensions, then there might need to be a clean way of overriding these or hooking into the functionality (maybe there is and I'm missing it). In any case, it quickly becomes messy because there is no way to anticipate which extensions have been overridden and will start trying to feed 'Element' objects to the processor, rather than strings. For wikilatex to work reliably, all of these would have to be overridden. Also, as Christian points out, the _parsing_ logic in Formatter should be able to be reused. Maybe this should be a new ticket. On that, after all this hacking, I am really starting to hate regex. I would suggest a recursive descent parser to generate a nice parse tree that could be passed to these content converters, but I don't think the grammar is context-free so this would have issues.

So where does this ticket stand?

With the current API, I don't think that Formatter can be reused, and for this plugin to remain maintainable, I feel as though I would have to write all the parsing logic from scratch. That's fair enough, I suppose, because inheriting from Formatter is what makes this a hack rather than a plugin, because I don't think the Formatter is an official API. So, perhaps if I run out of other work to do or feel like procrastinating, wikilatex will become the first attempt at implementing the wiki formatter as a recursive descent parser that can later be incorporated into the core to allow the parse tree used by other output generators (but I wouldn't be so presumptuous to suggest that it be used for the main HTML Formatter, which may be more suited to the regex implementation).

comment:17 Changed 9 years ago by Christian Boos

• Resolution set to wontfix
• Status changed from assigned to closed

Making the Wiki parser reusable by separating the parsing and formatting steps, and using a recursive descent parser instead of a regexp-based engine are two different things. The former can be achieved without the latter, and I have the feeling that getting away from regexps will be bad in terms of performance (see Trac-Dev:316 and the following DrProject's blog entry), and make it less flexible for introducing new constructions and being extensible by plugins.

Also, a Wiki engine is different than a parser for a programming language, as it parses text meant to be read by humans ;)

For Trent's plugin, see TracHacks:PageToLatexPlugin.

comment:18 in reply to: ↑ description Changed 3 years ago by anonymous

I did the recurse descent approach. Now there is a tool that generates LaTeX and PDF files:

it is available in binary form for Windows and Ubuntu Linux. Furthermore the source code is available under GPL 2. It works form the client side, and does not require any changes on the installation on the servers. So you essentially got the requested feature now.

Modify Ticket

Change Properties
Summary: Description: You may use WikiFormatting here. It would be very usefull to export wiki pages to Latex. A wiki is a great tool for brainstorming and planning. Latex is better for definitiv printed texts. It would be cool if I could use wiki pages as a starting point for Latex articles. defectenhancementtask highesthighnormallowlowest 0.12.81.21.0.101.3.1 next-dev-1.1.xnext-dev-1.3.xnext-major-releasesnext-stable-1.0.xnot applicableplugin - mercurialplugin - spam-filtertopic-multiprojecttopic-wikienginetranslationsundecidedunscheduled admin/consoleadmin/webattachmentcontribdatabase backendgenerali18nnotificationplugin/gitplugin/mercurialplugin/spamfilterprojectquery systemrenderingreport systemroadmapsearch systemticket systemtimelineversion controlversion control/browserversion control/changeset viewversion control/log viewweb frontendweb frontend/mod_pythonweb frontend/tracdwiki system 1.2dev1.1dev1.0dev1.0-stable0.12dev0.12-stable1.0.91.0.81.1.61.0.70.12.71.0.61.1.51.1.41.0.51.0.41.1.31.0.31.1.21.0.20.12.61.0.10.12.51.1.1dev1.00.12.41.0b10.12.30.12.20.12.2rc10.12.10.13dev0.120.12rc10.12b10.11.70.11.60.11.50.10.50.10.40.10.10.9.60.9.30.8.4devel blockercriticalmajornormalminortrivial Set your email in Preferences
Action
as closed The owner will remain Alec Thomas.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Alec Thomas to the specified user.