#2296 closed enhancement (wontfix)
Export wiki pages to latex
Reported by: | Owned by: | Alec Thomas | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | wiki system | Version: | devel |
Severity: | normal | Keywords: | mimetype converter |
Cc: | emilk@…, tapted@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
It would be very usefull to export wiki pages to Latex. A wiki is a great tool for brainstorming and planning. Latex is better for definitiv printed texts. It would be cool if I could use wiki pages as a starting point for Latex articles.
Attachments (14)
Change History (32)
comment:1 by , 19 years ago
Component: | general → wiki |
---|---|
Priority: | normal → low |
comment:2 by , 19 years ago
comment:3 by , 19 years ago
Here is a first cut, against trac 0.9.3
--- /old/web_ui.py 2006-01-16 17:09:09.000000000 +1100 +++ /new/web_ui.py 2006-04-14 17:01:25.000000000 +1000 @@ -32,6 +32,7 @@ from trac.web import IRequestHandler from trac.wiki.model import WikiPage from trac.wiki.formatter import wiki_to_html, wiki_to_oneliner +from trac.wiki.wikilatex import wiki_to_latex class WikiModule(Component): @@ -113,6 +114,12 @@ req.end_headers() req.write(page.text) return + if req.args.get('format') == 'latex': + req.send_response(200) + req.send_header('Content-Type', 'text/plain;charset=utf-8') + req.end_headers() + req.write(wiki_to_latex(page, self.env, req)) + return self._render_view(req, db, page) req.hdf['wiki.action'] = action @@ -358,6 +365,9 @@ txt_href = self.env.href.wiki(page.name, version=version, format='txt') add_link(req, 'alternate', txt_href, 'Plain Text', 'text/plain') + latex_href = self.env.href.wiki(page.name, version=version, format='latex') + add_link(req, 'alternate', latex_href, 'LaTeX', 'text/plain') + req.hdf['wiki'] = {'page_name': page.name, 'exists': page.exists, 'version': page.version, 'readonly': page.readonly} if page.exists:
and wikilatex.py attached
by , 19 years ago
Attachment: | wikilatex.py added |
---|
site-packages/trac/wiki/wikilatex.py patch for LaTeX export (first cut)
comment:4 by , 19 years ago
I should add that the 'first cut' is extremely hackish, and needs a lot of work. The idea, in this case, was to get a workable LaTeX file for the students to fix up themselves.
Note that this has the potential to replace the unattractive 'export to PDF' options that are toted on other tickets, which first convert to HTML, then directly to PDF.
I'll attach a sample DVI file to give an impression of what this looks like..
- Trent.
by , 19 years ago
Ignore the 0.8 stuff — this is just an old start page that was sitting in an upgraded Trac on my system. … need to fix quotes…
by , 19 years ago
Attachment: | wikilatex.2.py added |
---|
/trunk/trac/wiki/wikilatex.py LaTeX formatter (wiki export)
by , 19 years ago
Attachment: | wiki_latex_export.diff added |
---|
svn diff against trunk rev.3213@2006-04-19 09:59:20 UTC
comment:5 by , 19 years ago
Cc: | added |
---|---|
Priority: | low → normal |
Version: | 0.9 → devel |
Righto.
A lot of the Wiki markup makes no sense in LaTeX. It does "stuff" with most things though. I got it to a point where it works on trunk/trac/wiki/tests/wiki-tests.txt. That is, without breaking Trac or latex (although one of the tests results in a 'too deeply nested' error, which you can just batchmode through). The result is ugly, but no uglier than the html that Trac makes by default.
I've done this against the SVN trunk, and the changes are very isolated, so I see little reason for this not to be bumped in.
related tickets: #1468, #2207 and ticket 76 on trac-hacks.org
comment:6 by , 19 years ago
Keywords: | converter added |
---|---|
Owner: | changed from | to
This should be packaged as a plugin, and we would need to add an interface extension point for exporting to alternate formats.
We should first discuss this, I think.
class IMIMETypeConverter: def get_supported_conversions(): """Yield tuples corresponding to the supported export formats: Each tuple should be of the form `(key, name, in_mimetype, out_mimetype)` e.g. ('latex', 'Wiki to LaTeX', 'text/x-trac-wiki', 'text/plain') """ def convert(self, content, mimetype, key): """Perform the actual conversion of `content`. The actual MIMEType is given in `mimetype` and the conversion mode is the chosen `key`. The result should be a `(converted_content, out_mimetype)` pair. """"
With this, the !WikiModule
could build a list of alternate download links
corresponding to the text/x-trac-wiki
converters, and then perform
the conversion in a generic way.
Having this interface at the Mimeview level would enable to install a similar mechanism for alternate download formats in the attachment view, and in the repository browser view.
comment:7 by , 19 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
I've made an implementation based on your concept, and the one I added #1468. This seems pretty clean to me, opinions?
I think this could supplant the existing IHTMLPreviewRenderer interface as well, either removing it entirely (not good for backwards compatibility) or adding an adaptor using the new interface (probably a better idea) for IMIMETypeConverters that convert to text/html
.
This could also be used for adding CSV export for ticket data, which gets requested a bit on the IrcChannel and the MailingList; text/x-trac-ticket
to text/csv
.
comment:8 by , 19 years ago
And here's a quick example for converting Wiki text to text with the formatting stripped:
from trac.core import * from trac.wiki.formatter import wiki_to_html from trac.mimeview.api import IMIMETypeConverter class StrippedWikiConverter(Component): implements(IMIMETypeConverter) def get_mime_conversions(self): yield ('strippedtxt', 'Plain Text (no formatting)', 'text/x-trac-wiki', 'text/plain', 9) def convert_mime_content(self, req, mimetype, content, key, filename=None, url=None): return (wiki_to_html(content, self.env, req).plaintext(), 'text/plain;charset=utf-8')
comment:9 by , 19 years ago
I've started to make some comments here, but they grew a bit too much, so I'll post them on Trac-Devel instead, in a couple of minutes.
by , 19 years ago
Attachment: | wikilatex.3.py added |
---|
Fixed some bugs. Tickets, reports, oneliners and revision logs and images are still broken (investigating)
by , 19 years ago
Attachment: | WikiFormatting.tex added |
---|
Result of trunk/wiki-default/WikiFormatting (LaTeX)
by , 19 years ago
Attachment: | WikiFormatting.pdf added |
---|
Typeset Result of trunk/wiki-default/WikiFormatting (PDF)
comment:12 by , 19 years ago
I've done some refinements to wikilatex.py. To be honest, I am still essentially hacking — I'm not yet familiar with inner-workings of Trac. When I have some more time I'll delve in to the Trac code, cleanup this stuff and see how this should fit in with the proposed IMIMETypeConverter.
Some comments on the LaTeX converter, as it currently stands (with reference to WikiFormatting.pdf):
- I think the typeset stuff looks pretty good
- but I'm biased — does anyone else have an opinion?
- Keep in mind that the current idea is that each wiki page exported will probably be incorporated into some larger document
- However, this should be an option flag to be passed somehow to the exporter
- cgi request? LaTeX conversion staging page?
- Hence:
- Handling section/subsection/subsubesction
- Currently the page name is put into the \section{} at the start, but convention might mean that this is not right
- Maybe we should rely on the wiki page to have a =Heading= at the start, to be promoted to \section{}
- Otherwise =h1= now maps to \subsection{}, ==h2== to \subsubsection{} and ≥ ===h3=== to \subsubsection*{}
- Handling hyperlinks
- since the Wiki is inherently hyperlinked, it makes sense to carry this over to the PDF
- hence, hyperref is now part of the preamble, with some sensible options set
- pdfauthor could be set to the login ID, perhaps (pass in context..)
- Wiki links obviously cannot be resolved until the all-in-one document is generated (so they come up as \S{}??), but otherwise work, and are clickable thanks to hyperref
- To dileneate links in the printed version, they are currently underlined (but this is easily changed by tweaking the \anchortext command in the preamble)
- If it is something other than an automatic CamelCase or http://www.example.com link, then a footnote is also created, showing where the link goes
- If we _know_ hyperref is going to be used (currently it does not rely on any specials in the hyperref package — it just overrides builtins to make links in the PDF), then the \anchortext{} command could be adjusted to accept an [optional] argument to make the actual anchortext an active link (but we should probably still do the footnote method for the printed version)
- … about that preamble (see WikiFormatting.tex)
- OK, so this is meant to be part of a larger document, so there should only be one preamble
- But it's easy to delete a preamble (harder to make up your own), and this should help latex n00bs that just want a PDF
- If this later goes the route required for #2207, individual preambles can probably be stripped automatically
- line separation of list items
- the default in LaTeX has quite a large spacing between list items, which is maybe not what people expect
- this can be adjusted in the preamble..
- Tickets
- I've written a nice way of showing tickets for Trac v0.8 (which is what is running on a legacy system for students at my university), but the way tickets are handled in the formatter has changed quite drastically to v0.9
- So this hasn't yet been implemented in the attached wikilatex.py (and acutally causes some unicode freakout that I don't fully understand at this point)
- Same goes for reports and revision logs
- Images
- obviously, the Image cannot be embedded in the LaTeX, but we can make a figure float for .{png,jpeg,gif,etc} links (not yet implemented)
- this required the image to be downloaded separately and put somewhere that pdflatex can find it (I wouldn't recommend /usr/bin/latex or dvipdfm because they want images in eps format, and would need to be converted)
- At some point there will need to be a 'LaTeX' flavour of the Image macro
- Quotes
- double quote characters (") are TeXified to
``
or''
, according to a regex
- double quote characters (") are TeXified to
- Oneliners
- not yet properly implemented, but probably similar to the method for Wiki/HTML
- Tables
- Tables in LaTeX are represented radically different to HTML (and are crap, quite frankly)
- e.g. you need a column desciption before the table starts, with number of columns, etc.
- The current implementation is a hack, but behaves most of the time
- Tables in LaTeX are represented radically different to HTML (and are crap, quite frankly)
- WikiProcessors
- currently these are all dumped in a \verbatim{} environment
- WikiMacros
- PageOutline is converted to \tableofcontents{}
- BR/br is converted to
(or \vspace{1em} if "there is no line to end") - Others are rendered in HTML, then the HTML code is put in a \verbatim{} environment (see, e.g. Timestamp in WikiFormating.pdf
- subscript/superscript
- These are implemented using math mode, making them very sensitive for anything much more complicated than plain text
- The Implementation
- probably needs cleaning up
- needs a more rigourous approach to escaping of LaTeX specials like %, _, #, $, etc.
- The Future
- Math formulae using LaTeX
- This is suggested on TracHacks
- Perhaps an opportunity to test the MIME detection/flavours for Macros..
- But the suggested solution on TracHacks is not very clean at the moment
- Testing
- This probably needs a test suite, which would be markedly different to the one for the HTML Wiki, testing handling of LaTeX specials, etc.
- Unicode
- LaTeX does not like unicode, the preferred way to do é, for example is
\'{e}
, but there is no solid mapping between unicode and LaTeX escapes like this — currently if unicode is encountered, a exception is raised (not deliberately.. some other bug is at work here), and the line is output as <bad unicode on this line> or similar
- LaTeX does not like unicode, the preferred way to do é, for example is
- Math formulae using LaTeX
- Trent.
comment:13 by , 19 years ago
alect: I've tested the patch, and even created a sample Ticket→Excel converter, see #2669; it works great (it only needed a small generalization of the export_csv
methods)
comment:14 by , 19 years ago
Keywords: | mimetype added |
---|
Trent: I've looked at the WikiFormatting.pdf (despite of the difficulty
to access it, because of the #2974 issue…) and it looks promising.
However the current approach is a bit heavy-weight, as you are forced to
reimplement most of the Formatter logic. It would be much better if
there would be a better separation of the parsing/formatting methods,
within the Formatter class. E.g. the _xxx_formatter
could be renamed
_parse_xxx
and would call format_this
or format_that
as appropriate.
by , 19 years ago
Attachment: | content-converter.diff added |
---|
New patch, but interface now returns the default extension for each conversion
comment:15 by , 19 years ago
comment:16 by , 19 years ago
OK, I've implemented it as a plugin (which can probably migrate to trac-hacks), so this ticket can probably be closed. But there are some other issues that implementing this has highlighted. These issues may have arisen because this 'plugin' is still in essence a hack, and they come in light of Christian's comment about the duplicated functionality in the latex formatter vs the html formatter that comes with Trac.
The change from 0.8/0.9 to 0.10 that puts HTML formatting for tickets, changesets, etc. into the extension architecture — ExtensionPoint(IWikiSyntaxProvider) and friends — means that I can no longer hook into the regex bindings for these the way I could for the versions of wikilatex implemented for 0.8 and 0.9. So, for now, I have just disabled them (maybe this is not such an issue for LaTeX, but it would be nice to have a footnote with the ticket description or changeset comment, as I did for 0.8). That is, I am no longer using wiki.rules, but generating my own that don't inlcude those inserted by extensions (which includes ticket, changeset, etc.):
syntax = Formatter._pre_rules[:] syntax += Formatter._post_rules[:] helper_re = re.compile(r'\?P<([a-z\d_]+)>') for rule in syntax: helpers += helper_re.findall(rule)[1:] self.myrules = re.compile('(?:' + '|'.join(syntax) + ')')
(borrowed from trunk/trac/wiki/api.py)
If we want the Formatter logic to be reusable for extensions, then there might need to be a clean way of overriding these or hooking into the functionality (maybe there is and I'm missing it). In any case, it quickly becomes messy because there is no way to anticipate which extensions have been overridden and will start trying to feed 'Element' objects to the processor, rather than strings. For wikilatex to work reliably, all of these would have to be overridden. Also, as Christian points out, the _parsing_ logic in Formatter should be able to be reused. Maybe this should be a new ticket. On that, after all this hacking, I am really starting to hate regex. I would suggest a recursive descent parser to generate a nice parse tree that could be passed to these content converters, but I don't think the grammar is context-free so this would have issues.
So where does this ticket stand?
With the current API, I don't think that Formatter can be reused, and for this plugin to remain maintainable, I feel as though I would have to write all the parsing logic from scratch. That's fair enough, I suppose, because inheriting from Formatter is what makes this a hack rather than a plugin, because I don't think the Formatter is an official API. So, perhaps if I run out of other work to do or feel like procrastinating, wikilatex will become the first attempt at implementing the wiki formatter as a recursive descent parser that can later be incorporated into the core to allow the parse tree used by other output generators (but I wouldn't be so presumptuous to suggest that it be used for the main HTML Formatter, which may be more suited to the regex implementation).
comment:17 by , 19 years ago
Resolution: | → wontfix |
---|---|
Status: | assigned → closed |
Making the Wiki parser reusable by separating the parsing and formatting steps, and using a recursive descent parser instead of a regexp-based engine are two different things. The former can be achieved without the latter, and I have the feeling that getting away from regexps will be bad in terms of performance (see Trac-Dev:316 and the following DrProject's blog entry), and make it less flexible for introducing new constructions and being extensible by plugins.
Also, a Wiki engine is different than a parser for a programming language, as it parses text meant to be read by humans ;)
For Trent's plugin, see TracHacks:PageToLatexPlugin.
comment:18 by , 13 years ago
I did the recurse descent approach. Now there is a tool that generates LaTeX and PDF files:
http://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf
it is available in binary form for Windows and Ubuntu Linux. Furthermore the source code is available under GPL 2. It works form the client side, and does not require any changes on the installation on the servers. So you essentially got the requested feature now.
I think this is an excellent idea, and intend to work on it myself as a patch for a system we are using to allow students to manage group software projects at the University of Sydney. Basically, a lot of their documentation in on the Wiki, but they need to generate a final report — ideally in LaTeX, but some use Word, which we try to discourage. An export to LaTeX would be a very convincing reason to (a) use LaTeX but also (b) maintain hyperlinked documentation on their wikis…