Context Navigation

Modify ↓

#5241 closed defect (wontfix)

Conversion of MediaWiki database to Trac Wiki database.

Reported by:	Koen Werdler <werdlerk@…>	Owned by:	Jonas Borgström
Priority:	normal	Milestone:
Component:	general	Version:
Severity:	normal	Keywords:	trac faq wiki mediawiki
Cc:	rganz@…, dclark@…	Branch:
Release Notes:
API Changes:
Internal Changes:

Description

The script provided on the wiki:TracFaq page doesn't work for MediaWiki 1.5 since the database structure has been changed.

Attached is the script I used to export the pages from Mediawiki and then import them into Trac.

Maybe someone can put this at the FAQ for me since I don't know how I should do this?

Thanks

Attachments (5)

mediawiki2trac.ph (2.1 KB ) - added by Koen Werdler <werdlerk@…> 18 years ago.: Modified MediaWiki 1.5 to Trac script (original from wiki:TracFaq)
mediawiki2trac.py (4.9 KB ) - added by jason.dusek@… 17 years ago.: This script handles revisions, User: pages and Talk: pages. It exports to SQL.
mediawiki2trac.2.py (5.3 KB ) - added by jason.dusek@… 17 years ago.: This version translates links correctly.
mediawiki2trac.3.py (5.2 KB ) - added by jason.dusek@… 17 years ago.: This one does links even a little more nicely.
mw2tw.py (6.3 KB ) - added by jason.dusek@… 17 years ago.: Grabs image data and puts it into the Trac tables. Links are more nicely formatted.

Download all attachments as: .zip

Change History (18)

by Koen Werdler <werdlerk@…>, 18 years ago

Attachment:	mediawiki2trac.ph added

Modified MediaWiki 1.5 to Trac script (original from wiki:TracFaq)

comment:1 by Koen Werdler <werdlerk@…>, 18 years ago

mediawiki2trac.ph should've been named mediawiki2trac.py :/

comment:2 by cobwebsmasher@…, 18 years ago

Thanks for this script. This is actually useful to me as I have an existing Mediawiki installation that folks want as part of Trac now.

Question: This code doesn't appear to address attachments. Is this a known issue or does trac-admin import mystically take care of this somehow?

comment:3 by anonymous, 18 years ago

Cc:	werdlerk@… added

comment:4 by cobwebsmasher@…, 18 years ago

Owner:	changed from Jonas Borgström to anonymous
Status:	new → assigned

OK. Well I've done some research about attachments.

MediaWiki handles attached files in a completely different way than Trac does. In Trac, you have attached files that are associated with a given page. Whereas in MediaWiki you have files that are uploaded and it's up to the wiki-editors to create links to the uploaded documents.

So, the process for dealing with "attachments" depends on a couple of factors.

If you want your downloaded documents to be unique in Trac like they are in mediawiki, then that's a problem. AFAIK, uniqueness in mediawiki is based on the filename whereas in Trac it's a combination of filename and what Wiki page the attachment is associated with.

So if you have an attachment that is linked in multiple places from on your MediaWiki page, then to have the same effect you'd either have to set up an independent web for these attachments (so all things would link to the same file) or you need to give up on the notion that the file is unique and that you have attachments that represent copies of the file in question.

Another interesting bit is the hash encoding that can occur with how Mediawiki stores the actual files.

I haven't looked into this supremely closely, but it appears that storing the files can be toggled between two methods. The method that is enabled on our MediaWiki instance is based on the following:

Attached files are stored in an images/ folder (and possibly a media/ folder. We don't have one, but our version is 1.6 and we only have the images/ folder to store uploads).

The subfolders that the actual attachment is stored in is based on the first 2 characters of the md5 hash of the filename. You can derive the file path using the following SQL query:

select 
img_name, 
concat(left(md5(img_name),1), '/', left(md5(img_name),2), '/', img_name) as path 
from image;

So at this point, I'm going to modify the script you have here to include the possibility of bringing attachments along, moving the downloaded files to the appropriate trac directory and updating the trac database accordingly.

See you back here in a few minutes.

comment:5 by Emmanuel Blot, 18 years ago

Owner:	changed from anonymous to Jonas Borgström
Status:	assigned → new

comment:6 by anonymous, 17 years ago

Cc:	rganz@… added

by jason.dusek@…, 17 years ago

Attachment:	mediawiki2trac.py added

This script handles revisions, User: pages and Talk: pages. It exports to SQL.

by jason.dusek@…, 17 years ago

Attachment:	mediawiki2trac.2.py added

This version translates links correctly.

by jason.dusek@…, 17 years ago

Attachment:	mediawiki2trac.3.py added

This one does links even a little more nicely.

comment:7 by jason.dusek@…, 17 years ago

Summary:	Modified mediawiki 2 trac script → Image grabbing `sh` code.

I found out how to get the images and put them in an "Image page" that is kind of like the MediaWiki image page. I'll post the Python for generating the database part of that in a minute. In the meantime, I'll post the sh I used to gather and organize the images:

 :; find <your mediawiki installation>/images -type f > image-like-files 
 :; egrep -v 'archive|README' image-like-files > f
 :; cat f | sed -r "s|^.+/([^/]+)$|mkdir 'Image/\1' \&\& cp '&' 'Image/\1/\1'|" | sh

by jason.dusek@…, 17 years ago

Attachment:	mw2tw.py added

Grabs image data and puts it into the Trac tables. Links are more nicely formatted.

comment:8 by werdlerk@…, 17 years ago

Cc:	werdlerk@… removed

comment:9 by jason.dusek@…, 17 years ago

Summary:	Image grabbing `sh` code. → Conversion of MediaWiki database to Trac Wiki database.

Still not really done, alas. The data store in MediaWiki is designed to be used, not transformed. What I've worked out so far:

Recovery of User:, Talk: and Image: pages.
Recovery of image metadata.
Recovery of revision history.
Reformatting of links, so they are both functional and attractive.

What I have not done:

The User_talk: pages are omitted.
Any HTML code in the MediaWiki documents is simply ignored.
Page moves and archived images are ignored.
Anonymous users, with just an IP address, are ignored.

If you ever find yourself afflicted with this task, you have my blessing.

comment:10 by dclark@…, 16 years ago

I was getting:

Traceback (most recent call last):
  File "./mw2tw.py", line 256, in <module>
    db.query(query)
_mysql_exceptions.OperationalError: (1054, "Unknown column 'cs_page.page_title' in 'on clause'")

with mysqld 5.0.51a - putting the FROM clause stuff in parens seems to have fixed the problem:

mw2tw.py

-              old
+              new
         ${p}image.img_size,
         ${p}page.page_namespace,
         ${p}revision.rev_page
       FROM
+      FROM (
         ${p}page,
         ${p}revision,
         ${p}user,
         ${p}text
+        ${p}text )
       LEFT JOIN ${p}image ON
         ${p}page.page_title = ${p}image.img_name
       WHERE

comment:11 by dclark@…, 16 years ago

Cc:	dclark@… added

Some more patches (includes the above patch). Translate some more syntax; make mediawiki headings like =heading= (no spaces) work; work with the mediawiki convention of the first letter always being in caps; use spaces instead of underlines in wiki links.

mw2tw.py

-              old
+               (this hunk was shorter than expected)
 pairs = [
     ("\n***","\n   *"),
     ("\n**", "\n  *"),
     ("\n*",  "\n *"),
+    ("\n#",  "\n 1."),
     ("<br>","[[BR]]"),
     ("\n:","\n "),
+    ("<pre>","{{{"),
+    ("</pre>","}}}"),
+    ("<code>","{{{"),
+    ("</code>","}}}"),
+    ]
+repairs = [
+    (r"(\=)([^\=]+)\=(\n)",r"\1 \2 \1\3"),
+    (r"(\=\=)([^\=]+)\=\=(\n)",r"\1 \2 \1\3"),
+    (r"(\=\=\=)([^\=]+)\=\=\=(\n)",r"\1 \2 \1\3"),
+    (r"(\=\=\=\=)([^\=]+)\=\=\=\=(\n)",r"\1 \2 \1\3"),
+    (r"(\=\=\=\=\=)([^\=]+)\=\=\=\=\=(\n)",r"\1 \2 \1\3"),
+    (r"(\=\=\=\=\=\=)([^\=]+)\=\=\=\=\=\=(\n)",r"\1 \2 \1\3"),
+    ]
 wiki_link_catcher = re.compile(r"""
 …
 def link_rewriter(match):
     (link, label) = match.group(1, 3)
     def wrap(a, b=()):
         return '[wiki:' + a.replace(' ', '_') + ' ' + (b or a) + ' ]'
+        return '[wiki:"' + (a[0].upper() + a[1:]).replace('_', ' ') + '" ' + (b or a) + ']'
     if link.startswith("Image:"):
         return '[[Image(wiki:Image/' + link[6:] + ':' + link[6:] + ')]]'
     return wrap(link, label)
 …
     """ convert from mediawiki text to trac text """
     for (mw, tw) in pairs:
         mw_text = mw_text.replace(mw, tw)
+    for (mw, tw) in repairs:
+        #print >> sys.stderr, mw_text
+        mw_text = re.sub(mw, tw, mw_text)
     return q(wiki_link_catcher.sub(link_rewriter, mw_text))
 def title_fixer(namespace, title):
+    title = (title[0].upper() + title[1:]).replace('_', ' ')
+    #print title
     if namespace is 0:
         return q(title)
     if namespace is 1:
 …
         ${p}image.img_size,
         ${p}page.page_namespace,
         ${p}revision.rev_page
       FROM
+      FROM (
         ${p}page,
         ${p}revision,
         ${p}user,
         ${p}text
+        ${p}text )
       LEFT JOIN ${p}image ON
         ${p}page.page_title = ${p}image.img_name
       WHERE

comment:12 by Christian Boos, 15 years ago

Milestone:	not applicable
Resolution:	→ wontfix
Status:	new → closed

Maybe time to create a NewHack from this script.

comment:13 by hinnerk.bruegmann@…, 15 years ago

In case someone has any use for this - I refactored the above into PHP adding the importing features I missed from the python version. See features and install/usage guide at:

http://consense-project.com/blog/mediawiki_export_to_trac-wiki

Modify Ticket

Change Properties

Summary:
Description:	The script provided on the wiki:TracFaq page doesn't work for !MediaWiki 1.5 since the database structure has been changed. Attached is the script I used to export the pages from Mediawiki and then import them into Trac. Maybe someone can put this at the FAQ for me since I don't know how I should do this? Thanks You may use WikiFormatting here.
Type:		Priority:
Milestone:		Component:
Version:		Severity:
Keywords:		Cc:	Set your email in Preferences
Branch:
Release Notes:
API Changes:
Internal Changes:

Action

leave as closed The owner will remain Jonas Borgström.

reopen The resolution will be deleted. Next status will be 'reopened'.

change ownership to The owner will be changed from Jonas Borgström to the specified user.

Add Comment

Your email or username:

E-mail address and name can be saved in the Preferences .

You may use WikiFormatting here.

Attachments ↑ Description ↑

Note: See TracTickets for help on using tickets.

Download in other formats: