-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
As some of you may know, a German company (DirectMedia) is currently
working on publishing a snapshot of the German wikipedia on DVD. They
use XML as their internal format.
To help them out, I patched together a database-to-XML converter. Turns
out, though, that they've just written their own.
So, this converter is sitting there [1] without a job ;-)
If you have a Windoze installation running, with access to a MediaWiki
(pre-1.5) database, you might want to try it. It takes the database
parameters and generates a single xml file (out.xml in the installation
directory) containing the XML of all the articles in the main namespace
from that database. It can also automatically reslove templates, if
desired. Installation and running is dead-easy.
It does *not* use the Flex/Bison-parser (I have developed a slight Bison
allergy when exposed to it;-) but works quite well otherwise. It also
generates XML extremely similar to what the Bison parser would do. I ran
about 10000 articles from a de database dump, and the generated XML was
valid.
The software is GPL, but I didn't upload the source anywhere yet. I can
mail you source and instructions if desired.
A slight variation of this software might be quite useful a number of
projects. For example, it could esaily be altered to take a number of
article titles and generate XML only for these. The XML could then be
converted to PDF or whatever.
Just letting you know,
Magnus
[1]
http://www.magnusmanske.de/stuff/StaticWikiInstaller.exe
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (MingW32)
Comment: Using GnuPG with Thunderbird -
http://enigmail.mozdev.org
iD8DBQFCLW3OCZKBJbEFcz0RArDCAJ9sndURQ3bt/Mt9TDaCNz2aMOjIZQCfS78E
8Q5m+vQ+hFB+6QHwqOsMBsw=
=DKB3
-----END PGP SIGNATURE-----