Brion Vibber wrote:
Tim's started a script for this, it's in the
maintenance directory in
CVS. This is the development version of MediaWiki and is not directly
usable yet on current page download dumps as the database format has
changed.
I guess I'd better say a few words about it, since the topic keeps
coming up. The script produces quite nice HTML dumps, with HTML files
distributed between directories identified by the first two bytes. The
link URLs are rewritten appropriately. This is useful for English wikis
since it allows you to guess the URL, but it doesn't work so well for
languages with a different orthography. Doing it by character rather
than byte could be useful, however that would give a much broader tree,
especially for the CJK languages.
It's currently clumsy to use, requiring you to move HTMLDump.php from
skins/disabled/ to skins/, which unfortunately enables the skin in the
user interface. Any user who changed their skin to it would find that
there are no user preference links allowing them to get back (the
interface is greatly stripped down). We need to make a distinction
between valid skins and skins available for users.
It rewrites stylesheet and image URLs to relative URLs in hard-coded
paths (../../../images and ../../../skins). This needs to be made more
flexible. It doesn't currently rewrite URLs for images from commons, or
provide a way to package these images.
So it's a start, but there's still plenty of things to do. Luckily,
porting the parser to a different language isn't one of them. I'm not
working on it at the moment, so I won't mind if someone picks it up
where I left off.
-- Tim Starling