What's wrong with using dumpHTML.php?
On 10/27/05, Sebastian Albrecht <albrecht(a)fielax.de> wrote:
Hello,
Have you tried wget? Should work under Linux.
Sorry, I Don't Do
Windows(tm)!
Me too ;)
This is what I've done so far and it's ok for me:
It is a sh script you can copy in a file wiki2html. Make it executable
using chmod and execute it.
It will get html files from a wiki using wget and it will try to get
some other files like f.e. main.css or the logo. It will also use SED to
replace absolute paths (/wiki/skins/...) inside css or javascript
elements of downloaded html pages. This is what wget won't do and what
will make the whole thing look a little better (than the printable
format).
Please note this script is quite specific for my personal wiki and you
should have a look at it for using it yourself. The rejected strings of
the wget command can be certainly optimized.
DON'T try to use it at wikipedia because this will not even kill their
servers but your client.
Best regards,
Sebastian
#!/bin/sh
######################################################
#
# WIKI Export script - Wgets a wiki to static html.
#
######################################################
# Check input
if [ "$2" = "" ] ; then
echo "
$0 - Wgets a wiki to static html, 10/2005
This script does a wget to retrieve static html pages from a wiki.
Several wiki typical pages are excluded because they are unimportant
for offline usage (edit, history and special pages).
URLs in the html pages are changed automatically so you can
browse the static wiki offline.
Usage:
$0 <URL_to_wiki> <destination_dir> [<recursive_depth> default=2]
Examples:
$0
http://url/wiki ./wiki
$0
http://url/wiki ./wiki 3
Requires:
sed, wget
"
exit 1
fi
# Define input variables
URL=$1
DEST_DIR=$2
DEST_DIR_COMPLETE=$DEST_DIR/`echo "$URL" | sed 's/[a-zA-Z]*:\/\///g'`
REC_LEVEL=$3
if [ "$3" = "" -o "$3" -le "0" ] ; then
REC_LEVEL=2
fi
# WGET pages recursively
echo "
> Getting wiki pages to static html...
> URL: $URL
> Destination: $DEST_DIR
"
wget \
-nv \
--convert-links \
--page-requisites \
--html-extension \
--recursive \
--level=$REC_LEVEL \
--directory-prefix=$DEST_DIR \
--reject "*edit*,*history*,*Spezial*,*oldid*" \
$URL
# Get main.css for having a nicer static wiki
echo "
> Trying to get some files for more beauty
(main.css, logo.png)...
"
wget \
-nv \
--directory-prefix=$DEST_DIR \
--recursive \
--level=1 \
$URL/skins/monobook/main.css
wget \
-nv \
--directory-prefix=$DEST_DIR \
--recursive \
--level=1 \
$URL/skins/common/images/wiki.png
# Find and replace absolute wiki css paths in static pages
echo "
> Replacing absolute wiki paths...
"
for FILE in `ls $DEST_DIR_COMPLETE/*.html` ; do
sed 's/\/wiki\/skin/skin/g' $FILE > $FILE.new;
done;
for FILE in `ls $DEST_DIR_COMPLETE/*.html` ; do
mv $FILE.new $FILE ;
done;
# Try copying index file
echo "
> Trying to copy index?index=Hauptseite.html to
index.html to have
> an easier entrance..."
cp $DEST_DIR_COMPLETE/*Hauptseite.html $DEST_DIR_COMPLETE/index.html
# DONE
echo "
> FINISHED! Look for the results at
$DEST_DIR_COMPLETE
> Your browser might be able to load following URL:
> file://$PWD/$DEST_DIR_COMPLETE/
"
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)Wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l