Graham-
I'm planning on using the text of Wikipedia for
research purposes but
I would like to remove the Wikicode, leaving only plain text. Does
anyone know of scripts or programs that do this automatically?
I advise against using this approach, since you effectively have to
parse the wikitext anyway. This especially applies to templates -- the
syntax {{Abc}} dynamically includes the text {{Template:Abc}} in the
page where it is used. Templates are extensively used on Wikipedia,
including some with parametrized name/value substitution, and if you use
the wikitext as a starting point, you will have to dynamically load and
process them yourself.
I recommend generating a static HTML dump instead and converting it to
plaintext, for which there are a number of tools (notably lynx -dump).
There is a basic static HTML dumper in the current CVS version of
MediaWiki: maintenance/dumpHTML.php - see Tim Starling's mailing list
post on it:
http://mail.wikipedia.org/pipermail/wikitech-l/2005-April/028741.html
All best,
Erik