[Mediawiki-l] Mass Import
John Blumel
johnblumel at earthlink.net
Wed Apr 13 20:57:17 UTC 2005
On Apr 13, 2005, at 4:23pm, Wolfe, Jeff wrote:
> I'm seeking a way to mass import lots of data into a MediaWiki. I can
> massage my data in most reasonable ways and have direct access to the
> database. I can use existing PHP, generate fake URLS, or hit the SQL
> database directly. Does anyone have a suggestion?
I'm working on a similar issue and decided to load the data through
MediaWiki's web interface, using a bot written in Perl (using LWP). I
went that way for a couple of reason's, chiefly because I want the
original submission attributable to a specific source (depending on the
user name I give the bot) and I want all the file updates that normally
take place (category assignment, recent changes, etc.) to occur without
me having to worry about what exactly the MediaWiki code does and when
it does it.
One of my sources has about 900 entries and there are several others
that are smaller, so it's a lot less work than creating all these
entries manually, even though some of the sources are non-trivial to
parse, and I expect fewer errors in the final text using this method.
I'm also creating category info off the extracted data and will insert
that into the final wiki text before it is uploaded so that the
submitted entries will be assigned to specific categories
The bot, in this case, simply does the work of submitting the generated
entries and I'm creating individual scripts to parse the various source
materials. The next step is to generate HTML output (1 file per entry)
from the data files I've generated (also individual scripts since the
sources contain different types of information) and then convert that
to wiki text for the bot to upload. (I could skip the HTML but I'd like
to be able to "preview" a sampling of the entries before I start
uploading them and it's not that much more work.) I'll probably also
create a second bot to delete a set of entries, just so that I can get
rid of the entries resulting from "test runs" on a test wiki I set up.
You're welcome to the scripts I'm working on, although, none of them is
completely finished at the moment, other than a couple of parsing
scripts that wouldn't be of much use to you.
John Blumel
More information about the MediaWiki-l
mailing list