Jim Hu wrote:
Ran into a weird dump problem yesterday, which has me
wondering if
there's a problem with my artificially created xml for upload.
Here's what happens. I have a script that builds wiki pages from an
external source and embeds them in xml for upload via importDump.
The script can be toggled to either generate a single page or a bunch
of them. The same script failed to load some pages that load just
fine if you specify them individually, but ImportDump.php does NOT
crash during the import.
I suspect that there is something wrong with the upstream items, but
I can't find it. The Brown Univ XML validator complains about the
following:
line 3, ecoliwiki20070730123135.xml:
error (1102): tag uses GI for an undeclared element: mediawiki
That sounds like you didn't include a schema declaration (dunno what
your thingy takes, maybe it's doctype only?)
There's an XML Schema description file -- you can use any XML Schema
validator, such as one of the demo scripts packaged with the Apache
Xalan java library, to run over your .xml file.
In theory, anyway. :)
line 166616, ecoliwiki20070730123135.xml:
error (1012): reference to undeclared entity:
line 166616, ecoliwiki20070730123135.xml:
error (1003): entity (or its expansion) is invalid:
line 166616, ecoliwiki20070730123135.xml:
error (1012): reference to undeclared entity:
line 166616, ecoliwiki20070730123135.xml:
error (1003): entity (or its expansion) is invalid:
line 184234, ecoliwiki20070730123135.xml:
The only predefined named character reference entities in XML are <
> and &.
For any other characters that you really intend to be interpreted *as
the character*, use decimal or binary codes -- eg   or  
For things you want to appear *as the HTML character reference* you need
to escape the & as & for instance "&nbsp;" to be producing
correct XML.
error (402): EOF encountered; no doctype declaration
found: mediawiki
but I'm pretty sure these are all red herrings. So...is there a
validator out there I should be using?
-- brion vibber (brion @
wikimedia.org)