I am so glad that someone re-re-resurrects this topic :-)
On Fri, Oct 23, 2009 at 1:27 PM, Andrew Dunbar <hippytrail(a)gmail.com> wrote:
I've been spending hours on the parsing now and
don't find it simple
at all due to the fact that templates can be nested. Just extracting
the Infobox as one big lump is hard due to the need to match nested {{
and }}
Not perfect, but try
http://toolserver.org/~magnus/wiki2xml/w2x.php
1. Unckeck "Use API", chose "Do not use templates"
2. Enter article name(s)
3. Get XML
4. Parse XML, re-submit the wiki text in templates to process the next
level of templates
I should really offer #4 in this...
Caveat: Will break on things like HTML attributes that are filled by
templates etc.
Cheers,
Magnus