Hello Emmanuel,
You can work with the xml dumps. Import them into
mysql and have a
look to
http://meta.wikimedia.org/wiki/Database_layout
I already had a look into it, but couldn't figure out, how to aquire the
desired information.
The table "categorylinks" only stores to which categorie(s) an article
belongs. Additionally, I need to know:
- which are the parent categories of category x
- which are the subcategories of category x
I couldn't figure out, where these informations are stored within the
database.
Another problem was the import of the xml-file. It stoped somewhere in
the middle (the script continued running, but no entries were written
anymore), without any hint about the cause of the problem.
Parsing the html-docs proved very stable and I would be happy to be able
continuing using it.
Ciao,
Frank