On Thu, Jan 12, 2012 at 2:37 PM, Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> wrote:

Hello all,
is there a query language for wiki syntax?
(NOTE: I really do not mean the Wikipedia API here.)

I am looking for an easy way to scrape data from Wiki pages.
In this way, we could apply a crowd-sourcing approach to knowledge
extraction from Wikis.

There must be thousands of data scraping approaches. But is there one
amongst them that has developed a "wiki scraper language" ?
Maybe with some sort of fuzziness involved, if the pages are too messy.
I have not yet worked with the XML transformation of the wiki markup:

*action=expandtemplates **
generatexml - Generate XML parse tree

Is it any good for issuing XPATH queries ?

1. XPATH reqires XML , mediawiki markup is not XML.
2. the only aplication which (correctly!?) expands templates is MedaiWiki itself.
3. You neglected to explain what you are trying to scrape and what constitutes a messy page.

Thank you very much,
Sebastian

--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

_______________________________________________
Wikitext-l mailing list
Wikitext-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitext-l