[Wiktionary-l] Wiktionary parsing ; multiple languages

4 Apr 2013

Hi All,

Greeting,

I am a CS grad student from Data Science Lab Stony
Brook<https://sites.google.com/site/datascienceslab/> and I am dropping this mail to
request information about parsing multi-lingual Wiktionary data. Our lab has been using
Wikipedia data for quite a while now but we are really interested in taking advantage of
the massive Wiktionary content which we feel , after proper parsing, can become an rich
muti-language corpus.

But the big hurdle is a parsing tool. We have tried a few Wiktionary parsing tools

1.       https://github.com/clbecker/perl-wiktionary-parser/

2.       https://code.google.com/p/wikokit/wiki/GettingStartedWiktionaryParser

3.       https://github.com/benreynwar/wiktionary-parser/tree/master/wiktionary_pars…

4.       http://www.ukp.tu-darmstadt.de/software/jwktl/

but none of them are available in a ready-to-use or easy-to-extend in multiple language
mode. (I am currently trying to work with wikokit (parser 2 above)  )

I request for some advice, suggestion or redirection towards best available Wiktionary
parser. We are mainly looking to extract meanings, POS, examples, translations etc. (more
can never hurt).

Any help is appreciated. Kindly let know if further information is needed.

Regards,

Moutupsi

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[Wiktionary-l] Wiktionary parsing ; multiple languages