Dear Mr. Kinzler,
  Could you give me an indication if your code is ready for other languages as well? I am asking particularly about the Unicode processing because I am really interested in trying it out in East Asian context (e.g. Chinese, Japanese, and Korean)

Best regards,

--
Liao,Han-Teng
DPhil student at the OII(web)
needs you(blog)
Daniel Kinzler wrote:
My diploma thesis about a system to automatically build a multilingual thesaurus
from wikipedia, "WikiWord", is finally done. I handed it in yesterday. My
research will hopefully help to make Wikipedia more accessible for automatic
processing, especially for applications natural languae processing, machine
translation and information retrieval. What this could mean for Wikipedia is:
better search and conceptual navigation, tools for suggesting categories, and more.

Here's the thesis (in German, i'm afraid): <http://brightbyte.de/DA/WikiWord.pdf>

  Daniel Kinzler, "Automatischer Aufbau eines multilingualen Thesaurus durch
  Extraktion semantischer und lexikalischer Relationen aus der Wikipedia",
  Diplomarbeit an der Abteilung für Automatische Sprachverarbeitung, Institut
  für Informatik, Universität Leipzig, 2008.

For the curious, http://brightbyte.de/DA/ also contains source code and data.
See <http://brightbyte.de/page/WikiWord> for more information.

Some more data is for now avialable at
<http://aspra27.informatik.uni-leipzig.de/~dkinzler/rdfdumps/>. This includes
full SKOS dumps for en, de, fr, nl, and no covering about six million concepts.

The thesis ended up being rather large... 220 pages thesis and 30k lines of
code. I'm plannign to write a research paper in english soon, which will give an
overview over WikiWord and what it can be used for.

The thesis is licensed under the GFDL, WikiWord is GPL software. All data taken
or derived from wikipedia is GFDL.


Enjoy,
Daniel

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

  


--
Liao,Han-Teng
DPhil student at the OII(web)
needs you(blog)