Dori wrote:
How hard would it be to come up with these word-stem
normalizers for other
languages (i.e. did you base Esperanto off of another similar language or
did you come up with it yourself relatively easily)? Is there a good
description somewhere on how to come up with them?
I took a quick look at the PorterStemFilter class (for English) that
comes in the Lucene distribution to see which classes I had to inherit
from and what interface to implement, then just whipped up some regular
expressions as a quick hack. It seems relatively straightforward, as
long as the language isn't too tricky. :)
There are other existing filters out there, as mentioned.
-- brion vibber (brion @
pobox.com)