Kate's Lucene-based search server is now up and running experimentally
to cover searches on
en.wikipedia.org. It's compiled with GCJ, so it's
not polluted by any of that dirty icky not-quite-free Sun Java VM stuff. ;)
For those of you new to the game, Lucene is a text search engine written
in Java, sponsored by the Apache project:
http://lucene.apache.org/
Using a separate search server like this instead of MySQL's fulltext
index lets us take some load off the main databases.
To compare our options I did an experimental port to C# using dotlucene;
some benchmarking showed that while the C# version running on Mono
outpaced the Java version on GCJ for building the index, Java+GCJ did
better on actual searches (even surpassing Sun's Java in some tests).
Since searches are more time-critical (as long as updates can keep up
with the rate of edits), we'll probably stick with Java.
More info at:
*
http://www.livejournal.com/community/wikitech/9608.html
*
http://meta.wikimedia.org/wiki/User:Brion_VIBBER/MWDaemon
At the moment the drop-down suggest-while-you-type box is disabled as
GCJ and BerkeleyDB Java Edition really don't get along. I'll either hack
it to use the native library version of BDB or just rewrite the title
prefix matcher to use a different backend.
-- brion vibber (brion @
pobox.com)