Robert Stojnic wrote:
After much delay, I've completed a new release
candidate for our internal
search engine. The testing site where you can see it action is same as
before [1], with indexes rebuilt from latest dumps.
Here are some highlights:
* spell checking (aka did you mean...)
* ajax prefix suggestions (reimplemented Julien's engine)
* nicer highlighting
* improved scoring
* fuzzy queries, e.g. sarah~ thomson~ will give you all the variations
of both of the words
* suffix wildcards (works on title words only), e.g. *stan will give you
all the -stan countries of central asia - for performance reasons it
won't work nicely on huge sets of words
Sweeeet! :)
Search is a bit slowish, especially on enwiki, since
I've crammed all of
its revision text, spellcheck indexes, search indexes and other stuff on
a single host. According to my tests, typical search should be in
150-180ms range (of CPU time), which is much slower than current (25-30ms).
Most overhead comes from spell checking and highlighting. I was
thinking of trying to use some of the 8-cpu boxes...
Yeah, we might need to dedicate more hardware to handle that.
The ajax suggestions (when properly cached in RAM) are
pretty fast
(0.2-0.4ms), so we could probably enable it side-wide on search boxes
and such. Initially it would be update once a day, but we could cut
that down, depending on number of servers and actual number of requests.
Cooooool!
-- brion