On 13/04/06, Jakob Voss <jakob.voss(a)nichtich.de> wrote:
Search engines don't update their search index
live with every new item.
The problem with Wikipedia is its size and the quick changes. Normally
you would generate a new index every week or night - and to generate a
search index for millions of records takes hours! A powerful MediaWiki
search engine with a time lag of 1 to 2 days would also be fine for me -
you could also think of a smart search engine that works on an old dump
in the first run and checks on the live database in the second.
That would be more than fine. I gather the search db is currently
several months out of date? But that wasn't my major complaint.
To get such a powerful search it's better to build
it from the scratch
in an independent application instead of coding it into MediaWiki (but
I'm no MediaWiki developer so I may be wrong) so you can optimize for
searching only.
Well, it should be as easily accessible as the search box is now.
SELECT page_id FROM page WHERE page_title RLIKE $regxp
AND $conditions
LIMIT $limit
That would be nice, but even the simple mechanism of exact matches
would be a start. And then you can add fall backs, like all upper
case, all lower case, upper case first letter of each word and so on.
If performance is the issue here.
Steve