On ĵaŭ, 2002-05-23 at 04:52, Jan.Hidders wrote:
On Thu, May 23, 2002 at 03:42:59AM -0700, Brion L.
VIBBER wrote:
Fulltext search is already broken in a million
ways! It doesn't know
character references (ü, ĉ, į etc), it can't find
partial matches or sounds-likes, it can't find "X" when you search for
"Xs" or "Xs" when you search for "X", it doesn't return
*ANY* results
for words it thinks are too common...
All true, although the last problem will be solved when we move to the new
MySQL and use the boolean search there. But at least there was a
well-defined semantics: if a word shows up in the edit-text the search will
find it. This is now no longer the case and no clever PHP programming can
solve this.
What's not well-defined about "We're having some problems with the
search engine right now; if you don't find what you're looking for at
first, try capitalizing it and search again." ?
UTF-8 is the
least of our problems; it just means that case-folding is a
little trickier (and if we had a decent $*#@%# database, it would take
care of that for us).
Case folding is not the only problem. The problem is that the fulltext index
does not index certain characters above 128. That means that words that
contain multibyte characters that are represented using such characters will
not be indexed. That's a bit harder to explain to the users than the
previous problems.
If it is in fact having problems indexing chars over 128, install the
attached hacked character set definition file and reindex the database.
my.cnf -> /etc/ (or wherever)
Index, custom.conf -> /usr/share/mysql/charsets/ (or wherever)
Run 'myisamchk -r -q' over the tables.
Works great on my machine...
-- brion vibber (brion @
pobox.com)