[Wikitech-l] Re: full text search by wiki engine

17 Jun 2004

Alexander Prudnikov wrote:
...
  Hello.

 Can you explain me in a few words how Wiki engine performs full-text search in UTF-8
 encoded articles?

 This is a very important problem for me. I have a database in UTF-8. MySQL prior 4.1
 doesn't support full-text search in UTF-8 text. Only alpha-version of mysql
 4.1 is available at the moment. So I don't want to install it.

 I tried to look for the answer in the Wiki sources. But I realized
 that this would take a rather long time. The only thing I understood is
 that search keys are somehow stored in the table 'searchindex'.

 So can anyone tell me the basic idea how Wiki performs the fulltext search?

 Thanks for your time.  

 Best regards,
 Alexander Prudnikov. 

The handling depends on the language. The basic UTF-8 handling is to 
convert to lower case using an internal table, then to encode any 
non-ASCII characters as hexadecimal using bin2hex(). The Chinese and 
Japanese language files have special routines to insert spaces into 
strings, since MySQL uses a word search and those languages don't 
usually use spaces.

The relevant functions are doUpdate() in includes/SearchUpdate.php, and 
stripForSearch() in languages/LanguageUtf8.php .

-- Tim Starling

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: full text search by wiki engine