Re: [Wikitech-l] International upgrades

10 Mar 2002

On sab, 2002-03-09 at 18:50, Lars Aronsson wrote:
...
  Brion L. VIBBER wrote:
  That's only relevant for accented Latin
characters, obviously. Hebrew,
 Arabic, Cyrillic, Greek, Chinese and Japanese characters still need to
 be retained and searchable.  
 Are we talking about Greek/Hebrew characters in the English/German
 Wikipedia now?  I think users of the English/German Wikipedia won't
 have Greek/Hebrew keyboards, 
Excepting Greeks and Israelis, obviously. ;)

...
  so ASCII searching would do just fine. 
But why bother creating a special separate ASCII-only search, when the
non-Latin code is necessary for other languages and we're using a
unified character set?

Why *shouldn't* I be able to search for the occasional Greek, Hebrew, or
Japanese word in the original spelling on the English wikipedia, if we
allow people to put them in in the first place?

...
  I have no idea how to implement search in the
Greek/Hebrew Wikipedia. 
As stated above: do whatever accent/case/other equivalent conversion is
necessary (exactly as you propose for Latin characters), and perform
some conversion so that MySQL doesn't reject the UTF-8 non-ascii
characters as word separators (in an ideal world, we'd just configure
MySQL to understand UTF-8; otherwise, replacing raw bytes with hex codes
should work fine).

-- brion vibber (brion @ pobox.com)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] International upgrades