On Tue, May 20, 2003 at 11:29:31AM -0700, Brion Vibber wrote:
At some point in the near future I'll be adding in
a per-language sort
order adjustment, so that various sorted lists should turn out in more
or less correct order for a change. :)
I'd appreciate pointers to descriptions of various languages' sorting
requirements so I can try to get them right.
I don't know if we can handle Japanese and Chinese sensibly, but
alphabetic languages should generally work fairly well by making a
munged copy of the string such that, eg, if "ó" sorts as the same as
"o" we just change it to "o"; if "ó" sorts after
"o" (as in Polish
IIRC), it becomes "o~", which should always sort after any "o" and
before any "p" in a binary ASCII-order string sort.
Simple replacements should generally work, though we can also do more
complicated replacements of certain sequences of characters.
1.
In some languages certain letter pairs are treated as single letter,
for example in Czech, "ch" is a letter, so "ca", "cz",
"ch", "da"
would be the correct sort order ;)
Polish is 100% sane about that, maybe with exception of having two
diactrics based on z (order: y z z' z.).
2.
Some languages sort first by primary then by secondary characteristics,
so it's *not lexicographical order*
For exampre to sort Japanese kana you have to:
if (strip_"_ond_o(x) != strip_"_ond_o(y))
return strip_"_ond_o(x)-strip_"_ond_o(y);
else
return x-y;
So order is like: kou gou kouin.
Then, sorting kanji is even worse.