On Tue, 27 May 2003, Lee Daniel Crocker wrote:
I confess ignorance here. Are there really languages
for which the
simplest canonical representation in Unicode requires combining forms?
Off the top of my head, one Aleutian language (Unangam Tunuu) uses
x-with-circumflex; Guarani apparently uses g-with-tilde. Tone marks for
Chinese Zhuyin phoenetic script are combining characters; I think the
Indian scripts are pretty dependant on this kind of thing as well.
Precombined characters are theoretically only included for round-trip
conversion with legacy character sets, so they're not really making new
ones for orthographies that are just getting started in the wonderful
world of character encoding.
If so, then I remove the restriction, but we must then
specify a
specific canonical representation for titles in each language, as you
suggest; perhaps something like a Stringprep profile would be needed.
They've thought of that already too, it seems. :)
See Unicode Standard Annex #15, "Unicode normalization forms":
http://www.unicode.org/unicode/reports/tr15/
-- brion vibber (brion @
pobox.com)