(Brion Vibber <vibber(a)aludra.usc.edu>)u>):
* Should not allow Unicode diacriticals,
combining forms, display
forms (ligatures), controls, and other specials.
Waitaminute... that would seem to exclude the use of accented characters
that do not have a precombined form. This could be seriously detrimental
to some languages.
(In any case, we ought to do a little fancier work with UTF-8 to make sure
that canonical forms are used to prevent false non-matches. I don't know
if there's a library we can link into PHP to do this or if we'd have to
write something.)
I confess ignorance here. Are there really languages for which the
simplest canonical representation in Unicode requires combining forms?
If so, then I remove the restriction, but we must then specify a
specific canonical representation for titles in each language, as you
suggest; perhaps something like a Stringprep profile would be needed.
--
Lee Daniel Crocker <lee(a)piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC