On 6/8/06, Roman Nosov <rnosov(a)gmail.com> wrote:
Well it looks like my question about why some quotation marks do break
words and others don't will remain unanswered ("rareness" of high
numbered punctuation doesn't make it part of a word) … Anyway if such
level of supporting UTF-8 is sufficient for Mediawiki then Unicode
issue is "solved". Unicode über alles.
I think it was adequately explained - the reason why it isn't detected is
because the algorithm doesn't know it's a seperation character. So it's not
seperated. If the algorithm did know, it would be seperated properly.
So perhaps someone, like you, should submit a quick patch to that part of
the diff engine, as outlined by Tim, that makes it properly interpret that
code point. If there's a general rule or table in the Unicode standard then
implementing that might be an even better option.
The unicode site, by the way, is
www.unicode.org and you can find a database
of unicode character properties here:
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
with information on interpreting them here:
http://ftp.lanet.lv/ftp/mirror/unicode/3.2-Update/UnicodeData-3.2.0.html
Enjoy!
--
Ben Garney
Torque Technologies Director
GarageGames.Com, Inc.