Mark Williamson wrote:
One thing that could be done is to make a list of
homographs -- most
of them are rather obvious -- and to consider, for example,
User:Маtthеw.Вrоwn as "taken" as soon as the username
User:Matthew.Brown is registered (in case you couldn't tell, the
former uses several Cyrillic letters).
There's a list of "confusable" characters as an appendix to TR 39:
http://www.unicode.org/reports/tr39/data/confusables.txt
So it's not necessary to make a list, it's already been done for us.
The question you may ask is, why would people want
characters from
multiple scripts? The answer is that, depending on the language, this
may be fashionable, a pun, or have some other effect. For example, a
username such as User:田中志郎jp is not unreasonable to allow for ja.wp,
especially for a Japanese person named Shiro Tanaka.
Before seriously considering implementing this, we would obviously need to
do some statistics on existing cross-script usernames. We could implement
exceptions as existing culture dictates.
Also, with some scripts, there are few or no
homographs with certain
other scripts -- for example, Hiragana and Basic Latin have no
homographs, you can add Hangul, most Indic scripts, Thai...
Yes, that too.
The main reason I'm advocating validity checking over conflict checking is
my intuition over the elegance, simplicity and consequences of each solution.
* Do we really want to encourage untypeable usernames like "Маtthеw Вrоwn"?
* Remember that conflict checking goes both ways. We could conceivably have
trolls squatting on conflicting usernames of sysops in foreign wikis. This
might be an ambiguous policy situation.
* Conflict checking requires an extra database column, an extra index and an
extra database query on account creation, compared to validation. A script
would need to be written to populate the new column.
* A validation module could be reused in applications where conflict
checking is irrelevant or impossible.
Of course, these aren't particularly strong arguments, feel free to pick
your favourite strawman and give it a beating. But I haven't heard any
particularly strong arguments which favour conflict checking over validation
either.
IDN uses conflict checking because it is aiming to have particularly high
standards of security. It aims to address phishing, where a confusable
domain name may lead to a user being tricked out of large amounts of money.
On Wikipedia, the only problem is temporary confusion over the identity of a
vandal or troll. More experienced editors are no doubt aware that this
confusion can be cleared up by simply clicking the contributions link.
Wikipedia benefits from usernames which are typeable, memorable and
preferably pronounceable. Validation can assist with these goals. In fact,
IDN would probably benefit from attention to these goals as well. Perhaps we
should suggest it to the relevant committees.
-- Tim Starling