Re: [Wikitech-l] Eliminating homographs in usernames: was: Re: [WikiEN-l] Pruning "dead" accounts (was Re: New York Times article)

21 Jun 2006

Mark Williamson wrote:
...
  One thing that could be done is to make a list of
homographs -- most
 of them are rather obvious -- and to consider, for example,
 User:Маtthеw.Вrоwn as "taken" as soon as the username
 User:Matthew.Brown is registered (in case you couldn't tell, the
 former uses several Cyrillic letters). 
There's a list of "confusable" characters as an appendix to TR 39:

http://www.unicode.org/reports/tr39/data/confusables.txt

So it's not necessary to make a list, it's already been done for us.

...
  The question you may ask is, why would people want
characters from
 multiple scripts? The answer is that, depending on the language, this
 may be fashionable, a pun, or have some other effect. For example, a
 username such as User:田中志郎jp is not unreasonable to allow for ja.wp,
 especially for a Japanese person named Shiro Tanaka. 
Before seriously considering implementing this, we would obviously need to 
do some statistics on existing cross-script usernames. We could implement 
exceptions as existing culture dictates.

...
  Also, with some scripts, there are few or no
homographs with certain
 other scripts -- for example, Hiragana and Basic Latin have no
 homographs, you can add Hangul, most Indic scripts, Thai... 
Yes, that too.

The main reason I'm advocating validity checking over conflict checking is 
my intuition over the elegance, simplicity and consequences of each solution.

* Do we really want to encourage untypeable usernames like "Маtthеw Вrоwn"?
* Remember that conflict checking goes both ways. We could conceivably have 
trolls squatting on conflicting usernames of sysops in foreign wikis. This 
might be an ambiguous policy situation.
* Conflict checking requires an extra database column, an extra index and an 
extra database query on account creation, compared to validation. A script 
would need to be written to populate the new column.
* A validation module could be reused in applications where conflict 
checking is irrelevant or impossible.

Of course, these aren't particularly strong arguments, feel free to pick 
your favourite strawman and give it a beating. But I haven't heard any 
particularly strong arguments which favour conflict checking over validation 
either.

IDN uses conflict checking because it is aiming to have particularly high 
standards of security. It aims to address phishing, where a confusable 
domain name may lead to a user being tricked out of large amounts of money. 
On Wikipedia, the only problem is temporary confusion over the identity of a 
vandal or troll. More experienced editors are no doubt aware that this 
confusion can be cleared up by simply clicking the contributions link.

Wikipedia benefits from usernames which are typeable, memorable and 
preferably pronounceable. Validation can assist with these goals. In fact, 
IDN would probably benefit from attention to these goals as well. Perhaps we 
should suggest it to the relevant committees.

-- Tim Starling

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Eliminating homographs in usernames: was: Re: [WikiEN-l] Pruning "dead" accounts (was Re: New York Times article)