On Fri, Jun 06, 2003 at 01:07:40PM -0700, Brion Vibber wrote:
Currently, before creating a new user account we do
some limited
validation on the given name:
* Trim beginning and trailing whitespace
* Check if it looks like an IP address (four sequences of 1-3 digits with
dots between them), if so reject it.
* Check if there's a slash character, if so reject it [I just added this
check; it was I think supposed to be added when we set up partial subpage
support for userspace, which conflicts with the slash character in names
if remaining problems with the contribs/email sidebar links are fixed.
Unless there's some huge objection... there don't appear to be any valid
usernames on this pattern. Note also that this check applies only to new
names; existing ones which are legitimate would be grandfathered in.]
*Canonicalize the name (run through the title canonicalizer and take the
version without underscores) and check for an exact existing match. If
there is one, reject it.
We may wish to do a case-_in_sensitive check, and/or a
same-except-for-accents check. Or not. Anyway, I think it could use some
tidying up.
For ISO 8859 wikis it sounds right, but we may need stronger checks
for Unicode wikis to protect against people pretending to be someone else.
There are many ways how to make 2 binary different Unicode strings look the same.
Converting to one of normal forms and then checking using allowed range
of characters should do the thing. We may use different allowed sets per-wiki,
so that Asians can use Han characters, Russians can use Cyrillic etc.
But it's not very important right now.