On Fri, May 13, 2011 at 3:31 AM, M. Williamson <node.ue(a)gmail.com> wrote:
I still don't think page titles should be case
sensitive. Last time I asked
how useful this really was, back in 2005 or so, I got a tersely-worded
response that we need it to disambiguate certain pages. OK, but how many
cases does that actually apply to? I would think that the increased
usability from removing case sensitivity would far outweigh the benefit of
natural disambiguation that only applies to a tiny minority of pages, and
which could easily be replaced with disambiguation pages.
From a software perspective, the way to do this would
be to store a
canonicalized version of each page's title, and require that to
be
unique instead of the title itself. This would be nice because we
could allow underscores in page titles, for instance, in addition to
being able to do case-folding.
Note that Unicode capitalization is locale-dependent, but case-folding
is not. Thus we could use the same case-folding on all projects,
including international projects like Commons. There's only one
exception -- Turkish, with its dotless and dotted i's. But that's
minor enough that we should be able to work around it without too much
pain.
Some projects, like probably all Wiktionaries, would doubtless not
want case-folding at all, so we should support different
canonicalization algorithms. Even the ones that don't want
case-folding could still benefit from allowing underscores in titles.
But all this would require a very intrusive rewrite. Assumptions like
"replace spaces by underscores to get dbkey" are hardwired into
MediaWiki all over the place, unfortunately. It's not clear that it's
worth it, since there are downsides to case-folding too. It might
make more sense to auto-generate redirects instead, which would be a
much easier project that wouldn't have the downsides.