2010/7/22 Tim Starling <tstarling(a)wikimedia.org>rg>:
An alternative would be to add a column to the
categorylinks table,
say cl_type. It could be an ENUM or some short text type. Then the
index could be altered to include this field at the start of it.
Presumably the rationale for combining these two things into
cl_sortkey is to avoid a schema change, and to make the paging code
slightly simpler. But I worry that future generations of MediaWiki
developers will curse us for laziness and obfuscation.
There is another reason to prefer this schema, which is that the
orginially proposed one is susceptible to weird transition bugs. After
this feature is deployed, there will be old-format (i.e. plain)
sortkeys sticking around in the database for quite some time after
(they won't go away until LinksUpdate fixes them), which means that
pages whose sortkey starts with a C or F will be recognized as
categories and files respectively, even if they're normal pages.
The best way to mitigate that is to populate the namespace information
prior to deployment. In Tim's schema, that means filling the cl_type
field based on page_namespace. In the sortkey prefix schema, that
means prefixing the sortkey with the relevant sortkey, but that also
requires the sortkey updating code has already been updated at that
time (so it doesn't overwrite new-style sortkeys with old-style ones),
which means you'd have to partially deploy the code while running the
population script. Yuck.
Roan is also asking for a store of the plain text form
in this thread.
I have since conceded that it takes space for almost zero gain (as it
doesn't solve the first letter problem and doesn't have much gain for
the API either), see my previous post.
Roan Kattouw (Catrope)