On Wed, Apr 2, 2008 at 2:03 AM, Samuel Wantman <wantman(a)earthlink.net> wrote:
I don't know if this has been discussed, but
I'm hoping some serious
consideration could be put into creating a category history that can be
viewed and used for reverting.
That would be a very good feature, yes. It's also worth considering
at some point.
On Wed, Apr 2, 2008 at 4:58 AM, Bryan Tong Minh
<bryan.tongminh(a)gmail.com> wrote:
Wouldn't it be easier for upgrading and backwards
compatibility to
keep the current cl_to field which should indicate the category that
is indicated in wikitext, and add a cl_id field, which indicates the
real category that is being pointed to.
cl_to is a VARCHAR(255) times 200 million rows. Being able to get rid
of it would significantly reduce the size (therefore also, to some
extent, improve the speed) of the categorylinks table. Furthermore,
having both the name and ID stored will unnecessarily allow
inconsistency, i.e., it's gratuitously denormalized.
There will probably have to be a transitional period where both fields
are present, just for the sake of updating. However, I'm viewing this
as best made an intra-version period, so it changes totally from one
release to the next. This is a breaking schema change, but we can't
*always* avoid those. We don't have major versions that we can pack
them all into; instead we sprinkle them in minor versions.
On Wed, Apr 2, 2008 at 8:02 AM, Roan Kattouw <roan.kattouw(a)home.nl> wrote:
Simetrical schreef:
Well, the simple SQL query could turn out to be a problem for very
large categories. I might be wrong; a single update may well run
faster than the insert/delete we have right now for large page
deletions.
That's why I suggested using the category table rather than
changing
lots of rows in categorylinks.
Using the category table how? Just changing the id's? It doesn't
work if you want to then change them back, or alter redirects. You
could do a join, but that seems like it would break sorted retrieval.
There is one thing nobody mentioned yet: nonexistent
categories can have
members, so it's possible to move one category on top of another one.
For example, let [[Category:A]] be an existent category and
[[Category:B]] a nonexistent one that does have members. If
[[Category:A]] is then moved to [[Category:B]] (which is allowed, since
the target doesn't exist), the categories would have to be merged. The
thing is that A and B had different category IDs before the move, but
the merged category will only have one ID after the move. This again
means updating category IDs in the categorylinks table. We could
probably use row count estimates here to decide which ID the unified
category gets (A's or B's, depending on which one would result in more
rows being changed) and stuff the UPDATEs in the job queue if both
estimates are unacceptably large.
Why would we want to allow moving one category on top of another? Why
not ban it, and allow people to create a redirect if they want to
"merge" them?