Thanks, both of you, for your in-depth answers. It helped a lot :)
I have started a branch (category-redirects) to work on this. So far,
category redirects are working, and I've created a new Job subclass to
handle categorylinks big changes after a category move. Category moves
are apparently working, as far as the target category (not category
page) does not exist.
I am, however, considering to disallow category moves over an existing
category (again, the *category*, not its attached page) :
* When the user does this, he probably expects the categories to be
merged together.
* However as said before, the move has to be reversible: the proper
way to do this is to redirect the former category to the latter;
reversing this is only deleting a redirect.
* But actually, after that redirect, there will be one page for two
category objects. If we merged A and B to B, category A will redirect
to category B, while the only valid Page will be B. That might not be
a big problem, but it differs from the structure I currently use : 1
Category object <-> 1 Page object
* It gets worse if you consider a third Category C, that was
redirecting to A before the move : The user moves/merges A and B, so
he expects C to redirect to the merged categories. However, after the
move, we'll only have a redirect chain : C->A->B. And like double
redirects for pages, it wont work. I don't see a way to resolve this
use case in a reversible manner with my new schema.
Overriding move would be forbidden, and the user would have to create
a redirect. Do you see a case where the user would suffer from
creating a redirect instead of doing a hard, plain move ?
There is a little problem with that constraint, though : if a category
once contained pages, it will still exist. It means that the user can
be asked not to move a category to an... empty category.
The easy answer here is "just check for category membership, and allow
the move when no categorylinks entries point to the target category";
however, how will we deal with CategoryLinksUpdate Jobs that might be
pending ? In other words, the target category might appear empty at
the query time, when a move / redirect change is undergo.
Should I try to address that ? If on moves, I make the job_title the
target category title, searching the job table for
job_cmd=categoryLinksUpdate AND job_title=target could help, but how
does it sound ?
2008/7/1 Simetrical <Simetrical+wikilist(a)gmail.com>om>:
On Tue, Jul 1, 2008 at 12:02 PM, Roan Kattouw
<roan.kattouw(a)home.nl> wrote:
I had the idea joining on integers would be
faster, I guess I
underestimated how much. I didn't know about the primary key thing, but
it makes sense (it's called *primary* key for a reason).
The primary key benefits are specific to InnoDB, since it clusters the
table data in the primary key (basically the table data is in the
leaves of the primary key B-tree, if I understand right). A primary
key lookup is therefore one B-tree lookup instead of two. In MyISAM,
and in many other DBMSes, the primary key isn't special.
With the schema I was backing, yes. However,
schema #1 doesn't eliminate
the need to check for a category's redirect target when adding a page to
it, and since joining on cat_page=page_id is faster than joining on
cat_title=page_title (because of the int vs. varchar and primary vs.
non-primary issues), that would constitute a "specific performance
benefit" for adding cat_page, wouldn't it?
Not a big enough one. Adding an extra column means every row is that
much larger, reducing key buffer efficiency and thereby hurting
performance slightly for all queries on the table. And if you're
going to join on it you also may need an extra index (depending on
join direction), which takes time to maintain on every insert and
delete and also competes for the cache. Plus you get the headache of
denormalization.
In this case it's almost certainly not worth it. In the case of
cl_final, where you're avoiding cripplingly large filesorts, it's
definitely worth it, because otherwise the feature is completely
untenable.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Nicolas Dumazet — NicDumZ [ nIk.d̪ymz ]
pywikipedia & mediawiki
Deuxième année ENSIMAG.
06 03 88 92 29