Hi,
I worte a little PHP object that takes
* a list of article IDs
* a list of categories
and returns those article IDs which belong to these categories *or their
subcategories*.
The algorithm is written to minimize the number of read queries when
parsing the category tree. The maximum number of read queries is the
"tallest" category tree in the set of articles (times two, one for the
"parents", one for their page_ids), which IMHO should mostly be below 10
(I guess it's usually 5 or less, but I have no data to back that up).
So, if the tree depth for an article is 8, and I add more articles to
search for which all have a depth of 8 or less, no additional database
queries are neccessary. The individual query will grow in size, though.
I intend to use it on the Tasks feature search, which is why I put it
into that extension ("extensions" module,
"Tasks/categoryfinder.php").
But I hope this can be used for other searches as well. The idea is,
instead of searching for, say, 25 articles, to internally search for
more (e.g. a few hundred), then filter that set through the
categoryfinder, until there are 25 matching articles.
Before I start implementing the actual search interface, can anyone tell
me if that would put too much stress on the DB slaves? Keep in mind that
limiting several searches to "articles in [[Category:Physics]] and its
subcategories" appears to be immensely useful, so it might be well worth
the DB stress from a user standpoint. (Damn you, users! ;-)
Magnus