On Thu, Feb 28, 2008 at 3:39 AM, Samuel Wantman <wantman(a)earthlink.net> wrote:
I'm wondering about creating a new namespace,
called (you guessed it)
INDEX. Any category of people could be put in an index by adding
[[Index:People]] on the category page. The "People" INDEX page, into
which the category get put, would have links to all the articles and
subcategories from the categories in the INDEX. The contents of the
subcategories of those categories would NOT be added automatically.
Each would have to be manually added to the index if appropriate. Just
like a category there would be text that could be edited for each INDEX
page. So in essence, an INDEX is a way to do category unions. This
would be much, much easier than trying to create and maintain these
indexes manually using categories.
So you're basically suggesting manually-created but
automatically-populated category unions. Category unions are not so
hard to do on the backend. They aren't great, though, if you want to
retrieve in sorted order. It's possible to do so if you're okay with
some fairly sharp restrictions, like unioning a max of three
categories. But in MySQL, I'm not sure there'd be an efficient way to
union a *large* number of categories and retrieve the results in
sorted order.
For a small number of categories, you can just do a MySQL UNION, like this:
mysql> EXPLAIN (SELECT * FROM categorylinks WHERE
cl_to='Living_people' ORDER BY cl_sortkey LIMIT 200) UNION ALL (SELECT
* FROM categorylinks WHERE cl_to='Vegetables' ORDER BY cl_sortkey
LIMIT 200) ORDER BY cl_sortkey LIMIT 200;
+----+--------------+---------------+------+-------------------------+------------+---------+-------+--------+----------------+
| id | select_type | table | type | possible_keys |
key | key_len | ref | rows | Extra |
+----+--------------+---------------+------+-------------------------+------------+---------+-------+--------+----------------+
| 1 | PRIMARY | categorylinks | ref | cl_sortkey,cl_timestamp |
cl_sortkey | 257 | const | 543730 | Using where |
| 2 | UNION | categorylinks | ref | cl_sortkey,cl_timestamp |
cl_sortkey | 257 | const | 31 | Using where |
| NULL | UNION RESULT | <union1,2> | ALL | NULL
| NULL | NULL | NULL | NULL | Using filesort |
+----+--------------+---------------+------+-------------------------+------------+---------+-------+--------+----------------+
3 rows in set (0.04 sec)
This filesorts, but only a limited number of rows: the maximum number
of rows times the number of categories. This is potentially
acceptable (although undesirable) for a small number of categories in
the union, especially if the limit (in this case 200) is small, say
more like 20. For a large number of categories with a reasonable
limit size you could easily be talking filesorts of thousands of rows,
which isn't really acceptable.
The thing is, I'm pretty sure (although I'm not a computer science
whiz) that MySQL should be able to use a merge sort here, rather than
an explicit sort. That might be acceptably fast. You'd still have to
scan a lot of index rows, but at least you wouldn't have to sort them.
I don't know if there's any way to get it to do a merge sort here,
though.
On Thu, Feb 28, 2008 at 10:06 AM, Ben <chuwiey(a)gmail.com> wrote:
The solution to your idea/request exists in the
combination of
SemanticMediaWiki and the Halo Extension - and in fact, implementation
could be quite easy, by adding the semantic properties in the different
taxonomy templates.. So for an example taxonomy dealing with people:
Name (property) : <nameofperson>
Profession (property) : <professionofperson>
and so on..
Unfortunately, Semantic MediaWiki is not efficient enough to be
enabled on Wikipedia. This kind of problem is very easy to solve
inefficiently but hard to do scalably.
On Thu, Feb 28, 2008 at 12:20 PM, Jim Hu <jimhu(a)tamu.edu> wrote:
This also leads to massive issues about whether
Categories in
Wikipedia are a well-formed ontology (which is a fancy way of
expressing Lars Aronsson's reply). I'm barely conversant in
ontologies through my participation in Gene Ontology activities as a
newbie, but my gut reaction is .... not even close.
It has been previously observed that there are quite a few cyclic
subcategory relationships on Wikipedia, so if that precludes being a
"well-formed ontology", then yeah, it's not.