-----Original Message-----
From: wikitech-l-bounces(a)wikimedia.org
[mailto:wikitech-l-bounces@wikimedia.org] On Behalf Of Aerik Sylvan
Sent: 14 December 2006 08:06
To: wikitech-l(a)wikimedia.org
Subject: Re: [Wikitech-l] Category Intersections (again)
Okay, I've got some better numbers (and a better method of
building the
category intersections table from an existing database).
I build a table with 2.6 million records, all the pages from
the snapshot I
had. It's 90% populated with categories (this will make more
sense in a
minute) and I'm still getting good results from intersections
queries. For
instance:
Showing rows 0 - 1 (2 total, Query took 0.3459 sec) SQL
query: SELECT *
FROM `pageindex`
WHERE MATCH (catlist
)
AGAINST ( '+Living_People +People_from_Maine +1956_births'
IN BOOLEAN
MODE
)
LIMIT 0 , 30
(which has two results, btw, Cynthia McFadden and David Kelley).
This takes consistently a third of a second. Not bad. And that's for
2.6million rows. I think we should restrict it to just current
articles in the
main namespace, but that's my opinion.
I figured out a better way to populate the table, too:
UPDATE pageindex,categorylinks SET pageindex.catlist=CONCAT_WS(' ',
pageindex.catlist,categorylinks.cl_to) WHERE (
pageindex.pageid=categorylinks.cl_from AND INSTR(pageindex.catlist,
categorylinks.cl_to) =0)
You have to run this query for a number of times equal to the greatest
number of distinct categories a page has. Most pages have
less than 7, but
a few apparently have outrageous numbers. After doing a lot
of searching
around for a solution to concatenating multiple rows into one
value, I think
this is pretty good.
MySQL has GROUP_CONCAT() if that's what your after.
Jared