Re: [Wikitech-l] Category Intersections (again)

14 Dec 2006

Okay, I've got some better numbers (and a better method of building the
category intersections table from an existing database).

I build a table with 2.6 million records, all the pages from the snapshot I
had.  It's 90% populated with categories (this will make more sense in a
minute) and I'm still getting good results from intersections queries.  For
instance:

Showing rows 0 - 1 (2 total, Query took 0.3459 sec) SQL query: SELECT *
FROM `pageindex`
WHERE MATCH (catlist
)
AGAINST ( '+Living_People +People_from_Maine +1956_births'
IN BOOLEAN
MODE
)
LIMIT 0 , 30

(which has two results, btw, Cynthia McFadden and David Kelley).

This takes consistently a third of a second.  Not bad.  And that's for
2.6million rows.  I think we should restrict it to just current
articles in the
main namespace, but that's my opinion.

I figured out a better way to populate the table, too:

UPDATE pageindex,categorylinks SET pageindex.catlist=CONCAT_WS(' ',
pageindex.catlist,categorylinks.cl_to) WHERE (
pageindex.pageid=categorylinks.cl_from AND INSTR(pageindex.catlist,
categorylinks.cl_to) =0)

You have to run this query for a number of times equal to the greatest
number of distinct categories a page has.  Most pages have less than 7, but
a few apparently have outrageous numbers.  After doing a lot of searching
around for a solution to concatenating multiple rows into one value, I think
this is pretty good.

Aerik

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Category Intersections (again)