On Fri, Mar 7, 2008 at 12:15 PM, Ilya Haykinson <haykinson(a)gmail.com> wrote:
For what it's worth, the extension
http://www.mediawiki.org/wiki/DynamicPageList has been in use on
various Wikimedia sites for a while now with great success to allow
for category intersections, and I think the latest versions support
image galleries etc.
We know. DPL is not suitable for use on large wikis.
On Fri, Mar 7, 2008 at 12:17 PM, Jared Williams
<jared.williams1(a)ntlworld.com> wrote:
Yeah did notice that, think it could be replaced with
something like.
SELECT ci_page FROM {$table_categoryintersections} WHERE ci_hash IN
(implode(',', $hashes))
GROUP BY ci_page
HAVING COUNT(*) = count($hashes)
LIMIT $this->max_hash_results
I'm not going to spend too much time parsing that, but it's an
automatic filesort of the entire set included by the WHERE clause,
i.e., the union of all the category intersections in question, since
MySQL doesn't support loose index scans for WHERE x IN (...) GROUP BY
y. Repeated join seems likely to be faster, although maybe not, I
haven't benchmarked it or anything.
Yeah, I think chances of hash collisions are
unlikely, whats far more likely
is someone recategorizing a page after a search. Which means the double
check could be removed.
It's not just unlikely, it's so unlikely as to be impossible to all
intents and purposes, barring deliberately-constructed collisions
(which are possible with MD5, although maybe not for such short
strings, I forget). Worry about a meteor wiping out the data center
before you worry about MD5 collisions by chance on sets with
cardinality in the billions.