On Fri, Feb 29, 2008 at 2:50 PM, Steve Sanbeg <ssanbeg(a)ask.com> wrote:
How important are intersections for non-existent
categories? Without
we could have something like (page_id int, cat_intersect bigint) or
(page_id int, cat1 int, cat2 int) to get two cat intersection without
collisions; and maybe even scale up by defining n-intersections
recursively, without collisions.
Maybe, except we don't have category id's. If we did, there would be
no such thing as a nonexistent category, logically: there would be
categories with no associated article pages, but they would still have
category ID's. Unless you're proposing we use article id's, but
currently categories do not need any article associated with them, and
I'm not sure it's valuable to change that.
On Fri, Feb 29, 2008 at 2:59 PM, Thomas Dalton <thomas.dalton(a)gmail.com> wrote:
How fast are ANDs in SELECT WHEREs? I would guess
it's quicker to
search by hash than by 2 ints.
It makes no difference, even if category id's existed (which they
should, and sooner or later will). It's a sub-millisecond query
either way. A B-tree index on (page, 32-bit cat1, 32-bit cat2) would
have exactly the same cardinality as a B-tree index on (page, 64-bit
hash), and values of the same length, so traversing them should take
the same time, I'd imagine. (But I don't know how the storage works
exactly for composite indexes, or anything about B-trees except the
most basic things.)