On Fri, Feb 29, 2008 at 6:35 PM, Steve Sanbeg <ssanbeg(a)ask.com> wrote:
That was my thinking; that categories without a page
ID are probably
typos, and anyway less useful for intersection; if not, the articles could
be added. So using the IDs and some recursion could be simpler and more
scalable than using a hash.
I don't really see that it's simpler or more scalable, particularly.
It does have moderately better locality of reference, although it's
still not great. The denormalization means that half of a category's
entries are scattered across the entire table, where it's in the
second position, and only the half (on average) where it's first will
be clustered in the same pages. I don't know if there's a very good
reason to prefer either way.
Domas, do you have any thoughts on this scheme?