On Fri, Feb 29, 2008 at 11:51 AM, Platonides <Platonides(a)gmail.com> wrote:
Magnus Manske wrote:
I just had the following thought: For a tag
intersection system,
* we limit queries to two intersections (show all pages with categories A and B)
* we assume on average 5 categories per page (can someone check that?)
then we have 5*4=20 intersections per page.
Now, for each intersection, we calculate MD5("A|B") to get an integer
hash, and store that in a new table (page_id INTEGER,intersection_hash
INTERGER).
A mysql integer is 32 *bits* while a md5 hash has 32 *characters*
when
expressed in hexadecimal (16 *bytes*).
If you want a 4-byte hash, maybe you could use Crc32. What's its
collision resistance without the NUL char?
Or, we could use the first 32 bits of the 128-bit MD5 value... ;-)
Or the first 64 bits, to make false duplicates less likely. Does MySQL
have a 64 bit value? If so, how does that scale on 32 bit systems?
Magnus