Earlier today I noticed that it was taking quite a long time to refill
the random page queue; this currently works by:
INSERT
INTO random(ra_current,ra_title)
SELECT 0,cur_title
FROM cur
WHERE cur_namespace=0 AND cur_is_redirect=0
ORDER BY RAND()
LIMIT 1000
Is it actually trying to reorder all 100,000+ entries randomly and then
taking the first 1000, or does it just _seem_ that slow? And is there
actually a faster way of doing this?
Off the top of my head; what if the random queue listed *every* page;
each page would be queued when it was created (and removed if deleted or
changed to non-"article" state), and associated with a random index
number. When asked to view a random page, we sort on the random index
column (which would be indexed!) and take the lowest number; then assign
that same article a new random index.
This wouldn't require the occasional delay to refill the queue, and
since the random index would be indexed, retrieval should be quick
enough even with a large number of articles.
Thoughts?
-- brion vibber (brion @
pobox.com)