Robert Dodier wrote:
Hello,
If I'm not mistaken, the random page scheme makes some pages
much more likely to be retrieved than others. If I understand
correctly, each page is assigned a random number (cur_random)
when it's created. To retrieve a random page, a random number x
is generated and the page with cur_random > x and lower than
any other cur_random is returned.
The probability of retrieving a page is proportional to the
gap between its cur_random and the next lower cur_random.
If cur_random is generated uniformly, the gaps have an
exponential distribution. This implies there will be some
gaps that are much larger than others. I guess that we could
directly test this by dumping out cur_random and computing
the gaps.
If you like that one, you might be interested in a bug we had with
Special:Randompage back in 2002 and early 2003. Pages were selected by
taking the next article with a cur_random value greater than some
randomly selected value on [0,1). Then, the cur_random value was
updated, set to another random value. If this new random value was close
to zero, the chance of it being selected again is small. Hence, the net
effect is to cause the cur_random of all articles to drift towards zero.
Eventually, there were only a few hundred articles out of 100,000 which
had a cur_random value greater than 0.01.
I plotted the distribution of cur_random at the time, and it seemed to
make a nice curve. I've been interested since then in whether there is
an analytic form for that distribution. I suspect it tends to a delta
function as t->infinity, but is there some normalised distribution which
describes the shape?
-- Tim Starling