[Wikitech-l] Re: Random page bias

9 Jan 2005

Robert Dodier wrote:
...
  Hello,

 If I'm not mistaken, the random page scheme makes some pages
 much more likely to be retrieved than others. If I understand
 correctly, each page is assigned a random number (cur_random)
 when it's created. To retrieve a random page, a random number x
 is generated and the page with cur_random > x and lower than 
 any other cur_random is returned.

 The probability of retrieving a page is proportional to the
 gap between its cur_random and the next lower cur_random.
 If cur_random is generated uniformly, the gaps have an 
 exponential distribution. This implies there will be some
 gaps that are much larger than others. I guess that we could
 directly test this by dumping out cur_random and computing
 the gaps. 
If you like that one, you might be interested in a bug we had with 
Special:Randompage back in 2002 and early 2003. Pages were selected by 
taking the next article with a cur_random value greater than some 
randomly selected value on [0,1). Then, the cur_random value was 
updated, set to another random value. If this new random value was close 
to zero, the chance of it being selected again is small. Hence, the net 
effect is to cause the cur_random of all articles to drift towards zero. 
Eventually, there were only a few hundred articles out of 100,000 which 
had a cur_random value greater than 0.01.

I plotted the distribution of cur_random at the time, and it seemed to 
make a nice curve. I've been interested since then in whether there is 
an analytic form for that distribution. I suspect it tends to a delta 
function as t->infinity, but is there some normalised distribution which 
describes the shape?

-- Tim Starling

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: Random page bias