[Foundation-l] New wikistats

epzachte at chello.nl epzachte at chello.nl
Wed Aug 31 15:32:02 UTC 2005


Cormac wrote: 
> For July 13th, under the columns C and D, I have the numbers 6,884 and
> 798 respectively. Below this then are two much higher figures (11,285
> and 1,629) - are these the maximum estimates and the above published
> figure the conservative estimate, or how does it work exactly?

Cormac, the July 13th figures are actual counts.
The much higher July figures below it are forecasts for the complete month. Hence the +/- sign.

These forecasts are based on looking at previous three months and calculating the proportion of wikipedians
that fulfilled the criteria on day x, versus the number that did so at the end of the month. 

The resulting multiplication factors for all wikipedias together are combined into a weighted average, to minimize distorting effects of peaks in activity in just one wikipedia in previous months.

This is better than just multiplying the actual counts for the 13th by 31/13 to arrive at full months forecasts.

Especially with columns C and D, as the increase in wikipedians that fulfill the norm for C or D is highly unlinear over a month. 

This is also why for C: 6884/11285 > 0.5 and for for D: 798/1629 < 0.5 
In words: more than half of the wikipedians that would count as active at the end of a month already made the grade at the 13th, most wikipedians who finally qualify for very active only do so later in the month.

When wikistats is run on 6th of month or earlier no forecasts are given as margin of error would be too high.

Erik Zachte




More information about the foundation-l mailing list