[Foundation-l] Tragical dynamics: that run for the number of articles

Stephen Bain stephen.bain at gmail.com
Sun Jun 29 08:27:50 UTC 2008


On Sun, Jun 29, 2008 at 10:03 AM, Lars Aronsson <lars at aronsson.se> wrote:
>
> I'd like to propose a quality metric: The difference in rank
> between the article count and the compressed database size.

I think this is a good metric, especially because it's a relative
metric (since it's effectively comparing projects against their peers
to see how mature they are).

Someone earlier was discussing article sizes, so I hacked up a script
to graph the distribution of article sizes:

http://www.toolserver.org/~thebainer/articlesizes/

Most graphs share the same basic shape, with a roughly logarithmic
distribution once you get past the initial peak (see the English
Wikipedia graph for an example of what I mean), but some are
different, and it tends to coincide with what has already been
observed.

> The Swedish Wikipedia was (when this table was compiled) the 10th
> biggest by article count, but the 12th biggest by compressed
> database size, so its quality is 10 - 12 = -2.

Swedish Wikipedia is distributed in almost exactly the same way as
English Wikipedia, with the difference being that its average size is
less than half that of En's, at around 1900 bytes.

> The Russian Wikipedia was the 11th by article count, but 9th by
> compressed database size, so its quality is +2. This doesn't mean
> the Russian Wikipedia is better than the English one, only that it
> is better than (two of) its peers of similar size.

Not only does the Russian Wikipedia have a high average article size
(about 5500 bytes, compared with, for example, English Wikipedia at
around 4100 bytes) but its graph, which has multiple peaks, seems to
show that, unlike many other projects, it has more mature, medium-size
articles than it does stubs.

> The Volapük Wikipedia was the 15th by article count, but the worse
> than the 30th by compressed database size (the table is
> incomplete), so its quality is worse than -15.

The Volapük Wikipedia has an unusual distribution, with two peaks. One
is in the usual place, just below the average size (which is low, at
just over 1000 bytes) while the other is around 2 - 2.5kb, which
corresponds to the size of all the geography stubs created by
SmeiraBot.

-- 
Stephen Bain
stephen.bain at gmail.com



More information about the foundation-l mailing list