[Wikipedia-l] Re: Death to the comma count!

Daniel Mayer maveric149 at yahoo.com
Mon Mar 10 23:06:41 UTC 2003


On Sunday 09 March 2003 12:01 pm, Brion Vibber wrote:
> Aha, again demonstrating the obsession over the count. Why was it
> important to hit or not hit 100,000? Because of an offhand remark made a
> couple years ago about "we hope to reach 100,000 articles"?
>
> When did this become our holy mission?

Round numbers, especially large ones, are milestones that get people's 
attention. That is why x.0 is so important in the software world, why cities 
celebrate the day they reach 1,000,000 inhabitants, why there was so much 
mania when our calendars hit the year 2000, why the first billion-dollar 
business and billionares are mentioned in history books, and why we got a lot 
of media attention after en.wiki hit the 100,000 count. 

The article count is also a measure (however crude) of our progress. So there 
is nothing wrong with trying to improve that measure and make it more 
conservative where it makes sense (Jimbo has already stated he wanted a more 
conservative count. However right after he said that we had already hit the 
100,000 mark and were being slashdotted). 

> Did the messianic age begin when the counter flipped into six digits?
> Have we all been betrayed by a sinister being who wants to make us look
> bad by leading us astray and "inflating our count"?
>
> What the *heck* does it matter?

Boy are you in a really bad mood today. See above.

> Bad to whom? Embarrassing to whom? Is it solely the use of the word
> "article" that throws us off? Are we obsessed with proving that our
> "articles" are so fricking wonderful that every single one of them must
> be the greatest pinnacle of writing prowess or we must lock it in the
> basement of shame and never admit its existence?

No - a simple automatic measure is all that is needed. We mention the 
definition of the count on en.wikis [[Wikipedia:What is an article]] page. 

> Go open up a paper encyclopedia sometime. Look at it. A fair chunk of
> the articles are *one paragraph long*. Do their editors worry themselves
> over the metric they use to stamp "over 60,000 articles!" on the cover?
> Or do they just count the number of entries at some point and say "at
> least this many"?

Exactly - and how many bytes would a smallish complete paragraph be in such an 
encyclopedia? Around 500 bytes. Then we could say that we *at least* have x 
number of articles. Right now the count includes many entries that do not 
consist of even one complete paragraph. A per language set 
{{HEADLINEARTICLECOUNT}} would be flexible enough for both large and small 
wikis. {{NUMBEROFARTICLES}} would be used for comparison purposes.

> Mav, thanks for proving my point again about count-mania. Are you
> seriously suggesting that the pseudo-random number spit out on the front
> page actually *defines* what articles are in a meaningful way?

Again, more unnecessary anger. Please calm down - we are not talking about 
anything of such cosmic importance to warrent such feelings. :-) 

The answer to your question is above (the part talking about tracking our 
progress and how the outside world sees our progress). So, yes it is 
important to have a conservative estimate of the number of articles we have. 
That's not to say that everything a computer would recognize as an article is 
actually what a human would consider to be one. But since the computer will 
also miss entries that /could/ be considered articles, then everything 
averages out in the end (some really obscure subjects can, in fact, be 
covered in a sub-500 byte entry). 

In short, I'm not asking for an AI article count - I just would like to see a 
more conservative crude method used on en.wiki that excludes more entries 
that are probably not articles (however we shouldn't go live with such a 
count until after have enough entries to still be above 100,000 - otherwise 
we could get some negative media attention and a drop in morale). 

IMO the best way to do that is to have a per wiki set  
{{HEADLINEARTICLECOUNT}} in addition to {{NUMBEROFARTICLES}}. It would be up 
to each language to define their own byte threshold for their own headline 
count (or they could choose to ignore {{HEADLINEARTICLECOUNT}} and use the 
much less conservative {{NUMBEROFARTICLES}}. Of course, each wiki that uses 
{{HEADLINEARTICLECOUNT}} would then have to publicly document their threshold 
for their own headline count. 

-- Daniel Mayer (aka mav)

WikiKarma
The usual at [[March 8]] (I'm fresh out of WikiKarma so I need to work on 
creating some more balance in the Universe before I respond to your 
response).





More information about the Wikipedia-l mailing list