On Sat, Mar 22, 2008 at 5:56 AM, Lars Aronsson <lars(a)aronsson.se> wrote:
According to [[sv:Special:Statistics]] there are
58,087 user
accounts, but <contributor><username> has 28,416 distinct values.
Is it realistic that half of all registered usernames have never
contributed a single edit (to non-deleted pages)?
Yes, this is very common on websites. People sign up and then never
use the account for some reason. Half is a figure I'd expect. On
enwiki,
mysql> SELECT COUNT(*) FROM user WHERE user_editcount=0;
+----------+
| COUNT(*) |
+----------+
| 4424031 |
+----------+
1 row in set (6 min 14.05 sec)
versus
mysql> SELECT ss_users FROM site_stats;
+----------+
| ss_users |
+----------+
| 6721545 |
+----------+
1 row in set (0.11 sec)
an even worse ratio. I just now notice that you actually used svwiki,
so here are the same queries for that.
mysql> SELECT COUNT(*) FROM user WHERE user_editcount=0;
+----------+
| COUNT(*) |
+----------+
| 26838 |
+----------+
1 row in set (3.60 sec)
mysql> SELECT ss_users FROM site_stats;
+----------+
| ss_users |
+----------+
| 58125 |
+----------+
1 row in set (0.01 sec)
Can we find out
what happened to them? Did they write spam that was deleted and
the username permanently blocked? Did they just register their
name to stop others from doing so? Or did something go wrong
during the registration?
I expect most weren't really sure what they were doing, and thought
they'd edit, only to find out they couldn't or didn't want to; or they
registered in case they wanted to edit later, but then forgot the
account password; or something in that vein. Some percentage will
have been blocked for WP:USERNAME violations, of course, but I don't
think it's going to be very high, since I've seen identical things on
many Internet forums. In those cases you basically never have people
patrolling new usernames (and for objectionable names, a forced name
change is more common than a block), or any very high level of
spammers. On forums you might have a failed e-mail confirmation, but
that's not going to matter on Wikimedia. When registering, you get
immediately logged in, right? So typing a password and then
forgetting it five minutes later isn't going to be a problem?
Of those who did contribute something, of course most
usernames
only made very few contributions. This is a long tail. So how do
we separate the regular/serious/active contributors from the
occassional ones? In [[m:board elections]] to the WMF, a limit of
400 edits is used, and this threshold is as good as any.
That's okay for established contributors. A probably more interesting
general-purpose statistic is the number of currently active
contributors, namely the number who have made edits in the past week,
two weeks, month, or whatever.
On Sat, Mar 22, 2008 at 6:52 PM, Alex <mrzmanwiki(a)gmail.com> wrote:
Some sort of statistic that gives the number of
active accounts would be
ideal, say any account that has made an edit in the past week. Not sure
how computationally expensive that would be though. For a large site
like enwiki, it would probably have to be cached and updated on a
regular basis.
Caching it is somewhat tricky, since you have to be able to decrement
it when any revision hits the one-week mark, *but* only if no
intervening edit was made by the same user. That makes maintenance in
O(dN/dT) time (with retrieval in O(1) time) not quite so simple as
with most counters. Scanning a bunch of recentchanges rows every hour
or every day and caching that might be okay, although it's not quite
as nice as most counters (needs to be recomputed, can't be updated in
real time).