Wikipedia isn't special in that people's participation is a long tail (few
people do quite a bit and many people do almost nothing). Basically any
community (online or offline) has this dynamic[1]. If we say, "# accounts
have registered and saved an edit", we're not inflating and the number
isn't "fuzzy" in meaning.
Let's say that we set the threshold for inclusion higher -- e.g. to 5+
edits per month. The funny thing about power laws (this variant of long
tail) is that they are self-similar. Mathematically, the distribution
looks exactly the same when you cut off everyone who made less than 5
edits. There's no clear "truth" to setting the threshold higher and
we've
shown that it doesn't affect the overall trends that we observe.
If we want to critique how we communicate about something, we can't do it
in such general terms as "use 5+ edits". We need to know what meaning is
intended to be expressed. Only within the context of "meaning" can we talk
about "deception" and "misunderstanding". As an empiricist, I'd
like to
challenge the speculation about the low competencies of our audience.
So, if we're going to communicate how people contribute to Wikipedia and
not "mislead", we're going to need to give people a primer on powerlaws of
participation and discuss the implications of the best fit pareto index
<https://en.wikipedia.org/wiki/Pareto_index> for Wikipedia edits. I don't
think that such a discussion of the limitations of simple metrics is
tractable in such communications. Further, I don't think our audience
wants it. Sometimes you just want a quick stat to get a sense of the
scale. How many countries are there on Earth? 196 I'm sure some of those
countries are much larger than others. Some are really just cities with
interesting political situations. I still enjoy knowing that the answer is
196 and I don't feel like I have been mislead.
1.
-Aaron
On Tue, Oct 27, 2015 at 11:03 AM, Erik Zachte <ezachte(a)wikimedia.org> wrote:
I do agree that we reject good contributions. I also
agree this is a messy
filter.
The main point however is do we want to communicate to the general public
using such messy, fuzzy, inflated (partially), hard to not misunderstand
numbers?
We have a history of using vanity metrics (800+ wikis, 280+ Wikipedias).
Not untrue in some very formal sense, but totally misleading in that they
play on expectations which are totally false.
Erik
*From:* Analytics [mailto:analytics-bounces@lists.wikimedia.org] *On
Behalf Of *Aaron Halfaker
*Sent:* Tuesday, October 27, 2015 16:41
*To:* A mailing list for the Analytics Team at WMF and everybody who has
an interest in Wikipedia and analytics.
*Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking
ENWP 5m article milestone
I don't agree. There are a lot of good-faith page creations that get
deleted every day. There are also many edits that get reverted. Arguably,
those edits aren't productive either, but they don't disappear from the
dumps like article drafts do. This is a messy filter at best.
On Tue, Oct 27, 2015 at 10:28 AM, Erik Zachte <ezachte(a)wikimedia.org>
wrote:
As Aaron says. I'd like to add that if almost 3 million accounts
disappeared from the dumps alltogether (vandals? school kids?) that makes
the case for not using such a count even more convincing.
Erik
*From:* Analytics [mailto:analytics-bounces@lists.wikimedia.org] *On
Behalf Of *Aaron Halfaker
*Sent:* Tuesday, October 27, 2015 15:48
*To:* A mailing list for the Analytics Team at WMF and everybody who has
an interest in Wikipedia and analytics.
*Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking
ENWP 5m article milestone
user_editcount includes edits to deleted pages and revdeleted edits.
Erik's perl scripts use the XML dumps that do not include edits to deleted
pages.
Strictly speaking, user_editcount is a better proxy for the number of
people who have "ever edited". Erik's is the number of people whose edits
appear in the history of a page at the time of an XML dump.
-Aaron
On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan <jmorgan(a)wikimedia.org>
wrote:
I also wonder about this discrepancy. I ran a more explicit version of
Andrew query, trying to eliminate some possible edge cases, and came up
with the same number.
Now I'm curious. Are there junk rows in our user table, retained for
legacy reasons maybe? Is user_editcount inaccurate? Erik, can you describe
the processing you perform to winnow down from 8.2 million?
J
On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray <andrew.gray(a)dunelm.org.uk>
wrote:
Interesting - wonder why my query's giving a higher number?
I agree entirely that we should be very careful with quoting these
figures. I think you'd probably be safe to say that more than a
million people have edited... but even then I'd be cautious.
Andrew.
On 27 October 2015 at 11:11, Erik Zachte <ezachte(a)wikimedia.org> wrote:
Wikistats has it that 5,644,681 registered
accounts published at least
once till Oct 1, 2015, and 2,181,006 three or more
times.
It used to publish that on [1][2] but I just
removed it.
I'm campaigning against us publishing overly inflated counts since about
two
years (Wikimania London).
Since this thread is going on and on, I'll repost my (reworded)
reservations
on this particular metric, for newcomers.
Even if we state explicitly that this is not unique people, any audience
will
think it may be close and we are overly correct by adding the caveat.
It may not be so close. For that reason imo such a metric would be of
questionable value, to put it mildly.
Pine:
> Is there a way to get counts for the number of accounts, including or
excluding IPs, that have ever edited English Wikipedia, ?
First the anon contributors: when we'd count every ip address that shows
up in
the dumps, we'd count *very* many people who were just vandalizing
willfully, or just pressing edit for fun, or forgot to login once, and also
moved from one ip address to another over the years. On top of that many
people get a new ip address (from a pool) on every session, depends on
provider policy.
As for registered editors the number Wikistats used to publish may be a
rather
empty metric for several reasons:
- How many casual editors will have forgotten
their password and just
created a new user id? Only veteran editors know about
sockpuppeting and
how one is supposed not to do that.
- How many people will have registered in good
faith just out of habit,
or to tweak presentation preferences, and then played with
the edit button
just to see what happens? Note that roughly 2 out of 3 accounts doesn't
even reach 3 edits.
Cheers,
Erik Zachte
[1]
https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution
[2] BTW I use the term wikipedians overly
inclusive in that report. A
person who edited once or twice isn't a wikipedian
in my book, just like a
person who writes two post-it notes per month and nothing else isn't called
a writer. Some terms only apply above some threshold.
-----Original Message-----
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On
Behalf Of Andrew
Gray
Sent: Tuesday, October 27, 2015 11:06
To: A mailing list for the Analytics Team at WMF and everybody who has
an interest
in Wikipedia and analytics.
Subject: Re: [Analytics] User statistics for
video marking ENWP 5m
article milestone
To a very crude approximation, there are approximately 8.2 million
accounts which
have at least one edit on English Wikipedia - at least
assuming my SQL query is correct!
http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions
record; it
does not contain IPs, and it does not contain any accounts whose
sole contributions have since been deleted (which is probably quite a
substantial number). Conversely, it includes a vast panoply of single-use
vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W <wiki.pine(a)gmail.com> wrote:
Is there a way to get counts for the number of
accounts, including or
excluding IPs, that have ever edited English Wikipedia, ? It would be
preferable to know the number of unique people, but of course that's
impossible.
Thanks,
Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray
<andrew.gray(a)dunelm.org.uk>
wrote:
On 11 September 2015 at 19:19, James Forrester
<jforrester(a)wikimedia.org>
wrote:
>> Does it include editors on all Wikimedia projects
>
> No.
>
>> or just those who have registered and/or edited on ENWP?
>
> Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding:
because of SUL, a substantial proportion of these will be "autocreated"
accounts from other projects - so even 'registration' may not mean
what it seems.
--
- Andrew Gray
andrew.gray(a)dunelm.org.uk
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray
andrew.gray(a)dunelm.org.uk
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray
andrew.gray(a)dunelm.org.uk
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics