Erik,
Thanks a lot for the appreciation.
As Sajjad mentioned, we have already obtained a edit-per-location
dataset from Evan (Rosen) that has the following column structure:
*language,country,city,start,end,fraction,ts*
*start* and *end* denote the beginning and ending date for counting the
number of edits, and *ts* is time stamp.
The *fraction*, however, gives a national ratio of edit activity, that
is it gives the ratio of 'total edits from that city for that language
Wikipedia project' divided 'total edits from that country for that
language Wikipedia project'. Hence, it cannot be used to understand
global edit contributions to a Wikipedia project (for a time period).
It seems that the original data (from where this dataset is extracted)
should also have the global fractions -- total edit from a city divided
by total edit from the whole world, for a project, for a time period.
Would you know if the global fractions can also be derived from the XML
dumps? Or, even better, is the relevant raw data available in CSV form
somewhere else?
Bests,
sumandro
-------------
sumandro
ajantriks.net
On Wednesday 15 May 2013 12:32 AM, analytics-request(a)lists.wikimedia.org
wrote:
> Send Analytics mailing list submissions to
> analytics(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/analytics
> or, via email, send a message with subject or body 'help' to
> analytics-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> analytics-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Analytics digest..."
>
> ----------------------------------------------------------------------
>
>
> Date: Tue, 14 May 2013 19:40:00 +0200
> From: "Erik Zachte" <ezachte(a)wikimedia.org>
> To: "'A mailing list for the Analytics Team at WMF and everybody who
> has an interest in Wikipedia and analytics.'"
> <analytics(a)lists.wikimedia.org>
> Subject: Re: [Analytics] Visualizing Indic Wikipedia projects.
> Message-ID: <016f01ce50ca$0fe736b0$2fb5a410$(a)wikimedia.org>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Awesome work! I like the flexibility of the charts, easy to switch metrics
> and presentation mode.
>
>
>
> 1. WMF has never captured ip->geo data on city level, but afaik this is
> going to change with Kraken.
>
>
>
> 2. Total edits per article per year can be derived from the xml dumps. I may
> have some csv data that come in handy.
>
> For edit wars you need track reverts on an per article basis, right? That
> can also be derived from dumps.
>
> For long history you need full archive dumps and need to calc checksum per
> revision text. (stub dumps have checksum but only for last year or two)
>
>
>
> Erik Zachte
>
>
>
Henrik updated the top view charts and few days ago foundationwiki was
added to webstatscollector. http://stats.grok.se/www.f/top shows
Most viewed articles in 201304
Rank Article Page views
1 Trang chủ 912
2 Portada galega 324
3 Home 182
4 Local chapters 172
etc.
This seems highly unlikely, is the problem known?
Nemo
Hi all,
as you might know, I have a few GLAM-related tools on the toolserver. Some
are updated once a month, some can be used live, but all are in high demand
by GLAM institutions.
Now, the monthly updated stats have always been slow to run, but did almost
grind to a halt recently. The on-demand tools have stalled completely.
All these tools get their data from stats.grok.se, which works well but not
really high-speed; my on-demand tools have apparently been shut out
recently because too many people were using them, DDOSing the server :-(
I know you are working on page view numbers, and for what I gather it's
up-and-running internally already. My requirements are simple: I have a
list of pages on many Wikimedia projects; I need view counts for these
pages for a specific month, per-page.
Now, I know that there is no public API yet, but is there any way I can get
to the data, at least for the monthly stats?
Cheers,
Magnus
Hi!
For the past two sprints, we are no longer dedicating a sprint to a theme
but instead we are doing time-based releases. Each release consists of 13
weeks (one quarter). These 13 weeks are divided into 6 sprints of 2 weeks
and one week of Quarterly Review meeting preparation.
Apologies for cross-posting; ideally you should receive this on the
Analytics Mailinglist so we can have one focal point for conversation. If
you are not on the Analytics list then please subscribe at
https://lists.wikimedia.org/mailman/listinfo/analytics
## Defects & Features completed (Ready for Showcase/Shipping/Done) during
Sprint ending 2013-05-29 ##
#134 I - Puppetize Hadoop CDH4 (13) Done requested by Ops
#236 I - Improve robustness of customer dashboards on Labs (8) Done
requested by Analytics
#329 F - Dump stats: collect countable namespaces via api for all 800+
wikis, and use this in dump stats (N/E) Done requested by Community
#341 D - Traffic reports: fix region Oceania (1) Done requested by Community
#356 D - Squid log based traffic report SquidReportDevices.htm for mobile
devices is broken (3) Done requested by Community
#358 D - Different squid log based traffic reports have different %mobile
(N/E) Done requested by Community
#503 F - Page View Metrics report for non Wikipedia Mobile Apps (5) Done
requested by Mobile (Tomasz)
#509 I - Reinstall Oxygen with Precise (8) Done requested by Ops
#646 F - X-CS header measurements (3) Done requested by Wikipedia Zero
(Amit)
#660 F - Improve Mobile App Pageview Stats (3) Done requested by Mobile
(Tomasz)
#673 F - UMAPI stability (1) Done requested by E3 (Dario)
#718 D - Sqoop as Hive table throws metastore error (1) Done requested by
Analytics
#724 D - Unique uploaders graph hover no longer works (1) Done requested
by Mobile (Tomasz)
## Current Sprint (ending 2013-06-05) ##
Stories in progress from last sprint:
#131 I - Puppetize + Debianize Kafka 0.8 (8) requested by Analytics / Ops
#244 F - Track user adoption of Wikipedia Zero (5) requested by Amit
(Wikipedia Zero)
#545 F - Librdkafka supports Kafka 0.8 (13) requested by Analytics / Ops
New stories:
#573 F - Flag undefined/missing values (right-censorship) (8) requested by
Dario (E3)
#677 F - Metrics Meeting May (1) requested by Erik Moeller
#693 F - Implement new DB Design (8) requested by various UMAPI stakeholders
#716 I - submit libdclass-dev to apt.wikimedia.org (5) requested by
Analytics / Ops
(Number in parentheses) = estimate of complexity
N/E = not estimated;
F = Feature
D = Defect
I = Infrastructure Task
S = Spike
Any mingle card can be accessed using the base url
https://mingle.corp.wikimedia.org/projects/analytics/cards/XYZ where XYZ is
the Mingle card id.
If you have any questions, comments or feedback: please let us know!
Best,
Diederik
Hi, just a heads up:
I'm discussing with Ryan Lane, Sumana and Rob a plan to contract the
setup and basic maintenance of automated community metrics to measure
the activity of the Wikimedia tech community.
The questions we want to answer with metrics data are defined here:
http://www.mediawiki.org/wiki/Community_metrics#Problems_we_want_to_solve
We would start watching Git/Gerrit, Bugzilla and Mailman. The idea is to
identify individual contributors behind multiple handlers and to
differentiate WMF employees from the rest.
For this we would be using the Grimoire set of tools developed by
Bitergia. See https://github.com/bitergia and an example at
http://activity.openstack.org/dash/browser/
In an ideal world the Analytics team would have infinite (or enough)
resources) to support and even lead this, but we understand you are more
than busy. Still, your expertise and help is welcome if only at
following what we are doing. If you have any questions just ask.
--
Quim Gil
Technical Contributor Coordinator @ Wikimedia Foundation
http://www.mediawiki.org/wiki/User:Qgil
What would be a good time and date for the next office hour with WMF researchers? I hope that the next meeting can be scheduled at a much different time of day so that a different set of community members has the opportunity to attend. I'd like to let the WMF people determine a time among themselves, and I'll post a meeting invitation to the other lists after the decision is made.
Thanks,
Pine
Hi everyone!
Is there an analytics of WMF bugzilla? How many bugs have been created and
resolved, maybe some metrics and ratios?
-----
Yury Katkov, WikiVote
Hi all!
Today I rebooted gadolinium, the machine responsible for forwarding many
of the webrequest logs to logging hosts. There will be a small gap in
some webrequest logs between at 2013-05-20 15:48 - 15:52 UTC.
This reboot was to upgrade the linux kernel running there.
Thanks!
-Andrew
Hi all!
Today I rebooted gadolinium, the machine responsible for forwarding many
of the webrequest logs to logging hosts. There will be a small gap in
some webrequest logs between at 2013-05-20 15:48 - 15:52 UTC.
This reboot was to upgrade the linux kernel running there.
Thanks!
-Andrew