Analytics April 2014

analytics@lists.wikimedia.org

48 participants
40 discussions

Re: [Analytics] [Wikitech-l] Is there a way to visualize where come from editors of a specific page?

by Nuria Ruiz

Adding analytics list to make sure everyone in the team sees this thread. >Is there a way to visualize where editors of a specific page come from? I assume you mean if this data is available to the general public. By reading the couple links you posted seems like the consensus was that this data is too private to be made public and thus, it is only accessible inside WMF or to users with CheckUser rights. As far as we know nothing has changed in this regard so this type of "geo-data" is not available to general public. On Sun, Apr 13, 2014 at 8:15 PM, Federico Leva (Nemo) <nemowiki(a)gmail.com>wrote: > Recent discussion on the topic: > * http://lists.wikimedia.org/pipermail/analytics/2013- > August/thread.html#857 > * http://thread.gmane.org/gmane.org.wikimedia.analytics/103 > > Nemo > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l

10 years, 1 month

Header for IRC

by Toby Negrin

Hi all, Currently the "header" for our IRC window points to some invalid/stale mingle links. I'd like to simplify this: Current links What we are working on right now http://bit.ly/13HvBJq How we Plan http://bit.ly/13Hw3r9 Batcave http://goo.gl/1pm5JI Channel logs http://bit.ly/1fjJnZo Proposed links Analytics Wiki: https://www.mediawiki.org/wiki/Analytics (or shortened) Channel logs http://bit.ly/1fjJnZo Does this sound reasonable? thanks, -Toby

10 years, 1 month

Analytics quarterly review updates

by Toby Negrin

Hi Everyone, We had our quarterly review with WMF management last week. The minutes[1] are posted up on meta along with the deck we presented. (Thank you to Tilman for taking the minutes and helping post the slides) Please take a look at the deck and let me know if you have any questions. In particular, I'd like to highlight our reprioritization[2] of our projects. We continue to focus on our Editor Engagement Vital Signs project and have added a couple of new projects, including taking over the Event Logging system from the Platform Team. I want to call out the Page View API project specifically. Everyone on the team wants to work on this but we have prioritized other projects ahead of it. While this is challenging for everyone, Editor Growth remains the priority for the Foundation and Analytics needs to support this initiative. In the meantime, we have worked with Henrik, the maintainer of stats.grok.seto help scale out this service. We've purchased a new machine and the initial performance numbers are very encouraging. We'll have more updates on this shortly. In conclusion, now that the team is fully staffed, I'll have more time to communicate about our projects and how they will interact with the community. I'm looking forward to it :) Thanks, -Toby [1] https://meta.wikimedia.org/wiki/Metrics_and_activities_meetings/Quarterly_r… [2] https://www.mediawiki.org/wiki/Analytics/Prioritization_Planning

10 years, 1 month

relevant to our interests

by Jared Zimmerman

https://data.quora.com/The-Quora-Topic-Network-1 I recently read this post by a researcher at Quora, about determining relationships between content. While they most likely have a lot more personally identifiable information about their users than we do, some of the concepts might be applicable. *Jared Zimmerman * \\ Director of User Experience \\ Wikimedia Foundation M : +1 415 609 4043 | : @JaredZimmerman<https://twitter.com/JaredZimmerman>

10 years, 1 month

Small Script: Write EventLogging Schemas in YAML

by Yuvi Panda

Hello! EventLogging Schemas are in JSON, and I hate hand-writing JSON. It feels rather painful. So I whipped up this little python script (requires pyyaml) that will let me write the actual schema in YAML, and convert it trivially to JSON for copy pasting into metawiki. You can find it at https://gist.github.com/yuvipanda/10481205 My workflow now is: 1. Write YAML file describing the schema 2. run `python yaml-to-json.py <yaml-file-name> | pbcopy` 3. Paste that into appropriate metawiki page (2) and (3) can be further consolidated if we want, but that is for another day (and I had to write only 4 schemas from scratch, so this was useful enough). Hope someone finds it useful! -- Yuvi Panda T http://yuvi.in/blog

10 years, 1 month

Analytics Updates

by Toby Negrin

Hi all, We finished our sprint on Tuesday and made plans for the next one on Thursday and I wanted to let you know the updates. I will also provide updates on our epics and our quarterly review in a following email. We finished two tasks committed to for this sprint: - Puppet allows wikimetrics user to write files (WikiMetrics) - Reports results can be made public in vagrant (WikiMetrics) We (technically ops) also got our Archiva deployment server up and running. This is a significant step towards a production Hadoop/Kafka environment and will also be used by other JVM-based tools such as Search. We did not finish one task that we committed to: - UI Changes for Recurrent and Public Reports (WikiMetrics) We also finished a numbers of unplanned tasks: - Camus and Kraken review (Hadoop/Kafka) - Changing Kraken to Apache for Camus folks (Hadoop/Kafka) - User Agent discussions (EventLogging) - Look at X-Analytics change (Wikipedia Zero) - Meeting with grants team wikimetrics consulting (WikiMetrics) - Flake8 work due to upgrade in Jenkins (WikiMetrics) - README changes (WikiMetrics) - Reworking limn-mobile-data patch from Jan (Limn) - Repair data and run the 3 reports for the Mobile team (Mobile Metrics) - Put the results in files and send to mobile-web team (Mobile Metrics) We fixed the following defects: - 62922 Wikipedia Zero: Doubled zero tags in varnish logs [1] - 57371 Limn: SSL-Error for https at ee-dashboard.wmflabs.org(ssl_error_rx_record_too_long) [2] - 62830 WikimetricsNew reports not working; New cohorts all invalid [3] For this sprint, we committed to the following tasks to complete two features for WikiMetrics: Publicly Sharable Report Results and Scheduled Reports - - Verify and apply script to migrate report table - Finish testing 112165 + puppet changes on staging - Unit test concatenated recurrent public reports - Code review 112165 - Code review 122638 - Test in vagrant - Test in staging Please let me know if you have any questions. -Toby [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=62922 [2] https://bugzilla.wikimedia.org/show_bug.cgi?id=57371 [3] https://bugzilla.wikimedia.org/show_bug.cgi?id=62830

10 years, 1 month

Re: [Analytics] [Wikimedia-l] Quarterly reviews of high priority WMF initiatives

by Tilman Bayer

Minutes and slides from Monday's quarterly review meeting of the Foundation's Analytics team are now available at https://meta.wikimedia.org/wiki/Metrics_and_activities_meetings/Quarterly_r… . On Wed, Dec 19, 2012 at 6:49 PM, Erik Moeller <erik(a)wikimedia.org> wrote: > Hi folks, > > to increase accountability and create more opportunities for course > corrections and resourcing adjustments as necessary, Sue's asked me > and Howie Fung to set up a quarterly project evaluation process, > starting with our highest priority initiatives. These are, according > to Sue's narrowing focus recommendations which were approved by the > Board [1]: > > - Visual Editor > - Mobile (mobile contributions + Wikipedia Zero) > - Editor Engagement (also known as the E2 and E3 teams) > - Funds Dissemination Committe and expanded grant-making capacity > > I'm proposing the following initial schedule: > > January: > - Editor Engagement Experiments > > February: > - Visual Editor > - Mobile (Contribs + Zero) > > March: > - Editor Engagement Features (Echo, Flow projects) > - Funds Dissemination Committee > > We'll try doing this on the same day or adjacent to the monthly > metrics meetings [2], since the team(s) will give a presentation on > their recent progress, which will help set some context that would > otherwise need to be covered in the quarterly review itself. This will > also create open opportunities for feedback and questions. > > My goal is to do this in a manner where even though the quarterly > review meetings themselves are internal, the outcomes are captured as > meeting minutes and shared publicly, which is why I'm starting this > discussion on a public list as well. I've created a wiki page here > which we can use to discuss the concept further: > > https://meta.wikimedia.org/wiki/Metrics_and_activities_meetings/Quarterly_r… > > The internal review will, at minimum, include: > > Sue Gardner > myself > Howie Fung > Team members and relevant director(s) > Designated minute-taker > > So for example, for Visual Editor, the review team would be the Visual > Editor / Parsoid teams, Sue, me, Howie, Terry, and a minute-taker. > > I imagine the structure of the review roughly as follows, with a > duration of about 2 1/2 hours divided into 25-30 minute blocks: > > - Brief team intro and recap of team's activities through the quarter, > compared with goals > - Drill into goals and targets: Did we achieve what we said we would? > - Review of challenges, blockers and successes > - Discussion of proposed changes (e.g. resourcing, targets) and other > action items > - Buffer time, debriefing > > Once again, the primary purpose of these reviews is to create improved > structures for internal accountability, escalation points in cases > where serious changes are necessary, and transparency to the world. > > In addition to these priority initiatives, my recommendation would be > to conduct quarterly reviews for any activity that requires more than > a set amount of resources (people/dollars). These additional reviews > may however be conducted in a more lightweight manner and internally > to the departments. We're slowly getting into that habit in > engineering. > > As we pilot this process, the format of the high priority reviews can > help inform and support reviews across the organization. > > Feedback and questions are appreciated. > > All best, > Erik > > [1] https://wikimediafoundation.org/wiki/Vote:Narrowing_Focus > [2] https://meta.wikimedia.org/wiki/Metrics_and_activities_meetings > -- > Erik Möller > VP of Engineering and Product Development, Wikimedia Foundation > > Support Free Knowledge: https://wikimediafoundation.org/wiki/Donate > > _______________________________________________ > Wikimedia-l mailing list > Wikimedia-l(a)lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l -- Tilman Bayer Senior Operations Analyst (Movement Communications) Wikimedia Foundation IRC (Freenode): HaeB

10 years, 1 month

Re: [Analytics] Access to user_properties

by Toby Negrin

Hi Niklas -- I have no idea. Forwarding to the general list. thanks, -Toby On Tue, Apr 1, 2014 at 2:10 AM, Niklas Laxström <nlaxstrom(a)wikimedia.org>wrote: > Hi, > > While trying to figure out whether I can accomplish the following [1], > I noticed that user_properties table is not available in the toollabs > database replicas. > > [1] Gather daily statistics of the numbers of users who have activated > beta feature X, for multiple wikis, possibly excluding users who > auto-roll to all beta features. > > Is there a way to access the user_properties table? Or is there > someone already gathering this kind of stats? > -Niklas >

10 years, 1 month

Fwd: Data to improve our code review queue

by Quim Gil

For your information. The thread about our Gerrit review queue metrics is continuing in wikitech-l under http://lists.wikimedia.org/pipermail/wikitech-l/2014-April/075635.html ---------- Forwarded message ---------- From: *Quim Gil* <qgil(a)wikimedia.org> Date: Thursday, April 3, 2014 Subject: Data to improve our code review queue To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Please have a look to some graphs visualizing interesting data from our code review queues in Gerrit, focusing in the key Wikimedia software projects.[1] http://korma.wmflabs.org/browser/gerrit_review_queue.html The queue of open chagesets keeps growing. We have open changesets submitted in every month since March 2012. However, since last December we must be doing something right, because the median times to update and resolve submissions are decreasing. Looking at http://korma.wmflabs.org/browser/scr.html , one reason for this improvement might be that the volume of new changesets has also decreased during the same period. Maybe newer patches get faster reviews? Any ideas? We need to dig further. We have created a "hall of shame" (add you preferred smiley here) to bring under the light the repositories with the open changesets that haven't seen any activity for a longer period. The principle is simple: you don't want to see one of your repos appearing in the top 10. Many of the _leading_ repos have a couple of open changesets only, and our hope is that by showing up there, the maintainers will act on them quickly (e.g. OpenStackManager, fluoride, commons, UserMerge, TorBlock, Vipscaler, luasandbox...). This will leave the fight for the pole position to the projects that actually have a real problem dealing with patches received (Donationinterface, GuidedTour, UploadWizard...) Who knows, perhaps we should organize "patch days", in the same way that we have organized bug days in the past (which we want to recover now). We also want to look at ways promote the oldest inactive requests. For instance, what about directing new volunteers there, asking them to submit their code revisions. For a patch that has been waiting in silence for over a year, any feedback will be better than no feedback. One last detail. Our initial motivation to look at the age of open changesets by affiliation was to check whether submissions from WMF employees and independent developers were treated equally. Interestingly, there are no big differences between these groups. However, there are big differences between the median age of open WMDE changesets (16.5 days) and open Wikia changesets (almost 283 days). All this according to our estimation of the origin of patches (domain of the submitter's email + affiliation submitted by the developers that filled our survey.[2] Your feedback about these metrics is welcome. Please reply here or file Bugzilla reports directly to Analytics > Tech Community metrics https://bugzilla.wikimedia.org/buglist.cgi?component=Tech%20community%20met… (Short link just in case: http://bit.ly/1q0itsl ) [1] https://wikitech.wikimedia.org/wiki/Key_Wikimedia_software_projects [2] https://docs.google.com/forms/d/1RFUa2zBAOolw78W-ozJPoYlR2lYbrAOYvOZYgjaAYQ… -- Quim Gil Engineering Community Manager @ Wikimedia Foundation http://www.mediawiki.org/wiki/User:Qgil -- Quim Gil Engineering Community Manager @ Wikimedia Foundation http://www.mediawiki.org/wiki/User:Qgil

10 years, 1 month

one machine to rule the all

by Sean Pringle

Hi! In another thread Oliver asked about the progress of One Machine To Rule Them All :-) In fact it looks like it will now be two machines to rule them all, or rather, two machines to cooperatively rule them all in roughly equal capacity. I know, it doesn't have the same ring to it... I posted the following to RT 6383, but who knows who reads RT, so here it is again: -- quote -- An update on this. Some Analytics folk have probably already heard bits and pieces via mailing lists, but my fellow Opsen on RT duty rightly begin to wonder about this ticket. We have procured dbstore100[12], both with which will be replicating shards S[1-7] into a single MariaDB 10 instance each, using the new multi-source replication. The boxes are still being setup (because recombining the shards requires full dump/reload, plus getting all seven in sync, plus compressing tables -- slow going). The x1 shard and event logging will replicate to dbstore too, but that's pending RT 7081. Analytics will have direct, but read-only, access to dbstore1002. db1047, the current s1-analytics-slave, has the required disk space so it will likely become a slave to dbstore1002, or else make use of the MariaDB 10 CONNECT engine to access the data (like federation, but better than FEDERATED engine was, thanks to ECP: engine-condition-pushdown). As ever, Analytics will have read/write access to db1047 with scratch space. The situation will result in: - cross joins/unions on any wiki on either db1047 or dbstore1002 - ability to spread load across both boxes with a single SQL query - less likely to block others due to locking - less likely to cause replag I'm happy to go into more technical detail if anyone is interested. When will it be ready, you ask? :-) Not until after the Ops meet in Athens, which realistically means: in May. -- endquote -- The bit about spreading load across two machines with one query will require people to be a bit careful in designing the SQL. Alternatively you guys might simply choose to dictate which box should run expensive queries, to avoid tripping each other up. Incidentally, MariaDB 10 has the Cassandra storage engine which might be of some use to you guys in time. But so far I've only been trialing CONNECT+ECP. BR Sean -- DBA @ WMF

10 years, 1 month

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics April 2014