Analytics June 2020

analytics@lists.wikimedia.org

10 participants
7 discussions

Analytics/Research Office hours June: 2020-06-24 at 9.00-10.00 (UTC)

by Martin Gerlach

Hi all, join the teams from Analytics and Research for their monthly office hours next Wednesday, 2020-06-24 from 9.00-10.00am (UTC)*. Bring all your research/analytics questions and ideas to discuss projects, data, analysis, etc. To participate, please join the IRC channel: #wikimedia-research [1]. More detailed information can be found here [2]. Note the earlier starting time to previous meetings -- starting this month we are experimenting with alternating time-slots from month to month to provide different options for participation and accommodate a wider range of timezones. Looking forward to your participation, Martin [1] irc://chat.freenode.net:6667/wikimedia-research [2] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours * find local times here: https://www.timeanddate.com/worldclock/fixedtime.html?iso=20200624T09 -- Martin Gerlach Research Scientist Wikimedia Foundation

3 years, 10 months

[Wikimedia Research Showcase] June 17, 2020: Credibility and Verifiability

by Janna Layton

Hi all, The next Research Showcase will be live-streamed on Wednesday, June 17, at 9:30 AM PDT/16:30 UTC. In the era of 'information explosion,' we strive to stay informed and relevant often too quickly, and hence run into the peril of consuming false or distorted facts. This month, our invited speakers will help us understand these dynamics, especially in the context of Wikipedia's content and readership. First, Connie will talk about an initiative she's been leading to source and rank credible information from the news, and its overlap with Wikipedia. In the second talk, Tiziano will present his recent work on quantifying and understanding how the readers of Wikipedia interact with an article's citations to verify specific claims. YouTube stream: https://www.youtube.com/watch?v=GS9Jc3IFhVQ As usual, you can join the conversation on IRC at #wikimedia-research. You can also watch our past research showcases here: https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase This month's presentations: Today’s News, Tomorrow’s Reference, and The Problem of Information Reliability - An Introduction to NewsQ By: Connie Moon Sehat, NewsQ, Hacks/Hackers The effort to make Wikipedia more reliable is related to the larger challenges facing the information ecosystem overall. These challenges include the discovery of and accessibility to reliable news amid the transformation of news distribution through platform and social media products. Connie will present some of the challenges related to the ranking and recommendation of news that are addressed by the NewsQ Initiative, a collaboration between the Tow-Knight Center for Entrepreneurial Journalism at the Craig Newmark Graduate School of Journalism and Hacks/Hackers. In addition, she’ll share some of the ways that the project intersects with Wikipedia, such as supporting research around the US Perennial Sources list (https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Perennial_sources ). Related resources - NewsQ Initiative site (https://newsq.net/) - DUE JUNE 15 (Please apply if interested!): Social Science Research Council Call for Papers, “News Quality in the Platform Era” https://www.ssrc.org/programs/component/media-democracy/news-quality-in-the… - M. Bhuiyan, A. Zhang, C. Sehat, T. Mitra, 2020. Investigating "Who" in the Crowdsourcing of News Credibility, C+J 2020 ( https://cpb-us-w2.wpmucdn.com/express.northeastern.edu/dist/d/53/files/2020… ) Quantifying Engagement with Citations on Wikipedia By: Tiziano Piccardi, EPFL Wikipedia, the free online encyclopedia that anyone can edit, is one of the most visited sites on the Web and a common source of information for many users. As an encyclopedia, Wikipedia is not a source of original information, but was conceived as a gateway to secondary sources: according to Wikipedia's guidelines, facts must be backed up by reliable sources that reflect the full spectrum of views on the topic. Although citations lie at the very heart of Wikipedia, little is known about how users interact with them. To close this gap, we built client-side instrumentation for logging all interactions with links leading from English Wikipedia articles to cited references for one month and conducted the first analysis of readers' interaction with citations on Wikipedia. We find that overall engagement with citations is low: about one in 300 page views results in a reference click (0.29% overall; 0.56% on desktop; 0.13% on mobile). Matched observational studies of the factors associated with reference clicking reveal that clicks occur more frequently on shorter pages and on pages of lower quality, suggesting that references are consulted more commonly when Wikipedia itself does not contain the information sought by the user. Moreover, we observe that recent content, open access sources, and references about life events (births, deaths, marriages, etc) are particularly popular. Taken together, our findings open the door to a deeper understanding of Wikipedia's role in a global information economy where reliability is ever less certain, and source attribution ever more vital. Paper: https://arxiv.org/abs/2001.08614 -- Janna Layton (she, her) Administrative Assistant - Product & Technology Wikimedia Foundation <https://wikimediafoundation.org/>

3 years, 10 months

nefarious bot/automated traffic analysis

by Scott Bassett

Hello Analytics Team- The Security Team has recently spent some cycles investigating improved anti-automation (bad bots, high-volume spammers, etc.) solutions, particularly around an improved Wikimedia captcha. We were curious if your team has any methods or advice regarding the analysis of nefarious automated traffic within the context of raw web requests or any other relevant analytics data. If the answer is "not really", that's fine. But if there are some relevant tools, methods, research, etc. your team has performed that you would like to share with us, that would be much appreciated. If it makes sense to discuss this further during a quick call, I can try to find some time for a few of us over the next couple of weeks. We also have an extremely barebones task where we are attempting to document various methods of measurement which might be helpful: https://phabricator.wikimedia.org/T255208. Thanks, -- Scott Bassett sbassett(a)wikimedia.org

3 years, 10 months

Upcoming reboot of stat1007 and stat1008 on June 17th

by Luca Toscano

Hi everybody, tomorrow morning, June 17th, I'll reboot during EU time stat1007 and stat1008 for Linux Kernel upgrades. The hosts will be down for a few minutes, please let me know if this interferes with your work or ongoing projects. Luca (on behalf of the Analytics team)

3 years, 10 months

Wikisource pageviews by agent and method

by Federico Leva (Nemo)

The pageviews statistics for the Italian Wikisource are very confusing to me: <https://stats.wikimedia.org/#/it.wikisource.org/reading/total-page-views/no…> In May there were supposedly more than 5 million pageviews, of which 3M desktop + 2M mobile and 3M "user" + 2M "spider". Do the "spider" pageviews include both the desktop and mobile URLs? Federico

3 years, 10 months

Clickstream: mobile vs. desktop, empty referrers

by Robert West

Hi Analytics team, Quick question: Does the Clickstream data <https://dumps.wikimedia.org/other/clickstream/readme.html> lump together *mobile and desktop?* It seems to be hinted at here <https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream>, but it's not mentioned explicitly. It just says that the 2015 data is for desktop only, which seems to imply that after that it's desktop + mobile. Also, I was wondering if anyone has any insights into what might cause *referrers to be empty?* I tried googling, but the issue is clouded in mystery and seems to depend a lot on browser and website specificities. Any insights (small or big) would be appreciated! Thanks a lot! Bob

3 years, 10 months

Turnilo upgraded to 1.24.0 (latest upstream)

by Luca Toscano

Hi everybody, Turnilo has been upgraded to v1.24.0 (was 1.17.0), all tracked in https://phabricator.wikimedia.org/T253294. Please let me know in the task if you see anything weird (a regression, undesired behavior, etc..). Thanks! Luca (on behalf of the Analytics team)

3 years, 11 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics June 2020