Hi all,
join the teams from Analytics and Research for their monthly office hours
next Wednesday, 2020-06-24 from 9.00-10.00am (UTC)*. Bring all your
research/analytics questions and ideas to discuss projects, data, analysis,
etc. To participate, please join the IRC channel: #wikimedia-research [1].
More detailed information can be found here [2].
Note the earlier starting time to previous meetings -- starting this month
we are experimenting with alternating time-slots from month to month to
provide different options for participation and accommodate a wider range
of timezones.
Looking forward to your participation,
Martin
[1] irc://chat.freenode.net:6667/wikimedia-research
[2] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
* find local times here:
https://www.timeanddate.com/worldclock/fixedtime.html?iso=20200624T09
--
Martin Gerlach
Research Scientist
Wikimedia Foundation
Hi all,
The next Research Showcase will be live-streamed on Wednesday, June 17, at
9:30 AM PDT/16:30 UTC.
In the era of 'information explosion,' we strive to stay informed and
relevant often too quickly, and hence run into the peril of consuming false
or distorted facts. This month, our invited speakers will help us
understand these dynamics, especially in the context of Wikipedia's content
and readership. First, Connie will talk about an initiative she's been
leading to source and rank credible information from the news, and its
overlap with Wikipedia. In the second talk, Tiziano will present his recent
work on quantifying and understanding how the readers of Wikipedia interact
with an article's citations to verify specific claims.
YouTube stream: https://www.youtube.com/watch?v=GS9Jc3IFhVQ
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
Today’s News, Tomorrow’s Reference, and The Problem of Information
Reliability - An Introduction to NewsQ
By: Connie Moon Sehat, NewsQ, Hacks/Hackers
The effort to make Wikipedia more reliable is related to the larger
challenges facing the information ecosystem overall. These challenges
include the discovery of and accessibility to reliable news amid the
transformation of news distribution through platform and social media
products. Connie will present some of the challenges related to the ranking
and recommendation of news that are addressed by the NewsQ Initiative, a
collaboration between the Tow-Knight Center for Entrepreneurial Journalism
at the Craig Newmark Graduate School of Journalism and Hacks/Hackers. In
addition, she’ll share some of the ways that the project intersects with
Wikipedia, such as supporting research around the US Perennial Sources list
(https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Perennial_sources
).
Related resources
-
NewsQ Initiative site (https://newsq.net/)
-
DUE JUNE 15 (Please apply if interested!): Social Science Research
Council Call for Papers, “News Quality in the Platform Era”
https://www.ssrc.org/programs/component/media-democracy/news-quality-in-the…
-
M. Bhuiyan, A. Zhang, C. Sehat, T. Mitra, 2020. Investigating "Who" in
the Crowdsourcing of News Credibility, C+J 2020 (
https://cpb-us-w2.wpmucdn.com/express.northeastern.edu/dist/d/53/files/2020…
)
Quantifying Engagement with Citations on Wikipedia
By: Tiziano Piccardi, EPFL
Wikipedia, the free online encyclopedia that anyone can edit, is one of the
most visited sites on the Web and a common source of information for many
users. As an encyclopedia, Wikipedia is not a source of original
information, but was conceived as a gateway to secondary sources: according
to Wikipedia's guidelines, facts must be backed up by reliable sources that
reflect the full spectrum of views on the topic. Although citations lie at
the very heart of Wikipedia, little is known about how users interact with
them. To close this gap, we built client-side instrumentation for logging
all interactions with links leading from English Wikipedia articles to
cited references for one month and conducted the first analysis of readers'
interaction with citations on Wikipedia. We find that overall engagement
with citations is low: about one in 300 page views results in a reference
click (0.29% overall; 0.56% on desktop; 0.13% on mobile). Matched
observational studies of the factors associated with reference clicking
reveal that clicks occur more frequently on shorter pages and on pages of
lower quality, suggesting that references are consulted more commonly when
Wikipedia itself does not contain the information sought by the user.
Moreover, we observe that recent content, open access sources, and
references about life events (births, deaths, marriages, etc) are
particularly popular. Taken together, our findings open the door to a
deeper understanding of Wikipedia's role in a global information economy
where reliability is ever less certain, and source attribution ever more
vital.
Paper: https://arxiv.org/abs/2001.08614
--
Janna Layton (she, her)
Administrative Assistant - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello Analytics Team-
The Security Team has recently spent some cycles investigating improved
anti-automation (bad bots, high-volume spammers, etc.) solutions,
particularly around an improved Wikimedia captcha. We were curious if your
team has any methods or advice regarding the analysis of nefarious
automated traffic within the context of raw web requests or any other
relevant analytics data. If the answer is "not really", that's fine. But
if there are some relevant tools, methods, research, etc. your team has
performed that you would like to share with us, that would be much
appreciated. If it makes sense to discuss this further during a quick
call, I can try to find some time for a few of us over the next couple of
weeks. We also have an extremely barebones task where we are attempting to
document various methods of measurement which might be helpful:
https://phabricator.wikimedia.org/T255208.
Thanks,
--
Scott Bassett
sbassett(a)wikimedia.org
Hi everybody,
tomorrow morning, June 17th, I'll reboot during EU time stat1007 and
stat1008 for Linux Kernel upgrades. The hosts will be down for a few
minutes, please let me know if this interferes with your work or ongoing
projects.
Luca (on behalf of the Analytics team)
The pageviews statistics for the Italian Wikisource are very confusing
to me:
<https://stats.wikimedia.org/#/it.wikisource.org/reading/total-page-views/no…>
In May there were supposedly more than 5 million pageviews, of which 3M
desktop + 2M mobile and 3M "user" + 2M "spider". Do the "spider"
pageviews include both the desktop and mobile URLs?
Federico
Hi Analytics team,
Quick question:
Does the Clickstream data
<https://dumps.wikimedia.org/other/clickstream/readme.html> lump
together *mobile
and desktop?*
It seems to be hinted at here
<https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream>, but it's
not mentioned explicitly. It just says that the 2015 data is for desktop
only, which seems to imply that after that it's desktop + mobile.
Also, I was wondering if anyone has any insights into what might cause
*referrers
to be empty?* I tried googling, but the issue is clouded in mystery and
seems to depend a lot on browser and website specificities. Any insights
(small or big) would be appreciated!
Thanks a lot!
Bob
Hi everybody,
Turnilo has been upgraded to v1.24.0 (was 1.17.0), all tracked in
https://phabricator.wikimedia.org/T253294. Please let me know in the task
if you see anything weird (a regression, undesired behavior, etc..).
Thanks!
Luca (on behalf of the Analytics team)