Hi all,
Just curious if there is a known cause for the multiple long delays we've
had on the AQS API data being available this week? I know periodic delays
are not uncommon but these seem beyond normal levels.
Thanks!
~Josh
Hi all,
Please see the call for papers for the 10th edition of Wiki Workshop below.
The call is for extended abstracts (2 pages) of ongoing or completed work.
The deadline is March 23. The submissions are non-archival which means you
can submit work that is already published as well! :)
Submit and join us in conversations about research on the Wikimedia
projects.
Best,
Leila
--
Leila Zia
Head of Research
Wikimedia Foundation
---------- Forwarded message ---------
From: Martin Gerlach <mgerlach(a)wikimedia.org>
Date: Mon, Feb 20, 2023 at 1:29 AM
Subject: [Wiki-research-l] [events] Wiki Workshop 2023 Call for Papers
To: <wiki-research-l(a)lists.wikimedia.org>
Hi everyone,
The call for papers for the 10th Wiki Workshop in 2023 is out:
https://wikiworkshop.org/2023/#call Submit your 2-page abstracts by March
23 (all submissions are non-archival). The workshop will take place on May
11, 2023. For more information, see the workshop website [1].
If you have questions about the workshop, please let us know on this list
or at wikiworkshop(a)googlegroups.com.
Looking forward to seeing many of you in this year's edition.
Best,
Pablo Aragón, Wikimedia Foundation
Martin Gerlach, Wikimedia Foundation
Evelin Heidel, Wikimedistas de Uruguay
Emily Lescak, Wikimedia Foundation
Francesca Tripodi, University of North Carolina
Bob West, EPFL
Leila Zia, Wikimedia Foundation
[1] https://wikiworkshop.org/2023/
—
We invite contributions to the 10th edition (!) of Wiki Workshop, which
will take place virtually on May 11, 2023 (tentatively 12:00-19:00 UTC).
Wiki Workshop is the largest Wikimedia research event of the year, aimed at
bringing together researchers who study all aspects of Wikimedia projects
(including, but not limited to, Wikipedia, Wikidata, Wikimedia Commons,
Wikisource, and Wiktionary) as well as Wikimedia developers, affiliate
organizations, and volunteer editors. Co-organized by the Wikimedia
Foundation’s Research team and members of the Wikimedia research community,
the workshop facilitates a direct pathway for exchanging ideas between the
organizations that serve Wikimedia projects and the researchers actively
studying them. New this year: Building on the successful experiences of
organizing Wiki Workshop in 2015 <https://wikiworkshop.org/2015/>, 2016
<https://wikiworkshop.org/2016/>, 2017 <https://wikiworkshop.org/2017/>,
2018 <https://wikiworkshop.org/2018/>, 2019 <https://wikiworkshop.org/2019/>
, 2020 <https://wikiworkshop.org/2020/>, 2021
<https://wikiworkshop.org/2021/>, and 2022 <https://wikiworkshop.org/2022/>
and based on feedback from authors and participants over the years, we are
introducing a few updates to the research track of the workshop for 2023:
-
This 10th edition will take place as a standalone event (rather than in
co-location with a conference, as in previous years).
-
We have changed the format of submissions and will only accept 2-page
extended abstracts (following the successful IC2S2 model).
-
Submissions are non-archival, so we welcome ongoing, completed, and
already published work.
-
We are excited to share that the authors of Wiki Workshop 2023 will have
the opportunity to receive feedback, improve their work, and submit the
extended version of their research paper to a special issue of the ACM
Transactions on the Web, which will have a dedicated open call for papers
later in 2023.
Topics include, but are not limited to:
-
new technologies and initiatives to grow content, quality, equity,
diversity, and participation across Wikimedia projects
-
use of bots, algorithms, and crowdsourcing strategies to curate, source,
or verify content and structured data
-
bias in content and gaps of knowledge on Wikimedia projects
-
relation between Wikimedia projects and the broader (open) knowledge
ecosystem
-
exploration of what constitutes a source and how/if the incorporation of
other kinds of sources are possible (e.g., oral histories, video)
-
detection of low-quality, promotional, or fake content (misinformation
or disinformation), as well as fake accounts (e.g., sock puppets)
-
questions related to community health (e.g., sentiment analysis,
harassment detection, tools that could increase harmony)
-
motivations, engagement models, incentives, and needs of editors,
readers, and/or developers of Wikimedia projects
-
innovative uses of Wikipedia and other Wikimedia projects for AI and NLP
applications and vice versa
-
consensus-finding and conflict resolution on editorial issues
-
dynamics of content reuse across projects and the impact of policies and
community norms on reuse privacy, security, and trust
-
collaborative content creation
-
innovative uses of Wikimedia projects' content and consumption patterns
as sensors for real-world events, culture, etc.
-
open-source research code, datasets, and tools to support research on
Wikimedia contents and communities
-
connections between Wikimedia projects and the Semantic Web
-
strategies for how to incorporate Wikimedia projects into media literacy
interventions
This year’s Wiki Workshop solicits extended abstracts (PDF format, maximum
2 pages, including references). Submissions that exceed the 2-page limit
will be automatically rejected. Authors may include 1 additional page with
figures and/or tables (including captions) only. Initial submissions
require names and affiliations of authors, 5 keywords, a title, abstract,
and a main text outlining the contribution, methods, findings, and impact
of the work, whichever is relevant. Submissions will be non-archival and as
a result may have already been published, under review, or ongoing
research. All submissions will be reviewed by multiple members of the Wiki
Workshop Program Committee. The names of the authors will be revealed to
the reviewers, whereas reviewers will remain anonymous to authors. Authors
of accepted abstracts will be invited to present their research in a
pre-recorded oral presentation with dedicated time for live Q&A on May 11,
2023. Accepted abstracts may be shared on the website prior to the event.
The template for formatting the submission as well as the submission link
to easychair will be made available by February 23.
--
Martin Gerlach (he/him) | Senior Research Scientist | Wikimedia Foundation
_______________________________________________
Wiki-research-l mailing list -- wiki-research-l(a)lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-leave(a)lists.wikimedia.org
Hello everyone,
The next Research Showcase will be livestreamed next Wednesday, February 15
at 9:30AM PT / 17:30 UTC. The theme is The Free Knowledge Ecosystem.
YouTube stream: https://www.youtube.com/watch?v=8VJmR-3lTac
We welcome you to join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
The evolution of humanitarian mapping in OpenStreetMap (OSM) and how it
affects map completeness and inequalities in OSMBy *Benjamin Herfort,
Heidelberg Institute for Geoinformation Technology*Mapping efforts of
communities in OpenStreetMap (OSM) over the previous decade have created a
unique global geographic database, which is accessible to all with no
licensing costs. The collaborative maps of OSM have been used to support
humanitarian efforts around the world as well as to fill important data
gaps for implementing major development frameworks such as the Sustainable
Development Goals (SDGs). Besides the well-examined Global North - Global
South bias in OSM, the OSM data as of 2023 shows a much more spatially
diverse spread pattern than previously considered, which was shaped by
regional, socio-economic and demographic factors across several scales.
Humanitarian mapping efforts of the previous decade have already made OSM
more inclusive, contributing to diversify and expand the spatial footprint
of the areas mapped. However, methods to quantify and account for the
remaining biases in OSM’s coverage are needed so that researchers and
practitioners will be able to draw the right conclusions, e .g. about
progress towards the SDGs in cities.
Dataset reuseː Toward translating principles to practiceBy *Laura Koesten,
University of Vienna*The web provides access to millions of datasets. These
data can have additional impact when used beyond the context for which they
were originally created. But using a dataset beyond the context in which it
originated remains challenging. Simply making data available does not mean
it will be or can be easily used by others. At the same time, we have
little empirical insight into what makes a dataset reusable and which of
the existing guidelines and frameworks have an impact.In this talk, I will
discuss our research on what makes data reusable in practice. This is
informed by a synthesis of literature on the topic, our studies on how
people evaluate and make sense of data, and a case study on datasets on
GitHub. In the case study, we describe a corpus of more than 1.4 million
data files from over 65,000 repositories. Building on reuse features from
the literature, we use GitHub’s engagement metrics as proxies for dataset
reuse and devise an initial model, using deep neural networks, to predict a
dataset’s reusability. This demonstrates the practical gap between
principles and actionable insights that might allow data publishers and
tool designers to implement functionalities that facilitate reuse.
We hope you can join us!
Warm regards,
Emily
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
Hi Willy,
(Forwarding your question to the public analytics list for others who might
know more.)
> Do you have any data that shows how many times audio files were
downloaded in 2022?
I think your best bet is the Mediacounts dataset
<https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts>,
which is available in a public API
<https://wikitech.wikimedia.org/wiki/Analytics/AQS/Mediarequests>. E.g.,
to get # requested of audio downloads in 2022:
https://wikimedia.org/api/rest_v1/metrics/mediarequests/aggregate/all-refer…
However, it doesn't look like data transfer details are available in the
Public API. The backing dataset in Hive does have a total_response_size field
so you could probably get this info more specifically by querying for it in
Hive.
Good luck!
On Wed, Feb 1, 2023 at 7:11 PM Willy Pao <wpao(a)wikimedia.org> wrote:
> Hey Andrew - hope all is going well. I've been working on gathering some
> data for Wikimedia's Annual Sustainability Report, and there was a question
> that Deb sent over regarding the usage of Audio files. With Jaime's help
> from Data Persistence SRE, we were able to figure out some of the numbers
> around storage and energy consumption. There was one part I was hoping you
> (or someone from your team) might be able to help with though. Do you have
> any data that shows how many times audio files were downloaded in 2022?
> Much appreciated in advance.
>
> Thanks,
> Willy
>
> ---------- Forwarded message ---------
> From: Deb Tankersley <dtankersley(a)wikimedia.org>
> Date: Mon, Jan 30, 2023 at 1:41 PM
> Subject: energy used to store
> To: Willy Pao <wpao(a)wikimedia.org>, Erin Morris <emorris(a)wikimedia.org>,
> Cassie Casares <ccasares(a)wikimedia.org>
>
>
> Hey Willy!
>
> I got an interesting question (bolded below) from Wikimedia Sweden on the
> energy that we use to store and serve audio files. Here's their full
> comment / question:
>
> *"As part of my yearly planning for 2023, we are conducting a study
>> regarding digitization of audio tapes, which climate footprints the various
>> stages in the process generate and whether some of these can be made more
>> energy efficient. We have limited the study to audio tapes, because it is a
>> prioritized material category and a very data-intensive business, and
>> because the limitation hopefully gives us relatively accurate numbers.
>> Since we have been publishing digital audio originally from audio tapes on
>> Wikimedia Commons for the past few years, I was wondering if there are any
>> statistics related to energy consumption and carbon dioxide emissions
>> available?*
>>
>>
>> *What we would like to know is how much energy is required in the year
>> 2022 to store our total amount of uploaded audio files (with the exception
>> of Karl Tirén's phonograph recordings), how many times they have been
>> downloaded and how large a total amount of data is involved. We suspect
>> that downloading the high-resolution audio files is also relatively data
>> intensive. As mentioned, the goal is not to stop this activity, or even
>> reduce it without seeing how it looks and then investigating whether there
>> are any links in the chain that can be tweaked to possibly reduce the
>> climate impact. If numbers cannot be obtained, this is also valuable
>> information."*
>>
>
>
> I'm not sure if we can narrow down this enough to get them a decent /
> solid answer. What are your thoughts?
>
>
> Thanks,
>
>
> Deb
>
> --
>
> deb tankersley (she/her)
>
> senior program manager, engineering
>
> Wikimedia Foundation
>
>
>
>
>
Hi,
I just enrolled this list, thanks to Dan Andreescu, who let me know
about it, and I have a question on processing clickstream data.
I downloaded a file for last month clickstream data
(https://dumps.wikimedia.org/other/clickstream/2022-12/clickstream-eswiki-20…)
and have problems to open it and processing it.
The only programme I could open it was OpenRefine. Other programmes
(Numbers and LibreOffice) just couldn't cope with it.
I can use OpenRefine to do some transformation and delete some rows I
don't need, but even then, with some 1.5milion rows, I can not open it
with numbers or libreoffice to do sum of the column 4.
Which tools do you use to work with such big files?
Thanks.
--
========================
Robert Garrigós i Castro
https://garrigos.cat
+34 620 91 87 01
Hi:
As it is the first time I'm working in Wikimedia analytics I found a case
that was weird to me. In some cases I can't get data from the API.
- en.wikivoyage.org
- Culturally significant landscapes in Jaén
- 2022121700
- API call: https://w.wiki/6DjC
I got the «The date(s) you used are valid, but we either do not have data
for those date(s)» message, which looks strange to me because the resource
exists as can be checked:
- 2022121600
- API call: https://w.wiki/6DjE
If there is no visit for 2022121700 I would have expected a correct
response with value=0.
Is this the expected behavior or I have found a glitch? I found a few other
cases, so I prefer to ask here.
Thanks.
--
Ismael Olea
http://olea.org/diario/
Hi:
Do we have tools, metrics or traces about the evolution of quality in
articles? Or something like that. Not sure if the ORES technology is
appropriate for it.
--
Ismael Olea
http://olea.org/diario/
Hello everyone,
The next Research Showcase, focused on Editor Retention, will be
live-streamed Wednesday, January 18. Find your local time here
<https://zonestamp.toolforge.org/1674063059>.
YouTube stream: https://www.youtube.com/watch?v=gS8ELcVZ8Q4
You can join the conversation on IRC at #wikimedia-research. You can also
watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
Vital Signsː Measuring Wikipedia Communities’ HealthBy *Cristian Consonni,
Eurecat - Centre Tecnològic de Catalunya, Barcelona*Community health in
Wikipedia is a complex topic that has been at the center of discussion for
Wikipedia and the scientific community for years. Researchers observed that
the number of active editors for the largest Wikipedias started declining
after an initial phase of exponential growth. Some media outlets picked
this fact as a death announcement for the project, but the news of
Wikipedia's death turned out to be greatly exaggerated. However, it remains
true that researchers and community activists need to understand how to
measure community health and describe it more accurately. In this
presentation, we would like to go beyond the traditional metrics used to
describe the status of the community. We propose the creation of 6 sets of
language-independent indicators that we call "Vital Signs." We borrow the
analogy from the medical field, as these indicators represent a first step
in defining the health status of a community; they can constitute a
valuable reference point to foresee and prevent future risks. We present
our analysis for several Wikipedia language editions, showing that
communities renew their productive force even with stagnating absolute
numbers; we observe a general need for renewal in positions related to
particular functions or administratorship. We created a dashboard to
visualize all the indicators we have computed and hope that the communities
will find it helpful for improving their health.
- Paperː Community Vital Signs: Measuring Wikipedia Communities’
Sustainable Growth and Renewal
<https://meta.wikimedia.org/wiki/File:Community_Vital_Signs_Research_Paper_-…>
Learning to Predict the Departure Dynamics of Wikidata EditorsBy *Guangyuan
Piao, Maynooth University*Wikidata as one of the largest open collaborative
knowledge bases has drawn much attention from researchers and practitioners
since its launch in 2012. As it is collaboratively developed and maintained
by a community of a great number of volunteer editors, understanding and
predicting the departure dynamics of those editors are crucial but have not
been studied extensively in previous works. In this paper, we investigate
the synergistic effect of two different types of features: statistical and
pattern-based ones with DeepFM as our classification model which has not
been explored in a similar context and problem for predicting whether a
Wikidata editor will stay or leave the platform. Our experimental results
show that using the two sets of features with DeepFM provides the best
performance regarding AUROC (0.9561) and F1 score (0.8843), and achieves
substantial improvement compared to using either of the sets of features
and over a wide range of baselines.
- Paperː Learning to Predict the Departure Dynamics of Wikidata Editors
<https://parklize.github.io/publications/ISWC2021.pdf>
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation