Hello all!
We have been hard at work on our Graph Split experiment [1], and we now
have a working graph split that is loaded onto 3 test servers. We are
running tests on a selection of queries from our logs to help understand
the impact of the split. We need your help to validate the impact of
various use cases and workflows around Wikidata Query Service.
**What is the WDQS Graph Split experiment?**
We want to address the growing size of the Wikidata graph by splitting it
into 2 subgraphs of roughly half the size of the full graph, which should
support the growth of Wikidata for the next 5 years. This experiment is
about splitting the full Wikidata graph into a scholarly articles subgraph
and a “main” graph that contains everything else.
See our previous update for more details [2].
**Who should care?**
Anyone who uses WDQS through the UI or programmatically should check the
impact on their use cases, scripts, bots, code, etc.
**What are those test endpoints?**
We expose 3 test endpoints, for the full, main and scholarly articles
graphs. Those graphs are all created from the same dump and are not live
updated. This allows us to compare queries between the different endpoints,
with stable / non changing data (the data are from the middle of October
2023).
The endpoints are:
* https://query-full-experimental.wikidata.org/
* https://query-main-experimental.wikidata.org/
* https://query-scholarly-experimental.wikidata.org/
Each of the endpoints is backed by a single dedicated server of performance
similar to the production WDQS servers. We don’t expect performance to be
representative of production due to the different load and to the lack of
updates on the test servers.
**What kind of feedback is useful?**
We expect queries that don’t require scholarly articles to work
transparently on the “main” subgraph. We expect queries that require
scholarly articles to need to be rewritten with SPARQL federation between
the “main” and scholarly subgraphs (federation is supported for some
external SPARQL servers already [3], this just happens to be for internal
server-to-server communication). We are doing tests and analysis based on a
sample of query logs.
**We want to hear about:**
General use cases or classes of queries which break under federation
Bots or applications that need significant rewrite of queries to work with
federation
And also about use cases that work just fine!
Examples of queries and pointers to code will be helpful in your feedback.
**Where should feedback be sent?**
You can reach out to us using the project’s talk page [1], the Phabricator
ticket for community feedback [4] or by pinging directly Sannita (WMF) [5].
**Will feedback be taken into account?**
Yes! We will review feedback and it will influence our path forward. That
being said, there are limits to what is possible. The size of the Wikidata
graph is a threat to the stability of WDQS and thus a threat to the whole
Wikidata project. Scholarly articles is the only split we know of that
would reduce the graph size sufficiently. We can work together on providing
support for a migration, on reviewing the rules used for the graph split,
but we can’t just ignore the problem and continue with a WDQS that provides
transparent access to the full Wikidata graph.
Have fun!
Guillaume
[1]
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split
[2]
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_up…
[3]
https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual#Federation
[4] https://phabricator.wikimedia.org/T356773
[5] https://www.wikidata.org/wiki/User:Sannita_(WMF)
--
Guillaume Lederrey (he/him)
Engineering Manager
Wikimedia Foundation
Hello all!
The Search Platform Team usually holds an open meeting on the first
Wednesday of each month. Come talk to us about anything related to
Wikimedia search, Wikidata Query Service (WDQS), Wikimedia Commons Query
Service (WCQS), etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, February 7, 2024
Time: 16:00-17:00 UTC / 08:00 PST / 11:00 EST / 17:00 CET
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vgj-bbeb-uyi
Join by phone: https://tel.meet/vgj-bbeb-uyi?pin=8118110806927
Have fun and see you soon!
Guillaume
--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello,
Community Wishathon is coming next month: March 15th-17th, 2024! So far, 78
participants have signed up to attend this online event since its first
announcement in December 2023 🎉
You can check out the project ideas and schedule on the event page at <
https://meta.wikimedia.org/wiki/Event:WishathonMarch2024>. These ideas come
from the wishes people share in Community Wishlist Survey <
https://meta.wikimedia.org/wiki/Community_Wishlist_Survey>. If you're a
user, developer, designer, or product lead, you are welcome to join! You
can sign up for the event on the wiki page and add your name, time zone,
and preferred contact method to a project by *February 15th, 2024*.
Before the event, you can learn about the projects and connect with others
who are interested. You can contact them using their preferred contact
method and consider forming project groups on Telegram.
During the event, there will be an opening session to introduce the event
and project ideas, async time for working on projects, self-organized
virtual project group meetings, and a showcase ceremony to share what each
group accomplished during the event. If you want to help with the event or
have questions, let us know on the event's talk page.
Cheers,
Srishti
On behalf of the Wishathon organizing committee
*Srishti Sethi*
Senior Developer Advocate
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello everyone,
The second edition of the Language & Internationalization newsletter
(January 2024) is available at this link: <
https://www.mediawiki.org/wiki/Wikimedia_Language_engineering/Newsletter/20…
>.
This newsletter is compiled by the Wikimedia Language team. It provides
updates from October–December 2023 quarter on new feature development,
improvements in various language-related technical projects and support
efforts, details about community meetings, and contributions ideas to get
involved in projects.
To stay updated, you can subscribe to the newsletter on its wiki page. If
you have any feedback or ideas for topics to feature in the newsletter,
please share them on the discussion page, accessible here: <
https://www.mediawiki.org/w/index.php?title=Talk:Wikimedia_Language_enginee…
>.
Cheers,
Srishti
On behalf of the WMF Language team
*Srishti Sethi*
Senior Developer Advocate
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all,
Wiki Workshop 2024 (now in its 11th edition) will take place as a
standalone virtual event on June 20, 2024. For more information, see the
workshop website: https://wikiworkshop.org/2024/
The call for papers is now open:
https://wikiworkshop.org/2024/call-for-papers
The call is for extended abstracts (2 pages) of ongoing or completed work.
The deadline is April 22. The submissions are non-archival which means you
can submit work that is already published as well!
If you have questions about the workshop, please let us know on this list
or at wikiworkshop(a)googlegroups.com.
On behalf of the organizing committee,
Pablo Aragón, Wikimedia Foundation
Pablo Beytía, Catholic University of Chile
Martin Gerlach, Wikimedia Foundation
Kinneret Gordon, Wikimedia Foundation
Robert West, EPFL
Leila Zia, Wikimedia Foundation
----
We invite contributions to the research track of the 11th edition of Wiki
Workshop, which will take place virtually on June 20, 2024 (tentatively
12:00-19:00 UTC) as a standalone event.
The Wiki Workshop is the largest Wikimedia research event of the year,
aimed at bringing together researchers who study all aspects of Wikimedia
projects (including, but not limited to, Wikipedia, Wikidata, Wikimedia
Commons, Wikisource, and Wiktionary) as well as Wikimedia developers,
affiliate organizations, and volunteer editors. Co-organized by the
Wikimedia Foundation’s Research team and members of the Wikimedia research
community, the workshop provides a direct pathway for exchanging ideas
between the organizations that serve Wikimedia projects and the researchers
actively studying them.
Building on the successful experiences of organizing Wiki Workshop in 2015
<https://wikiworkshop.org/2015/>, 2016 <https://wikiworkshop.org/2016/>,
2017 <https://wikiworkshop.org/2017/>, 2018 <https://wikiworkshop.org/2018/>,
2019 <https://wikiworkshop.org/2019/>, 2020 <https://wikiworkshop.org/2020/>,
2021 <https://wikiworkshop.org/2021/>, 2022 <https://wikiworkshop.org/2022/>,
2023 <https://wikiworkshop.org/2023/> and based on feedback from authors
and participants over the years, this year’s research track is organized as
follows:
-
Submissions are non-archival, meaning we welcome ongoing, completed, and
already published work.
-
We accept submissions in the form of 2-page extended abstracts.
-
Authors of accepted abstracts will be invited to present their research
in a pre-recorded oral presentation with dedicated time for live Q&A on
June 20, 2024.
-
Accepted abstracts will be shared on the website prior to the event.
Topics include, but are not limited to:
-
new technologies and initiatives to grow content, quality, equity,
diversity, and participation across Wikimedia projects;
-
use of bots, algorithms, and crowdsourcing strategies to curate, source,
or verify content and structured data;
-
bias in content and gaps of knowledge on Wikimedia projects;
-
relation between Wikimedia projects and the broader (open) knowledge
ecosystem;
-
exploration of what constitutes a source and how/if the incorporation of
other kinds of sources are possible (e.g., oral histories, video);
-
detection of low-quality, promotional, or fake content (misinformation
or disinformation), as well as fake accounts (e.g., sock puppets);
-
questions related to community health (e.g., sentiment analysis,
harassment detection, tools that could increase harmony);
-
motivations, engagement models, incentives, and needs of editors,
readers, and/or developers of Wikimedia projects;
-
innovative uses of Wikipedia and other Wikimedia projects for AI and NLP
applications and vice versa;
-
consensus-finding and conflict resolution on editorial issues;
-
dynamics of content reuse across projects and the impact of policies and
community norms on reuse;
-
privacy, security, and trust;
-
collaborative content creation;
-
innovative uses of Wikimedia projects' content and consumption patterns
as sensors for real-world events, culture, etc.;
-
open-source research code, datasets, and tools to support research on
Wikimedia contents and communities;
-
connections between Wikimedia projects and the Semantic Web;
-
strategies for how to incorporate Wikimedia projects into media literacy
interventions.
Important dates and timeline:
-
Submission deadline: April 22, 2024 (23:59 AoE
<https://en.wikipedia.org/wiki/Anywhere_on_Earth>)
-
Author notification: May 27, 2024
-
Final version due: June 10, 2024 (23:59 AoE)
-
Workshop date: June 20, 2024
Submission instructions:
https://wikiworkshop.org/2024/call-for-papers#submission
--
Martin Gerlach (he/him) | Senior Research Scientist | Wikimedia Foundation
Dear Community,
My name is Adetayo Boluwatife Christiana, a current Outreachy intern for
Wikimedia contributing to the Project; “Wikidata For Education” or
“WikiCurricula”. I have been following the impactful work that Wikimedia is
doing to promote free knowledge, and I am eager to contribute my skills and
passion to such a noble cause.
To provide you with some context about my background, I have recently
written a blog on myself which can be accessed on my talk page
<https://meta.wikimedia.org/wiki/User:BhbeeX> and I have also been
contributing to this project; WikiCurricula
<https://github.com/wikicurricula-uy/wikicurricula-boilerplate>, a curricula
digitisation project aiming to align Wikimedia projects with school
curricula with the help of Wikidata in three countries Ghana, Uruguay, and
Italy. I wrote a blog introducing WikiCurricula
<https://meta.wikimedia.org/wiki/User:BhbeeX/Think_About_Your_Audience>,
and I’m working towards improving this tool for a better user experience.
During my contribution to this project, I’ve been exposed to other
Wikimedia projects like “Wikidata”, “Wikipedia”, “translateWiki”,
“toolforge” and “metaWiki”.
My motivation to join Wikimedia's FOSS community as a developer is to meet
other contributors and get their help using other Wikimedia tools,
especially help using “TranslateWiki”. I look forward to connecting with
the community to take their ideas and suggestions on the dashboard.
Best regards,
Adetayo Boluwatife Christiana
Dear all,
Wikidata cities in the Ukrainian Odesa Oblast have a specialty:
https://www.wikidata.org/wiki/Q805554
the KOATUU ID is given as deprecated value (since these numbers were only valid until 2020).
What is the SPARQL search that gets these deprecated IDs?
much obliged,
Olaf
Dr. Olaf Simons
Wirtschafts- und Sozialgeschichte
Historisches Datenzentrum Sachsen-Anhalt
Martin-Luther-Universität Halle-Wittenberg
Emil-Abderhalden-Str. 26-27
06108 Halle (Saale)
---
Büro in Gotha:
Forschungszentrum Gotha der Universität Erfurt
Schloßberg 2
99867 Gotha
Privat: Hauptmarkt 17b/ 99867 Gotha
(*apologies for cross-posting*)
Hello,
This is a breaking change announcement relevant to those working with
Lexeme dumps.
In Lexeme dumps, "senses" and "forms" values, when not empty, are shown as
arrays. When these lists are empty, they are currently displayed as
objects. For example, values with content are displayed in array
format: "senses":[{"id":"L4-S1",...]
but empty values are treated as objects: "senses":{}
However, empty lists should be presented as arrays as well: "senses":[]
In this change, empty lists of forms and senses will be switched from
objects to arrays. This adjustment makes the dumps more consistent and
matches the same way non-empty values are presented. We will roll this
change out on February 8th.
We anticipate the impact of this change to be minimal and harmless for most
use cases. Therefore, we haven't generated a test dump, as it would demand
substantial resources and time. If you have any questions or concerns about
this change, please don’t hesitate to reach out to us in this ticket (
T305660 <https://phabricator.wikimedia.org/T305660>).
Cheers,
--
Mohammed S. Abdulai
*Community Communications Manager, Wikidata*
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0) 30 577 116 2466
https://wikimedia.de
Grab a spot in my calendar for a chat: calendly.com/masssly.
A lot is happening around Wikidata - Keep up to date!
<https://www.wikidata.org/wiki/Wikidata:Status_updates> Current news and
exciting stories about Wikimedia, Wikipedia and Free Knowledge in our
newsletter (in German): Subscribe now <https://www.wikimedia.de/newsletter/>
.
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us to achieve our vision!
https://spenden.wikimedia.de
Wikimedia Deutschland — Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Charlottenburg, VR 23855 B.
Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin,
Steuernummer 27/029/42207. Geschäftsführende Vorstände: Franziska Heine,
Dr. Christian Humborg