Hello all!
The Search Platform Team usually holds an open meeting on the first
Wednesday of each month. Come talk to us about anything related to
Wikimedia search, Wikidata Query Service (WDQS), Wikimedia Commons Query
Service (WCQS), etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, February 1st, 2023
Time: 16:00-17:00 UTC / 08:00 PDT / 11:00 EDT / 17:00 CET
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vgj-bbeb-uyi
Join by phone: https://tel.meet/vgj-bbeb-uyi?pin=8118110806927
Have fun and see you soon!
Guillaume
--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
Dear Wikimedia,
Let me introduce myself first. My name is Ivan Heibi, I am a researcher at the University of Bologna working at OpenCitations (directed by Silvio Peroni) as the responsible of the technical infrastructure.
We are currently facing a technical issue while managing our triplestore I wanted to share with you, hoping that maybe your expertise regarding similar issues might give us some new insights to help us deal with it. Thank you in advance for your time and support, here I will briefly explain you the issue.
Currently OpenCitations stores and maintain its data (citations and bibliographic metadata) in one big triplestore (JNL format) using the Blazegraph database. The size of the current JNL file has reached almost 1.5T, and this JNL file is regularly updated (almost every two months) with new triples (data regarding new citations). However, it seems that the current JNL file does not accept any further addition of data, yet its size and total number of triples (almost 8 billion) is less than the limits that Blazegraph states (50 billion). Therefore, any attempt to DATA LOAD additional triples to the JNL file makes the process hanging forever, with no effects on the triplestore.
We tried to LOAD new data into the JNL file using different properties when lanching the Blazegraph triplestore, yet all the tests we have tried gave us the same negative results.
Did you ever face a similar behaviour? are you aware of some limits that Blazegraph has (that we are ignoring)? What are the solutions you have adopted and suggest in order to deal with such issues (in case you have faced such problems)?
Thank you in advance for your support and help,
Have a nice day,
Ivan Heibi
----------------------------------------------------------------
Ivan Heibi, Ph.D.
Digital Humanities Advanced Research Centre (DHARC),
Department of Classical Philology and Italian Studies,
University of Bologna, Bologna (Italy)
E-mail: ivan.heibi2(a)unibo.it<mailto:ivan.heibi2@unibo.it>
Twitter: @ivanHeiB<https://twitter.com/ivanheib>
Personal web site: ivanhb.it<http://ivanhb.it>
University web page: unibo.it/sitoweb/ivan.heibi2<https://www.unibo.it/sitoweb/ivan.heibi2/>
Hello all!
The Search Platform Team usually holds an open meeting on the first
Wednesday of each month. Come talk to us about anything related to
Wikimedia search, Wikidata Query Service (WDQS), Wikimedia Commons Query
Service (WCQS), etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, January 11, 2023
Time: 16:00-17:00 UTC / 08:00 PDT / 11:00 EDT / 17:00 CET
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vgj-bbeb-uyi
Join by phone: https://tel.meet/vgj-bbeb-uyi?pin=8118110806927
Have fun and see you soon!
--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi,
I am getting frequent timeouts trying to use the SPARQL endpoint GUI at https://query.wikidata.org/ .
I'll admit, I have some complex queries, bu I really feel like this is something that the system should be able to handle or at least allow me to request a longer timeout wait.
For example, this query:
SELECT ?item ?item2
WHERE
{
?item wdt:P625 ?location .
?item <http://www.w3.org/2002/07/owl#sameAs> ?item2 .
}
LIMIT 10
or this query:
SELECT DISTINCT ?item ?itemname ?location
WHERE {
?item wdt:P625 ?location ;
wdt:P31 ?type ;
rdfs:label ?itemname.
?type wdt:P279 ?supertype .
FILTER(
LANG(?itemname) = "en" &&
?supertype not in (wd:Q5, wd:Q4991371, wd:Q7283, wd:Q36180, wd:Q7094076, wd:Q905511, wd:Q1063801,
wd:Q1062856, wd:Q35127, wd:Q68, wd:Q42848, wd:Q2858615, wd:Q241317 , wd:Q1662611, wd:Q7397, wd:Q151885,
wd:Q1301371, wd:Q1068715, wd:Q7366 , wd:Q18602249, wd:Q16521, wd:Q746549, wd:Q13485782, wd:Q36963)
)
}
LIMIT 200000
When I use python SPARQLwrapper things improve somewhat, but still timeout on some of my queries.
I tried the first query above on an old wikidata dump we have from 2021 that we loaded on Jena TDB and it managed to complete it (0 results, but I had to run it to figure that out...).
Seems strange to get such poor performance.
Cheers
Tomer
Hi,
I am getting frequent timeouts trying to use the SPARQL endpoint GUI at https://query.wikidata.org/ .
I'll admit, I have some complex queries, bu I really feel like this is something that the system should be able to handle or at least allow me to request a longer timeout wait.
For example, this query:
SELECT ?item ?item2
WHERE
{
?item wdt:P625 ?location .
?item <http://www.w3.org/2002/07/owl#sameAs> ?item2 .
}
LIMIT 10
or this query:
SELECT DISTINCT ?item ?itemname ?location
WHERE {
?item wdt:P625 ?location ;
wdt:P31 ?type ;
rdfs:label ?itemname.
?type wdt:P279 ?supertype .
FILTER(
LANG(?itemname) = "en" &&
?supertype not in (wd:Q5, wd:Q4991371, wd:Q7283, wd:Q36180, wd:Q7094076, wd:Q905511, wd:Q1063801,
wd:Q1062856, wd:Q35127, wd:Q68, wd:Q42848, wd:Q2858615, wd:Q241317 , wd:Q1662611, wd:Q7397, wd:Q151885,
wd:Q1301371, wd:Q1068715, wd:Q7366 , wd:Q18602249, wd:Q16521, wd:Q746549, wd:Q13485782, wd:Q36963)
)
}
LIMIT 200000
When I use python SPARQLwrapper things improve somewhat, but still timeout on some of my queries.
I tried the first query above on an old wikidata dump we have from 2021 that we loaded on Jena TDB and it managed to complete it (0 results, but I had to run it to figure that out...).
Seems strange to get such poor performance.
Cheers
Tomer
Hello all!
The Search Platform Team usually holds an open meeting on the first
Wednesday of each month. Come talk to us about anything related to
Wikimedia search, Wikidata Query Service (WDQS), Wikimedia Commons Query
Service (WCQS), etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, December 7, 2022
Time: 16:00-17:00 UTC / 08:00 PDT / 11:00 EDT / 17:00 CET
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vgj-bbeb-uyi
Join by phone: https://tel.meet/vgj-bbeb-uyi?pin=8118110806927
Have fun and see you soon!
Guillaume
--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello all!
The Search Platform Team usually holds an open meeting on the first
Wednesday of each month. Come talk to us about anything related to
Wikimedia search, Wikidata Query Service (WDQS), Wikimedia Commons Query
Service (WCQS), etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, November 2nd, 2022
Time: 15:00-16:00 UTC / 08:00 PDT / 11:00 EDT / 16:00 CEST / 19:00 GST
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vgj-bbeb-uyi
Join by phone: https://tel.meet/vgj-bbeb-uyi?pin=8118110806927
Have fun and see you soon!
Guillaume
--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello all!
For the last 2 quarters, the Search Platform team has been working on
upgrading our Elasticsearch clusters to version 7.10.2 [1]. Keeping our
software up to date is part of the usual project hygiene, allowing us to
benefit from bugs and security fixes, performance improvements, and new
features. In our case, upgrading to Elasticsearch 7.10.2 is also a required
step towards a potential move to OpenSearch [2].
After much testing, fixes and validations, we are now ready to start the
final migration process. We are anticipating a 3-week migration process,
starting on August 29 2022. You can follow along on Phabricator [3].
What does this mean for you?
For users of Special:Search, Special:MediaSearch and other user-facing
Search interfaces, the upgrade should be fully seamless, and should not
cause any disruptions to normal usage.
For users of Cloudelastic [4] who are accessing the Elasticsearch API
directly, there might be minor API changes that could affect your queries.
Please review the documented breaking changes [5]. Most of the breaking
changes are not related to queries, so it is unlikely that any client code
will break with this upgrade.
If you have any questions about this process, you can find us in
#wikimedia-search on IRC, or at discovery(a)lists.wikimedia.org. Have fun!
The Search Platform team
[1] https://phabricator.wikimedia.org/T263142
[2] https://phabricator.wikimedia.org/T280482
[3] https://phabricator.wikimedia.org/T308676
[4]
https://wikitech.wikimedia.org/wiki/Help:CirrusSearch_elasticsearch_replicas
[5]
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/breaking-chang…
--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
The Search Platform Team
<https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds an
open meeting on the first Wednesday of each month. Come talk to us about
anything related to Wikimedia search, Wikidata Query Service (WDQS),
Wikimedia Commons Query Service (WCQS), etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, September 7th, 2022
Time: 15:00-16:00 UTC / 08:00 PDT / 11:00 EDT / 17:00 CEST / 19:00 GST
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vgj-bbeb-uyi
Join by phone: https://tel.meet/vgj-bbeb-uyi?pin=8118110806927
Hope to talk to you next week!
—Trey
Trey Jones
Staff Computational Linguist, Search Platform
Wikimedia Foundation
UTC–4 / EDT