Hi David Causse,

Curious why https://www.wikidata.org/wiki/Q24033349 is not being returned in the below SPARQL?

https://w.wiki/YwL

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 ?instance .
  FILTER(?instance != wd:Q13442814).
  SERVICE wikibase:mwapi {
      bd:serviceParam wikibase:endpoint "www.wikidata.org";
        wikibase:api "EntitySearch";
        mwapi:search "front matter";
        mwapi:language "en".
      ?item wikibase:apiOutputItem mwapi:item.
  }
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
}
LIMIT 100




On Fri, Jul 17, 2020 at 12:37 PM Thad Guidry <thadguidry@gmail.com> wrote:
Thank you so much David!

This was such a great example that I had to add this to our SPARQL Examples page in a new section "Mediawiki API":

The community thanks you sincerely!



On Mon, Jul 13, 2020 at 2:26 AM David Causse <dcausse@wikimedia.org> wrote:
On Sat, Jul 11, 2020 at 7:12 PM Thad Guidry <thadguidry@gmail.com> wrote:
This query times out:

SELECT ?item ?label
WHERE
{
  ?item wdt:P31 ?instance ;
    rdfs:label ?label ;
    rdfs:label ?enLabel .
  FILTER(CONTAINS(lcase(?label), "Soriano")).
  FILTER(?instance != wd:Q5).
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
}
LIMIT 100

I have this feeling that it's not actually using an index or even asking the right question and so is slow and times out?


Indeed, none of the criteria in your query allows the triple store to determine an index to follow to extract the results in a timely manner.
The sole non negative criterion would be FILTER(CONTAINS(lcase(?label), "Soriano")) but being in a FILTER and moreover a function it cannot be used to determine an index to work on.
The only way to speed-up your query would be to introduce a discriminant "matching" criterion. 

However the MediaWiki wbsearchentities API does seem to use an index and is performant for label searching:
https://www.wikidata.org/w/api.php?action=wbsearchentities&search=soriano&language=en


wbsearchentitiies is backed by elasticsearch which is optimized for such lookups.

How can I get my SPARQL query to be more performant or asking the right question?


Unfortunate I don't see an obvious way to adapt your sparql query and keep exactly the same semantic but to illustrate the problem:

SELECT ?item ?label WHERE {
  ?item wdt:P31 ?instance ;
        rdfs:label "Soriano"@en .
  FILTER(?instance != wd:Q5).
}
LIMIT 100

will return results in a timely manner, only because we helped the graph traversal with an initial path on ?item rdfs:label "Soriano"@en.

But by combining the query service and the wikidata API[0] baked by elasticsearch I think you can extract what you want:

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 ?instance .
  FILTER(?instance != wd:Q5).
  SERVICE wikibase:mwapi {
      bd:serviceParam wikibase:endpoint "www.wikidata.org";
        wikibase:api "EntitySearch";
        mwapi:search "soriano";
        mwapi:language "en".
      ?item wikibase:apiOutputItem mwapi:item.
  }
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
}
LIMIT 100

This query will first contact EntitySearch (an alias to wbsearchentities) which will pass the items it found to the triple store which in turn can now query the graph in a timely manner. Obviously this solution only works if the number of items returned by wbsearchentities remains reasonable.

--
David C.
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata