Hello all!
Here is a short update on what we have been doing around WDQS lately.
The update lag over the last 30 days has been slightly better [1]. We have
not done anything more to improve it, or to analyze why it was less
problematic lately. My guess is that it is a combination of being lucky and
that the self throttling of edit based on the WDQS lag exposed through the
Wikidata API.
We are now collecting more metrics from the WDQS updater [2] and exposing
them through a new dashboard [3]. We are also collecting queries for
analysis. Our hope is that digging into those queries (when we'll have
time) will allow us to discover patterns of queries that might be better
served with a different solution than Blazegraph.
We have loaded Wikidata dumps in Hadoop. This allows us to run analysis
that would not be possible with Blazegraph. For example, we ran an analysis
of the usage of common qualifiers for “unknown value” [4].
<https://phabricator.wikimedia.org/T246238>
There is an ongoing discussion about the use of blank nodes [5]. Blank
nodes are problematic for our updater, as finding them is by design a non
trivial operation. The discussion is still ongoing, but it is likely that
we will need to introduce a breaking change in the way we are using blank
nodes. We will provide an update once we know more precisely what we need
to do and we have a migration path for use cases using them.
We are now focused on a complete rewrite of the WDQS Updater [6]. We are
investigating using Flink [7] as a stream processing solution. This should
allow us to both simplify the update process a lot and make it a lot more
efficient. There is still a lot of work to be done before this is complete,
but we think we have a good path forward.
Misc:
* some aliases for Wikidata have been deployed [8]
As always, thank you for your patience!
Guillaume
[1]
https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&am…
[2]
https://phabricator.wikimedia.org/T239908
[3]
https://grafana.wikimedia.org/d/dSksY08Zk/wikidata-query-service-updater?or…
[4]
https://phabricator.wikimedia.org/T246238
[5]
https://phabricator.wikimedia.org/T244341
[6]
https://phabricator.wikimedia.org/T244590
[7]
https://flink.apache.org/
[8]
https://phabricator.wikimedia.org/T222321
--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+1 / CET