On 8/18/21 5:07 PM, Mike Pham wrote:
Wikidata community members,
Thank you for all of your work helping Wikidata grow and improve over
the years. In the spirit of better communication, we would like to
take this opportunity to share some of the current challenges Wikidata
Query Service (WDQS) is facing, and some strategies we have for
dealing with them.
WDQS currently risks failing to provide acceptable service quality due
to the following reasons:
1.
Blazegraph scaling
1.
Graph size. WDQS uses Blazegraph as our graph backend. While
Blazegraph can theoretically support 50 billion edges
<https://blazegraph.com/>, in reality Wikidata is the largest
graph we know of running on Blazegraph (~13 billion triples
<https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&refresh=1m>),
and there is a risk that we will reach a size
<https://www.w3.org/wiki/LargeTripleStores#Bigdata.28R.29_.2812.7B.29>limit
of what it can realistically support
<https://phabricator.wikimedia.org/T213210>. Once Blazegraph
is maxed out, WDQS can no longer be updated. This will also
break Wikidata tools that rely on WDQS.
2.
Software support. Blazegraph is end of life software, which is
no longer actively maintained, making it an unsustainable
backend to continue moving forward with long term.
Blazegraph maxing out in size poses the greatest risk for catastrophic
failure, as it would effectively prevent WDQS from being updated
further, and inevitably fall out of date. Our long term strategy to
address this is to move to a new graph backend that best meets our
WDQS needs and is actively maintained, and begin the migration off of
Blazegraph as soon as a viable alternative is identified
<https://phabricator.wikimedia.org/T206560>.
Hi Mike,
Do bear in mind that pre and post selection of Blazegraph for Wikidata,
we've always offered an RDF-based DBMS that can handle current and
future requirements for Wikidata, just as we do DBpedia.
At the time of our first rendezvous, handling 50 billion triples would
have typically required our Cluster Edition which is a Commercial Only
offering -- basically, that was the deal breaker back then.
Anyway, in recent times, our Open Source Edition has evolved to handle
some 80 Billion+ triples (exemplified by the live Uniprot instance)
where performance and scale is primary a function of available memory.
I hope this helps.
Related:
[1]
https://wikidata.demo.openlinksw.com/sparql
<https://wikidata.demo.openlinksw.com/sparql>-- Our Live Wikidata SPARQL
Query Endpoint
[2]
https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97f…
<https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0>
-- Google Spreadsheet about various Virtuoso Configurations associated
with some well-known public endpoints
[3]
https://t.co/EjAAO73wwE <https://t.co/EjAAO73wwE> -- this query
doesn't complete with the current Blazegraph-based Wikidata endpoint
[4]
https://t.co/GTATPPJNBI <https://t.co/GTATPPJNBI> -- same query
completing when applied to the Virtuoso-based endpoint
[5]
https://t.co/X7mLmcYC69 <https://t.co/X7mLmcYC69> -- about loading
Wikidata's datasets into a Virtuoso instance
[6]
https://twitter.com/search?q=%23Wikidata%20%23VirtuosoRDBMS%20%40kidehen&am…
<https://twitter.com/search?q=%2523Wikidata%20%2523VirtuosoRDBMS%20%2540kidehen&src=typed_query&f=live>
-- various demos shared via Twitter over the years regarding Wikidata
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Home Page:
http://www.openlinksw.com
Community Support:
https://community.openlinksw.com
Weblogs (Blogs):
Company Blog:
https://medium.com/openlink-software-blog
Virtuoso Blog:
https://medium.com/virtuoso-blog
Data Access Drivers Blog:
https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
Personal Weblogs (Blogs):
Medium Blog:
https://medium.com/@kidehen
Legacy Blogs:
http://www.openlinksw.com/blog/~kidehen/
http://kidehen.blogspot.com
Profile Pages:
Pinterest:
https://www.pinterest.com/kidehen/
Quora:
https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter:
https://twitter.com/kidehen
Google+:
https://plus.google.com/+KingsleyIdehen/about
LinkedIn:
http://www.linkedin.com/in/kidehen
Web Identities (WebID):
Personal:
http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
:
http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this