How can I speed up the queries processing even more?
imho: drop the unwanted data as early as you can ... ( ~
aggressive prefiltering ; ~ not import )
Any suggestion will be appreciated.
in your case ..
- I will check the RDF dumps ..
https://www.wikidata.org/wiki/Wikidata:Database_download#RDF_dumps
- I will try to write a custom filter for pre-filter for 2 million
parameters ... ( simple text parsing .. in GoLang; using multiple cores
... or with other fast code )
- and just load the results to PostgreSQL ..
I have a good experience - parsing the and filtering the wikidata json dump
(gzipped) .. and loading the result to PostgreSQL database ..
I can run the full code on my laptop .... and the result in my case ~
12 GB in the PostgreSQL ...
the biggest problem .. the memory requirements of "2 million parameters"
.. but you can choose some fast key-value storage .. like RocksDB ...
but there are other low tech parsing solutions ...
Regards,
Imre
Best,
Imre
Adam Sanchez <a.sanchez75(a)gmail.com> ezt írta (időpont: 2020. júl. 13., H,
19:42):
> Hi,
>
> I have to launch 2 million queries against a Wikidata instance.
> I have loaded Wikidata in Virtuoso 7 (512 RAM, 32 cores, SSD disks with
> RAID 0).
> The queries are simple, just 2 types.
>
> select ?s ?p ?o {
> ?s ?p ?o.
> filter (?s = ?param)
> }
>
> select ?s ?p ?o {
> ?s ?p ?o.
> filter (?o = ?param)
> }
>
> If I use a Java ThreadPoolExecutor takes 6 hours.
How can I speed up the queries processing even more?
>
> I was thinking :
>
> a) to implement a Virtuoso cluster to distribute the queries or
> b) to load Wikidata in a Spark dataframe (since Sansa framework is
> very slow, I would use my own implementation) or
> c) to load Wikidata in a Postgresql table and use Presto to distribute
> the queries or
> d) to load Wikidata in a PG-Strom table to use GPU parallelism.
>
> What do you think? I am looking for ideas.
Any suggestion will be appreciated.
>
> Best,
>
> _______________________________________________
> Wikidata mailing list
> Wikidata(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>