Hi Maarten,
Thank you very much for your answer and your pointers. The page (which I
did not know existed) containing a federated SPARQL query is definitely
close to what I mean. It just misses one more step: deciding who is right.
If we look at the first result of the table
<https://www.wikidata.org/wiki/Property_talk:P1006/Mismatches> of
mismatches (Dmitry Bortniansky <https://www.wikidata.org/wiki/Q316505>) and
we draw a little graph, the result is:
[image: Diagram.png]
We can see that the error comes (probably) from Viaf, which contains a
duplicate, and from NTA, which obviously created an authority based on this
bad Viaf ID.
My research is very close to this kind of case, and I am very interested to
know what is already implemented in Wikidata.
Cheers,
Ettore Rizza
On Sat, 29 Sep 2018 at 13:03, Maarten Dammers <maarten(a)mdammers.nl> wrote:
Hi Ettore,
On 26-09-18 14:31, Ettore RIZZA wrote:
Dear all,
I'm looking for Wikidata bots that perform accuracy audits. For
example, comparing the birth dates of persons with the same date
indicated in databases linked to the item by an external-id.
Let's have a look
at the evolution of automated editing. The first step
is to add missing data from anywhere. Bots importing date of birth are
an example of this. The next step is to add data from somewhere with a
source or add sources to existing unsourced or badly sourced statements.
As far as I can see that's where we are right now, see for example edits
like
https://www.wikidata.org/w/index.php?title=Q41264&type=revision&dif…
is . Of course the next step would be to be able to compare existing
sourced statements with external data to find differences. But how would
the work flow be? Take for example Johannes Vermeer (
https://www.wikidata.org/wiki/Q41264 ). Extremely well documented and
researched, but
http://www.getty.edu/vow/ULANFullDisplay?find=&role=&nation=&su…
and
https://rkd.nl/nl/explore/artists/80476 combined provide 3 different
dates of birth and 3 different dates of death. When it comes to these
kind of date mismatches, it's generally first come, first served (first
date added doesn't get replaced). This mismatch could show up in some
report. I can check it as a human and maybe do some adjustments, but how
would I sign it of to prevent other people from doing the same thing
over and over again?
With federated SPARQL queries it becomes much easier to generate reports
of mismatches. See for example
https://www.wikidata.org/wiki/Property_talk:P1006/Mismatches .
Maarten
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata