Hello,


As you may know, the Wikidata development team has been working on a tool that lets editors review mismatching data between Wikidata and external databases. The tool is now ready to be used, and you can access it here and read more details on Wikidata:Mismatch Finder. We hope that this tool can be useful to people who are working on data quality and matching external databases with Wikidata, and we are looking forward to your feedback if you give it a try!


What is the purpose of Mismatch Finder?

The tool helps highlight differences in the data between Wikidata and other databases, in order to improve data quality in Wikidata and make the whole linked open data web more robust. The tool itself doesn’t check these databases automatically: it is necessary for someone to compare an external database to Wikidata first and then upload a list of possible mismatches into the Mismatch Finder, so they can be analyzed and processed by Wikidata editors.


By providing such a tool, we hope to support the Wikidata editors to spot and fix mistakes in Wikidata as well as organizations reusing Wikidata’s data, who now have a convenient way to contribute back by reporting lists of possible mismatches.


How to use the tool to check mismatches?

On the Mismatch Finder tool page, you can check Items by entering a list of Q-IDs (for example taken from a SPARQL query). After clicking on “Check Items”, the tool will check if there are mismatches for these Items in the mismatch store, and display any issue that was found with a specific part of the data.


From this page and after logging in with your Wikidata account via OAuth, you will be able to choose a status of the mismatch, indicating what part of the data is wrong, and to access the Item on wikidata.org to edit the data if needed. Mismatch Finder does not perform any automatic editing on Wikidata.


Once the status is changed from “waiting for review” to another value, the mismatch will not appear in the list anymore. 


You can also use the Mismatch Finder user script that will display an alert at the top of the Item pages on wikidata.org and a link to the Mismatch Finder tool to learn more about the potential mismatches. See Help:User scripts for how to enable the user script for your account.


Where does the information come from?

Information about the potential mismatches is stored in the Mismatch Store, a database separate from Wikidata where organizations, researchers and editors can upload lists of mismatches.


The Mismatch Store is hosted on Toolforge and its content can be accessed via an API. You can find more information about the database, how to get data from the API, how to prepare and upload a mismatches file in this user guide.


We hope that the Mismatch Finder tool will help to build up feedback loops with data re-users to get them actively involved in improving the data on Wikidata. Feel free to try out the tool and let us know what you think on the talk page. You can also join us for an intro session and discussion at the upcoming Data Reuse Days.


For a quick intro to and demo of how the Mismatches tool works, please see this short video.


We would especially like to thank Mike Peel and Marco Fosatti, for providing the first mismatches and real-world testing data for the Mismatch Finder to get us started. More will follow in the next days and weeks.


Cheers,

--
Mohammed Sadat
Community Communications Manager for Wikidata/Wikibase

Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
https://wikimedia.de

Keep up to date! Current news and exciting stories about Wikimedia, Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now.

Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision!
https://spenden.wikimedia.de

Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.