I think the general idea of documenting collections is a good one, though I haven't thought carefully about this or some of the responses already sent. However, I think the use of P361 (part of) for this purpose might not be a good idea and a new property should be proposed for it, or some other mechanism used for large collection handling (collections added through Mix n Match for example generally have external identifiers as their collection-specific properties). My concern here is mainly that the relationship is not generally going to be intrinsic to the item, and is more related to the project doing the import work, while P361 should generally describe some intrinsic relationship that an item has (for example a subsidiary being part of a parent company, a component of a device being part of the device, a research article being part of a particular journal issue, etc).

We do have a very new property that might be useable for this purpose, though it is intended to link to Wikiprojects rather than "collection" items - P4570 (Wikidata project). Or perhaps something similar should be proposed?

   Arthur



On Fri, Nov 24, 2017 at 6:30 PM, Dario Taraborelli <dtaraborelli@wikimedia.org> wrote:
Hey all,

I'd like to hear from you on a proposal to add some order and structure to the various bibliographic corpora we currently have in Wikidata.

As you may know, coverage of creative works in Wikidata has seen significant growth over the last year. [1][2] Different groups and projects have started importing source metadata for various reasons: 
  • to provide sources machine-extracted statements (WikiFactMine [3], StrepHit [4])
  • to represent sources cited in Wikipedia (e.g. DOIs and PMIDs imported via the mwcite identifier dumps) or other Wikimedia projects (Wikisource, Wikispecies, Wikinews)
  • to create collections of the open access literature citable and reusable in Wikimedia projects (e.g. open access PMC review articles)
  • to maintain small, curated corpora about specific topics (e.g. the Zika corpus [5])
While all these efforts have grown organically and with little coordination, it's hard to keep track of who initiated the, to clearly communicate their purpose, to understand their completion criteria and their data quality needs, and last but not least to offer any contribution opportunities (in terms of code, or manual labor) to other community members. It's unclear if the future of these efforts should continue to be within Wikidata, or leverage the power of federated Wikibase-powered wikis (see our discussion at the end of the WikiCite session at WikidataCon [6]). Irrespective of the best long term solution, we need to provide some better structure to these efforts today if we want to address the above problems. 

I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications.
  1. create a Wikidata class called "Wikidata item collection" [Q-X]
  2. create and document individual collections (e.g. the Wikidata Zika corpus [Q-Y]) as instances of this class: [Q-Y] --P31--> [Q-X] 
  3. add appropriate metadata to describe such collections (its main topic(s), creators, any external identifiers, if applicable) 
  4. mark individual bibliographic items as part of [P361] the corresponding collections
Note that this approach can apply to bibliographic item collections but also to any other set of items not directly identifiable via Wikidata properties. Of course, the same items could obviously be part of multiple collections. Some criteria would be needed to determine an appropriate threshold for legitimate collections (we wouldn't want arbitrary collections to be created for sets of items generated as part of a test import).

Beyond solving the issues listed above, this approach would also allow us to generate dedicated statistics on the growth or data quality of each collection via the SPARQL endpoint. It would also allow us to design constraints for arbitrary  item collections, something that right now is not possible (unless these sets can already be identified via a query).

If something similar already exists in the context of structured data donations/imports for GLAM, I'd be most grateful for any pointers.

Dario
 

[1] http://wikicite.org/statistics.html
[2] https://doi.org/10.6084/m9.figshare.5548591.v1
[3] https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine
[4] https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Renewal
[5] https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus
[6] https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidatacon2017-10009-eng-WikiCite_Wikidata_as_a_structured_repository_of_bibliographic_data_hd.mp4

--
Meta: https://meta.wikimedia.org/wiki/WikiCite
Twitter: https://twitter.com/wikicite
---
You received this message because you are subscribed to the Google Groups "wikicite-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wikicite-discuss+unsubscribe@wikimedia.org.