Hi all,
You should in any case be sure to avoid allowing collections which fall
in Russell's paradox
<https://en.wikipedia.org/wiki/Russell%27s_paradox>. So if a predicate
"belongs to collection QX" is added such that an Wikidata item can be
stated as being part of an other, it must be envisionned that at some
point a request my aske "What is the collection of items that do not
belongs to themselves?".
Paradoxically logical,
mathieu
Le 27/11/2017 à 02:07, Arthur Smith a écrit :
I think the general idea of documenting collections is
a good one,
though I haven't thought carefully about this or some of the responses
already sent. However, I think the use of P361 (part of) for this
purpose might not be a good idea and a new property should be proposed
for it, or some other mechanism used for large collection handling
(collections added through Mix n Match for example generally have
external identifiers as their collection-specific properties). My
concern here is mainly that the relationship is not generally going to
be intrinsic to the item, and is more related to the project doing the
import work, while P361 should generally describe some intrinsic
relationship that an item has (for example a subsidiary being part of
a parent company, a component of a device being part of the device, a
research article being part of a particular journal issue, etc).
We do have a very new property that might be useable for this purpose,
though it is intended to link to Wikiprojects rather than "collection"
items - P4570 (Wikidata project). Or perhaps something similar should
be proposed?
Arthur
On Fri, Nov 24, 2017 at 6:30 PM, Dario Taraborelli
<dtaraborelli(a)wikimedia.org <mailto:dtaraborelli@wikimedia.org>> wrote:
Hey all,
I'd like to hear from you on a proposal to add some order and
structure to the various bibliographic corpora we currently have
in Wikidata.
As you may know, coverage of creative works in Wikidata has seen
significant growth over the last year. [1][2] Different groups and
projects have started importing source metadata for various reasons:
* to provide sources machine-extracted statements (WikiFactMine
[3], StrepHit [4])
* to represent sources cited in Wikipedia (e.g. DOIs and PMIDs
imported via the mwcite identifier dumps) or other Wikimedia
projects (Wikisource, Wikispecies, Wikinews)
* to create collections of the open access literature citable
and reusable in Wikimedia projects (e.g. open access PMC
review articles)
* to maintain small, curated corpora about specific topics (e.g.
the Zika corpus [5])
While all these efforts have grown organically and with little
coordination, it's hard to keep track of who initiated the, to
clearly communicate their purpose, to understand their completion
criteria and their data quality needs, and last but not least to
offer any contribution opportunities (in terms of code, or manual
labor) to other community members. It's unclear if the future of
these efforts should continue to be within Wikidata, or leverage
the power of federated Wikibase-powered wikis (see our discussion
at the end of the WikiCite session at WikidataCon [6]).
Irrespective of the best long term solution, we need to provide
some better structure to these efforts today if we want to address
the above problems.
I'd like to propose a fairly simple solution and hear your
feedback on whether it makes sense to implement it as is or with
some modifications.
1. create a Wikidata class called "Wikidata item collection" [Q-X]
2. create and document individual collections (e.g. the Wikidata
Zika corpus [Q-Y]) as instances of this class: [Q-Y] --P31-->
[Q-X]
3. add appropriate metadata to describe such collections (its
main topic(s), creators, any external identifiers, if applicable)
4. mark individual bibliographic items as part of [P361] the
corresponding collections
Note that this approach can apply to bibliographic item
collections but also to any other set of items not directly
identifiable via Wikidata properties. Of course, the same items
could obviously be part of multiple collections. Some criteria
would be needed to determine an appropriate threshold for
legitimate collections (we wouldn't want arbitrary collections to
be created for sets of items generated as part of a test import).
Beyond solving the issues listed above, this approach would also
allow us to generate dedicated statistics on the growth or data
quality of each collection via the SPARQL endpoint. It would also
allow us to design constraints for arbitrary item collections,
something that right now is not possible (unless these sets can
already be identified via a query).
If something similar already exists in the context of structured
data donations/imports for GLAM, I'd be most grateful for any
pointers.
Dario
[1]
http://wikicite.org/statistics.html
<http://wikicite.org/statistics.html>
[2]
https://doi.org/10.6084/m9.figshare.5548591.v1
<https://doi.org/10.6084/m9.figshare.5548591.v1>
[3]
https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine
<https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine>
[4]
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va…
<https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Renewal>
[5]
https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus
<https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus>
[6]
https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidataco…
<https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidatacon2017-10009-eng-WikiCite_Wikidata_as_a_structured_repository_of_bibliographic_data_hd.mp4>
--
Meta:
https://meta.wikimedia.org/wiki/WikiCite
<https://meta.wikimedia.org/wiki/WikiCite>
Twitter:
https://twitter.com/wikicite
---
You received this message because you are subscribed to the Google
Groups "wikicite-discuss" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to wikicite-discuss+unsubscribe(a)wikimedia.org
<mailto:wikicite-discuss+unsubscribe@wikimedia.org>.
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata