Daniel, and Gergo,
I've been thinking some more about Daniel's replies in IRC chat last
week, about using qualifiers to handle underlying works that images are
derived from, if the underlying work isn't either (i) another image, or
(ii) something with its own Wikidata item
https://tools.wmflabs.org/meetbot/wikimedia-office/2014/wikimedia-office.20…
I think the issue I'm stuck on is: what property would the qualifier be
attached to ?
For sets of images to have in mind, we might consider
https://commons.wikimedia.org/wiki/Category:Twenty-four_Views_by_Henry_Salt…
https://commons.wikimedia.org/wiki/Category:Pyne%27s_Royal_Residences
The first choice might be attaching the information to a "Creator" property.
But for the underlying works of these engravings, there are typically
*two* creators, both of which are significant -- the artist, and the
engraver.
So instead, we might consider an "Underlying work" property, analogous
to the "Work" class in the Multimedia API development, "a creation to
which copyright, authorship, etc is attached", as per
https://docs.google.com/document/d/1tzwGtXRyK3o2ZEfc85RJ978znRdrf9EkqdJ0zVj…
But can we then capture the whole of the work class in such a property?
There seem to me a couple of issues:
(1) What should be the value of the property? There doesn't seem to be
an obvious choice (eg if one were importing from a repository or
catalogue). What would be the datatype, and what should we store for
this field.
(2) It seems to me that we would need to enable qualifiers on qualifiers
-- for example, if we represented the creator of an underlying engraving
using a qualifier, we would then seem to need another qualifier to
indicate whether the role was as artist, or as engraver.
Similarly, if there is sourcing, there are sources that might apply to
one (1st level) qualifier, but not another. But normally the WD
sourcing model is for a whole statement, not part of it.
What we're would really be doing, if we did this in full, would be in
effect to store the contents of what might otherwise be an entire item
in a property.
That has some attractiveness, if at a future time one wanted to promote
the 'underlying work' to have a Wikidata item in its own right -- the
two structures would then match exactly.
But it would mean CommonsData having a slightly different data structure
to Wikidata.
It's maybe worth thinking also about what happens if information is
sometimes stored on the Commons file item, and sometimes on a Wikidata item.
For example, if we are looking for views similar to the categories
above, we might be searching for
* Best version of each engraving from the book-title, ordered by
sequence number; or
* Best version of each engraving from the book-edition, ordered by page
number; or
* Best version of each engraving from a scan-set, ordered by scan number
Alternatively we might want to sort by artist; or engraver; or date of
first publication (the engravings were often issued first as individual
prints, or partwork sets, sometimes well before publication of a final
volume).
If we're looking to support these searches and orderings, does it matter
that a particular field may sometimes be on the file item, but sometimes
on a Wikidata item ?
(For example, in the Henry Salt set, suppose we were to have the policy
that engravings we only have one copy of only get a file item, but
engravings we have multiple copies of get a Wikidata item to store their
common information.
Would it matter that for one of the engravings we have two copies, so
the information that we would be wanting for search and selection and
ordering would be stored on a Wikidata item; whereas for the rest, with
only a single copy, it would be stored on a Commons item? )
None of these questions are without solutions. But it does, I think,
require a decisive view to be reached, as to what we propose to do.
Thanks for all your work on this,
All best,
James.
On 16/10/2014 18:56, James Heald wrote:
On 13/10/2014 13:03, Daniel Kinzler wrote:
Am 13.10.2014 00:17, schrieb Jane Darnell:
I think the place for all data about an image
should be Wikidata.
Do you really mean *any* image?
E.g., if we have a scan of an old book with 50 engravings, do you want
to make a
wikidata item for each engraving? Or just for the book? Engravings are
often
simple illustrations, not notable of and by themselves, and there is
frequently
very little we can say about them, except for which book they were
published in.
It seems to me that it makes more sense to just model the book on
Wikidata, not
each illustration (or even every page, including the text-only ones,
in case
they are extracted to a png file or something).
Thinking about books of engravings, eg a set like this:
https://commons.wikimedia.org/wiki/Category:Views_of_the_Seats_of_Noblemen_…
There is a fair amount one can say about each of these engravings: what
the subject is; and where that location is; who was the artist, and who
was the engraver; when the engraving was first published (which may or
may not be the same as the date at which it was first collected).
We probably also want to identify the *edition* of the book it was taken
from, and probably also the scan-set -- each with a page number or
sequence number, so the set can be easily retrieved and displayed in the
right order.
In terms of items required, at the moment membership of a scan-set or an
edition of the book might be handled by membership of a category. It's
not clear how it is intended to represent such categories and their
memberships in the new structured approach. Does one associate the
scanset item directly with a category? Or is the scanset item its own
thing, that one maps the category onto? And is the scanset an item on
Wikidata, or an item somewhere else?
A further issue arises when we have more than one copy of the same
engraving.
eg:
https://commons.wikimedia.org/wiki/File:Neale%281818%29_p6.190_-_Fleurs,_Ro…
https://www.wikidata.org/wiki/File:MA%281829%29_p.340_-_Fleurs_-_John_Prest…
At the moment on Commons one can make a gallery of "other versions" on
the filepage, each with a short footer to explain what that version is.
So it probably makes sense to be able to record that we have multiple
representations or versions of the same basic thing, which presumably
means some kind of object to represent that basic thing - here an
engraving.
Turning to Gergo's model of "squashing" all of the information onto a
limited number of nodes (ie an item per file, plus some floating items
on Wikidata), and just making information into properties of those
items, I think there is a problem.
The specific thing is that we want to associate various properties
together, as all being tied to a particular stage of development of the
work -- ie a distinguishable "work" entity, in the language of the draft
"Multimedia data model" API at
https://docs.google.com/document/d/1tzwGtXRyK3o2ZEfc85RJ978znRdrf9EkqdJ0zVj…
In particular, in the case of rights information, we need to carefully
associate the rights information with the other fields it relates to:
the author, the date, the nature of the contribution, the act of
licensing or release or assessment.
This is tricky because there may be multiple "stages of development"
associated with a single file, each with its own
author/date/contribution/license information. Yet there may
nevertheless only be the one file on Commons.
Even if the image has been 'restored' by a Commons user, this will not
necessarily generate a separate file -- standard practice for many
restorers (myself included) is to upload the restored version over the
previous version, so the reader can easily compare the two by looking at
the file history (and access an earlier version to download, if they so
wish).
(Another example could be where we may want to associate a particular
music file of a piece of classical music with a particular modern
edition of the score, even if the piece was originally from the 18th
century. Even if the only file we have is the recording, we still need
to be able to reflect the rights in the score.)
Another important class of data is date information. There may be
multiple dates associated with an image -- and we may want to sort, or
filter, or order by any of them. But really, to be meaningful, we don't
really want to associate the dates with the image, but rather with a
stage of development in the derivative chain that has led to the image.
So again, the idea of what the API the "work" comes forward, but again
there cannot be presumed to be a bi-directionally unique 1 <-> 1
identity between a "work" in this sense and any image on Commons, nor
(unless decreed otherwise) an item on Wikidata.
I don't know the right way to go forward, which is why I started this
thread.
On the one hand, I'd like to avoid if possible a vast multiplication of
items on Wikidata, for all the reasons I brought up a couple of months
ago, when I wondered whether there should be an item created on Wikidata
for every present Commons Category -- something which made me uneasy.
But on the other hand, there is a huge virtue in consistency -- on there
being a particular place where you know a particular piece of
information will be (if it exists); rather than there being a complexity
of multiple places it could be, depending on whether this has an item or
not, or that has an item or not, or the other.
I think something we definitely do need is worked-through examples of
how data might be stored for some quite complicated cases, for people to
be able to discuss and critique, rather than only the most simple type
of cases discussed so far.
So, for example, suppose we had as a particular test-case the following:
An image that has been enhanced & overwritten by a User in 2014 -- based
on a scan from a set made and released by an Institution in 2012 -- of
an engraving published in an 1850s book -- but created and first
published in the 1830s, by an engraver after a sketching artist -- after
an oil painting (since destroyed) painted by an important painter in the
1540s.
How in detail do we think that might be stored, identifying the
different contributors and dates and contributions, so one could sort by
* (a) contributor and the nature of their contribution
-- eg best surviving representation of every known painting associated
with Holbein.
* (b) date and the nature of the contribution
-- eg best surviving representation of every known painting made in the
1540s
-- eg engravings first published in the 1830s
I don't know what the way forward is, but I think this is the kind of
information we ought to be able to represent; and of sort we ought to be
able to do.
-- James.