Re: [Wikidata-l] [Multimedia] Inclusion criteria for Wikidata items for paintings, engravings, illustrations, manuscript folios, photographs, old postcards, etc ?

24 Oct 2014

Daniel, and Gergo,

I've been thinking some more about Daniel's replies in IRC chat last 
week, about using qualifiers to handle underlying works that images are 
derived from, if the underlying work isn't either (i) another image, or 
(ii) something with its own Wikidata item

https://tools.wmflabs.org/meetbot/wikimedia-office/2014/wikimedia-office.20…

I think the issue I'm stuck on is: what property would the qualifier be 
attached to ?

For sets of images to have in mind, we might consider

https://commons.wikimedia.org/wiki/Category:Twenty-four_Views_by_Henry_Salt…

https://commons.wikimedia.org/wiki/Category:Pyne%27s_Royal_Residences

The first choice might be attaching the information to a "Creator" property.

But for the underlying works of these engravings, there are typically 
*two* creators, both of which are significant -- the artist, and the 
engraver.

So instead, we might consider an "Underlying work" property, analogous 
to the "Work" class in the Multimedia API development, "a creation to 
which copyright, authorship, etc is attached", as per

https://docs.google.com/document/d/1tzwGtXRyK3o2ZEfc85RJ978znRdrf9EkqdJ0zVj…

But can we then capture the whole of the work class in such a property?

There seem to me a couple of issues:
(1) What should be the value of the property?  There doesn't seem to be 
an obvious choice (eg if one were importing from a repository or 
catalogue).  What would be the datatype, and what should we store for 
this field.

(2) It seems to me that we would need to enable qualifiers on qualifiers 
-- for example, if we represented the creator of an underlying engraving 
using a qualifier, we would then seem to need another qualifier to 
indicate whether the role was as artist, or as engraver.

Similarly, if there is sourcing, there are sources that might apply to 
one (1st level) qualifier, but not another.  But normally the WD 
sourcing model is for a whole statement, not part of it.

What we're would really be doing, if we did this in full, would be in 
effect to store the contents of what might otherwise be an entire item 
in a property.

That has some attractiveness, if at a future time one wanted to promote 
the 'underlying work' to have a Wikidata item in its own right -- the 
two structures would then match exactly.

But it would mean CommonsData having a slightly different data structure 
to Wikidata.

It's maybe worth thinking also about what happens if information is 
sometimes stored on the Commons file item, and sometimes on a Wikidata item.

For example, if we are looking for views similar to the categories 
above, we might be searching for

* Best version of each engraving from the book-title, ordered by 
sequence number; or
* Best version of each engraving from the book-edition, ordered by page 
number;  or
* Best version of each engraving from a scan-set, ordered by scan number

Alternatively we might want to sort by artist; or engraver; or date of 
first publication (the engravings were often issued first as individual 
prints, or partwork sets, sometimes well before publication of a final 
volume).

If we're looking to support these searches and orderings, does it matter 
that a particular field may sometimes be on the file item, but sometimes 
on a Wikidata item ?

(For example, in the Henry Salt set, suppose we were to have the policy 
that engravings we only have one copy of only get a file item, but 
engravings we have multiple copies of get a Wikidata item to store their 
common information.

Would it matter that for one of the engravings we have two copies, so 
the information that we would be wanting for search and selection and 
ordering would be stored on a Wikidata item; whereas for the rest, with 
only a single copy, it would be stored on a Commons item? )

None of these questions are without solutions.  But it does, I think, 
require a decisive view to be reached, as to what we propose to do.

Thanks for all your work on this,

All best,

     James.

On 16/10/2014 18:56, James Heald wrote:
...
  On 13/10/2014 13:03, Daniel Kinzler wrote:
  Am 13.10.2014 00:17, schrieb Jane Darnell:
  I think the place for all data about an image
should be Wikidata. 
 Do you really mean *any* image?

 E.g., if we have a scan of an old book with 50 engravings, do you want
 to make a
 wikidata item for each engraving? Or just for the book? Engravings are
 often
 simple illustrations, not notable of and by themselves, and there is
 frequently
 very little we can say about them, except for which book they were
 published in.

 It seems to me that it makes more sense to just model the book on
 Wikidata, not
 each illustration (or even every page, including the text-only ones,
 in case
 they are extracted to a png file or something). 
 Thinking about books of engravings, eg a set like this:

https://commons.wikimedia.org/wiki/Category:Views_of_the_Seats_of_Noblemen_…

 There is a fair amount one can say about each of these engravings: what
 the subject is; and where that location is; who was the artist, and who
 was the engraver; when the engraving was first published (which may or
 may not be the same as the date at which it was first collected).

 We probably also want to identify the *edition* of the book it was taken
 from, and probably also the scan-set -- each with a page number or
 sequence number, so the set can be easily retrieved and displayed in the
 right order.

 In terms of items required, at the moment membership of a scan-set or an
 edition of the book might be handled by membership of a category.  It's
 not clear how it is intended to represent such categories and their
 memberships in the new structured approach.  Does one associate the
 scanset item directly with a category?  Or is the scanset item its own
 thing, that one maps the category onto?  And is the scanset an item on
 Wikidata, or an item somewhere else?

 A further issue arises when we have more than one copy of the same
 engraving.

 eg:

https://commons.wikimedia.org/wiki/File:Neale%281818%29_p6.190_-_Fleurs,_Ro…

 https://www.wikidata.org/wiki/File:MA%281829%29_p.340_-_Fleurs_-_John_Prest…

 At the moment on Commons one can make a gallery of "other versions" on
 the filepage, each with a short footer to explain what that version is.

 So it probably makes sense to be able to record that we have multiple
 representations or versions of the same basic thing, which presumably
 means some kind of object to represent that basic thing - here an
 engraving.

 Turning to Gergo's model of "squashing" all of the information onto a
 limited number of nodes (ie an item per file, plus some floating items
 on Wikidata), and just making information into properties of those
 items, I think there is a problem.

 The specific thing is that we want to associate various properties
 together, as all being tied to a particular stage of development of the
 work -- ie a distinguishable "work" entity, in the language of the draft
 "Multimedia data model" API at
 https://docs.google.com/document/d/1tzwGtXRyK3o2ZEfc85RJ978znRdrf9EkqdJ0zVj…

 In particular, in the case of rights information, we need to carefully
 associate the rights information with the other fields it relates to:
 the author, the date, the nature of the contribution, the act of
 licensing or release or assessment.

 This is tricky because there may be multiple "stages of development"
 associated with a single file, each with its own
 author/date/contribution/license information.  Yet there may
 nevertheless only be the one file on Commons.

 Even if the image has been 'restored' by a Commons user, this will not
 necessarily generate a separate file -- standard practice for many
 restorers (myself included) is to upload the restored version over the
 previous version, so the reader can easily compare the two by looking at
 the file history (and access an earlier version to download, if they so
 wish).

 (Another example could be where we may want to associate a particular
 music file of a piece of classical music with a particular modern
 edition of the score, even if the piece was originally from the 18th
 century.  Even if the only file we have is the recording, we still need
 to be able to reflect the rights in the score.)

 Another important class of data is date information.  There may be
 multiple dates associated with an image -- and we may want to sort, or
 filter, or order by any of them.  But really, to be meaningful, we don't
 really want to associate the dates with the image, but rather with a
 stage of development in the derivative chain that has led to the image.
   So again, the idea of what the API the "work" comes forward, but again
 there cannot be presumed to be a bi-directionally unique 1 <-> 1
 identity between a "work" in this sense and any image on Commons, nor
 (unless decreed otherwise) an item on Wikidata.

 I don't know the right way to go forward, which is why I started this
 thread.

 On the one hand, I'd like to avoid if possible a vast multiplication of
 items on Wikidata, for all the reasons I brought up a couple of months
 ago, when I wondered whether there should be an item created on Wikidata
 for every present Commons Category -- something which made me uneasy.

 But on the other hand, there is a huge virtue in consistency -- on there
 being a particular place where you know a particular piece of
 information will be (if it exists); rather than there being a complexity
 of multiple places it could be, depending on whether this has an item or
 not, or that has an item or not, or the other.

 I think something we definitely do need is worked-through examples of
 how data might be stored for some quite complicated cases, for people to
 be able to discuss and critique, rather than only the most simple type
 of cases discussed so far.

 So, for example, suppose we had as a particular test-case the following:

 An image that has been enhanced & overwritten by a User in 2014 -- based
 on a scan from a set made and released by an Institution in 2012 -- of
 an engraving published in an 1850s book -- but created and first
 published in the 1830s, by an engraver after a sketching artist -- after
 an oil painting (since destroyed) painted by an important painter in the
 1540s.

 How in detail do we think that might be stored, identifying the
 different contributors and dates and contributions, so one could sort by

 * (a) contributor and the nature of their contribution
 -- eg best surviving representation of every known painting associated
 with Holbein.
 * (b) date and the nature of the contribution
 -- eg best surviving representation of every known painting made in the
 1540s
 -- eg engravings first published in the 1830s

 I don't know what the way forward is, but I think this is the kind of
 information we ought to be able to represent; and of sort we ought to be
 able to do.

    -- James.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata-l] [Multimedia] Inclusion criteria for Wikidata items for paintings, engravings, illustrations, manuscript folios, photographs, old postcards, etc ?