On 4 sep. 2013, at 18:59, Brian Wolff <bawolff(a)gmail.com> wrote:
On 9/1/13, Jean-Frédéric
<jeanfrederic.wiki(a)gmail.com> wrote:
[..]
The downside to this is in order to effectively
get metadata out of
commons given the current practises, one essentially has to screen
scrape and do slightly ugly things
This [1] looks quite acrobatic indeed. Can’t we make better use of the
machine-readable markings provided by templates?
<https://commons.wikimedia.org/wiki/Commons:Machine-readable_data>
[1]
https://gerrit.wikimedia.org/r/#/c/80403/4/CommonsMetadata_body.php
It is using the machine readable data from that page. (Although its
debatable how much "Look for a <td> with this id, and then look at the
contents of the next sibling <td> you encounter is").
Almost all of that is templated, so of course we can choose to actually fix some of those
templates if we really wanted to. Especially for the licenses, my intent was EXACTLY to
feed a system like you are building right now, while at the same time making Magnus'
StockPhoto gadget possible for the immediate future, so I love what you are doing here.
I have not had time to read your patches unfortunately, but can I suggest creating a
separate table of licenses ? The licenses are very well suited as 'managed' data
units I think and would give you a lot of flexibility. You could have like:
id, abbreviation, short name, long name, license version, long description page, default
template, scrapeid, canonical license URL, canonical RFDa, PD/CC, BY, NC, SA, other
properties of the license requirements
Then use the 'scrapeid' to link the licenses to the file metadata. The licenses
are very well suited for this I think and it will make it a lot easier to search trough
the database and to dynamically give suitable representations of the license in different
types (very short linked, long linked, full text, full linked) in different languages.
For the other metadata it would also be very nice to take a much more structured and even
WikiData approach, but I think a licenses table is much simpler that most other metadata,
would give us a lot of flexibility and advantadges and would be easy to import into
WikiData once we think we are up to that. Something to consider.
DJ