If anyone has a moment today and would like to comment, I would
appreciate extra eyes on the test upload of images on beta at:
http://commons.wikimedia.beta.wmflabs.org/w/index.php?title=Category:Images…
The way GWT handles the Artwork template is not ideal, and things like
the use of a hyphen in the filename, or the creator template for
non-existent authors are not user customizable. If I proceed with this
run as is, then I will also need to run a little post-upload
house-keeping on the way apostrophes have been converted to html
codes.
This run should be around 2,000 images and is limited to pre-20th
century artworks from Japan. If this works well, then I'll look at
other collections themes. I may need to tweak some category mapping in
the background but the source of metadata as well as the image is now
direct from the Rijksmuseum. Unfortunately there was limited reliable
metadata in English, so I have stuck to one language.
I suggest comments are raised on the project page at
<https://commons.wikimedia.org/wiki/Commons:Batch_uploading/Art_of_Japan_in_…>
(opinions section) unless they are of more general interest for all
GWT users.
Fae
--
faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae
I am at the final test run stage of a Rijksmuseum upload and getting
annoying results when using the Artwork template in GWT, compared to
my experience when using the Information template.
FILENAME
The filename used appears based on <title> + <title identifier> +
{mime-type}. This is incredibly annoying as I would like to be able to
specify the filename for myself as this formula means that I cannot
identify the language on the Artwork title parameter (such as using
{{nl | 1=<title>}} as this puts leading junk in the filename) and
gives the uploader no control over specifying the extention (i.e.
forcing "jpeg" instead of using the community norm of "jpg").
Are there reasons for this approach, or is this suitable to be improved?
HTML TRANSLATION
A second trap is that the wikicode use of " ' " is rendered impossible
as it is translated as "'". Similarly "<br>" vanishes. This makes
formatting of text inside the Artwork template extremely limited and
is not the approach used for the Information template. There could be
some underpinning rationale due to the differences in these templates,
but the behaviour appears quite inconsistent to me.
Fae
--
faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae
Hi,
I'm about to upload a few hundred images that have been released by the
British Library.
I am all set to go, with carefully designed Commons filenames; but the
GWtoolset uploader is wrecking all the commas and brackets in my filenames.
What I want is:
File:Large flowering sensitive plant (Mimosa grandiflora) - New
illustration of the Sexual System of Carolus von Linnaeus (1807) - BL.jpg
File:Cherries - Pomona Britannica (1812), pl.10 - BL.jpg
File:Rape threshing - The costume of Yorkshire (1814), plate XV - BL.jpg
What it's giving me is:
File:Large flowering sensitive plant -Mimosa grandiflora- - New
illustration of the Sexual System of Carolus von Linnaeus -1807- - BL.jpg
File:Cherries - Pomona Britannica -1812-- pl.10 - BL.jpg
File:Rape threshing - The costume of Yorkshire -1814-- plate XV - BL.jpg
How do I turn this behaviour off, please, or how do I work around it, to
get the more easily human-readable names that I want?
Thanks,
James Heald.
Hi,
I had an odd problem with files not being created, which I think I can
put down to how long filenames are handled by GWT.
As an example, my xml specified (A) but GWT created (B):
A. File:Index Map No.2 of a part of Suffolk County. South Side - Ocean
Shore, Long Island. Part of Islip and Part of Brookhaven. Published by
E. Belcher Hyde. 97 Liberty Street, Brooklyn. 5 Beekman Street,
NYPL1633883.tiff (209 chars) (see link)
B. File:Index Map No. 2 of a part of Suffolk County. South Side -
Ocean Shore, Long Island. Easthampton. Published by E. Belcher Hyde.
97 Liberty Street, Brooklyn. 5 Beekman Street, Manhattan. 1916. Volume
NYPL1633.tiff (206 chars)
This seems an easy thing to warn the user about when reading the xml.
In terms of behaviour I would expect the tool to reject the xml as
malformed and warn about maximum allowed filename length, rather than
truncate the name, in this case truncation meant corrupting the unique
NYPL identifier.
It would be better if GWT allowed the maximum title length that
Commons allows (240 bytes, the number of visible characters varying by
charset).
I vaguely recall the Steering Committee discussing this last year, so
I'm unsure if this is worth raising in bugzilla. Suggestions?
Links
1. https://commons.wikimedia.org/wiki/File:Index_Map_No.2_of_a_part_of_Suffolk…
2. https://bugzilla.wikimedia.org/show_bug.cgi?id=30202
3. https://commons.wikimedia.org/wiki/Commons:Filenames
Fae
--
faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae
Does anyone have a working definition of characters allowed in the
filenames that I could apply in my pre-processing of the xml files?
See [1] and [2] for the technical standards that apply by default.
In my NYPL uploads I have found that characters in the chosen filename like:
* ö (o umlaut)
* Æ (upper case ash / ae ligature)
caused GWT to halt the upload at that point (no warning back to me).
These characters should be acceptable to the MediaWiki software.
These characters seem to be okay in the image page body, just not the
filename. Other characters like é (e acute) appear to process fine.
For the 18th century and earlier maps from the NYPL, this is a major
time-sink. :-(
Links
1. https://commons.wikimedia.org/wiki/MediaWiki:Filename-prefix-blacklist
2. https://commons.wikimedia.org/wiki/MediaWiki:Titleblacklist
Fae
--
faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae
Hi,
On behalf of the Amsterdam Museum I'm prepping a batch upload of about 300 images of their collection of paintings. They've made the selection.
I note that a number of their paintings have already been uploaded by individual Commonists to here https://commons.wikimedia.org/wiki/Category:Paintings_in_the_Amsterdam_Muse…
Question: Should I upload all images in "my" batch anyway even though this risks duplicating images? Is there a best practise for cases like this?
Cheers,
David Haskiya
David Haskiya
Product Development Manager
T: +31 (0)70 314 0696
M: +31 (0)64 217 2542
E: david.haskiya(a)europeana.eu
Skype: davidhaskiya
Europeana<http://www.europeana.eu/> makes Europe's culture available for all, across borders and generations and for creative re-use - follow how at #AllezCulture<http://bit.ly/17mnbL7>
Disclaimer: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. If you are not the named addressee you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system.
Hi Charles,
I am arriving at 10.25 in neuchatel. Stucked with thw Kindergarden. Will be at UNINE asap.
Best,
Rromir
-------- Original message --------
From: charles andrès <charles.andres(a)wikimedia.ch>
Date:03/05/2014 13:32 (GMT+01:00)
To: Conversations revolving around the development of GLAM Digital Tools <glamtools(a)lists.wikimedia.org>
Subject: Re: [Glamtools] Advice on uploading a batch from a GLAM when individuals have already uploaded some of that GLAMs images?
>
> It probably should still output a warning and list all identical files, so they can be tackled manually after the upload.
> Giving preference to the media file from the GLAM probably makes sense, but you still want to substitute any other identical files, right?
>
Am I understand correctly, you suggest to remove volunteer uploaded files by GLAM uploaded files?
charles
_______________________________________________
Glamtools mailing list
Glamtools(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/glamtools