Replies in-line.
On 24 May 2016 at 06:57, Dr. Trigon <dr.trigon(a)surfeu.ch> wrote:
>> * incomplete uploads resulting from
>> server failures. Checksum
>> comparisons would mean re-
>> downloading files, which would be
>> unnecessarily bandwidth expensive, but
>> local image analysis would
>> highlight these.
>
> What about local checksum comparison?
Yes, we have SHA1 values for the Commons hosted images, however a
local checksum is not normally available from the source (e.g. NYPL)
which means re-downloading the original to do the comparison. As some
of my uploads are over 100mb for one page, it's an expensive solution.
>> * uploads that are mostly blank pages
>> in old scanned books. I have a
>> simple detection process, but it would
>> be neat to have a more common
>> standard way of doing this.
>
> Depends on the format. For PDF you can try to use Poppler/poppler-utils or
> MuPDF. For images it will be bit more involved ... but intressting.
Formats are normally jpeg or TIFF. My blank detection uses analysis of
pixel colour deviations over parts of the image to deduce if it looks
blank. This uses the basic Python Image Library rather than any
sophisticated math. This can happen pre-upload by testing a
client-side image. See
<https://commons.wikimedia.org/wiki/User:Fae/Project_list/Internet_Archive#B…>
...
>> Hi Fæ,
>>
>> Thanks a lot for the ideas !
>> The ideas you mentioned are awesome, and something I'll definitely look
>> into !
>>
>> The second and third ideas mentioned are, I believe, do-able within the
>> scope of my GSoC. For the first idea to be implemented, as you mentioned
>> local image analysis would be needed, which we've not planned (But i'll add
>> it to the "to plan" list :) ). Currently we're planning on downloading the
>> image and performing the analysis on ToolsLab or a personal computer.
>>
>> Thank you for the project list ! I was looking for a good dataset to test
>> things out on and this will be immensely helpful.
>>
>> Regards
>> Abdeali JK
>>
>> On Wed, May 18, 2016 at 5:25 PM, Fæ <faewik(a)gmail.com> wrote:
>>>
>>> (Just replying on Commons-l with a non-tech observation. If more tech
>>> stuff arises I'll add it to Phabricator instead)
>>>
>>> This looks like a useful contained project, though a lot to be done in
>>> 12 weeks. :-)
>>>
>>> I was not familiar with catimages.py. It would be great if using the
>>> module for the preparation or housekeeping of large batch uploads were
>>> easy and not time consuming to try. As Commons grows we are seeing
>>> more donations over 10,000 images and have had a few with over 1m.
>>> Uploads of this size make manual categorization a huge hurdle, so
>>> automatic 'tagging' of image characteristics would be a useful way of
>>> breaking down such a large batch to highlight the more interesting
>>> outliers or mistakes, which can then be prioritized on a backlog for
>>> human review.
>>>
>>> For example, in my upload projects I have problems detecting:
>>> * incomplete uploads resulting from server failures. Checksum
>>> comparisons would mean re-downloading files, which would be
>>> unnecessarily bandwidth expensive, but local image analysis would
>>> highlight these.
>>> * uploads that are mostly blank pages in old scanned books. I have a
>>> simple detection process, but it would be neat to have a more common
>>> standard way of doing this.
>>> * distinguishing between scans with diagrams and line
>>> drawings/cartoons, printed old photographs, newsprint and text pages.
>>>
>>> It would be great if the testing routines you use during the project
>>> could tackle any of these and be written up as practical case studies.
>>>
>>> As well as the Phabricator write-up/tracking of the project, it would
>>> be useful to have an on-wiki Commons or Mediawiki user guide. Perhaps
>>> this can be sketched out as you go along during the project, giving an
>>> insight into what other users or amateur Python programmers might do
>>> to customize or make better use of the module? Having an more easy to
>>> find manual, might avoid others going off on their own tangents using
>>> various off the shelf image modules, when they could just plug in
>>> catimages with a smallish amount of configuration.
>>>
>>> P.S. If you would like to test the tool on some large collections with
>>> predictable formats, try looking through <
>>> https://commons.wikimedia.org/wiki/User:Fae/Project list >. The 1/2
>>> million images in the book plates project would be an interesting
>>> sample set.
>>>
>>> Thanks,
>>> Fae
>>>
>>> On 18 May 2016 at 02:53, Abdeali Kothari <abdealikothari(a)gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > I'm a student from Chennai, India and my project is going to be related
>>> > to
>>> > performing image processing on the images on commons.wikimedia to
>>> > automate
>>> > categorization. DrTrigon had made the script catimages.py a few years
>>> > ago
>>> > which was made in the old pywikipedia-bot framework. I'll be working
>>> > towards
>>> > updating the script to the pywikibot-core framework, updating it's
>>> > dependencies, and using newer techniques when possible.
>>> >
>>> > catimages.py is a script that analyzes an image using various computer
>>> > vision algorithms and allots categories to the image on commons. For
>>> > example, consider algorithms that detect faces, barcodes, etc. The
>>> > script
>>> > uses these to categorize images to Category:Unidentified People,
>>> > Category:Barcode, and so on.
>>> >
>>> > If you have any suggestions and categorizations you think might be
>>> > useful to
>>> > you, drop in at #gsoc-catimages on freenode or my talk page[0]. You can
>>> > find
>>> > out more about me on User:AbdealiJK[1] and about the project at
>>> > T129611[2].
>>> >
>>> > Regards
>>> >
>>> > [0] - https://commons.wikimedia.org/wiki/User_talk:AbdealiJK
>>> > [1] - https://meta.wikimedia.org/wiki/User:AbdealiJK
>>> > [2] - https://phabricator.wikimedia.org/T129611
>>> >
>>> >
>>> > _______________________________________________
>>> > Commons-l mailing list
>>> > Commons-l(a)lists.wikimedia.org
>>> > https://lists.wikimedia.org/mailman/listinfo/commons-l
>>> >
>>>
>>>
>>>
>>> --
>>> faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae
>>> Personal and confidential, please do not circulate or re-quote.
>>
>>
>
> Dr. Trigon
--
faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae
Personal and confidential, please do not circulate or re-quote.
---------- Forwarded message ----------
From: Robin Owain <info(a)cymruwales.com>
Date: 19 May 2016 at 15:11
Subject: [Wikimediauk-l] 3rd party Dictionary of Species taking a feed
from Commons via Wikidata
To: wikimediauk-l(a)lists.wikimedia.org
Hi all
The Dictionary of Welsh species has now become an Illustrated Dictionary.
Take a look:
http://www.llennatur.com/Drupal7/llennatur/?q=node/6#Gl%C3%B6yn
It also includes clickbacks to corresponding articles on Wikipedia.
All thanks to Wikimedia UK and the University of Bangor.
There a couple of thousand images missing on Commons, and if anyone
can help, there's a Wikiproject here to fill the gaps.
Best regards
Robin
_______________________________________________
Wikimedia UK mailing list
wikimediauk-l(a)wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediauk-l
WMUK: https://wikimedia.org.uk
Hi,
I'm a student from Chennai, India and my project is going to be related to
performing image processing on the images on commons.wikimedia to automate
categorization. DrTrigon had made the script catimages.py a few years ago
which was made in the old pywikipedia-bot framework. I'll be working
towards updating the script to the pywikibot-core framework, updating it's
dependencies, and using newer techniques when possible.
catimages.py is a script that analyzes an image using various computer
vision algorithms and allots categories to the image on commons. For
example, consider algorithms that detect faces, barcodes, etc. The script
uses these to categorize images to Category:Unidentified People,
Category:Barcode, and so on.
If you have any suggestions and categorizations you think might be useful
to you, drop in at #gsoc-catimages on freenode or my talk page[0]. You can
find out more about me on User:AbdealiJK[1] and about the project
at T129611[2].
Regards
[0] - https://commons.wikimedia.org/wiki/User_talk:AbdealiJK
[1] - https://meta.wikimedia.org/wiki/User:AbdealiJK
[2] - https://phabricator.wikimedia.org/T129611
All,
I have some videos of the seabed of the Dogger Bank, which includes some
footage of wrecks on the bed, marine life, and parts of prehistoric
settlements.
I have the exact co-ordinates of the videos - however, because it's a
video, the co-ordinates change over time, and the *moving *co-ordinates of
the file can't really be entered into Commons - or can they?
Can anyone help with this? *
*What's the best way to record the co-ordinates if they move over the
duration of the video?**
Richard Symonds
Wikimedia UK
0207 065 0992
Wikimedia UK is a Company Limited by Guarantee registered in England and
Wales, Registered No. 6741827. Registered Charity No.1144513. Registered
Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT.
United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia
movement. The Wikimedia projects are run by the Wikimedia Foundation (who
operate Wikipedia, amongst other projects).
*Wikimedia UK is an independent non-profit charity with no legal control
over Wikipedia nor responsibility for its contents.*
<!--.hmmessage P{margin:0px;padding:0px}body.hmmessage{font-size: 12pt;font-family:Calibri}-->
Because it doesnt work. Probably because my account is globally blocked to prevent me from improving the projects and to enforce my bullshit abusive ban on enwp..
Sent from my T-Mobile 4G LTE device
------ Original message------From: Nahid SultanDate: Tue, May 10, 2016 6:28 AMTo: Wikimedia Commons Discussion List;Subject:Re: [Commons-l] Co-ordinates for a path
There is a 'Unsubscribe' button at the bottom of every mail. Why don't you use that?
---Nahid SultanUser:NahidSultan on all Wikimedia Foundation's public wikisMember of Wikimedia ombudsman commissionSecretary, Wikimedia Bangladeshhttp://wikimedia.org.bd
Facebook | Nahid SultanTwitter | @nahidunlimited
Date: Tue, 10 May 2016 03:22:22 -0700
From: reguyla(a)gmail.com
To: richard.symonds(a)wikimedia.org.uk; commons-l(a)lists.wikimedia.org
Subject: Re: [Commons-l] Co-ordinates for a path
Take me off these spam lists. Since editors arent wanted on the wmf projects and the wmf wants to enable bully behavior by admins I dont want to be spammed with this crap anymore.
Sent from my T-Mobile 4G LTE device
------ Original message------From: Richard SymondsDate: Tue, May 10, 2016 5:18 AMTo: Wikimedia Commons Discussion List;Subject:[Commons-l] Co-ordinates for a path
All,
I have some videos of the seabed of the Dogger Bank, which includes some footage of wrecks on the bed, marine life, and parts of prehistoric settlements.
I have the exact co-ordinates of the videos - however, because it's a video, the co-ordinates change over time, and the moving co-ordinates of the file can't really be entered into Commons - or can they?
Can anyone help with this? *
What's the best way to record the co-ordinates if they move over the duration of the video?*
Richard SymondsWikimedia UK0207 065 0992Wikimedia UK is a Company Limited by Guarantee registered in England and Wales, Registered No. 6741827. Registered Charity No.1144513. R egistered Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT. United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia movement. The Wikimedia projects are run by the Wikimedia Foundation (who operate Wikipedia, amongst other projects).Wikimedia UK is an independent non-profit charity with no legal control over Wikipedia nor responsibility for its contents.
_______________________________________________Commons-l mailing listCommons-l@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/commons-l
Take me off these spam lists. Since editors arent wanted on the wmf projects and the wmf wants to enable bully behavior by admins I dont want to be spammed with this crap anymore.
Sent from my T-Mobile 4G LTE device
------ Original message------From: Richard SymondsDate: Tue, May 10, 2016 5:18 AMTo: Wikimedia Commons Discussion List;Subject:[Commons-l] Co-ordinates for a path
All,
I have some videos of the seabed of the Dogger Bank, which includes some footage of wrecks on the bed, marine life, and parts of prehistoric settlements.
I have the exact co-ordinates of the videos - however, because it's a video, the co-ordinates change over time, and the moving co-ordinates of the file can't really be entered into Commons - or can they?
Can anyone help with this? *
What's the best way to record the co-ordinates if they move over the duration of the video?*
Richard SymondsWikimedia UK0207 065 0992Wikimedia UK is a Company Limited by Guarantee registered in England and Wales, Registered No. 6741827. Registered Charity No.1144513. Registered Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT. United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia movement. The Wikimedia projects are run by the Wikimedia Foundation (who operate Wikipedia, amongst other projects).Wikimedia UK is an independent non-profit charity with no legal control over Wikipedia nor responsibility for its contents.
Hi all,
A professional photo agency offers us (Wikimedia Belgium) a donation of
images of art works. They now offer as a start these images with 595 x 842
pixels at 72 dpi. This size is almost double of that from a thumbnail size
on Wikipedia. My own (not the most modern) smartphone makes images at 5.312
× 2.988 pixels at 72 dpi. Seeing the size of these images I think they are
to low.
My question is: what is the minimum of quality we should ask?
Thanks!
Romaine
Please disenroll me from this list. If the wmf nor its communities want editors and want to support bullies because they are admins I dint want the WMFs spam.
Sent from my T-Mobile 4G LTE device
------ Original message------From: Romaine WikiDate: Thu, May 5, 2016 7:51 AMTo: Wikimedia Commons Discussion List;Affiliates discussion list;Wikimedia Chapters cultural partners coordination;Subject:[Commons-l] Size of images donated
Hi all,
A professional photo agency offers us (Wikimedia Belgium) a donation of images of art works. They now offer as a start these images with 595 x 842 pixels at 72 dpi. This size is almost double of that from a thumbnail size on Wikipedia. My own (not the most modern) smartphone makes images at 5.312 × 2.988 pixels at 72 dpi. Seeing the size of these images I think they are to low.
My question is: what is the minimum of quality we should ask?
Thanks!
Romaine