Tool to list images by license - Wikitech-l

1 Sep 2004

Hello,
the image copyright problem is still largely unresolved. On the English
wikipedia there are about 90.000 images, and a smaller but growing number
of images in the other wikipedias.

I saw on [[:en:Wikipedia:Image_copyright_tags]] that some users have
started categorizing images using templates like {{PD}}, {{GFDL}} and so
on placed in the image description pages. I think this is very useful for
semi-automatic sorting of the various images for different purposes (and
outright deletion of the illegal ones).

Since we are about to do the same on the Italian wiki, I wrote a small
Perl script to read a database dump and write out several image lists,
one for each template, listing which images contain that particular
template, and an "unassigned" list for image without any template. Each
list is in wiki format, ready to be copied-and-pasted into a wikipedia
page if needed. We plan to use the tool on it: to generate lists of images
still to be categorized.

Of course all this can be done with a few clever SQL queries, but not all
of us have access to the DB or mysql installed.

In case anyone wants to use it the URL is
http://www.tommasoconforti.com/wiki/tools/images.pl.gz

The number of lines in each generated file is the number of images with
that particular template. For example this is the situation for the Aug 28
english dump, showing that a bit less of 25% of the images have a
proper template:

$ wc -l *

     73 CopyrightedFreeUse
    229 CopyrightedFreeUseProvided
     13 CrownCopyright
  10373 GFDL
     56 GPL
      2 LGPL
   5864 PD
      1 PD_USGov
    104 PermissionAndFairUse
     33 Sovietpd
     66 copyrighted
   3503 fairuse
      1 freefairusein
    237 images.pl
    253 noncommercial
    137 noncommercialProvided
  64568 unassigned
    131 unknown
   1984 unverified
     18 verifieduse
  87646 total

Processing the dump takes a while, especially if it must be decompressed
on the way.

Alfio