[Foundation-l] Re: Hosting scans of the 1911 Britannica onWikimedia

Wed Nov 9 17:41:36 UTC 2005

Anthony DiPierro wrote:

>On 11/9/05, Robert Scott Horning <robert_horning at netzero.net> wrote:
>  
>
>>What??? Wikimedia Commons is the best place for images, and indeed
>>there have already been several scans of this encyclopedia that have
>>been put into Wikimedia projects. We don't need to use bit torrents
>>unless this is a move to do bit torrents for all Wikimedia projects
>>(perhaps a good idea but a seperate discussion). There is also a
>>license tag that has been specifically established on commons just for
>>content from the 1911 Encyclopaedia Brittanica because of the large
>>number of potential images that can come from this source. Look them up
>>right now with the associated categories at
>>http://commons.wikimedia.org/wiki/Template%3APD-Britannica
>>    
>>
>
> Wikimedia Commons is the best place for images of text? If that's what
>you're saying, I disagree. I think maybe we were talking about two different
>things, though.
>
No, this is still the same issue.  I'm not exactly sure where the best 
place for scanned pages of historical text ought to go in this case. 
 The images themselves should be in commons, and perhaps as a temporary 
"Wikiproject" within commons to extract those images might be useful to 
have the full scanned pages available.  Wikisource also has an image 
repository independent of commons, so that may be more appropriate, but 
that is something that ought to be decided within the Wikisource 
community itself.  Figures and engravings do need to go to Commons.

>
>One thing I see missing from this discussion is working in cooperation
>  
>
>>with Distributed Proofreaders, who is not only transcribing the contents
>>of this encyclopedia into plain ASCII text (and XML markup as well), but
>>is also providing scans of the figures and images from within the
>>volumes and making them available with a public domain license. What
>>more do we want here? The slow going on that project with Distributed
>>Proofreaders is something that goes to show how large of a project it is.
>>    
>>
>
> AFAIK Distributed Proofreaders hasn't released the raw images out to the
>public. If that's still the case, I'd say *that* is the reason for the slow
>going. The wiki process would be much more efficient.
>
>That trying to organize the content onto a Wiki has been difficult, yes.
>
It is not that difficult to get the raw image scans from Distributed 
Proofreaders if you really want them.  They are not of the best quality 
(DP has other goals in mind) but they are usable for the purpose of 
transcription of the text.  I also fail to see how using a Wiki for 
proofreading is going to be any better than what DP is doing.  Indeed, 
the DP standards for proofreading are much higher than anything on any 
Wikimedia project, and once it has gone through the proofreading rounds 
through DP you can be generally assured of transcription accuracy that 
is as good if not better than any other trascription service, 
professional or amature.  I have seen the efforts of the 1911 
Encyclopaedia Britannica project on Wikisource and the efforts to 
improve textual fidelity for those articles have been absolutely 
miserable, and Distributed Proofreaders does a much, much better job. 
 All we are trying to do on Wikisource anyway is to do MediaWiki markup 
and linkages into existing Wikimedia projects like Wikipedia and 
Wiktionary where appropriate, as well as to link back to Wikisource for 
historical reference.  

The few articles that attempts for cleanup due to the textual source not 
coming from DP or Project Gutenberg sources have been frankly a joke and 
have incredible errors in the transcription.  Performing textual 
trascriptions of historical documents is simply not something that 
MediaWiki software is set up to deal with except on a very limited 
basis.  Marking up (adding bold words and italics) and hyperlinks is a 
much more appropriate task and something MediaWiki software does very 
well, which is the big strength of Wikisource as a project in general.

>  
>
>>That is the real issue here, because you can copyright a scan of an
>>image. Weak copyright protection at best, but you can copyright the
>>scan itself which would in turn force you to have to find the original
>>materials and do the scan seperately. In the case of the 1911
>>Encyclopaedia Brittanica, however, that is much easier to do than some
>>other older works. Again, working with the Distributed Proofreaders on
>>something like this is going to make life much easier because they have
>>done the scans themselves and are granting explicitly the scanned images
>>and content into the public domain. It also avoids duplication of labor
>>with a huge project like this.
>>
>>--
>>Robert Scott Horning
>>    
>>
>
> I thought scans of 2D public domain images were public domain. I've
>certainly read that on Wikipedia somewhere.
>
>Anthony
>  
>
This is an area of copyright law that is still working its way through 
the court system.  Most notable is the assertion of copyright by museums 
on classical artwork and university library special collections 
departments who have scanned images of historical works.  The only way 
they can assert copyright is to claim copyright on the scan or the image 
of the artwork, not on the original material itself.  For instance, 
there is a huge collection of photographs from the University of 
Michigan that you can look at here:

http://www.lib.umich.edu/spec-coll/labadie/labadie.html

The Univeristy of Michigan is asserting copyright on the whole 
collection, even though many photographs in the collection were 
physically made before 1923 and therefore would be considered in the 
public domain through copyright expiration.  Personally, I think there 
are some interesting photos in this collection and I'd like to add them 
to the Wikimedia Commons, but in this case because of the copyright 
assertion I am very reluctant to do so.  In this case, it is very 
unlikely that I would gain access to the photos in the special 
collections area of that library to do my own scans just for licensing 
purposes.  I am using this as just an example, but something that would 
be useful for all Wikimedia projects and to describe independently the 
issues of grabbing images at random and assuming that you have copyright 
authority to do with them as you please.

BTW, using Wikipedia as a scholarly reference is hardly a supporting 
argument.

-- 
Robert Scott Horning