[Wikisource-l] [Commons-l] Dream a little...

Fri Oct 20 08:22:22 UTC 2006

daniwo59 at aol.com wrote:

>My own thoughts on this, which I also expressed on the meta page: 
> 
>1. There is plenty of material out that that is already public domain. Part  
>of the problem is that it can take forever and a day to digitize it all. In 
>the  case of books and magazines, digitization often involves destroying the 
>hard  copies in the process. There are, however, specialized scanners that can do 
>the  work without ruining the books themselves. These are expensive (about US 
>$30,000  a machine). Ten machines, strategically located around the world, 
>along with  student staff to operate them around the clock could help to 
>preserve these  texts and store them for prosperity. Additional people (paid and 
>volunteer) will  be needed to OCR, proof, and hyperlink the material to ensure 
>that it  doesn't get lost in a glut of material 
>
All that scanning and OCR work could be quite tedious, and people might 
even need to be paid for this.  As look as these workers don't develop 
an addiction to the money we provide it could work.  These machines 
could go into key small institutions with significant archives who would 
appreciate having the machine.  Perhaps they could even keep the machine 
once our work there is done.  Students could be paid on a per semester 
contract basis, with renewal available when the previous year's targets 
are met.

>5. To ensure all of this remains accessible, we will need a LOT of  servers 
>and bandwidth: Initial outlay: $10 million.
> 
>Total $100 million dollars, spent over 5 years. Costs include staffing,  
>identifying prospective targets, transportation, overhead, etc. Just  coordinating 
>a project of this scope will take a lot of effort. 
>
A long term hardware optimization strategy would make interesting reading.

>And there is competition too. As an example, 
>_http://historical.library.cornell.edu/IWP/_ (http://historical.library.cornell.edu/IWP/)  is  a collection of 
>Internation Women's Journals, some of which are  very important historically. 
>They are already scanned, but they are  inaccessible because a private 
>company has (rightfully or wrongfully)  copyrighted the scans. 
>
This is where we need to decide where we should and where we shouldn't 
co-operate.  Our bottom line must remain to make everything accessible 
to everybody.  If they insist on the proprietary nature of this 
material, or try to invoke database protection laws it might be 
necessary to scan our own copies of everything that they have.

Ec