[Wikipedia-l] [Wikisource-l] [Commons-l] Dream a little...

Tue Oct 17 05:51:23 UTC 2006

Hi Guys,

I am a new member of the wikipedia society. I have a plan to start wikipedia
in a new language. Please give me any suggestion.

On 16/10/06, Yann Forget <yann at forget-me.net> wrote:
>
> Hi,
>
> Danny tells it right. I have little to add to his mail.
>
> daniwo59 at aol.com a écrit :
> > My own thoughts on this, which I also expressed on the meta page:
> >
> > 1. There is plenty of material out that that is already public domain.
> > Part of the problem is that it can take forever and a day to digitize it
> > all. In the case of books and magazines, digitization often involves
> > destroying the hard copies in the process. There are, however,
> > specialized scanners that can do the work without ruining the books
> > themselves. These are expensive (about US $30,000 a machine). Ten
> > machines, strategically located around the world, along with student
> > staff to operate them around the clock could help to preserve these
> > texts and store them for prosperity. Additional people (paid and
> > volunteer) will be needed to OCR, proof, and hyperlink the material to
> > ensure that it doesn't get lost in a glut of material (I have visions of
> > the final scene of Raiders of the Lost Ark, when the Ark was finally
> > stored in some crate in an army warehouse).
> >
> > 2. While OCR capacities exist for some languages, they do not exist for
> > other languages, where the material is much more likely to get lost.
> > Manuscripts in Tibetan monasteries, for example, can be scanend but not
> > OCRed easily. To make this information available, developers should be
> > paid to create adequate OCR tools for these languages. Rough cost: $5
> > million.
>
> Much of the limits of Wikisource now is on the capability to scan and
> ocr documents. There is no good free OCR software, apart the new
> software recently released to GPL by Google, but it works only for
> English and has still limitations. So developing a good free and
> multilingual OCR software would be my priority. AFAIK there is no good
> OCR software (free or not) for any Indian languages, including Sanskrit.
> I have never seen any for Tibetan either.
>
> But having a software is not enough. A few OCR servers managed by the
> Foundation where anyone can sent an automated OCR request would be very
> useful. There are already proprietary OCR software who can do that.
>
> > 3. Music has been recorded around the world for well over a century, yet
> > many of the early recordings are being lost, especially those on wax
> > cylinders and porcelain records. Preservation includes locating,
> > identifying, and remastering. People must be trained to do this. Rough
> > cost: $35 million over two years.
> >
> > 4. This is true of old films as well. Celluloid copies are extremely
> > rare and extremely flammable. Restoration is exceedingly costly. For
> > example, [[Theda Bara]] is a well-known vamp of early Hollywood (the
> > word "vamp" was first used to describe her), yet none of her films
> > survive, and they were made less than a hundred years ago. Films are
> > international, they include important historic documents such as
> > newsreels, and they are being lost every day. Today, most
> > preservation work is being done by major studios, since it is so costly.
> > In other words, they are taking important works now in the public
> > domain, restoring them, and contending that the restoration is an
> > original work, i.e., another hundred years at least until some Vigo or
> > Charlie Chaplin films enter the public domain ... and little attention
> > is being paid to newsreels of events like the Russian revolution, World
> > War I, etc. Like music, people should be offered scholarships to learn
> > the art of film restoration and work on these projects. Until this
> > happens it can be outsourced. Rough cost: $50 million.
>
> I would add a special request for some of Cartier-Bresson photographs of
> Gandhi's funerals. I would have said a copy of the Encyclopedia of the
> Enlightment (1750, by Diderot and d'Alembert), but we already have it. ;o)
>
> > 5. To ensure all of this remains accessible, we will need a LOT of
> > servers and bandwidth: Initial outlay: $10 million.
>
> Yes, it's important not to forget that point.
>
> > Total $100 million dollars, spent over 5 years. Costs include staffing,
> > identifying prospective targets, transportation, overhead, etc. Just
> > coordinating a project of this scope will take a lot of effort.
>
> Yes, I would generally put more money on people's work than on documents.
>
> > And there is competition too. As an example,
> > http://historical.library.cornell.edu/IWP/ is a collection of
> > Internation Women's Journals, some of which are very important
> > historically. They are already scanned, but they are inaccessible
> > because a private company has (rightfully or wrongfully) copyrighted the
> > scans.
> >
> > Lots to be done. You will see how quickly $100 million can be spent.
> >
> > Danny
> >
> > In a message dated 10/15/2006 11:27:57 AM Eastern Daylight Time,
> > jwales at wikia.com writes:
> >
> >     I would like to gather from the community some examples of works you
> >     would like to see made free, works that we are not doing a good job
> of
> >     generating free replacements for, works that could in theory be
> >     purchased and freed.
> >
> >     Dream big.  Imagine there existed a budget of $100 million to
> purchase
> >     copyrights to be made available under a free license.  What would
> you
> >     like to see purchased and released under a free license?
> >
> >     Photos libraries? textbooks? newspaper archives? Be bold, be
> specific,
> >     be general, brainstorm, have fun with it.
> >
> >     I was recently asked this question by someone who is potentially in
> a
> >     position to make this happen, and he wanted to know what we need,
> what
> >     we dream of, that we can't accomplish on our own, or that we would
> >     expect to take a long time to accomplish on our own.
>
> Yes, fun has just started.
>
> >     --Jimbo
>
> Regards,
>
> Yann
> _______________________________________________
> Wikipedia-l mailing list
> Wikipedia-l at Wikimedia.org
> http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
>

-- 
Amanuel Amente
XL in life